Ingesting PDFs using OpenAI GPT-4 Vision (2024)

Ingesting PDFs using OpenAI GPT-4 Vision (1)

  • Report this article

Matthew Groff Ingesting PDFs using OpenAI GPT-4 Vision (2)

Matthew Groff

Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead

Published Feb 18, 2024

+ Follow

PDFs are everywhere, but getting information out of them can be tough, especially when they're packed with charts and tables. That's why I started using OpenAI's GPT-4 Vision to make things easier by converting PDFs into Markdown, a format that's much simpler for computers to read.

Traditional tools for pulling text from PDFs are hit or miss. They might miss important details, especially if the PDF has lots of visuals. This inconsistency is a big problem when you're trying to understand or use the information in those PDFs.

Markdown is great for this because it's straightforward and structured, making it easy for AI to understand. OpenAI even uses Markdown to talk to ChatGPT, which shows how useful it is.

Here's what I did: First, I turned each page of the PDF into an image. This way, I didn't lose anything, like charts or images, that I might miss if I just tried to pull out the text. Then, I used GPT-4 Vision to read those images and turn them into Markdown text. GPT-4 Vision is smart enough to handle complex layouts and visuals, so I ended up with Markdown that kept the original PDF's content and structure.

Recommended by LinkedIn

OpenAI Dev Day: What got announced, and what it means Simon Smith 7 months ago
What is Auto-GPT and why does it matter? Ana L. 1 year ago

I wrapped all this up into a few Python scripts to automate the process. There's one script to turn the PDF into images, another to convert those images to Markdown with GPT-4 Vision, and a third to clean up the Markdown and get rid of anything we don't need, like placeholder images or page numbers. There's even an optional script that puts all the cleaned-up Markdown into one document.

This method isn't perfect, but it's a big step forward in making PDFs more accessible and easier to work with. Manually converting PDFs to Markdown by hand isn't realistic on a large scale, and just pulling out the text and chopping it up into chunks isn't enough, especially if you're missing out on important visual information.

Check out the GitHub repo for the scripts I mentioned. I hope this method helps you see the potential of AI in making it easier to work with PDFs and other documents. Feel free to reach out to me on LinkedIn if you have questions or want to chat about it.

Like
Comment

32

3 Comments

Taitan Nguyen

Know Your Data | Discover Opportunities | Deliver Value

2mo

  • Report this comment

Thanks! You mentioned the conversion is not perfect but I am curious if you have a measure of how well the resulting Markdown data compared to the original PDFs?

Like Reply

1Reaction

Matthew Groff

Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead

3mo

  • Report this comment
Like Reply

1Reaction 2Reactions

See more comments

To view or add a comment, sign in

Sign in

Stay updated on your professional world

Sign in

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Insights from the community

  • Software Engineering How can AI software systems be designed to resist adversarial examples?
  • Artificial Intelligence How can you choose the right debugging tool for an AI app?
  • Call Center Administration What are the best ways to handle customer questions about machine learning?
  • Public Administration What do you do if you want to leverage artificial intelligence and machine learning in public administration?
  • Algorithms You want to build a recommendation engine in Julia. What are the best tools to use?
  • Software Engineering How can software engineers improve transparency in AI systems?
  • Machine Learning What are the best ways to ensure transferability in machine learning?
  • Technological Innovation How do you select the right AI and ML techniques for your data?
  • Data Science What are effective techniques for labeling and annotating data?
  • Programming How can you differentiate between machine learning and artificial intelligence?

Others also viewed

  • Machine Learning Orientation for Motivated Non-Coders: A Half-Day of Reading Larry O'Brien 1y
  • AI Showdown Google Bard vs. OpenAI GPT4 vs. AI21 Jurassic-2: Coding Round 1 Victor Tai 1y
  • OpenAI's GPT Store - The Latest and Greatest GPTs Sarah Huard 4mo
  • OpenAI DevDay 2023 Highlights Ganapathy Shankar 7mo
  • Fine-tuning GPT-3.5 Turbo: A short intro for software engineers artiqode 9mo
  • OpenAI Doubles Down on Agent Behavior and Hosts First Devday David Norris 7mo
  • OpenAI Playground Aris Ihwan 9mo
  • Two-Minute Recap of OpenAI DevDay + Insights Andrei Puni 7mo
  • GPT-4 is here and my takeaways! Tao Guo 1y
  • "Strategic Moves Catapulted This GPT To The Top Of OpenAI's Charts!" Orren Prunckun 5mo

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
Ingesting PDFs using OpenAI GPT-4 Vision (2024)
Top Articles
Latest Posts
Article information

Author: Corie Satterfield

Last Updated:

Views: 6400

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Corie Satterfield

Birthday: 1992-08-19

Address: 850 Benjamin Bridge, Dickinsonchester, CO 68572-0542

Phone: +26813599986666

Job: Sales Manager

Hobby: Table tennis, Soapmaking, Flower arranging, amateur radio, Rock climbing, scrapbook, Horseback riding

Introduction: My name is Corie Satterfield, I am a fancy, perfect, spotless, quaint, fantastic, funny, lucky person who loves writing and wants to share my knowledge and understanding with you.