- Report this article
Matthew Groff
Matthew Groff
Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead
Published Feb 18, 2024
+ Follow
PDFs are everywhere, but getting information out of them can be tough, especially when they're packed with charts and tables. That's why I started using OpenAI's GPT-4 Vision to make things easier by converting PDFs into Markdown, a format that's much simpler for computers to read.
Traditional tools for pulling text from PDFs are hit or miss. They might miss important details, especially if the PDF has lots of visuals. This inconsistency is a big problem when you're trying to understand or use the information in those PDFs.
Markdown is great for this because it's straightforward and structured, making it easy for AI to understand. OpenAI even uses Markdown to talk to ChatGPT, which shows how useful it is.
Here's what I did: First, I turned each page of the PDF into an image. This way, I didn't lose anything, like charts or images, that I might miss if I just tried to pull out the text. Then, I used GPT-4 Vision to read those images and turn them into Markdown text. GPT-4 Vision is smart enough to handle complex layouts and visuals, so I ended up with Markdown that kept the original PDF's content and structure.
Recommended by LinkedIn
I wrapped all this up into a few Python scripts to automate the process. There's one script to turn the PDF into images, another to convert those images to Markdown with GPT-4 Vision, and a third to clean up the Markdown and get rid of anything we don't need, like placeholder images or page numbers. There's even an optional script that puts all the cleaned-up Markdown into one document.
This method isn't perfect, but it's a big step forward in making PDFs more accessible and easier to work with. Manually converting PDFs to Markdown by hand isn't realistic on a large scale, and just pulling out the text and chopping it up into chunks isn't enough, especially if you're missing out on important visual information.
Check out the GitHub repo for the scripts I mentioned. I hope this method helps you see the potential of AI in making it easier to work with PDFs and other documents. Feel free to reach out to me on LinkedIn if you have questions or want to chat about it.
Like
Celebrate
Support
Love
Insightful
Funny
32
3 Comments
Taitan Nguyen
Know Your Data | Discover Opportunities | Deliver Value
2mo
- Report this comment
Thanks! You mentioned the conversion is not perfect but I am curious if you have a measure of how well the resulting Markdown data compared to the original PDFs?
1Reaction
Matthew Groff
Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead
3mo
- Report this comment
More details on my blog site https://groff.dev/blog/ingesting-pdfs-with-gpt-vision
1Reaction 2Reactions
See more comments
To view or add a comment, sign in
Sign in
Stay updated on your professional world
Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Insights from the community
- Software Engineering How can AI software systems be designed to resist adversarial examples?
- Artificial Intelligence How can you choose the right debugging tool for an AI app?
- Call Center Administration What are the best ways to handle customer questions about machine learning?
- Public Administration What do you do if you want to leverage artificial intelligence and machine learning in public administration?
- Algorithms You want to build a recommendation engine in Julia. What are the best tools to use?
- Software Engineering How can software engineers improve transparency in AI systems?
- Machine Learning What are the best ways to ensure transferability in machine learning?
- Technological Innovation How do you select the right AI and ML techniques for your data?
- Data Science What are effective techniques for labeling and annotating data?
- Programming How can you differentiate between machine learning and artificial intelligence?
Others also viewed
- Machine Learning Orientation for Motivated Non-Coders: A Half-Day of Reading Larry O'Brien 1y
- AI Showdown Google Bard vs. OpenAI GPT4 vs. AI21 Jurassic-2: Coding Round 1 Victor Tai 1y
- OpenAI's GPT Store - The Latest and Greatest GPTs Sarah Huard 4mo
- OpenAI DevDay 2023 Highlights Ganapathy Shankar 7mo
- Fine-tuning GPT-3.5 Turbo: A short intro for software engineers artiqode 9mo
- OpenAI Doubles Down on Agent Behavior and Hosts First Devday David Norris 7mo
- OpenAI Playground Aris Ihwan 9mo
- Two-Minute Recap of OpenAI DevDay + Insights Andrei Puni 7mo
- GPT-4 is here and my takeaways! Tao Guo 1y
- "Strategic Moves Catapulted This GPT To The Top Of OpenAI's Charts!" Orren Prunckun 5mo