Can GPT-4 Transform Image-Rich PDFs into Dynamic HTML Websites?
The short answer is no, not yet. The longer answer is a little more complex. Let’s get into it.
The Power of GPT-4
GPT-4, the fourth iteration of the ground-breaking large language model, has evolved to possess even more advanced natural language processing and content generative capabilities, making it an ideal tool for transforming complex documents into interactive websites through its understanding of both human and computer languages.
While we’ve tinkered around with a variety of other prediction models, GPT-4 is the most ubiquitous amongst our team and thus the one we decided to put to the test.
The Challenge
Turn The Lore of Entelect PDF document into a functional HTML site.
Some background for context. The Lore of Entelect is a meaty whack of a document. It’s a beautiful thing and gives readers an excellent overview of our history and culture. But it was also originally printed as a coffee table-style book. Image-heavy doesn’t begin to cover it.
While it now exists as an online version because of pandemic times (take a look at it here), we do want to turn it into a website for tracking and mobile purposes.
Software Engineer, Kreason Naidoo, bravely took on the challenge and describes the process and results below.
In Kreason’s words:
As we know, PDF documents are designed to preserve precise layouts and formatting, which can be lost when transforming them into a different format. We wanted to see if GPT-4 could analyse and interpret the underlying data in this particular vector-rich PDF, unlocking its potential for an engaging online experience. Equally, we wanted to be able to track its performance with heatmaps and analytics.
The Conversion Process
We knew that converting a vector-rich PDF into an HTML website using GPT-4 would involve a multi-step process that combines text extraction, content understanding, and dynamic rendering.
Two paths - The First Prompt
We asked GPT-4 the following: Convert my PDF to a static HTML document.
It's important to note here that I had GPT-4 beta plugins active, with the Notable plugin running.
This is what it suggested
GPT realised it had access to Notable and set up a Jupyter Notebook with Python code to convert the PDF to images. GPT then gave me a link to use to upload The Lore pdf. Once it detected the PDF it ran the Python code and took the images extracted and embedded them in an HTML webpage.
This strategy, while effective, lacks any kind of indexing or styling that we are accustomed to in websites. The fascinating part is that GPT was able to behave autonomously, only asking me for the input data it needed.
The Second Prompt
I asked GPT-4 the following: I have a complicated PDF and want to make it a website, how do I do this? This time I did not enable any beta plugins.
This is what it suggested
In this instance, GPT provided a series of instructions for us to follow. It provided multiple options for strategies that would achieve the goal.
First, it recommended that I convert the PDF to Word via Adobe Acrobat, and then export it to PowerPoint. This method worked well at first glance, but upon further inspection, I realised that the spacing between words was inconsistent. And upon looking at the generated HTML code, it was obvious that it was not at all readable.
It also suggested that I could use Adobe InDesign if I had the original InDesign file (which I did). Specifically, it pointed me to an undocumented feature in InDesign and a custom script that would use the feature to export from InDesign to HTML. I had never used InDesign before and told GPT this. GPT then provided step-by-step instructions on how to run the custom script and background info on how it works and who authored it. This was amazing as it allowed me to use advanced features that typically require months of experience to master, in a few minutes.
The limitations
The biggest hurdle was that the code was unreadable to humans. So while I had output code, it was such a mess that we would’ve had to spend more time fixing it than if we’d just coded the site ourselves.
While the Beta version of GPT-4 can action items itself via plugins, it is still limited by the complexity of the problem. Had we had a simpler PDF, GPT with plugins would have completed this task easily.
Observations
- You can produce dirty code but this might still be useful for doing a prototype when you want to move quickly.
- When we first started using GPT-4, like many others, we saw it as a magical solution to all our problems. But now we're realising that it's not here to do everything for us. Instead, it's teaching us how to accomplish tasks in ways we wouldn’t know without it. It's like having an 'expert' by our side, giving us advice and guidance rather than doing all the work itself.
- Plugins are a game changer. GPT is getting more autonomous, and through integrations with online environments, is quickly moving from a helpful advisor to a companion, who, for the time being, is not confident enough to start pressing the buttons on its own. Soon it might be.
Conclusion
GPT isn’t ready to create rich and dynamic content on its own yet, but it can run solutions to simpler versions of this problem on its own and prompt us with multiple possible strategies to do it ourselves.
Something else! We made an intelligent chatbot!
While I waded through The Lore content and realised how long it was, I wondered if it was possible to specialise and limit the knowledge base of GPT-4 for our own needs.
So I wrote a service that enables GPT to find and read a specific document/knowledge base and then answer questions related to that base, in the same way you would have a conversation with an expert on an obscure topic.
We’re busy developing the app now, and of course, we need to add a front-end chat-bot style Q&A feature to this. But watch this space – it will be out soon.
So there are two paths that were taken here, the first was to see if GPT could do it on its own with the aid of plugins, the second was to ask it to give me steps to follow.