When working with extremely large PDF documents (e.g., 1,000+ pages), you may run into the upper limit of the context window allowed by the LLM. There are a few recommended approaches for handling documents in this case:
The first technique is to use prompt caching features to retain intermediate outputs or reference previous chunks without reprocessing them. This can speed up multi-step workflows and reduce redundancy when handling very large documents. This has the added benefit of reducing costs. Note, these are not always available for all LLMs, however they are commonly found in new versions of the most popular models.
Another technique is to divide the PDF document into logical chunks (i.e., by section or page range), run your prompt on each chunk individually, and then merge the results. This approach preserves key information while drastically reducing the total token load needed for a single prompt.
Lastly, retrieval-augmented generation (RAG) may be a good option if documents don’t fit within your LLM’s context window. RAG enables the LLM to leverage specific, external data sources. Rather than using the entire PDF in the context of your prompt, RAG will retrieve only the relevant information from the document to merge with your prompt.
On their own, or combined, these approaches enable scalable analysis of large PDF documents while maintaining the LLM’s performance.
Additional Resources:
This response has been generated by an LLM based on notes from PJMF technical consultations. All responses go through human review by our PJMF Products & Services team and are anonymized to protect our consultation participants.