Last updated
5/28/2025
Get started
LLM Document Parser
This Blueprint provides a locally runnable pipeline for parsing structured data from scanned or digital documents using open-source OCR and LLMs. It takes in one or more documents in image and/or PDF formats as input and returns a single structured object with fields parsed from the documents. By defining a prompt and data model, the Blueprint will know how what fields to parse and what they should look like.
The example use case, parsing transaction data from bank statements, demonstrates how you can pass in multiple documents with differing formats and extract shared fields (transaction amount, description, and date). All of the bank statments from every document are compiled into one object. This Blueprint can be customized to work with any type of document to fit your needs.
Preview this Blueprint in action
Hosted demo
Drag the corner to resize
Step by step walkthrough
Tools used to create
Trusted open source tools used for this Blueprint
Choices
Insights into our motivations and key technical decisions throughout the development process.
No items found.
Ready? Try it yourself!
System Requirements
OS: Windows, macOS, or Linux. Python 3.10 or higher. Min RAM: 8 GB. Disk space: 6 GB min. GPU options (a GPU will enable the use of more powerful LLMs. 4GB+ of VRAM is recommended if using a GPU)
Learn MoreHelp Documentation
Detailed guidance on GitHub walking you through this project installation.
View MoreDiscussion Points
Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.
Join inExplore Blueprints Extensions
See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.
Load more
Want to build your own Blueprints?
See our guidelines for building a top-notch Blueprint.
Must-haves
Open-source models and tools usage
README, pyproject.toml, and organized folder structure
Demo app (Streamlit or Gradio) or jupyter notebook
Config file for easy customization
CLI support
Nice-to-haves
CPU compatibility for most local setups
Google Colab notebook option
PyPI package availability
Dockerfile for the demo app
Diagram of the Blueprint in the README
Setup and guidance docs using mkdocs