Query structured documents using a lightweight LLM workflow

Last updated

1/23/2025

Get started

Document

Q&A

Made by

Mozilla.ai

Query structured documents using a lightweight LLM workflow

This Blueprint demonstrates how to use open-source models and a simple LLM workflow to answer questions based on structured documents.

It is designed to showcase a simpler alternative to more complex and/or resource demanding alternatives, such as RAG systems that rely on vectorDBs and/or long-context models with large token windows.

If you encounter any issues with the hosted demo below, try the Blueprint in the GPU-enabled Google Colab Notebook available here.

‍

Time

10 min

Complexity

Low

Medium

High

Status

Stable

Contributors

Tags

Text-to-Text

Local AI

License

Apache 2.0

Preview this Blueprint in action

Hosted demo

Hosted Demo

Drag the corner to resize

Step by step walkthrough

Tools used to create

Trusted open source tools used for this Blueprint

PyMuPDF

Use pymupdf4llm to convert the document into markdown and then split into sections.

Llama.cpp

Use llama.cpp to load GGUF-type models, enabling efficient question answering using text-to-text.

Streamlit

Use Streamlit to build an interactive app to query your strcutured documents.

Choices

Insights into our motivations and key technical decisions throughout the development process.

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Overall Motivation

Build a local-friendly, Q&A system for structured docs (e.g. rulebooks).

Enables structured document retrieval without relying on closed APIs, or embeddings generation/VectorDB set-up.

Full Context API calls, standard RAG.

Performance gap compared to Full Context API solutions.

Document Pre-processing

Used PyMuPDF4LLM for section extraction.

Extracts structured sections for retrieval-based answers.

Docling - Lack of heading hierarchy affected performance. Marker - Slower, required additional model.

Struggles with visually complex layouts, impacting accuracy.

Question answering workflow

Find, Retrieve, Answer workflow.

Simpler than RAG and requires smaller context window than full-context window methods.

Agentic retrieval - added to much complexity to the workflow.

Relies on preprocessing quality and quality of section titles.

Model Selection

Qwen2.5-7B-Instruct.

Runs on accessible hardware while maintaining reasonable performance.

DeepSeek R1 distilled models (COT lowered accuracy), 1.5B-3B range models (lowered accuracy).

Slight reduction in accuracy compared to larger models.

Deployment Options

Supports Codespaces, local CLI, local Streamlit app, Google Colab, HF Spaces.

Flexible for diverse environments and compute needs.

Single-path setups like only local.

Maintaining multiple pathways adds complexity for updates and consistency.

Ready? Try it yourself!

System Requirements

Windows, macOS, or Linux. Python 3.10 or higher. Min RAM: 10 GB. Disk space: 32 GB min

Learn More

Help Documentation

Detailed guidance on GitHub walking you through this project installation.

Discussion Points

Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.

Join in

Get started

Explore Blueprints Extensions

See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.

Use Cases

Academic Paper Analysis

A tool that helps you analyse academic papers by answering your questions while preserving technical details and academic rigor.

alexmeckes

Want to build your own Blueprints?

See our guidelines for building a top-notch Blueprint.

Must-haves

Open-source models and tools usage

README, pyproject.toml, and organized folder structure

Demo app (Streamlit or Gradio) or jupyter notebook

Config file for easy customization

CLI support

Nice-to-haves

CPU compatibility for most local setups

Google Colab notebook option

PyPI package availability

Dockerfile for the demo app

Diagram of the Blueprint in the README

Setup and guidance docs using mkdocs

Github Template Repo

Query structured documents using a lightweight LLM workflow

Related content

Want to build your own Blueprints?