Last updated
1/23/2025
Share
Get started
Document
Q&A

Query structured documents using a lightweight LLM workflow

This Blueprint demonstrates how to use open-source models and a simple LLM workflow to answer questions based on structured documents.

It is designed to showcase a simpler alternative to more complex and/or resource demanding alternatives, such as RAG systems that rely on vectorDBs and/or long-context models with large token windows.

If you encounter any issues with the hosted demo below, try the Blueprint in the GPU-enabled Google Colab Notebook available here.

Preview this Blueprint in action
Hosted demo
Hosted Demo
Step by step walkthrough
Tools used to create

Trusted open source tools used for this Blueprint

PyMuPDF

Use pymupdf4llm to convert the document into markdown and then split into sections.

Llama.cpp

Use llama.cpp to load GGUF-type models, enabling efficient question answering using text-to-text.

Streamlit

Use Streamlit to build an interactive app to query your strcutured documents.

icon choices
Choices

Insights into our motivations and key technical decisions throughout the development process.

Focus
Decision
Rationale
Alternatives Considered
Trade-offs
Focus
Focus
Decision
Rationale
Alternatives Considered
Trade-offs
Overall Motivation
Overall Motivation
Build a local-friendly, Q&A system for structured docs (e.g. rulebooks).
Enables structured document retrieval without relying on closed APIs, or embeddings generation/VectorDB set-up.
Full Context API calls, standard RAG.
Performance gap compared to Full Context API solutions.
Document Pre-processing
Document Pre-processing
Used PyMuPDF4LLM for section extraction.
Extracts structured sections for retrieval-based answers.
Docling - Lack of heading hierarchy affected performance. Marker - Slower, required additional model.
Struggles with visually complex layouts, impacting accuracy.
Question answering workflow
Question answering workflow
Find, Retrieve, Answer workflow.
Simpler than RAG and requires smaller context window than full-context window methods.
Agentic retrieval - added to much complexity to the workflow.
Relies on preprocessing quality and quality of section titles.
Model Selection
Model Selection
Qwen2.5-7B-Instruct.
Runs on accessible hardware while maintaining reasonable performance.
DeepSeek R1 distilled models (COT lowered accuracy), 1.5B-3B range models (lowered accuracy).
Slight reduction in accuracy compared to larger models.
Deployment Options
Deployment Options
Supports Codespaces, local CLI, local Streamlit app, Google Colab, HF Spaces.
Flexible for diverse environments and compute needs.
Single-path setups like only local.
Maintaining multiple pathways adds complexity for updates and consistency.
Ready? Try it yourself!
icon extensions
Explore Blueprints Extensions

See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.

Load more
Text Link
BYOTA
tags
Text Link
Finetune STT with your voice
tags
Text Link
Map Features in OSM with CV
tags
Text Link
Finetune LLM using Federated AI
tags
Text Link
Embedding
tags
Text Link
Federated AI
tags
Text Link
Image Segmentation
tags
Text Link
Object Detection
tags
Text Link
Automatic Speech Recognition
tags
Text Link
Speech-to-Text
tags
Text Link
Query structured documents Q&A
tags
Text Link
Emails
tags
Text Link
Newsletter
tags
Text Link
Podcast
tags
Text Link
Community
tags
Text Link
Events
tags
Text Link
Discord
tags
Text Link
Data Extraction
tags
Text Link
User-Interface
tags
Text Link
Performance Optimization
tags
Text Link
LLM Inference
tags
Text Link
Language Modelling
tags
Text Link
Text-to-Text
tags
Text Link
Text-to-Speech
tags
Text Link
LLM
tags
Text Link
Email
tags
Text Link
Podcast personalities
tags
Text Link
Document-to-podcast
tags
Text Link
Blueprints
tags
Text Link
Use Cases
tags
Text Link
English
tags
Text Link
General Language
tags
Text Link
Multilingual
tags
Text Link
Audio
tags
Text Link
Text
tags
Text Link
Finetuning
tags
Text Link
Local AI
tags
Text Link
Federated Learning
tags
Text Link
LLM Integration
tags