Last updated
4/22/2025
Get started
Convert documents to markdown format
This blueprint guides you to convert various unstructured documents (PDFs, DOCX, HTML, etc.) to Markdown format using the Docling command-line interface, with special attention to OCR capabilities and image handling options. Great for building text datasets or for downstream NLP tasks, this setup gives you a consistent, open format to work with across different document types. This Blueprint includes quick-start instructions to help you get started quickly on your own data and requires basic Python experience. Made in collaboration with EleutherAI.
Preview this Blueprint in action
Hosted demo
Drag the corner to resize
Step by step walkthrough
Choices
Insights into our motivations and key technical decisions throughout the development process.
No items found.
Ready? Try it yourself!
System Requirements
OS: Windows, macOS, or Linux; Python 3.10 or higher; Minimum RAM: 8GB; Disk space: 4GB for models and dependencies; GPU: optional
Learn MoreHelp Documentation
Detailed guidance on GitHub walking you through this project installation.
View MoreDiscussion Points
Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.
Join inExplore Blueprints Extensions
See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.
Load more
Want to build your own Blueprints?
See our guidelines for building a top-notch Blueprint.
Must-haves
Open-source models and tools usage
README, pyproject.toml, and organized folder structure
Demo app (Streamlit or Gradio) or jupyter notebook
Config file for easy customization
CLI support
Nice-to-haves
CPU compatibility for most local setups
Google Colab notebook option
PyPI package availability
Dockerfile for the demo app
Diagram of the Blueprint in the README
Setup and guidance docs using mkdocs