Last updated
4/22/2025
Get started
Convert documents to markdown format
This blueprint guides you to convert various unstructured documents (PDFs, DOCX, HTML, etc.) to Markdown format using the Docling command-line interface, with special attention to OCR capabilities and image handling options. Great for building text datasets or for downstream NLP tasks, this setup gives you a consistent, open format to work with across different document types. This Blueprint includes quick-start instructions to help you get started quickly on your own data and requires basic Python experience. Made in collaboration with EleutherAI.
Preview this Blueprint in action
Hosted demo
Step by step walkthrough
Choices
Insights into our motivations and key technical decisions throughout the development process.
No items found.
Ready? Try it yourself!
System Requirements
OS: Windows, macOS, or Linux; Python 3.10 or higher; Minimum RAM: 8GB; Disk space: 4GB for models and dependencies; GPU: optional
Learn MoreHelp Documentation
Detailed guidance on GitHub walking you through this project installation.
View MoreDiscussion Points
Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.
Join inExplore Blueprints Extensions
See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.
Load more