Create your own tailored podcast using your documents

Last updated

1/17/2025

Get started

Document

Podcast

Made by

Mozilla.ai

Create your own tailored podcast using your documents

This blueprint demonstrates how you can use open-source models & tools to convert input documents into a podcast featuring two speakers. It combines document pre-processing, language model-powered script generation, and text-to-speech synthesis. Designed to run on most local setups, it requires no external API calls or GPU access, making it both more accessible and privacy-friendly by keeping all processing local.

If you encounter any issues with the hosted demo below, try the Blueprint in the GPU-enabled Google Colab Notebook available here.

Time

10 min

Complexity

Low

Medium

High

Status

Stable

Contributors

Tags

Local AI

Text-to-Speech

License

Apache 2.0

Preview this Blueprint in action

Hosted demo

Hosted Demo

Drag the corner to resize

Step by step walkthrough

Tools used to create

Trusted open source tools used for this Blueprint

Llama.cpp

Use llama.cpp to load GGUF-type models, enabling efficient local generation of podcast scripts.

Kokoro

Use Kokoro to initialize a text-to-speech model to generate your podcast audio.

‍

Streamlit

Use Streamlit to build an interactive app for the full document-to-podcast pipeline.

Choices

Insights into our motivations and key technical decisions throughout the development process.

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Overall Motivation

Build a local-friendly, developer-centric document-to-podcast solution.

Ensures privacy, flexibility, and accessibility for developers without GPUs.

APIs or targeting GPU-accessible developers.

Less polished UX; smaller models yield weaker output but are more accessible.

Document Pre-processing

Used Python’s re library for text cleaning.

Lightweight, familiar, and effective for most input scenarios.

MarkItDown or custom regex libraries.

Limited scalability for complex tasks.

Podcast Script Generation

Used llama_cpp with Qwen2.5-3B-Instruct-GGUF.

CPU-friendly, consistent, and balanced performance.

Larger models or APIs; OLMoE-1B-7B-0924.

Smaller models are often less creative than larger ones, leading to weaker podcast scripts.

Audio Generation

Adopted Kokoro/OuteTTS for TTS.

Best open-source results after testing multiple frameworks.

Suno/bark and parler-tts.

Open-source TTS lags behind closed-source solutions like ElevenLabs.

Deployment Options

Supported Codespaces, CLI, Streamlit app, Colab, HF Spaces.

Flexible for diverse environments and compute needs.

Single-path setups like only local.

Maintaining multiple pathways adds complexity for updates and consistency.

Ready? Try it yourself!

System Requirements

OS: Windows, macOS, or Linux. Python 3.10 or higher. Min RAM: 10 GB. Disk space: 32 GB min.

Learn More

Help Documentation

Detailed guidance on GitHub walking you through this project installation.

Discussion Points

Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.

Join in

Get started

Explore Blueprints Extensions

See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.

Extensions

Multilingual Document-to-Podcast

An extension of the Document-to-Podcast Blueprint that shows how to enable support for languages other than English by integrating additional text-to-speech (TTS) models.

Kostis

Extensions

Document pre-processing with Markitdown

An extension of the Document-to-Podcast Blueprint that integrates 'Markitdown' as part of document pre-processing, enabling more document types to be used.