Last updated
6/23/2025
Share
Get started
Audio File
Transcript
Made by 
Mozilla.ai
By 
Mozilla.ai

Align text transcriptions in speech-to-text applications

This Blueprint enables you to align OpenAI’s Whisper speech‑to‑text models toward user-defined text. By supplying a custom list of phrases (e.g., brand names, technical terms, rare phrases), the model adjusts its transcriptions, improving accuracy for domain‑specific vocabulary, especially when you need reliable recognition of words that aren’t common in everyday language.

In the audio example below, you can compare the transcriptions before and after biasing the model with the text "Dileesh Pothan", which is the correct spelling of a name that does not appear often in the training data of the original model.

Without model alignment: "The rich potent as an Indian film director from Kerala who works in the Malayalam film industry."

With model alignment: "Dileesh Pothan is an Indian film director from Kerala who works in the Malayalam film industry."

Preview this Blueprint in action
Hosted demo
Hosted Demo
Drag the corner to resize
Step by step walkthrough
Tools used to create

Trusted open source tools used for this Blueprint

Whisper BiDec

Whisper BiDec enables to adjust transcriptions and recognize unusual names or phrases with smaller Whisper models.

Gradio

Gradio used to build a simple user interface that lets you upload audio files and see their transcriptions.

icon choices
Choices

Insights into our motivations and key technical decisions throughout the development process.

No items found.
Ready? Try it yourself!
icon extensions
Explore Blueprints Extensions

See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.

Load more

Want to build your own Blueprints?

See our guidelines for building a top-notch Blueprint.

Must-haves

Open-source models and tools usage

README, pyproject.toml, and organized folder structure

Demo app (Streamlit or Gradio) or jupyter notebook

Config file for easy customization

CLI support

Nice-to-haves

CPU compatibility for most local setups

Google Colab notebook option

PyPI package availability

Dockerfile for the demo app

Diagram of the Blueprint in the README

Setup and guidance docs using mkdocs

Text Link
Evaluation
tags
Text Link
Model Training
tags
Text Link
Synthetic Data Detection
tags
Text Link
OCR
tags
Text Link
Agents
tags
Text Link
BYOTA
tags
Text Link
Finetune STT with your voice
tags
Text Link
Map Features in OSM with CV
tags
Text Link
Finetune LLM using Federated AI
tags
Text Link
Embedding
tags
Text Link
Federated AI
tags
Text Link
Image Segmentation
tags
Text Link
Object Detection
tags
Text Link
Automatic Speech Recognition
tags
Text Link
Speech-to-Text
tags
Text Link
Query structured documents Q&A
tags
Text Link
Emails
tags
Text Link
Newsletter
tags
Text Link
Podcast
tags
Text Link
Community
tags
Text Link
Events
tags
Text Link
Discord
tags
Text Link
Data Extraction
tags
Text Link
User-Interface
tags
Text Link
Performance Optimization
tags
Text Link
LLM Inference
tags
Text Link
Language Modelling
tags
Text Link
Text-to-Text
tags
Text Link
Text-to-Speech
tags
Text Link
LLM
tags
Text Link
Email
tags
Text Link
Podcast personalities
tags
Text Link
Document-to-podcast
tags
Text Link
Blueprints
tags
Text Link
Use Cases
tags
Text Link
English
tags
Text Link
General Language
tags
Text Link
Multilingual
tags
Text Link
Audio
tags
Text Link
Text
tags
Text Link
Finetuning
tags
Text Link
Local AI
tags
Text Link
Federated Learning
tags
Text Link
LLM Integration
tags