Finetune an LLM using Federated AI

Last updated

3/10/2025

Get started

Federated AI

LLM

Made by

Partner

Flower

Finetune an LLM using Federated AI

This Blueprint demonstrates how to finetune LLMs using Flower, a framework for federated AI. As access to high-quality public datasets declines, federated AI enables multiple data owners to collaboratively fine-tune models without sharing raw data, preserving privacy while leveraging distributed datasets.

We apply parameter-efficient fine-tuning (PEFT) with LoRA adapters to fine-tune the Qwen2-0.5B-Instruct model on the Alpaca-GPT4 dataset. This approach optimizes resource efficiency while maintaining model adaptability, making it a practical solution for decentralized AI development.

Time

20 min

Complexity

Low

Medium

High

Status

Stable

Contributors

Tags

Text-to-Text

Finetuning

Federated AI

License

Apache 2.0

Preview this Blueprint in action

Hosted demo

Hosted Demo

Drag the corner to resize

Step by step walkthrough

Tools used to create

Trusted open source tools used for this Blueprint

HuggingFace Transformers

HuggingFace Transformers is used for model finetuning and HuggingFace Datasets is used for loading the dataset.

Flower

Flower’s Simulation Engine is used to run federated fine-tuning of the model.

Streamlit

Streamlit is used for testing the finetuned model in real time.

Choices

Insights into our motivations and key technical decisions throughout the development process.

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Focus

Decision

Rationale

Alternatives Considered

Trade-offs

Federated AI Framework

Used Flower for federated fine-tuning.

Developed by Flower.ai team, with the required expertise to create the Blueprint.

Fine-tune the model locally.

Will require one to gather huge amounts of data, pay for licenses, which will be a costly solution, and performance decreases.

Base Model

Fine-tuned Qwen2-0.5B-Instruct.

Smaller size to make federated fine-tuning more accessible for initial experimentation.

Larger/even smaller models from different models series (Qwen, Llama, etc.).

Larger models require more compute; even smaller models may lose expressiveness.

Simulation vs. Deployment

Default simulation mode for federated training.

Easier for developers to test without extensive infra setup.

Direct deployment with Flower’s Deployment Engine.

Simulations may not capture all real-world constraints.

Training Hardware

Supports CPU and GPU fine-tuning.

Increases accessibility for users with limited compute resources.

GPU-only training for efficiency.

CPU training is significantly slower, especially for larger models.

Dataset

Used Alpaca-GPT4 for fine-tuning.

Well-structured dataset for instruction tuning (not too large).

Custom datasets.

Alpaca-GPT4 may not cover edge cases for specific use-case.

Demo and Evaluation

Provided both Streamlit app and CLI-based evaluation for interactive testing.

Simple way to validate model responses in real time, depending on user-preference.

Real-world deployment across the globe. It's feasible.

Would need to find partners to set this up or rent more instances across the world.

Ready? Try it yourself!

System Requirements

OS: Linux Python 3.10 or higher Minimum RAM: 8GB (recommended for LLM fine-tuning)

Learn More

Help Documentation

Detailed guidance on GitHub walking you through this project installation.

Discussion Points

Get involved in improving the Blueprint by visiting the GitHub Blueprint issues.

Join in

Get started

Explore Blueprints Extensions

See examples of extended blueprints unlocking new capabilities and adjusted configurations enabling tailored solutions—or try it yourself.

Want to build your own Blueprints?

See our guidelines for building a top-notch Blueprint.

Must-haves

Open-source models and tools usage

README, pyproject.toml, and organized folder structure

Demo app (Streamlit or Gradio) or jupyter notebook

Config file for easy customization

CLI support

Nice-to-haves

CPU compatibility for most local setups

Google Colab notebook option

PyPI package availability

Dockerfile for the demo app

Diagram of the Blueprint in the README

Setup and guidance docs using mkdocs

Github Template Repo

Any-Agent Lumigator Blueprints

Blog Mozilla Manifesto Careers

Legal Privacy Notice Cookies

Finetune an LLM using Federated AI

Related content

Want to build your own Blueprints?