Orpheus 3B — Bangla TTS (Small Data Fine-tune)
Fine-tuned version of Orpheus 3B for Bangla (Bengali) Text-to-Speech using LoRA adapters. Trained on ~39K Bangla speech samples (Adiba speaker dataset) for 4,500 steps on H100 GPU.
For higher quality output, see the High Data version trained on ~99K samples.
Model Details
| Property | Value |
|---|---|
| Base Model | canopylabs/orpheus-3b-0.1-pretrained |
| Architecture | Llama 3B + LoRA adapters |
| Training Data | ~39,000 Bangla speech samples (Adiba speaker) |
| Training Steps | 4,500 |
| Audio Codec | SNAC 24kHz |
| Training Platform | Modal (H100 GPU) with Unsloth |
| Language | Bangla (bn) |
| License | Apache 2.0 |
What is Orpheus?
Orpheus TTS is a Llama-based text-to-speech model that generates audio as interleaved SNAC codec tokens. It supports emotional speech tags for expressive synthesis.
Usage
Note: The base model
canopylabs/orpheus-3b-0.1-pretrainedis gated — you need a HuggingFace token with approved access.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# 1. Load base model
base_model_id = "canopylabs/orpheus-3b-0.1-pretrained"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, token="YOUR_HF_TOKEN")
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
token="YOUR_HF_TOKEN"
)
# 2. IMPORTANT: Resize embeddings before loading LoRA
model.resize_token_embeddings(156940)
# 3. Load LoRA adapter
model = PeftModel.from_pretrained(
model,
"EMTIAZZ/orpheus-3b-bangla-small-data-finetuning",
token="YOUR_HF_TOKEN"
)
model = model.merge_and_unload()
# 4. Prepare prompt and generate
text = "আমি বাংলায় কথা বলতে পারি।"
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1200,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
Emotional Speech Tags
<laugh> <chuckle> <sigh> <cough> <sniffle>
<groan> <yawn> <gasp>
Training Details
- Framework: Unsloth + HuggingFace Trainer
- Method: LoRA (Low-Rank Adaptation)
- Speaker: Adiba (single-speaker Bangla dataset, ~39K samples)
- Hardware: H100 GPU on Modal
- Training: 4,500 steps
When to Use This vs. High-Data Version
- Use this model if you want a single-speaker voice (Adiba) or need quicker prototyping
- Use high-data version for better generalization, more natural prosody, and higher quality
Citation
@misc{emtiaz2026orpheusbanglasmall,
author = {Emtiaz Uddin Ahmed},
title = {Orpheus 3B Bangla Small-Data Fine-tune},
year = {2026},
url = {https://huggingface.co/EMTIAZZ/orpheus-3b-bangla-small-data-finetuning}
}
Author
- Downloads last month
- 4
Model tree for EMTIAZZ/orpheus-3b-bangla-small-data-finetuning
Base model
meta-llama/Llama-3.2-3B-Instruct Finetuned
canopylabs/orpheus-3b-0.1-pretrained