|
|
--- |
|
|
base_model: |
|
|
- LiquidAI/LFM2-350M |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- lfm2 |
|
|
- trl |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- ml |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
# Malayalam TTS Model (LFM2-350M Fine-tuned) |
|
|
|
|
|
This repository contains a fine-tuned **Malayalam Text-to-Speech (TTS)** model based on **LFM2-350M**, trained using [VyvoTTS](https://github.com/Vyvo-Labs/VyvoTTS) (LLM-based TTS framework) and [Unsloth](https://github.com/unslothai/unsloth). |
|
|
|
|
|
--- |
|
|
Malayalam TTS — 24 kHz (LLM + SNAC Codec) |
|
|
|
|
|
High-quality Malayalam text-to-speech model targeting natural pronunciation and clean prosody at 24 kHz, using a discrete audio codec (SNAC 24 kHz) for waveform reconstruction. Designed for lightweight deployment (~350M parameters) with GPU/CPU support. |
|
|
|
|
|
Status: v0.1 — stable inference, strong pronunciation, limited emotional expressiveness. Roadmap includes expressive styles and non‑verbal cues (laughter, giggles, breaths). |
|
|
|
|
|
✨ Highlights |
|
|
|
|
|
Language: Malayalam (with support for basic English loanwords). |
|
|
|
|
|
Sample Rate: 24 kHz, mono. |
|
|
|
|
|
Codec: [SNAC 24 kHz] for fast decoding. |
|
|
|
|
|
Model Size: ~350M parameters (small/efficient). |
|
|
|
|
|
Strengths: Clear, non‑robotic pronunciation; punctuation‑aware phrasing. |
|
|
|
|
|
Known Limits: Emotion range is narrow; limited style transfer; no speaker cloning in v0.1. |
|
|
|
|
|
## 📖 Model Details |
|
|
- **Base Model:** LFM2-350M |
|
|
- **Language:** Malayalam |
|
|
- **Dataset:** [ai4bharat/rasa](https://huggingface.co/datasets/ai4bharat/rasa) (Malayalam subset) |
|
|
- **Training:** 10 epochs, ~77k steps |
|
|
- **Frameworks Used:** VyvoTTS, Unsloth |
|
|
|
|
|
--- |
|
|
## 🔮 Future Work |
|
|
- Emotion and expressive style support |
|
|
- Non-verbal cues (laughter, giggles, breaths) |
|
|
- Multi-speaker extension |
|
|
|