File size: 6,252 Bytes
bca11b0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | ---
license: apache-2.0
language:
- ro
pipeline_tag: text-to-speech
tags:
- tts
- romanian
- matcha-tts
- conditional-flow-matching
- swara
library_name: pytorch
datasets:
- SWARA-1.0
---
# Matcha-TTS Romanian Models
Pre-trained Romanian text-to-speech models based on [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) trained on the SWARA 1.0 dataset.
## Quick Start
### Clone Repository
Since this repository contains custom inference code and model loading utilities, you need to clone it:
```bash
# Clone from HuggingFace Hub
git clone https://huggingface.co/adrianstanea/Ro-Matcha-TTS
cd Ro-Matcha-TTS
# Install Git LFS (if not already installed) to download large model files
git lfs install
git lfs pull
```
### Installation
```bash
# Install system dependencies (required for phonemization)
sudo apt-get install espeak-ng
# Install the main Matcha-TTS repository
pip install git+https://github.com/adrianstanea/Matcha-TTS.git
# Install required dependencies
pip install -r requirements.txt
```
### Usage
```python
import sys
sys.path.append("src")
from model_loader import ModelLoader
# Load from local cloned repository
loader = ModelLoader.from_pretrained("./")
# List available models
print(loader.list_models())
# {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...}
# Load production-ready BAS speaker
model_info = loader.load_models(model="bas_950")
print(f"Model: {model_info['model_name']}")
print(f"Path: {model_info['model_path']}")
# Load few-shot SGS speaker
model_info = loader.load_models(model="sgs_10")
print(f"Training data: {model_info['model_info']['training_data']}")
# Use with original Matcha-TTS inference code
# See examples/inference_example.py for complete usage
```
### Run Example
```bash
cd examples
python inference_example.py
```
## Available Models
### Baseline Model
| Model | Type | Description |
| --------- | -------- | ---------------------------------------------------- |
| **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
### Fine-tuned Speaker Models
| Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
| ----------- | ---------- | ---------------- | ---------------- | -------------------------------- |
| **bas_10** | BAS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| **bas_950** | BAS (Male) | 950 samples | 100 | Production-ready speaker |
| **sgs_10** | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| **sgs_950** | SGS (Male) | 950 samples | 100 | Production-ready speaker |
**Vocoder**: Universal HiFi-GAN vocoder
### Research Methodology
- **Training Strategy**: Baseline β Speaker Fine-tuning (100 epochs)
- **Data Efficiency Study**: 10 vs 950 samples comparison
- **Low-Resource Learning**: Demonstrates few-shot TTS adaptation
## Model Details
- **Architecture**: Matcha-TTS (Conditional Flow Matching)
- **Dataset**: SWARA 1.0 Romanian Speech Corpus
- **Sample Rate**: 22,050 Hz
- **Language**: Romanian (ro)
- **Text Processing**: eSpeak Romanian phonemizer
- **Model Size**: ~100M parameters per model
## Repository Structure
```
βββ models/ # Model checkpoints (Git LFS)
β βββ swara/
β β βββ matcha-base-1000.ckpt # Baseline model (1000 epochs)
β βββ bas/
β β βββ matcha-bas-10_100.ckpt # BAS speaker (10 samples, 100 epochs)
β β βββ matcha-bas-950_100.ckpt # BAS speaker (950 samples, 100 epochs)
β βββ sgs/
β β βββ matcha-sgs-10_100.ckpt # SGS speaker (10 samples, 100 epochs)
β β βββ matcha-sgs-950_100.ckpt # SGS speaker (950 samples, 100 epochs)
β βββ vocoder/
β βββ hifigan_univ_v1 # Universal HiFi-GAN vocoder
βββ configs/
β βββ config.json # Model configuration
βββ src/
β βββ model_loader.py # HuggingFace-compatible loader
βββ examples/
βββ sample_texts_ro.txt # Sample Romanian texts
βββ inference_example.py # Complete usage example
```
## Usage with Original Repository
This repository provides model weights and HuggingFace integration. For training, evaluation, and advanced features, use the [main repository](https://github.com/adrianstanea/Matcha-TTS).
```python
# After loading models with ModelLoader
from matcha.models.matcha_tts import MatchaTTS
import torch
# Load using paths from ModelLoader
model = MatchaTTS.load_from_checkpoint(model_info['model_path'])
# ... continue with original inference code
```
## Requirements
- Python 3.10
- Main Matcha-TTS repository for inference
- HuggingFace Hub for model downloading
## License
Same as the original [Matcha-TTS repository](https://github.com/adrianstanea/Matcha-TTS).
## Citation
If you use this Romanian adaptation in your research, please cite:
```bibtex
@ARTICLE{11269795,
author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
journal={IEEE Access},
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
year={2025},
volume={13},
number={},
pages={203415-203428},
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
doi={10.1109/ACCESS.2025.3637322}
}
```
**Original Matcha-TTS Citation:**
```bibtex
@inproceedings{mehta2024matcha,
title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
booktitle={Proc. ICASSP},
year={2024}
}
```
## Links
- [Main Repository](https://github.com/adrianstanea/Matcha-TTS) - Training, documentation, and research details
- [Original Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) - Base architecture and paper
|