metadata
license: apache-2.0
language:
- ro
pipeline_tag: text-to-speech
tags:
- tts
- romanian
- matcha-tts
- conditional-flow-matching
- swara
library_name: pytorch
datasets:
- SWARA-1.0
Matcha-TTS Romanian Models
Pre-trained Romanian text-to-speech models based on Matcha-TTS trained on the SWARA 1.0 dataset.
Quick Start
Clone Repository
Since this repository contains custom inference code and model loading utilities, you need to clone it:
# Clone from HuggingFace Hub
git clone https://huggingface.co/adrianstanea/Ro-Matcha-TTS
cd Ro-Matcha-TTS
# Install Git LFS (if not already installed) to download large model files
git lfs install
git lfs pull
Installation
# Install system dependencies (required for phonemization)
sudo apt-get install espeak-ng
# Install the main Matcha-TTS repository
pip install git+https://github.com/adrianstanea/Matcha-TTS.git
# Install required dependencies
pip install -r requirements.txt
Usage
import sys
sys.path.append("src")
from model_loader import ModelLoader
# Load from local cloned repository
loader = ModelLoader.from_pretrained("./")
# List available models
print(loader.list_models())
# {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...}
# Load production-ready BAS speaker
model_info = loader.load_models(model="bas_950")
print(f"Model: {model_info['model_name']}")
print(f"Path: {model_info['model_path']}")
# Load few-shot SGS speaker
model_info = loader.load_models(model="sgs_10")
print(f"Training data: {model_info['model_info']['training_data']}")
# Use with original Matcha-TTS inference code
# See examples/inference_example.py for complete usage
Run Example
cd examples
python inference_example.py
Available Models
Baseline Model
| Model | Type | Description |
|---|---|---|
| swara | Baseline | Speaker-agnostic model trained on full SWARA dataset |
Fine-tuned Speaker Models
| Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
|---|---|---|---|---|
| bas_10 | BAS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| bas_950 | BAS (Male) | 950 samples | 100 | Production-ready speaker |
| sgs_10 | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| sgs_950 | SGS (Male) | 950 samples | 100 | Production-ready speaker |
Vocoder: Universal HiFi-GAN vocoder
Research Methodology
- Training Strategy: Baseline β Speaker Fine-tuning (100 epochs)
- Data Efficiency Study: 10 vs 950 samples comparison
- Low-Resource Learning: Demonstrates few-shot TTS adaptation
Model Details
- Architecture: Matcha-TTS (Conditional Flow Matching)
- Dataset: SWARA 1.0 Romanian Speech Corpus
- Sample Rate: 22,050 Hz
- Language: Romanian (ro)
- Text Processing: eSpeak Romanian phonemizer
- Model Size: ~100M parameters per model
Repository Structure
βββ models/ # Model checkpoints (Git LFS)
β βββ swara/
β β βββ matcha-base-1000.ckpt # Baseline model (1000 epochs)
β βββ bas/
β β βββ matcha-bas-10_100.ckpt # BAS speaker (10 samples, 100 epochs)
β β βββ matcha-bas-950_100.ckpt # BAS speaker (950 samples, 100 epochs)
β βββ sgs/
β β βββ matcha-sgs-10_100.ckpt # SGS speaker (10 samples, 100 epochs)
β β βββ matcha-sgs-950_100.ckpt # SGS speaker (950 samples, 100 epochs)
β βββ vocoder/
β βββ hifigan_univ_v1 # Universal HiFi-GAN vocoder
βββ configs/
β βββ config.json # Model configuration
βββ src/
β βββ model_loader.py # HuggingFace-compatible loader
βββ examples/
βββ sample_texts_ro.txt # Sample Romanian texts
βββ inference_example.py # Complete usage example
Usage with Original Repository
This repository provides model weights and HuggingFace integration. For training, evaluation, and advanced features, use the main repository.
# After loading models with ModelLoader
from matcha.models.matcha_tts import MatchaTTS
import torch
# Load using paths from ModelLoader
model = MatchaTTS.load_from_checkpoint(model_info['model_path'])
# ... continue with original inference code
Requirements
- Python 3.10
- Main Matcha-TTS repository for inference
- HuggingFace Hub for model downloading
License
Same as the original Matcha-TTS repository.
Citation
If you use this Romanian adaptation in your research, please cite:
@ARTICLE{11269795,
author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
journal={IEEE Access},
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
year={2025},
volume={13},
number={},
pages={203415-203428},
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
doi={10.1109/ACCESS.2025.3637322}
}
Original Matcha-TTS Citation:
@inproceedings{mehta2024matcha,
title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
booktitle={Proc. ICASSP},
year={2024}
}
Links
- Main Repository - Training, documentation, and research details
- Original Matcha-TTS - Base architecture and paper