--- license: apache-2.0 language: - ro pipeline_tag: text-to-speech tags: - tts - romanian - matcha-tts - conditional-flow-matching - swara library_name: pytorch datasets: - SWARA-1.0 --- # Matcha-TTS Romanian Models Pre-trained Romanian text-to-speech models based on [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) trained on the SWARA 1.0 dataset. ## Quick Start ### Clone Repository Since this repository contains custom inference code and model loading utilities, you need to clone it: ```bash # Clone from HuggingFace Hub git clone https://huggingface.co/adrianstanea/Ro-Matcha-TTS cd Ro-Matcha-TTS # Install Git LFS (if not already installed) to download large model files git lfs install git lfs pull ``` ### Installation ```bash # Install system dependencies (required for phonemization) sudo apt-get install espeak-ng # Install the main Matcha-TTS repository pip install git+https://github.com/adrianstanea/Matcha-TTS.git # Install required dependencies pip install -r requirements.txt ``` ### Usage ```python import sys sys.path.append("src") from model_loader import ModelLoader # Load from local cloned repository loader = ModelLoader.from_pretrained("./") # List available models print(loader.list_models()) # {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...} # Load production-ready BAS speaker model_info = loader.load_models(model="bas_950") print(f"Model: {model_info['model_name']}") print(f"Path: {model_info['model_path']}") # Load few-shot SGS speaker model_info = loader.load_models(model="sgs_10") print(f"Training data: {model_info['model_info']['training_data']}") # Use with original Matcha-TTS inference code # See examples/inference_example.py for complete usage ``` ### Run Example ```bash cd examples python inference_example.py ``` ## Available Models ### Baseline Model | Model | Type | Description | | --------- | -------- | ---------------------------------------------------- | | **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset | ### Fine-tuned Speaker Models | Model | Speaker | Training Samples | Fine-tune Epochs | Use Case | | ----------- | ---------- | ---------------- | ---------------- | -------------------------------- | | **bas_10** | BAS (Male) | 10 samples | 100 | Few-shot learning / Low-resource | | **bas_950** | BAS (Male) | 950 samples | 100 | Production-ready speaker | | **sgs_10** | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource | | **sgs_950** | SGS (Male) | 950 samples | 100 | Production-ready speaker | **Vocoder**: Universal HiFi-GAN vocoder ### Research Methodology - **Training Strategy**: Baseline → Speaker Fine-tuning (100 epochs) - **Data Efficiency Study**: 10 vs 950 samples comparison - **Low-Resource Learning**: Demonstrates few-shot TTS adaptation ## Model Details - **Architecture**: Matcha-TTS (Conditional Flow Matching) - **Dataset**: SWARA 1.0 Romanian Speech Corpus - **Sample Rate**: 22,050 Hz - **Language**: Romanian (ro) - **Text Processing**: eSpeak Romanian phonemizer - **Model Size**: ~100M parameters per model ## Repository Structure ``` ├── models/ # Model checkpoints (Git LFS) │ ├── swara/ │ │ └── matcha-base-1000.ckpt # Baseline model (1000 epochs) │ ├── bas/ │ │ ├── matcha-bas-10_100.ckpt # BAS speaker (10 samples, 100 epochs) │ │ └── matcha-bas-950_100.ckpt # BAS speaker (950 samples, 100 epochs) │ ├── sgs/ │ │ ├── matcha-sgs-10_100.ckpt # SGS speaker (10 samples, 100 epochs) │ │ └── matcha-sgs-950_100.ckpt # SGS speaker (950 samples, 100 epochs) │ └── vocoder/ │ └── hifigan_univ_v1 # Universal HiFi-GAN vocoder ├── configs/ │ └── config.json # Model configuration ├── src/ │ └── model_loader.py # HuggingFace-compatible loader └── examples/ ├── sample_texts_ro.txt # Sample Romanian texts └── inference_example.py # Complete usage example ``` ## Usage with Original Repository This repository provides model weights and HuggingFace integration. For training, evaluation, and advanced features, use the [main repository](https://github.com/adrianstanea/Matcha-TTS). ```python # After loading models with ModelLoader from matcha.models.matcha_tts import MatchaTTS import torch # Load using paths from ModelLoader model = MatchaTTS.load_from_checkpoint(model_info['model_path']) # ... continue with original inference code ``` ## Requirements - Python 3.10 - Main Matcha-TTS repository for inference - HuggingFace Hub for model downloading ## License Same as the original [Matcha-TTS repository](https://github.com/adrianstanea/Matcha-TTS). ## Citation If you use this Romanian adaptation in your research, please cite: ```bibtex @ARTICLE{11269795, author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana}, journal={IEEE Access}, title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools}, year={2025}, volume={13}, number={}, pages={203415-203428}, keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian}, doi={10.1109/ACCESS.2025.3637322} } ``` **Original Matcha-TTS Citation:** ```bibtex @inproceedings{mehta2024matcha, title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching}, author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje}, booktitle={Proc. ICASSP}, year={2024} } ``` ## Links - [Main Repository](https://github.com/adrianstanea/Matcha-TTS) - Training, documentation, and research details - [Original Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) - Base architecture and paper