Matcha-TTS Romanian Models

Pre-trained Romanian text-to-speech models based on Matcha-TTS trained on the SWARA 1.0 dataset.

Quick Start

Clone Repository

Since this repository contains custom inference code and model loading utilities, you need to clone it:

# Clone from HuggingFace Hub
git clone https://huggingface.co/adrianstanea/Ro-Matcha-TTS
cd Ro-Matcha-TTS

# Install Git LFS (if not already installed) to download large model files
git lfs install
git lfs pull

Installation

# Install system dependencies (required for phonemization)
sudo apt-get install espeak-ng

# Install the main Matcha-TTS repository
pip install git+https://github.com/adrianstanea/Matcha-TTS.git

# Install required dependencies
pip install -r requirements.txt

Usage

import sys
sys.path.append("src")
from model_loader import ModelLoader

# Load from local cloned repository
loader = ModelLoader.from_pretrained("./")

# List available models
print(loader.list_models())
# {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...}

# Load production-ready BAS speaker
model_info = loader.load_models(model="bas_950")
print(f"Model: {model_info['model_name']}")
print(f"Path: {model_info['model_path']}")

# Load few-shot SGS speaker
model_info = loader.load_models(model="sgs_10")
print(f"Training data: {model_info['model_info']['training_data']}")

# Use with original Matcha-TTS inference code
# See examples/inference_example.py for complete usage

Run Example

cd examples
python inference_example.py

Available Models

Baseline Model

Model Type Description
swara Baseline Speaker-agnostic model trained on full SWARA dataset

Fine-tuned Speaker Models

Model Speaker Training Samples Fine-tune Epochs Use Case
bas_10 BAS (Male) 10 samples 100 Few-shot learning / Low-resource
bas_950 BAS (Male) 950 samples 100 Production-ready speaker
sgs_10 SGS (Male) 10 samples 100 Few-shot learning / Low-resource
sgs_950 SGS (Male) 950 samples 100 Production-ready speaker

Vocoder: Universal HiFi-GAN vocoder

Research Methodology

  • Training Strategy: Baseline β†’ Speaker Fine-tuning (100 epochs)
  • Data Efficiency Study: 10 vs 950 samples comparison
  • Low-Resource Learning: Demonstrates few-shot TTS adaptation

Model Details

  • Architecture: Matcha-TTS (Conditional Flow Matching)
  • Dataset: SWARA 1.0 Romanian Speech Corpus
  • Sample Rate: 22,050 Hz
  • Language: Romanian (ro)
  • Text Processing: eSpeak Romanian phonemizer
  • Model Size: ~100M parameters per model

Repository Structure

β”œβ”€β”€ models/                          # Model checkpoints (Git LFS)
β”‚   β”œβ”€β”€ swara/
β”‚   β”‚   └── matcha-base-1000.ckpt   # Baseline model (1000 epochs)
β”‚   β”œβ”€β”€ bas/
β”‚   β”‚   β”œβ”€β”€ matcha-bas-10_100.ckpt  # BAS speaker (10 samples, 100 epochs)
β”‚   β”‚   └── matcha-bas-950_100.ckpt # BAS speaker (950 samples, 100 epochs)
β”‚   β”œβ”€β”€ sgs/
β”‚   β”‚   β”œβ”€β”€ matcha-sgs-10_100.ckpt  # SGS speaker (10 samples, 100 epochs)
β”‚   β”‚   └── matcha-sgs-950_100.ckpt # SGS speaker (950 samples, 100 epochs)
β”‚   └── vocoder/
β”‚       └── hifigan_univ_v1         # Universal HiFi-GAN vocoder
β”œβ”€β”€ configs/
β”‚   └── config.json                  # Model configuration
β”œβ”€β”€ src/
β”‚   └── model_loader.py              # HuggingFace-compatible loader
└── examples/
    β”œβ”€β”€ sample_texts_ro.txt          # Sample Romanian texts
    └── inference_example.py         # Complete usage example

Usage with Original Repository

This repository provides model weights and HuggingFace integration. For training, evaluation, and advanced features, use the main repository.

# After loading models with ModelLoader
from matcha.models.matcha_tts import MatchaTTS
import torch

# Load using paths from ModelLoader
model = MatchaTTS.load_from_checkpoint(model_info['model_path'])
# ... continue with original inference code

Requirements

  • Python 3.10
  • Main Matcha-TTS repository for inference
  • HuggingFace Hub for model downloading

License

Same as the original Matcha-TTS repository.

Citation

If you use this Romanian adaptation in your research, please cite:

@ARTICLE{11269795,
  author={Răgman, Teodora and Bogdan StÒnea, Adrian and Cucu, Horia and Stan, Adriana},
  journal={IEEE Access},
  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
  year={2025},
  volume={13},
  number={},
  pages={203415-203428},
  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
  doi={10.1109/ACCESS.2025.3637322}
}

Original Matcha-TTS Citation:

@inproceedings{mehta2024matcha,
  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2024}
}

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support