Initial upload of Romanian Matcha-TTS models

- Add BAS, SGS, and BASE speaker models trained on SWARA 1.0
- Include universal HiFi-GAN vocoder for Romanian TTS
- Provide HuggingFace-compatible model loader
- Add inference examples and configuration files

Signed-off-by: adrianstanea <adrianstanea1@gmail.com>

Files changed (16) hide show

.gitattributes +9 -0
README.md +197 -3
configs/config.json +83 -0
configs/speaker_config.json +56 -0
examples/inference_example.py +200 -0
examples/sample_texts_ro.txt +9 -0
models/bas/matcha-bas-10_100.ckpt +3 -0
models/bas/matcha-bas-950_100.ckpt +3 -0
models/sgs/matcha-sgs-10_100.ckpt +3 -0
models/sgs/matcha-sgs-950_100.ckpt +3 -0
models/swara/matcha-base-1000.ckpt +3 -0
models/vocoder/hifigan_univ_v1 +3 -0
requirements.txt +16 -0
src/__init__.py +8 -0
src/model_loader.py +202 -0
test.py +22 -0

.gitattributes CHANGED Viewed

@@ -1,3 +1,12 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

+# Git LFS configuration for model files
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+# Vocoder file (without extension in this case)
+models/vocoder/hifigan_univ_v1 filter=lfs diff=lfs merge=lfs -text
+# All model checkpoint files in subdirectories
+models/**/*.ckpt filter=lfs diff=lfs merge=lfs -text
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,197 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- ro
+pipeline_tag: text-to-speech
+tags:
+- tts
+- romanian
+- matcha-tts
+- conditional-flow-matching
+- swara
+library_name: pytorch
+datasets:
+- SWARA-1.0
+---
+# Matcha-TTS Romanian Models
+Pre-trained Romanian text-to-speech models based on [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) trained on the SWARA 1.0 dataset.
+## Quick Start
+### Clone Repository
+Since this repository contains custom inference code and model loading utilities, you need to clone it:
+```bash
+# Clone from HuggingFace Hub
+git clone https://huggingface.co/adrianstanea/Ro-Matcha-TTS
+cd Ro-Matcha-TTS
+# Install Git LFS (if not already installed) to download large model files
+git lfs install
+git lfs pull
+```
+### Installation
+```bash
+# Install system dependencies (required for phonemization)
+sudo apt-get install espeak-ng
+# Install the main Matcha-TTS repository
+pip install git+https://github.com/adrianstanea/Matcha-TTS.git
+# Install required dependencies
+pip install -r requirements.txt
+```
+### Usage
+```python
+import sys
+sys.path.append("src")
+from model_loader import ModelLoader
+# Load from local cloned repository
+loader = ModelLoader.from_pretrained("./")
+# List available models
+print(loader.list_models())
+# {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...}
+# Load production-ready BAS speaker
+model_info = loader.load_models(model="bas_950")
+print(f"Model: {model_info['model_name']}")
+print(f"Path: {model_info['model_path']}")
+# Load few-shot SGS speaker
+model_info = loader.load_models(model="sgs_10")
+print(f"Training data: {model_info['model_info']['training_data']}")
+# Use with original Matcha-TTS inference code
+# See examples/inference_example.py for complete usage
+```
+### Run Example
+```bash
+cd examples
+python inference_example.py
+```
+## Available Models
+### Baseline Model
+| Model     | Type     | Description                                          |
+| --------- | -------- | ---------------------------------------------------- |
+| **swara** | Baseline | Speaker-agnostic model trained on full SWARA dataset |
+### Fine-tuned Speaker Models
+| Model       | Speaker    | Training Samples | Fine-tune Epochs | Use Case                         |
+| ----------- | ---------- | ---------------- | ---------------- | -------------------------------- |
+| **bas_10**  | BAS (Male) | 10 samples       | 100              | Few-shot learning / Low-resource |
+| **bas_950** | BAS (Male) | 950 samples      | 100              | Production-ready speaker         |
+| **sgs_10**  | SGS (Male) | 10 samples       | 100              | Few-shot learning / Low-resource |
+| **sgs_950** | SGS (Male) | 950 samples      | 100              | Production-ready speaker         |
+**Vocoder**: Universal HiFi-GAN vocoder
+### Research Methodology
+- **Training Strategy**: Baseline → Speaker Fine-tuning (100 epochs)
+- **Data Efficiency Study**: 10 vs 950 samples comparison
+- **Low-Resource Learning**: Demonstrates few-shot TTS adaptation
+## Model Details
+- **Architecture**: Matcha-TTS (Conditional Flow Matching)
+- **Dataset**: SWARA 1.0 Romanian Speech Corpus
+- **Sample Rate**: 22,050 Hz
+- **Language**: Romanian (ro)
+- **Text Processing**: eSpeak Romanian phonemizer
+- **Model Size**: ~100M parameters per model
+## Repository Structure
+```
+├── models/                          # Model checkpoints (Git LFS)
+│   ├── swara/
+│   │   └── matcha-base-1000.ckpt   # Baseline model (1000 epochs)
+│   ├── bas/
+│   │   ├── matcha-bas-10_100.ckpt  # BAS speaker (10 samples, 100 epochs)
+│   │   └── matcha-bas-950_100.ckpt # BAS speaker (950 samples, 100 epochs)
+│   ├── sgs/
+│   │   ├── matcha-sgs-10_100.ckpt  # SGS speaker (10 samples, 100 epochs)
+│   │   └── matcha-sgs-950_100.ckpt # SGS speaker (950 samples, 100 epochs)
+│   └── vocoder/
+│       └── hifigan_univ_v1         # Universal HiFi-GAN vocoder
+├── configs/
+│   └── config.json                  # Model configuration
+├── src/
+│   └── model_loader.py              # HuggingFace-compatible loader
+└── examples/
+    ├── sample_texts_ro.txt          # Sample Romanian texts
+    └── inference_example.py         # Complete usage example
+```
+## Usage with Original Repository
+This repository provides model weights and HuggingFace integration. For training, evaluation, and advanced features, use the [main repository](https://github.com/adrianstanea/Matcha-TTS).
+```python
+# After loading models with ModelLoader
+from matcha.models.matcha_tts import MatchaTTS
+import torch
+# Load using paths from ModelLoader
+model = MatchaTTS.load_from_checkpoint(model_info['model_path'])
+# ... continue with original inference code
+```
+## Requirements
+- Python 3.10
+- Main Matcha-TTS repository for inference
+- HuggingFace Hub for model downloading
+## License
+Same as the original [Matcha-TTS repository](https://github.com/adrianstanea/Matcha-TTS).
+## Citation
+If you use this Romanian adaptation in your research, please cite:
+```bibtex
+@ARTICLE{11269795,
+  author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
+  journal={IEEE Access},
+  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
+  year={2025},
+  volume={13},
+  number={},
+  pages={203415-203428},
+  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
+  doi={10.1109/ACCESS.2025.3637322}
+}
+```
+**Original Matcha-TTS Citation:**
+```bibtex
+@inproceedings{mehta2024matcha,
+  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
+  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
+  booktitle={Proc. ICASSP},
+  year={2024}
+}
+```
+## Links
+- [Main Repository](https://github.com/adrianstanea/Matcha-TTS) - Training, documentation, and research details
+- [Original Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS) - Base architecture and paper

configs/config.json ADDED Viewed

	@@ -0,0 +1,83 @@

+{
+  "model_type": "matcha-tts",
+  "language": "romanian",
+  "architecture": "conditional_flow_matching",
+  "dataset": "SWARA_1.0",
+  "sample_rate": 22050,
+  "hop_length": 256,
+  "win_length": 1024,
+  "n_mels": 80,
+  "n_fft": 1024,
+  "f_min": 0,
+  "f_max": 8000,
+  "mel_mean": -4.776196479797363,
+  "mel_std": 2.2280216217041016,
+  "phoneme_cleaners": ["romanian_cleaners"],
+  "text_processing": "espeak_romanian",
+  "training_methodology": {
+    "baseline": "Speaker-agnostic model trained on full SWARA dataset (21,299 samples)",
+    "fine_tuning": "Speaker-specific fine-tuning from baseline for 100 epochs",
+    "data_variants": "Comparison of 10 vs 950 samples for fine-tuning effectiveness"
+  },
+  "available_models": {
+    "swara": {
+      "type": "baseline",
+      "description": "Speaker-agnostic baseline trained on full SWARA dataset",
+      "training_data": "21,299 samples (all speakers)",
+      "epochs": "1000",
+      "file": "models/swara/matcha-base-1000.ckpt"
+    },
+    "bas_10": {
+      "type": "fine_tuned",
+      "speaker": "BAS",
+      "description": "BAS speaker fine-tuned with 10 samples",
+      "base_model": "swara",
+      "training_data": "10 samples",
+      "fine_tune_epochs": 100,
+      "file": "models/bas/matcha-bas-10_100.ckpt"
+    },
+    "bas_950": {
+      "type": "fine_tuned",
+      "speaker": "BAS",
+      "description": "BAS speaker fine-tuned with 950 samples",
+      "base_model": "swara",
+      "training_data": "950 samples",
+      "fine_tune_epochs": 100,
+      "file": "models/bas/matcha-bas-950_100.ckpt"
+    },
+    "sgs_10": {
+      "type": "fine_tuned",
+      "speaker": "SGS",
+      "description": "SGS speaker fine-tuned with 10 samples",
+      "base_model": "swara",
+      "training_data": "10 samples",
+      "fine_tune_epochs": 100,
+      "file": "models/sgs/matcha-sgs-10_100.ckpt"
+    },
+    "sgs_950": {
+      "type": "fine_tuned",
+      "speaker": "SGS",
+      "description": "SGS speaker fine-tuned with 950 samples",
+      "base_model": "swara",
+      "training_data": "950 samples",
+      "fine_tune_epochs": 100,
+      "file": "models/sgs/matcha-sgs-950_100.ckpt"
+    },
+    "vocoder": {
+      "type": "vocoder",
+      "description": "Universal HiFi-GAN vocoder for Romanian TTS",
+      "file": "models/vocoder/hifigan_univ_v1"
+    }
+  },
+  "default_model": "bas_950",
+  "research_variants": ["bas_10", "bas_950", "sgs_10", "sgs_950"],
+  "inference_defaults": {
+    "n_timesteps": 50,
+    "temperature": 0.667,
+    "length_scale": 0.95
+  }
+}

configs/speaker_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "speakers": {
+    "BAS": {
+      "name": "BAS",
+      "gender": "male",
+      "description": "Primary male speaker from SWARA dataset",
+      "total_samples": 1490,
+      "variants": {
+        "bas_10": {
+          "training_samples": 10,
+          "description": "Few-shot learning with minimal data",
+          "use_case": "Low-resource speaker adaptation"
+        },
+        "bas_950": {
+          "training_samples": 950,
+          "description": "High-quality speaker adaptation",
+          "use_case": "Production-ready speaker model"
+        }
+      }
+    },
+    "SGS": {
+      "name": "SGS",
+      "gender": "male",
+      "description": "Secondary male speaker from SWARA dataset",
+      "total_samples": 994,
+      "variants": {
+        "sgs_10": {
+          "training_samples": 10,
+          "description": "Few-shot learning with minimal data",
+          "use_case": "Low-resource speaker adaptation"
+        },
+        "sgs_950": {
+          "training_samples": 950,
+          "description": "High-quality speaker adaptation",
+          "use_case": "Production-ready speaker model"
+        }
+      }
+    }
+  },
+  "baseline": {
+    "swara": {
+      "type": "multi_speaker",
+      "description": "Speaker-agnostic baseline model",
+      "training_samples": 21299,
+      "speakers_included": ["BAS", "SGS", "FLO", "others"],
+      "use_case": "General Romanian TTS, speaker adaptation base"
+    }
+  },
+  "research_insights": {
+    "data_efficiency": "Compare 10 vs 950 samples for fine-tuning effectiveness",
+    "speaker_adaptation": "Baseline + fine-tuning approach for new speakers",
+    "low_resource": "Demonstrate few-shot learning capabilities (10 samples)"
+  }
+}

examples/inference_example.py ADDED Viewed

	@@ -0,0 +1,200 @@

+"""
+Example usage of Romanian Matcha-TTS models with HuggingFace integration
+This script shows how to use the HuggingFace model loader with the original
+Matcha-TTS repository for inference.
+"""
+import sys
+import os
+import torch
+import soundfile as sf
+from pathlib import Path
+# Add the HuggingFace model loader to path
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+# Import our model loader
+from model_loader import ModelLoader
+def load_matcha_dependencies():
+    """
+    Try to import Matcha-TTS dependencies
+    Make sure you have the main repository installed:
+    pip install git+https://github.com/adrianstanea/Matcha-TTS.git
+    """
+    try:
+        # Import from the original Matcha-TTS repository
+        from matcha.models.matcha_tts import MatchaTTS
+        from matcha.hifigan.models import Generator as HiFiGAN
+        from matcha.hifigan.config import v1
+        from matcha.hifigan.env import AttrDict
+        from matcha.hifigan.denoiser import Denoiser
+        from matcha.text import text_to_sequence
+        from matcha.utils.utils import intersperse
+        return {
+            'MatchaTTS': MatchaTTS,
+            'HiFiGAN': HiFiGAN,
+            'v1': v1,
+            'AttrDict': AttrDict,
+            'Denoiser': Denoiser,
+            'text_to_sequence': text_to_sequence,
+            'intersperse': intersperse
+        }
+    except ImportError as e:
+        print(f"Error importing Matcha-TTS dependencies: {e}")
+        print("Please install the main repository:")
+        print("pip install git+https://github.com/adrianstanea/Matcha-TTS.git")
+        return None
+def synthesize_romanian(text: str, model: str = "bas_950", repo_path: str = None):
+    """
+    Synthesize Romanian speech using HuggingFace model loader
+    Args:
+        text: Romanian text to synthesize
+        model: Model name (swara, bas_10, bas_950, sgs_10, sgs_950)
+        repo_path: Path to HuggingFace repo (local or repo ID)
+    """
+    # Load Matcha-TTS dependencies
+    matcha_deps = load_matcha_dependencies()
+    if matcha_deps is None:
+        return None
+    # Initialize model loader
+    if repo_path is None:
+        # Use local path relative to this script
+        repo_path = str(Path(__file__).parent.parent)
+    try:
+        loader = ModelLoader.from_pretrained(repo_path)
+        print(f"✓ Loaded model configuration from {repo_path}")
+    except Exception as e:
+        print(f"✗ Failed to load model configuration: {e}")
+        return None
+    # Get model paths and configuration
+    model_info = loader.load_models(model=model)
+    print(f"✓ Model info loaded: {model_info['model_name']}")
+    print(f"  Description: {model_info['model_info']['description']}")
+    print(f"  Training data: {model_info['model_info'].get('training_data', 'N/A')}")
+    device = torch.device(model_info['device'])
+    print(f"✓ Using device: {device}")
+    # Load TTS model
+    try:
+        model = matcha_deps['MatchaTTS'].load_from_checkpoint(
+            model_info['model_path'],
+            map_location=device,
+            weights_only=False  # Required for PyTorch 2.6+ to load OmegaConf configs
+        )
+        model.eval()
+        print(f"✓ Loaded TTS model from {model_info['model_path']}")
+    except Exception as e:
+        print(f"✗ Failed to load TTS model: {e}")
+        return None
+    # Load vocoder
+    try:
+        h = matcha_deps['AttrDict'](matcha_deps['v1'])
+        vocoder = matcha_deps['HiFiGAN'](h).to(device)
+        checkpoint = torch.load(model_info['vocoder_path'], map_location=device, weights_only=False)
+        vocoder.load_state_dict(checkpoint['generator'])
+        vocoder.eval()
+        vocoder.remove_weight_norm()
+        denoiser = matcha_deps['Denoiser'](vocoder, mode='zeros')
+        print(f"✓ Loaded vocoder from {model_info['vocoder_path']}")
+    except Exception as e:
+        print(f"✗ Failed to load vocoder: {e}")
+        return None
+    # Process text
+    print(f"Processing text: '{text}'")
+    try:
+        # Use Romanian cleaners
+        x = torch.tensor(
+            matcha_deps['intersperse'](
+                matcha_deps['text_to_sequence'](text, ['romanian_cleaners'])[0], 0
+            ),
+            dtype=torch.long,
+            device=device
+        )[None]
+        x_lengths = torch.tensor([x.shape[-1]], dtype=torch.long, device=device)
+        print("✓ Text processed successfully")
+    except Exception as e:
+        print(f"✗ Failed to process text: {e}")
+        return None
+    # Generate speech
+    print("Generating speech...")
+    try:
+        with torch.inference_mode():
+            # Synthesis parameters from config
+            params = model_info['inference_params']
+            output = model.synthesise(
+                x, x_lengths,
+                n_timesteps=params['n_timesteps'],
+                temperature=params['temperature'],
+                length_scale=params['length_scale']
+            )
+            # Convert to waveform
+            mel = output['mel']
+            audio = vocoder(mel).clamp(-1, 1)
+            audio = denoiser(audio.squeeze(0), strength=0.00025).cpu().squeeze()
+            print("✓ Speech generated successfully")
+            return audio.numpy(), model_info['config']['sample_rate']
+    except Exception as e:
+        print(f"✗ Failed to generate speech: {e}")
+        return None
+def main():
+    """Example usage"""
+    # Test with local repository path
+    repo_path = str(Path(__file__).parent.parent)  # Path to Ro-Matcha-TTS
+    # Sample Romanian texts
+    test_texts = [
+        "Bună ziua! Acesta este un test de sinteză vocală.",
+        "România are o cultură bogată și o istorie fascinantă.",
+        "Limba română face parte din familia limbilor romanice."
+    ]
+    # Test different models for research comparison
+    test_models = ["bas_10", "bas_950", "sgs_10", "sgs_950"]
+    # Test synthesis
+    output_dir = Path("generated_samples")
+    output_dir.mkdir(exist_ok=True)
+    for model in test_models:  # Test with first two models
+        print(f"\n{'='*50}")
+        print(f"Testing model: {model}")
+        print(f"{'='*50}")
+        for i, text in enumerate(test_texts):  # Test with first text
+            print(f"\nText {i+1}: {text}")
+            result = synthesize_romanian(
+                text=text,
+                model=model,
+                repo_path=repo_path
+            )
+            if result is not None:
+                audio, sr = result
+                output_file = output_dir / f"sample_{model}_{i+1}.wav"
+                sf.write(output_file, audio, sr)
+                print(f"✓ Saved audio to {output_file}")
+            else:
+                print(f"✗ Failed to generate audio for {model}")
+if __name__ == "__main__":
+    main()

examples/sample_texts_ro.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+Bună ziua! Acesta este un test de sinteză vocală în limba română.
+Sistemul de sinteză vocală poate genera vorbire naturală.
+Această tehnologie folosește inteligența artificială avansată.
+Vorbirea sintetizată sună foarte realistă și naturală.
+România are o cultură bogată și o istorie fascinantă.
+Carpații sunt o destinație turistică populară în România.
+Bucureștiul este capitala și cel mai mare oraș din România.
+Limba română face parte din familia limbilor romanice.
+Tehnologia de sinteză vocală continuă să se dezvolte rapid.

models/bas/matcha-bas-10_100.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e38088c0861ea27a3052c913519431d30a898eb103acc66fa76cbce2915c4266
+size 218842881

models/bas/matcha-bas-950_100.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d1685d670ed150a298af0ba101b8ba22dd4ad5c6e5802a3b00caa8ce9f320e98
+size 218842881

models/sgs/matcha-sgs-10_100.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d5893deb965e7d3456d2be0af1e9974618df18af72c4772b91dc515a5bccf8a
+size 218842881

models/sgs/matcha-sgs-950_100.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abb84ec60dda560da6ce5f029e17cc4a9effd591bc102c4241ce7f025304648a
+size 218842498

models/swara/matcha-base-1000.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e18fe217b309d2bbf894b9d9ad7263252170f38556a76a1b0e4def95e2caeae
+size 218840902

models/vocoder/hifigan_univ_v1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:771eaf4876485a35e25577563d390c262e23c2421e4a8c929eacfde34a5b7a60
+size 55788858

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+# Core requirements for model loading and inference
+torch>=1.13.0
+huggingface_hub>=0.16.0
+numpy>=1.21.0
+soundfile>=0.10.0
+# Required for full inference functionality
+# Install the main repository for complete TTS pipeline:
+# pip install git+https://github.com/adrianstanea/Matcha-TTS.git
+# Text processing dependencies (included in main repo)
+# phonemizer>=3.0.0  # Romanian text processing
+# tqdm>=4.0.0        # Progress bars
+# Note: The main repository includes all necessary dependencies
+# for loading and using these models with the original inference pipeline

src/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+"""
+Matcha-TTS Romanian: HuggingFace Integration
+"""
+from .model_loader import ModelLoader
+__version__ = "1.0.0"
+__all__ = ["ModelLoader"]

src/model_loader.py ADDED Viewed

	@@ -0,0 +1,202 @@

+"""
+HuggingFace-compatible model loader for Romanian Matcha-TTS
+"""
+import json
+import os
+import torch
+from pathlib import Path
+from typing import Optional, Dict, Any
+try:
+    from huggingface_hub import hf_hub_download
+    HF_AVAILABLE = True
+except ImportError:
+    HF_AVAILABLE = False
+class ModelLoader:
+    """
+    HuggingFace-compatible loader for Romanian Matcha-TTS models
+    Usage:
+        loader = ModelLoader.from_pretrained("adrianstanea/Ro-Matcha-TTS")
+        model, vocoder = loader.load_models(speaker="BAS")
+    """
+    def __init__(self, repo_path: str):
+        """
+        Initialize with local repository path or HuggingFace repo ID
+        Args:
+            repo_path: Path to local repo or HuggingFace repo ID
+        """
+        self.repo_path = repo_path
+        self.config = self._load_config()
+    @classmethod
+    def from_pretrained(cls, repo_id: str, cache_dir: Optional[str] = None) -> "ModelLoader":
+        """
+        Load from HuggingFace Hub or local path
+        Args:
+            repo_id: HuggingFace repo ID (e.g., "adrianstanea/Ro-Matcha-TTS") or local path
+            cache_dir: Optional cache directory for downloads
+        Returns:
+            ModelLoader instance
+        """
+        if os.path.exists(repo_id):
+            # Local path
+            return cls(repo_id)
+        elif HF_AVAILABLE:
+            # Download from HuggingFace Hub
+            try:
+                config_path = hf_hub_download(
+                    repo_id=repo_id,
+                    filename="configs/config.json",
+                    cache_dir=cache_dir
+                )
+                repo_cache_path = Path(config_path).parent.parent
+                return cls(str(repo_cache_path))
+            except Exception as e:
+                raise ValueError(f"Could not download from HuggingFace Hub: {e}")
+        else:
+            raise ImportError("huggingface_hub is required for downloading from HF Hub. Install with: pip install huggingface_hub")
+    def _load_config(self) -> Dict[str, Any]:
+        """Load model configuration"""
+        config_path = os.path.join(self.repo_path, "configs", "config.json")
+        if not os.path.exists(config_path):
+            raise FileNotFoundError(f"Config file not found at {config_path}")
+        with open(config_path, 'r') as f:
+            return json.load(f)
+    def get_model_path(self, model: str = None) -> str:
+        """
+        Get path to model checkpoint for specified model
+        Args:
+            model: Model name (swara, bas_10, bas_950, sgs_10, sgs_950). If None, uses default.
+        Returns:
+            Absolute path to model checkpoint
+        """
+        if model is None:
+            model = self.config["default_model"]
+        if model not in self.config["available_models"]:
+            available = list(self.config["available_models"].keys())
+            raise ValueError(f"Model '{model}' not available. Available: {available}")
+        model_file = self.config["available_models"][model]["file"]
+        model_path = os.path.join(self.repo_path, model_file)
+        if not os.path.exists(model_path):
+            # Try to download from HuggingFace if not local
+            if HF_AVAILABLE and not os.path.exists(self.repo_path):
+                try:
+                    model_path = hf_hub_download(
+                        repo_id=self.repo_path,  # Treat as repo_id if not local path
+                        filename=model_file
+                    )
+                except Exception as e:
+                    raise FileNotFoundError(f"Model file not found locally and could not download: {e}")
+            else:
+                raise FileNotFoundError(f"Model file not found: {model_path}")
+        return model_path
+    def get_vocoder_path(self) -> str:
+        """
+        Get path to vocoder checkpoint
+        Returns:
+            Absolute path to vocoder checkpoint
+        """
+        vocoder_file = self.config["available_models"]["vocoder"]["file"]
+        vocoder_path = os.path.join(self.repo_path, vocoder_file)
+        if not os.path.exists(vocoder_path):
+            # Try to download from HuggingFace if not local
+            if HF_AVAILABLE and not os.path.exists(self.repo_path):
+                try:
+                    vocoder_path = hf_hub_download(
+                        repo_id=self.repo_path,
+                        filename=vocoder_file
+                    )
+                except Exception as e:
+                    raise FileNotFoundError(f"Vocoder file not found locally and could not download: {e}")
+            else:
+                raise FileNotFoundError(f"Vocoder file not found: {vocoder_path}")
+        return vocoder_path
+    def load_models(self, model: str = None, device: str = "auto"):
+        """
+        Load TTS model and vocoder for inference
+        NOTE: This returns paths for use with the original Matcha-TTS repository.
+        You'll need to import and use the original loading functions.
+        Args:
+            model: Model to load (swara, bas_10, bas_950, sgs_10, sgs_950)
+            device: Device to load on ("auto", "cpu", "cuda")
+        Returns:
+            Dict with model and vocoder paths and configurations
+        """
+        if device == "auto":
+            device = "cuda" if torch.cuda.is_available() else "cpu"
+        model_path = self.get_model_path(model)
+        vocoder_path = self.get_vocoder_path()
+        model_name = model or self.config["default_model"]
+        model_info = self.config["available_models"][model_name]
+        return {
+            "model_path": model_path,
+            "vocoder_path": vocoder_path,
+            "config": self.config,
+            "model_name": model_name,
+            "model_info": model_info,
+            "device": device,
+            "inference_params": self.config["inference_defaults"]
+        }
+    def list_models(self):
+        """List available models with details"""
+        models = {}
+        for name, info in self.config["available_models"].items():
+            if name != "vocoder":
+                models[name] = {
+                    "type": info["type"],
+                    "description": info["description"],
+                    "speaker": info.get("speaker", "multi_speaker"),
+                    "training_data": info.get("training_data", "N/A")
+                }
+        return models
+    def list_research_variants(self):
+        """List research comparison variants"""
+        return self.config["research_variants"]
+    def get_model_info(self, model: str = None):
+        """Get detailed information about a specific model"""
+        model_name = model or self.config["default_model"]
+        if model_name not in self.config["available_models"]:
+            raise ValueError(f"Model '{model_name}' not available")
+        return self.config["available_models"][model_name]
+    def get_sample_texts(self) -> list:
+        """Get Romanian sample texts for testing"""
+        return [
+            "Bună ziua! Acesta este un test de sinteză vocală în limba română.",
+            "Matcha-TTS funcționează foarte bine pentru limba română.",
+            "Sistemul de sinteză vocală poate genera vorbire naturală.",
+            "Această tehnologie folosește inteligența artificială avansată.",
+            "Vorbirea sintetizată sună foarte realistă și naturală."
+        ]

test.py ADDED Viewed

	@@ -0,0 +1,22 @@

+import sys
+sys.path.append("src")
+from model_loader import ModelLoader
+# Load from HuggingFace Hub (when available)
+loader = ModelLoader.from_pretrained("adrianstanea/Ro-Matcha-TTS")
+# Or load from local path
+loader = ModelLoader.from_pretrained("./")
+# List available models
+print(loader.list_models())
+# {'swara': {...}, 'bas_10': {...}, 'bas_950': {...}, ...}
+# Load production-ready BAS speaker
+model_info = loader.load_models(model="bas_950")
+print(f"Model: {model_info['model_name']}")
+print(f"Path: {model_info['model_path']}")
+# Load few-shot SGS speaker
+model_info = loader.load_models(model="sgs_10")
+print(f"Training data: {model_info['model_info']['training_data']}")