ZipVoice-CA: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching (now in Catalan!)
This repository contains the checkpoint of the fine-tuned ZipVoice-CA model, able to synthesise speech in catalan with great quality. Its metrics follow below. For more information regarding the training and evaluation of the model please refer to its repository on https://github.com/ErikUPV/ZipVoice-CA. Please also visit the repository of the original ZipVoice model.
To get a feel of the model, click here to listen to some samples.
Performance Metrics
| Dataset | WER (%) ↓ | CER (%) ↓ | SIM-o ↑ | UTMOS ↑ |
|---|---|---|---|---|
| Common Voice 17 | 10.96 | 3.00 | 0.68 | 3.17 |
| Festcat | 7.31 | 2.56 | 0.65 | 3.46 |
| LaFrescat | 7.61 | 2.56 | 0.67 | 3.54 |
Installation
1. Clone the repository
git clone https://github.com/erikupv/zipvoice-ca
cd zipvoice-ca
2. Environment Setup
We recommend using Conda to manage your dependencies and ensure a clean environment:
conda create -n ZipVoice python=3.11
conda activate ZipVoice
pip install -r requirements_zipvoice.txt
3. Download the Catalan Model
Use the Hugging Face CLI to download the fine-tuned checkpoint directly into your local models directory:
# pip install huggingface_hub
huggingface-cli download \
--local-dir models \
ebellob/ZipVoice-CA \
zipvoice_ca.pt
Inference
To generate speech from a test.tsv file using the Catalan model, use the command below.
python3 -m zipvoice.bin.infer_zipvoice \
--model-name zipvoice \
--model-dir ./models \
--checkpoint-name zipvoice_ca.pt \
--tokenizer "espeak" \
--lang "ca" \
--test-list data_cat/raw/test.tsv \
--res-dir results/ \
--guidance-scale 1.0 \
--num-step 16
For single file inference
python3 -m zipvoice.bin.infer_zipvoice \
--model-name zipvoice \
--prompt-wav prompt.wav \
--prompt-text "I am the transcription of the prompt wav." \
--text "I am the text to be synthesized." \
--res-wav-path result.wav
--model-dir ./models \
--checkpoint-name zipvoice_ca.pt \
--tokenizer "espeak" \
--lang "ca" \
--guidance-scale 1.0 \
--num-step 16
Acknowledgments
This work is a fine-tuned version of the ZipVoice project.
License
This project is licensed under the Apache-2.0 License.
Model tree for ebellob/ZipVoice-CA
Base model
k2-fsa/ZipVoice