ZipVoice-CA: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching (now in Catalan!)

This repository contains the checkpoint of the fine-tuned ZipVoice-CA model, able to synthesise speech in catalan with great quality. Its metrics follow below. For more information regarding the training and evaluation of the model please refer to its repository on https://github.com/ErikUPV/ZipVoice-CA. Please also visit the repository of the original ZipVoice model.

To get a feel of the model, click here to listen to some samples.

Performance Metrics

Dataset WER (%) ↓ CER (%) ↓ SIM-o ↑ UTMOS ↑
Common Voice 17 10.96 3.00 0.68 3.17
Festcat 7.31 2.56 0.65 3.46
LaFrescat 7.61 2.56 0.67 3.54

Installation

1. Clone the repository

git clone https://github.com/erikupv/zipvoice-ca
cd zipvoice-ca

2. Environment Setup

We recommend using Conda to manage your dependencies and ensure a clean environment:

conda create -n ZipVoice python=3.11
conda activate ZipVoice
pip install -r requirements_zipvoice.txt

3. Download the Catalan Model

Use the Hugging Face CLI to download the fine-tuned checkpoint directly into your local models directory:

# pip install huggingface_hub
huggingface-cli download \
  --local-dir models \
  ebellob/ZipVoice-CA \
  zipvoice_ca.pt

Inference

To generate speech from a test.tsv file using the Catalan model, use the command below.

python3 -m zipvoice.bin.infer_zipvoice \
    --model-name zipvoice \
    --model-dir ./models \
    --checkpoint-name zipvoice_ca.pt \
    --tokenizer "espeak" \
    --lang "ca" \
    --test-list data_cat/raw/test.tsv \
    --res-dir results/ \
    --guidance-scale 1.0 \
    --num-step 16

For single file inference

python3 -m zipvoice.bin.infer_zipvoice \
    --model-name zipvoice \
    --prompt-wav prompt.wav \
    --prompt-text "I am the transcription of the prompt wav." \
    --text "I am the text to be synthesized." \
    --res-wav-path result.wav
    --model-dir ./models \
    --checkpoint-name zipvoice_ca.pt \
    --tokenizer "espeak" \
    --lang "ca" \
    --guidance-scale 1.0 \
    --num-step 16

Acknowledgments

This work is a fine-tuned version of the ZipVoice project.

License

This project is licensed under the Apache-2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ebellob/ZipVoice-CA

Base model

k2-fsa/ZipVoice
Finetuned
(1)
this model

Dataset used to train ebellob/ZipVoice-CA