Text-to-Speech
Czech
File size: 1,876 Bytes
75a601e
 
 
 
 
cfc81f7
75a601e
 
 
 
 
 
 
 
 
 
 
 
30903b4
75a601e
 
 
 
 
 
 
b0e4fd2
cfc81f7
 
75a601e
fb08567
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
datasets:
- classla/ParlaSpeech-CZ
language:
- cs
license: cc-by-nc-sa-4.0
pipeline_tag: text-to-speech
tags:
- text-to-speech
---

# ZipVoice

Here, we share **ZipVoice** models trained on our department from **Czech** public speech datasets. 
We followed the recipes of the original ZipVoice model:
  - ZipVoice⚡: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching | [paper](https://arxiv.org/abs/2506.13053) | [HF repo](https://huggingface.co/k2-fsa/ZipVoice)


For instructions on using the models, see the original GitHub repository [ZipVoice](https://github.com/k2-fsa/ZipVoice) or our [Google Colab DEMO](https://colab.research.google.com/drive/143UCv_l6ns7Bsed1DmRVdfXW2YpQrMyn?usp=sharing).

# Models

## 1. zipvoice_cs_ParlaSpeech
- model type: ZipVoice
- training data: [ParlaSpeech-CZ.v1.0](https://huggingface.co/datasets/classla/ParlaSpeech-CZ) (1100 hours of parliamentary proceedings available in the Czech part of the [ParlaMint corpus](http://hdl.handle.net/11356/1859), automatically aligned with transcripts)
- trained from scratch
- the final model is a checkpoint averaged over the epoch range from 50 (excluded) to 60
- ▶️ [Google Colab DEMO](https://colab.research.google.com/drive/143UCv_l6ns7Bsed1DmRVdfXW2YpQrMyn?usp=sharing)
- 📜 License [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (Non-commercial research use only).

# Disclaimer
By using these models, you agree to inform the listeners that the speech samples are synthesized by the models, unless you have permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.