|
|
--- |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- amphion/Emilia-Dataset |
|
|
language: |
|
|
- fr |
|
|
base_model: |
|
|
- ResembleAI/chatterbox |
|
|
pipeline_tag: text-to-speech |
|
|
tags: |
|
|
- french |
|
|
- audio |
|
|
- speech |
|
|
- tts |
|
|
- fine-tuning |
|
|
- chatterbox |
|
|
- Emilia |
|
|
- voice-cloning |
|
|
- zero-shot |
|
|
--- |
|
|
|
|
|
# Chatterbox TTS French 🥖 |
|
|
|
|
|
**Chatterbox TTS French** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis. |
|
|
|
|
|
<div align="center"><img width="400px" src="https://ih1.redbubble.net/image.5397735048.6235/bg,f8f8f8-flat,750x,075,f-pad,750x1000,f8f8f8.jpg" alt="baguette-france-tour-eiffel-image" /></div> |
|
|
|
|
|
- 🔊 **Language**: French 🇫🇷 |
|
|
- 🗣️ **Training dataset**: [Emilia Dataset (FR branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset) |
|
|
- ⏱️ **Data quantity**: 1400 hours of audio |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
Here’s how to generate speech using Chatterbox-TTS French: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import soundfile as sf |
|
|
from chatterbox.tts import ChatterboxTTS |
|
|
from huggingface_hub import hf_hub_download |
|
|
from safetensors.torch import load_file |
|
|
|
|
|
# Configuration |
|
|
MODEL_REPO = "Thomcles/Chatterbox-TTS-French" |
|
|
CHECKPOINT_FILENAME = "t3_cfg.safetensors" |
|
|
OUTPUT_PATH = "output_cloned_voice.wav" |
|
|
TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)." |
|
|
|
|
|
def get_device() -> str: |
|
|
return "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
def download_checkpoint(repo: str, filename: str) -> str: |
|
|
return hf_hub_download(repo_id=repo, filename=filename) |
|
|
|
|
|
def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS: |
|
|
model = ChatterboxTTS.from_pretrained(device=device) |
|
|
checkpoint_path = download_checkpoint(repo, checkpoint_file) |
|
|
t3_state = load_file(checkpoint_path, device="cpu") |
|
|
model.t3.load_state_dict(t3_state) |
|
|
return model |
|
|
|
|
|
def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor: |
|
|
with torch.inference_mode(): |
|
|
return model.generate( |
|
|
text=text, |
|
|
audio_prompt_path=audio_prompt_path, |
|
|
**kwargs |
|
|
) |
|
|
|
|
|
def save_audio(waveform: torch.Tensor, path: str, sample_rate: int): |
|
|
sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate) |
|
|
|
|
|
def main(): |
|
|
print("Loading model...") |
|
|
device = get_device() |
|
|
model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device) |
|
|
|
|
|
print(f"Generating speech on {device}...") |
|
|
wav = synthesize_speech( |
|
|
model, |
|
|
TEXT_TO_SYNTHESIZE, |
|
|
audio_prompt_path=None, |
|
|
exaggeration=0.5, |
|
|
temperature=0.6, |
|
|
cfg_weight=0.3 |
|
|
) |
|
|
|
|
|
print(f"Saving output to: {OUTPUT_PATH}") |
|
|
save_audio(wav, OUTPUT_PATH, model.sr) |
|
|
print("Done.") |
|
|
|
|
|
if __name__ == "__main__": |
|
|
main() |
|
|
``` |
|
|
|
|
|
Here is the output: |
|
|
|
|
|
<audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-French/resolve/main/example.mp3">Your browser does not support audio.</audio> |
|
|
|
|
|
### Base model license |
|
|
|
|
|
The base model is licensed under the MIT License. |
|
|
Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox) |
|
|
License: [MIT](https://choosealicense.com/licenses/mit/) |
|
|
|
|
|
### Training Data License |
|
|
|
|
|
This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0). |
|
|
Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) |
|
|
License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/) |
|
|
|
|
|
|
|
|
### Contact me |
|
|
|
|
|
Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out. |
|
|
|