tedi-resemble's picture
Link model card to Chatterbox base model
7262b36 verified
|
Raw
History Blame Contribute Delete
2.2 kB
---
license: mit
language:
- es
tags:
- chatterbox
- text-to-speech
- tts
- multilingual
- single-language-tts
- voice-cloning
- chatterbox-v3
pipeline_tag: text-to-speech
base_model: ResembleAI/chatterbox
base_model_relation: finetune
---
<!-- chatterbox-space-link -->
> 🎙️ **Live demo:** Try this model in the [`ResembleAI/Chatterbox-Multilingual-TTS-es-es`](https://huggingface.co/spaces/ResembleAI/Chatterbox-Multilingual-TTS-es-es) Space.
<!-- chatterbox-space-link -->
# Chatterbox Multilingual: Spanish (Spain)
Chatterbox Multilingual: Spanish (Spain) is a dedicated single-language finetune in the **Chatterbox Multilingual V3 Single Language Pack**. It is optimized for Spanish as spoken in Spain, with language- and region-specific behavior for expressive text-to-speech and voice cloning.
Use this model when you want tighter Spanish (Spain) quality control than the broad multilingual checkpoint. For a single model that covers all supported languages, use [`ResembleAI/chatterbox`](https://huggingface.co/ResembleAI/chatterbox).
## Demo
Try the hosted demo Space: [`ResembleAI/Chatterbox-Multilingual-TTS-es-es`](https://huggingface.co/spaces/ResembleAI/Chatterbox-Multilingual-TTS-es-es).
## Files
- `t3_es_es.safetensors`: T3 state dict in safetensors format.
- `s3gen_v3.pt` / `s3gen_v3.safetensors`: V3 S3Gen speech decoder checkpoint.
- `grapheme_mtl_merged_expanded_v1.json`: multilingual tokenizer config.
## Language
- Locale: `es-ES`
- Chatterbox language ID: `es`
## Checkpoint Metadata
- Source step: `135500`
- Source checkpoint: `t3_135500.pth.tar`
- Tensor count: `292`
- Dtype: `float32`
- Text embedding shape: `(2454, 1024)`
- Speech embedding shape: `(8194, 1024)`
- Size: `2143990264` bytes
- SHA256: `d85844b13ea8cb45e95b8d84a55bcfeccb2d743035cf304ee3d778fc6be39546`
## Loader Notes
This repository contains Chatterbox Multilingual V3 single-language assets used by the linked demo Space. The T3 checkpoint is loaded with multilingual vocabulary shape `2454` and S3 speech vocabulary shape `8194`.
The demo combines these model-specific assets with the shared Chatterbox inference code and companion assets needed for end-to-end speech generation.