fish-speech-s2-pro / README.md
AEmotionStudio's picture
Upload README.md with huggingface_hub
7e7d4ae verified
---
license: other
license_name: fish-audio-research
license_link: LICENSE
tags:
- text-to-speech
- tts
- voice-cloning
- speech-synthesis
language:
- en
- zh
---
# Fish Speech S2 Pro — Mirror
Mirror of the Fish Speech S2 Pro model by [Fish Audio](https://fish.audio).
**Original model:** [fishaudio/fish-speech-1.5](https://huggingface.co/fishaudio/fish-speech-1.5)
## Available Files
| File | Size | Description |
|---|---|---|
| `model.safetensors` | 9.12 GB | Main language model weights |
| `codec.pth` | 1.87 GB | Audio codec (encoder/decoder) |
| `config.json` | 1.86 KB | Model configuration |
| `tokenizer.json` | 12.2 MB | Tokenizer data |
| `tokenizer_config.json` | 861 KB | Tokenizer configuration |
| `special_tokens_map.json` | 102 KB | Special tokens mapping |
| `chat_template.jinja` | 4.12 KB | Chat template |
## Model Details
Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities.
- **Architecture:** Qwen3-based language model + audio codec
- **Task:** Text-to-speech, voice cloning
- **Languages:** English, Chinese, Japanese, and more
- **Code:** [github.com/fishaudio/fish-speech](https://github.com/fishaudio/fish-speech)
## Usage with ComfyUI-FFMPEGA
This model is automatically downloaded and used by the [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA) extension for TTS and voice cloning features.
## License
**Fish Audio Research License** — see [LICENSE](LICENSE) file.
- ✅ Free for research and non-commercial use
- ❌ Commercial use requires a separate license from [Fish Audio](https://fish.audio) (contact: business@fish.audio)