| --- |
| license: other |
| license_name: fish-audio-research |
| license_link: LICENSE |
| tags: |
| - text-to-speech |
| - tts |
| - voice-cloning |
| - speech-synthesis |
| language: |
| - en |
| - zh |
| --- |
| |
| # Fish Speech S2 Pro — Mirror |
|
|
| Mirror of the Fish Speech S2 Pro model by [Fish Audio](https://fish.audio). |
|
|
| **Original model:** [fishaudio/fish-speech-1.5](https://huggingface.co/fishaudio/fish-speech-1.5) |
|
|
| ## Available Files |
|
|
| | File | Size | Description | |
| |---|---|---| |
| | `model.safetensors` | 9.12 GB | Main language model weights | |
| | `codec.pth` | 1.87 GB | Audio codec (encoder/decoder) | |
| | `config.json` | 1.86 KB | Model configuration | |
| | `tokenizer.json` | 12.2 MB | Tokenizer data | |
| | `tokenizer_config.json` | 861 KB | Tokenizer configuration | |
| | `special_tokens_map.json` | 102 KB | Special tokens mapping | |
| | `chat_template.jinja` | 4.12 KB | Chat template | |
|
|
| ## Model Details |
|
|
| Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities. |
|
|
| - **Architecture:** Qwen3-based language model + audio codec |
| - **Task:** Text-to-speech, voice cloning |
| - **Languages:** English, Chinese, Japanese, and more |
| - **Code:** [github.com/fishaudio/fish-speech](https://github.com/fishaudio/fish-speech) |
|
|
| ## Usage with ComfyUI-FFMPEGA |
|
|
| This model is automatically downloaded and used by the [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA) extension for TTS and voice cloning features. |
|
|
| ## License |
|
|
| **Fish Audio Research License** — see [LICENSE](LICENSE) file. |
|
|
| - ✅ Free for research and non-commercial use |
| - ❌ Commercial use requires a separate license from [Fish Audio](https://fish.audio) (contact: business@fish.audio) |
|
|