AEmotionStudio
/

fish-speech-s2-pro

+---
+license: other
+license_name: fish-audio-research
+license_link: LICENSE
+tags:
+  - text-to-speech
+  - tts
+  - voice-cloning
+  - speech-synthesis
+language:
+  - en
+  - zh
+---
+# Fish Speech S2 Pro — Mirror
+Mirror of the Fish Speech S2 Pro model by [Fish Audio](https://fish.audio).
+**Original model:** [fishaudio/fish-speech-1.5](https://huggingface.co/fishaudio/fish-speech-1.5)
+## Available Files
+| File | Size | Description |
+|---|---|---|
+| `model.safetensors` | 9.12 GB | Main language model weights |
+| `codec.pth` | 1.87 GB | Audio codec (encoder/decoder) |
+| `config.json` | 1.86 KB | Model configuration |
+| `tokenizer.json` | 12.2 MB | Tokenizer data |
+| `tokenizer_config.json` | 861 KB | Tokenizer configuration |
+| `special_tokens_map.json` | 102 KB | Special tokens mapping |
+| `chat_template.jinja` | 4.12 KB | Chat template |
+## Model Details
+Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities.
+- **Architecture:** Qwen3-based language model + audio codec
+- **Task:** Text-to-speech, voice cloning
+- **Languages:** English, Chinese, Japanese, and more
+- **Code:** [github.com/fishaudio/fish-speech](https://github.com/fishaudio/fish-speech)
+## Usage with ComfyUI-FFMPEGA
+This model is automatically downloaded and used by the [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA) extension for TTS and voice cloning features.
+## License
+**Fish Audio Research License** — see [LICENSE](LICENSE) file.
+- ✅ Free for research and non-commercial use
+- ❌ Commercial use requires a separate license from [Fish Audio](https://fish.audio) (contact: business@fish.audio)