--- license: other license_name: fish-audio-research license_link: LICENSE tags: - text-to-speech - tts - voice-cloning - speech-synthesis language: - en - zh --- # Fish Speech S2 Pro — Mirror Mirror of the Fish Speech S2 Pro model by [Fish Audio](https://fish.audio). **Original model:** [fishaudio/fish-speech-1.5](https://huggingface.co/fishaudio/fish-speech-1.5) ## Available Files | File | Size | Description | |---|---|---| | `model.safetensors` | 9.12 GB | Main language model weights | | `codec.pth` | 1.87 GB | Audio codec (encoder/decoder) | | `config.json` | 1.86 KB | Model configuration | | `tokenizer.json` | 12.2 MB | Tokenizer data | | `tokenizer_config.json` | 861 KB | Tokenizer configuration | | `special_tokens_map.json` | 102 KB | Special tokens mapping | | `chat_template.jinja` | 4.12 KB | Chat template | ## Model Details Fish Speech is a leading open-source text-to-speech (TTS) model that supports high-quality voice cloning and multilingual speech synthesis. The S2 Pro variant offers improved quality and zero-shot voice cloning capabilities. - **Architecture:** Qwen3-based language model + audio codec - **Task:** Text-to-speech, voice cloning - **Languages:** English, Chinese, Japanese, and more - **Code:** [github.com/fishaudio/fish-speech](https://github.com/fishaudio/fish-speech) ## Usage with ComfyUI-FFMPEGA This model is automatically downloaded and used by the [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA) extension for TTS and voice cloning features. ## License **Fish Audio Research License** — see [LICENSE](LICENSE) file. - ✅ Free for research and non-commercial use - ❌ Commercial use requires a separate license from [Fish Audio](https://fish.audio) (contact: business@fish.audio)