Text-to-Speech
Transformers
Safetensors
English
multilingual
sonus
tts
voice-cloning
zero-shot
audio
speech
Instructions to use comethrusws/sonus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use comethrusws/sonus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="comethrusws/sonus")# Load model directly from transformers import OmniVoice model = OmniVoice.from_pretrained("comethrusws/sonus", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| - multilingual | |
| license: other | |
| library_name: transformers | |
| tags: | |
| - text-to-speech | |
| - tts | |
| - voice-cloning | |
| - multilingual | |
| - zero-shot | |
| - audio | |
| - speech | |
| datasets: | |
| - multilingual-speech | |
| metrics: | |
| - mos | |
| pipeline_tag: text-to-speech | |
| # Sonus | |
| A massively multilingual zero-shot text-to-speech synthesis system | |
| ## Overview | |
| Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities. | |
| ## Key Features | |
| - **600+ Languages Supported**: Broad language coverage for zero-shot TTS | |
| - **Voice Cloning**: High-quality voice cloning from short reference audio | |
| - **Voice Design**: Control voices via speaker attributes (gender, age, pitch, accent, etc.) | |
| - **Fine-grained Control**: Support for non-verbal symbols and pronunciation correction | |
| - **Fast Inference**: Optimized for real-time and batch processing | |
| ## Installation | |
| ```bash | |
| pip install torch torchaudio | |
| pip install transformers | |
| ``` | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| import torch | |
| model = AutoModel.from_pretrained("comethrusws/sonus", trust_remote_code=True) | |
| tokenizer = AutoTokenizer.from_pretrained("comethrusws/sonus", trust_remote_code=True) | |
| # Load to device | |
| model = model.to("cuda") | |
| # Generate speech | |
| text = "Hello, this is a test of voice synthesis." | |
| ``` | |
| ## Model Specifications | |
| - **Architecture**: Diffusion language model-style | |
| - **Parameters**: 0.6B | |
| - **Sampling Rate**: 24 kHz | |
| - **Languages**: 600+ | |
| ## License | |
| This project is available under a custom license. | |
| - **Non-commercial use**: Free for personal projects, research, and educational purposes | |
| - **Commercial use**: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries | |
| See LICENSE file for full terms. | |
| ## Disclaimer | |
| Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards. |