--- language: - multilingual license: other library_name: transformers tags: - text-to-speech - tts - voice-cloning - multilingual - zero-shot - audio - speech datasets: - multilingual-speech metrics: - mos pipeline_tag: text-to-speech --- # Sonus A massively multilingual zero-shot text-to-speech synthesis system ## Overview Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities. ## Key Features - **600+ Languages Supported**: Broad language coverage for zero-shot TTS - **Voice Cloning**: High-quality voice cloning from short reference audio - **Voice Design**: Control voices via speaker attributes (gender, age, pitch, accent, etc.) - **Fine-grained Control**: Support for non-verbal symbols and pronunciation correction - **Fast Inference**: Optimized for real-time and batch processing ## Installation ```bash pip install torch torchaudio pip install transformers ``` ## Quick Start ### Basic Usage ```python from transformers import AutoModel, AutoTokenizer import torch model = AutoModel.from_pretrained("cortexsgea/sonus", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("cortexsgea/sonus", trust_remote_code=True) # Load to device model = model.to("cuda") # Generate speech text = "Hello, this is a test of voice synthesis." # See documentation for full generation API ``` ### Voice Cloning ```python # Provide reference audio for voice cloning # See API documentation for complete examples ``` ## Model Specifications - **Architecture**: Diffusion language model-style - **Parameters**: 0.6B - **Sampling Rate**: 24 kHz - **Languages**: 600+ ## License This project is available under a custom license. - **Non-commercial use**: Free for personal projects, research, and educational purposes - **Commercial use**: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries See LICENSE file for full terms. ## Disclaimer Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards.