Instructions to use cortexsgea/sonus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cortexsgea/sonus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="cortexsgea/sonus")# Load model directly from transformers import OmniVoice model = OmniVoice.from_pretrained("cortexsgea/sonus", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 2,296 Bytes
10641f8 1c02485 76d720a 1c02485 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ---
language:
- multilingual
license: other
library_name: transformers
tags:
- text-to-speech
- tts
- voice-cloning
- multilingual
- zero-shot
- audio
- speech
datasets:
- multilingual-speech
metrics:
- mos
pipeline_tag: text-to-speech
---
# Sonus
A massively multilingual zero-shot text-to-speech synthesis system
## Overview
Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities.
## Key Features
- **600+ Languages Supported**: Broad language coverage for zero-shot TTS
- **Voice Cloning**: High-quality voice cloning from short reference audio
- **Voice Design**: Control voices via speaker attributes (gender, age, pitch, accent, etc.)
- **Fine-grained Control**: Support for non-verbal symbols and pronunciation correction
- **Fast Inference**: Optimized for real-time and batch processing
## Installation
```bash
pip install torch torchaudio
pip install transformers
```
## Quick Start
### Basic Usage
```python
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("cortexsgea/sonus", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cortexsgea/sonus", trust_remote_code=True)
# Load to device
model = model.to("cuda")
# Generate speech
text = "Hello, this is a test of voice synthesis."
# See documentation for full generation API
```
### Voice Cloning
```python
# Provide reference audio for voice cloning
# See API documentation for complete examples
```
## Model Specifications
- **Architecture**: Diffusion language model-style
- **Parameters**: 0.6B
- **Sampling Rate**: 24 kHz
- **Languages**: 600+
## License
This project is available under a custom license.
- **Non-commercial use**: Free for personal projects, research, and educational purposes
- **Commercial use**: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries
See LICENSE file for full terms.
## Disclaimer
Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards. |