Text-to-Speech
Transformers
Safetensors
English
multilingual
sonus
tts
voice-cloning
zero-shot
audio
speech
Instructions to use comethrusws/sonus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use comethrusws/sonus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="comethrusws/sonus")# Load model directly from transformers import OmniVoice model = OmniVoice.from_pretrained("comethrusws/sonus", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 2,119 Bytes
4f28737 913527e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
language:
- en
- multilingual
license: other
library_name: transformers
tags:
- text-to-speech
- tts
- voice-cloning
- multilingual
- zero-shot
- audio
- speech
datasets:
- multilingual-speech
metrics:
- mos
pipeline_tag: text-to-speech
---
# Sonus
A massively multilingual zero-shot text-to-speech synthesis system
## Overview
Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities.
## Key Features
- **600+ Languages Supported**: Broad language coverage for zero-shot TTS
- **Voice Cloning**: High-quality voice cloning from short reference audio
- **Voice Design**: Control voices via speaker attributes (gender, age, pitch, accent, etc.)
- **Fine-grained Control**: Support for non-verbal symbols and pronunciation correction
- **Fast Inference**: Optimized for real-time and batch processing
## Installation
```bash
pip install torch torchaudio
pip install transformers
```
## Quick Start
```python
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("comethrusws/sonus", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("comethrusws/sonus", trust_remote_code=True)
# Load to device
model = model.to("cuda")
# Generate speech
text = "Hello, this is a test of voice synthesis."
```
## Model Specifications
- **Architecture**: Diffusion language model-style
- **Parameters**: 0.6B
- **Sampling Rate**: 24 kHz
- **Languages**: 600+
## License
This project is available under a custom license.
- **Non-commercial use**: Free for personal projects, research, and educational purposes
- **Commercial use**: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries
See LICENSE file for full terms.
## Disclaimer
Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards. |