File size: 2,296 Bytes
10641f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c02485
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76d720a
 
1c02485
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language:
- multilingual
license: other
library_name: transformers
tags:
- text-to-speech
- tts
- voice-cloning
- multilingual
- zero-shot
- audio
- speech
datasets:
- multilingual-speech
metrics:
- mos
pipeline_tag: text-to-speech
---
# Sonus

A massively multilingual zero-shot text-to-speech synthesis system

## Overview

Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities.

## Key Features

- **600+ Languages Supported**: Broad language coverage for zero-shot TTS
- **Voice Cloning**: High-quality voice cloning from short reference audio
- **Voice Design**: Control voices via speaker attributes (gender, age, pitch, accent, etc.)
- **Fine-grained Control**: Support for non-verbal symbols and pronunciation correction
- **Fast Inference**: Optimized for real-time and batch processing

## Installation

```bash
pip install torch torchaudio
pip install transformers
```

## Quick Start

### Basic Usage

```python
from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("cortexsgea/sonus", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cortexsgea/sonus", trust_remote_code=True)

# Load to device
model = model.to("cuda")

# Generate speech
text = "Hello, this is a test of voice synthesis."
# See documentation for full generation API
```

### Voice Cloning

```python
# Provide reference audio for voice cloning
# See API documentation for complete examples
```

## Model Specifications

- **Architecture**: Diffusion language model-style
- **Parameters**: 0.6B
- **Sampling Rate**: 24 kHz
- **Languages**: 600+

## License

This project is available under a custom license.

- **Non-commercial use**: Free for personal projects, research, and educational purposes
- **Commercial use**: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries

See LICENSE file for full terms.

## Disclaimer

Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards.