SVECTOR-OFFICIAL commited on
Commit
78a7d83
·
verified ·
1 Parent(s): c8da1af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +265 -1
README.md CHANGED
@@ -4,4 +4,268 @@ pipeline_tag: text-to-speech
4
  tags:
5
  - voice
6
  - speech
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - voice
6
  - speech
7
+ - text-to-speech
8
+ - audio
9
+ ---
10
+
11
+ <p align="center">
12
+ <img alt="Continue-TTS" src="https://github.com/SVECTOR-CORPORATION/Continue-TTS/blob/main/continue-tts-image-banner.jpg?raw=true" width="800">
13
+ </p>
14
+
15
+ # Continue-TTS
16
+
17
+ ### Text-to-Speech Model Based on Continue-1-OSS
18
+
19
+ <div align="left" style="line-height: 1;">
20
+ <a href="https://spec-chat.tech" target="_blank" style="margin: 2px;">
21
+ <img alt="SVECTOR" src="https://img.shields.io/badge/💬%20Spec%20Chat-Spec%20Chat-blue?style=plastic" style="display: inline-block; vertical-align: middle;"/>
22
+ </a>
23
+
24
+ <a href="https://huggingface.co/SVECTOR-CORPORATION" target="_blank" style="margin: 2px;">
25
+ <img alt="SVECTOR" src="https://img.shields.io/badge/🤗%20Hugging%20Face-SVECTOR-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
26
+ </a>
27
+
28
+ <a href="https://huggingface.co/SVECTOR-CORPORATION/Continue-TTS/blob/main/LICENSE" style="margin: 2px;">
29
+ <img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?color=1e88e5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
30
+ </a>
31
+
32
+ <a href="https://github.com/SVECTOR-CORPORATION/Continue-TTS" target="_blank" style="margin: 2px;">
33
+ <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Continue--TTS-181717?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
34
+ </a>
35
+ </div>
36
+
37
+ ## Introduction
38
+
39
+ We are thrilled to introduce **Continue-TTS**, a fine-tuned text-to-speech model based on the **Continue-1-OSS** architecture, developed by SVECTOR. This model is specifically trained for high-quality speech synthesis and delivers exceptional voice generation capabilities.
40
+
41
+ **Continue-TTS** is engineered to provide:
42
+
43
+ - **Natural Speech:** Human-like intonation, emotion, and rhythm that rivals commercial solutions
44
+ - **8 Unique Voices:** Diverse voice options with distinct personalities and characteristics
45
+ - **Real-time Generation:** Low-latency streaming for interactive applications (~200ms)
46
+ - **Emotional Expression:** Built-in support for laughter, sighs, gasps, and other natural emotions
47
+ - **Open Source:** Fully accessible under Apache 2.0 license for research and commercial use
48
+
49
+ This model is based on the **Continue-1-OSS** architecture and combines the power of large language models with neural audio codecs to generate exceptionally natural speech from text.
50
+
51
+ ### Model Specifications
52
+
53
+ - **Base Architecture:** Continue-1-OSS
54
+ - **Type:** Text-to-Speech (TTS) Model
55
+ - **Parameters:** 3 Billion
56
+ - **Audio Codec:** SNAC (24kHz)
57
+ - **Context Length:** 131,072 tokens
58
+ - **Vocabulary:** 156,940 tokens (including 28,672 audio tokens)
59
+ - **License:** Apache 2.0
60
+ - **Voices:** 8 (Nova, Aurora, Stellar, Atlas, Orion, Luna, Phoenix, Ember)
61
+
62
+ ## Requirements
63
+
64
+ To use Continue-TTS, install the required dependencies:
65
+
66
+ ```bash
67
+ pip install transformers torch
68
+ pip install snac # Audio codec
69
+ pip install vllm==0.7.3 # For fast inference (optional but recommended)
70
+ ```
71
+
72
+ ## Quickstart
73
+
74
+ ### Basic Usage
75
+
76
+ ```python
77
+ from transformers import AutoTokenizer, AutoModelForCausalLM
78
+ import torch
79
+
80
+ model_id = "SVECTOR-CORPORATION/Continue-TTS"
81
+
82
+ # Load model and tokenizer
83
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
84
+ model = AutoModelForCausalLM.from_pretrained(
85
+ model_id,
86
+ torch_dtype=torch.bfloat16,
87
+ device_map="auto",
88
+ trust_remote_code=True
89
+ )
90
+
91
+ # Prepare text with voice
92
+ text = "Hello! I am Continue-TTS, a text-to-speech model based on Continue-1-OSS."
93
+ voice = "nova" # Choose: nova, aurora, stellar, atlas, orion, luna, phoenix, ember
94
+
95
+ # Format prompt (TTS format)
96
+ adapted_prompt = f"{voice}: {text}"
97
+ prompt_tokens = tokenizer(adapted_prompt, return_tensors="pt")
98
+ start_token = torch.tensor([[128259]], dtype=torch.int64)
99
+ end_tokens = torch.tensor([[128009, 128260, 128261, 128257]], dtype=torch.int64)
100
+ input_ids = torch.cat([start_token, prompt_tokens.input_ids, end_tokens], dim=1)
101
+
102
+ # Generate audio tokens
103
+ outputs = model.generate(
104
+ input_ids.to(model.device),
105
+ max_new_tokens=1200,
106
+ temperature=0.6,
107
+ top_p=0.8,
108
+ repetition_penalty=1.3,
109
+ eos_token_id=49158, # TTS stop token
110
+ do_sample=True
111
+ )
112
+
113
+ # Decode tokens (audio codes can be decoded using SNAC decoder)
114
+ generated_tokens = tokenizer.decode(outputs[0], skip_special_tokens=False)
115
+ ```
116
+
117
+ ### Using Continue-TTS Package (Recommended)
118
+
119
+ For easier usage with audio generation, use the Continue-TTS package:
120
+
121
+ ```bash
122
+ pip install continue-speech
123
+ ```
124
+
125
+ ```python
126
+ from continue_tts import Continue1Model
127
+ import wave
128
+
129
+ # Initialize model
130
+ model = Continue1Model(model_name="SVECTOR-CORPORATION/Continue-TTS", max_model_len=2048)
131
+
132
+ # Generate speech
133
+ text = "Welcome to Continue-TTS! This model is built on Continue-1-OSS."
134
+ audio_chunks = model.generate_speech(prompt=text, voice="nova")
135
+
136
+ # Save to file
137
+ with wave.open("output.wav", "wb") as wf:
138
+ wf.setnchannels(1)
139
+ wf.setsampwidth(2)
140
+ wf.setframerate(24000)
141
+ for chunk in audio_chunks:
142
+ wf.writeframes(chunk)
143
+ ```
144
+
145
+ ## Available Voices
146
+
147
+ Continue-TTS includes 8 professionally designed voices:
148
+
149
+ | Voice | Gender | Description |
150
+ |-------|--------|-------------|
151
+ | **nova** | Female | Conversational and natural, perfect for general use |
152
+ | **aurora** | Female | Warm and friendly, excellent for storytelling |
153
+ | **stellar** | Female | Energetic and bright, great for upbeat content |
154
+ | **atlas** | Male | Deep and authoritative, ideal for narration |
155
+ | **orion** | Male | Friendly and casual, perfect for conversational content |
156
+ | **luna** | Female | Soft and gentle, excellent for calm narration |
157
+ | **phoenix** | Male | Dynamic and expressive, great for engaging content |
158
+ | **ember** | Female | Warm and engaging, perfect for emotional expression |
159
+
160
+ ## Advanced Features
161
+
162
+ ### Emotion Tags
163
+
164
+ Add natural emotions to your speech:
165
+
166
+ ```python
167
+ text = "This is incredible! <laugh> I can't believe how natural it sounds. <gasp>"
168
+ ```
169
+
170
+ **Supported emotions:**
171
+ - `<laugh>` - Natural laughter
172
+ - `<chuckle>` - Light laugh
173
+ - `<sigh>` - Expressive sigh
174
+ - `<gasp>` - Surprised gasp
175
+ - `<cough>` - Cough sound
176
+ - `<yawn>` - Yawn
177
+ - `<groan>` - Groan
178
+ - `<sniffle>` - Sniffle
179
+
180
+ ### Custom Generation Parameters
181
+
182
+ Fine-tune generation quality:
183
+
184
+ ```python
185
+ audio = model.generate_speech(
186
+ prompt="Your text here",
187
+ voice="nova",
188
+ temperature=0.6, # Lower = more consistent, Higher = more varied
189
+ top_p=0.8, # Nucleus sampling threshold
190
+ max_tokens=1200, # Maximum audio length
191
+ repetition_penalty=1.3 # Prevent token repetition
192
+ )
193
+ ```
194
+
195
+ ## Use Cases
196
+
197
+ Continue-TTS excels at:
198
+
199
+ - **Audiobook Narration:** Natural storytelling with emotional expression
200
+ - **Virtual Assistants:** Conversational AI with personality
201
+ - **Accessibility:** Text-to-speech for visually impaired users
202
+ - **Content Creation:** Voiceovers for videos, podcasts, and presentations
203
+ - **Gaming:** Dynamic character voices and dialogue
204
+ - **Education:** Interactive learning materials with voice
205
+ - **Customer Service:** Natural-sounding automated responses
206
+
207
+ ## Performance
208
+
209
+ - **Quality:** State-of-the-art natural speech synthesis
210
+ - **Latency:** ~200ms for streaming generation (GPU)
211
+ - **Speed:** Real-time on GPU, slower on CPU
212
+ - **Memory:** ~7GB GPU RAM (FP16), ~14GB (FP32)
213
+ - **Sample Rate:** 24kHz (high quality audio)
214
+
215
+ ## Model Architecture
216
+
217
+ Continue-TTS is built on the Continue-1-OSS and combines:
218
+ - **Base Model:** Continue-1-OSS (LLaMA-based, 3.3B parameters)
219
+ - **Audio Codec:** SNAC multi-scale neural audio codec
220
+ - **Token Structure:** 7 audio tokens per frame (hierarchical encoding)
221
+ - **Training:** Fine-tuned on few hours of diverse speech data
222
+
223
+ The model generates audio tokens autoregressively, which are then decoded into waveforms using the SNAC neural codec.
224
+
225
+ ## Training
226
+
227
+ Continue-TTS was fine-tuned on the Continue-1-OSS using:
228
+ - High-quality speech datasets covering diverse accents and styles
229
+ - Multi-speaker recordings for voice diversity
230
+ - Emotional speech data for expressive synthesis
231
+ - Conversational and narrative content
232
+
233
+ Training utilized:
234
+ - Continue-1-OSS as base
235
+ - Custom tokenizer with 28,672 audio tokens
236
+ - Multi-stage training (pretraining + fine-tuning)
237
+ - Optimized for naturalness and emotion
238
+
239
+ ## Limitations
240
+
241
+ As with any TTS model, Continue-TTS has certain limitations:
242
+
243
+ - **Pronunciation:** May struggle with unusual names, technical terms, or non-English words
244
+ - **Consistency:** Long-form generation may have minor quality variations
245
+ - **Accents:** Primarily trained on specific accent patterns
246
+ - **Compute:** Requires GPU for real-time generation (CPU is slower)
247
+ - **Language:** Currently optimized for English
248
+
249
+ ## Ethical Considerations
250
+
251
+ SVECTOR is committed to responsible AI development. Users should:
252
+
253
+ - **Transparency:** Disclose when audio is AI-generated
254
+ - **Consent:** Do not clone voices without explicit permission
255
+ - **Verification:** Implement safeguards against deepfakes and misinformation
256
+ - **Attribution:** Credit the model when used in public projects
257
+ - **Responsible Use:** Avoid generating harmful, deceptive, or illegal content
258
+
259
+ ## License
260
+
261
+ This model is released under the **Apache License 2.0**. See the [LICENSE](https://huggingface.co/SVECTOR-CORPORATION/Continue-TTS/blob/main/LICENSE) file for complete details.
262
+
263
+ ## Acknowledgments
264
+
265
+ Continue-1-OSS builds upon advances in neural speech synthesis, large language models, and neural audio codecs. We thank the open-source community for their contributions to these foundational technologies.
266
+
267
+ ---
268
+
269
+ <p align="center">
270
+ <i>Developed by <a href="https://www.svector.co.in">SVECTOR</a></i>
271
+ </p>