File size: 10,269 Bytes

---
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- voice
- speech
- text-to-speech
- audio
---

<p align="center">
  <img alt="Continue-TTS" src="https://github.com/SVECTOR-CORPORATION/Continue-TTS/blob/main/continue-tts-image-banner.jpg?raw=true" width="800">
</p>

# Continue-TTS

### Text-to-Speech Model Based on Continue-1-OSS

<div align="left" style="line-height: 1;">
  <a href="https://spec-chat.tech" target="_blank" style="margin: 2px;">
    <img alt="SVECTOR" src="https://img.shields.io/badge/💬%20Spec%20Chat-Spec%20Chat-blue?style=plastic" style="display: inline-block; vertical-align: middle;"/>
  </a>
  
  <a href="https://huggingface.co/SVECTOR-CORPORATION" target="_blank" style="margin: 2px;">
    <img alt="SVECTOR" src="https://img.shields.io/badge/🤗%20Hugging%20Face-SVECTOR-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  
  <a href="https://huggingface.co/SVECTOR-CORPORATION/Continue-TTS/blob/main/LICENSE" style="margin: 2px;">
    <img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?color=1e88e5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
  
  <a href="https://github.com/SVECTOR-CORPORATION/Continue-TTS" target="_blank" style="margin: 2px;">
    <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Continue--TTS-181717?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
  </a>
</div>

## Introduction

We are thrilled to introduce **Continue-TTS**, a fine-tuned text-to-speech model based on the **Continue-1-OSS** architecture, developed by SVECTOR. This model is specifically trained for high-quality speech synthesis and delivers exceptional voice generation capabilities.

**Continue-TTS** is engineered to provide:

- **Natural Speech:** Human-like intonation, emotion, and rhythm that rivals commercial solutions
- **8 Unique Voices:** Diverse voice options with distinct personalities and characteristics
- **Real-time Generation:** Low-latency streaming for interactive applications (~200ms)
- **Emotional Expression:** Built-in support for laughter, sighs, gasps, and other natural emotions
- **Open Source:** Fully accessible under Apache 2.0 license for research and commercial use

This model is based on the **Continue-1-OSS** architecture and combines the power of large language models with neural audio codecs to generate exceptionally natural speech from text.

<audio controls src="https://ik.imagekit.io/svector/efd3e807-49a4-463b-af6d-4069acf7ff3a.wav"></audio>

```
The sun was setting behind the mountains, painting the sky with soft shades of orange and violet.
She stood there quietly, breathing in the moment. <sigh>
Sometimes, the smallest moments are the ones that change everything.
```

<audio controls src="https://ik.imagekit.io/svector/c99ff697-291a-4fb7-940a-56b523b9f286.wav?updatedAt=1762362454065"></audio>

```
<sigh>  
Not every journey is loud.  
Some begin quietly… inside.  
But once they begin, they never stop.  
We continue.
```

### Model Specifications

- **Base Architecture:** Continue-1-OSS
- **Type:** Text-to-Speech (TTS) Model
- **Parameters:** 3 Billion
- **Audio Codec:** SNAC (24kHz)
- **Context Length:** 131,072 tokens
- **Vocabulary:** 156,940 tokens (including 28,672 audio tokens)
- **License:** Apache 2.0
- **Voices:** 8 (Nova, Aurora, Stellar, Atlas, Orion, Luna, Phoenix, Ember)

## Requirements

To use Continue-TTS, install the required dependencies:

```bash
pip install transformers torch
pip install snac  # Audio codec
pip install vllm==0.7.3  # For fast inference (optional but recommended)
```

## Quickstart

### Basic Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "SVECTOR-CORPORATION/Continue-TTS"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare text with voice
text = "Hello! I am Continue-TTS, a text-to-speech model based on Continue-1-OSS."
voice = "nova"  # Choose: nova, aurora, stellar, atlas, orion, luna, phoenix, ember

# Format prompt (TTS format)
adapted_prompt = f"{voice}: {text}"
prompt_tokens = tokenizer(adapted_prompt, return_tensors="pt")
start_token = torch.tensor([[128259]], dtype=torch.int64)
end_tokens = torch.tensor([[128009, 128260, 128261, 128257]], dtype=torch.int64)
input_ids = torch.cat([start_token, prompt_tokens.input_ids, end_tokens], dim=1)

# Generate audio tokens
outputs = model.generate(
    input_ids.to(model.device),
    max_new_tokens=1200,
    temperature=0.6,
    top_p=0.8,
    repetition_penalty=1.3,
    eos_token_id=49158,  # TTS stop token
    do_sample=True
)

# Decode tokens (audio codes can be decoded using SNAC decoder)
generated_tokens = tokenizer.decode(outputs[0], skip_special_tokens=False)
```

### Using Continue-TTS Package (Recommended)

For easier usage with audio generation, use the Continue-TTS package:

```bash
pip install continue-speech
```

```python
from continue_tts import Continue1Model
import wave

# Initialize model
model = Continue1Model(model_name="SVECTOR-CORPORATION/Continue-TTS", max_model_len=2048)

# Generate speech
text = "Welcome to Continue-TTS! This model is built on Continue-1-OSS."
audio_chunks = model.generate_speech(prompt=text, voice="nova")

# Save to file
with wave.open("output.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(24000)
    for chunk in audio_chunks:
        wf.writeframes(chunk)
```

## Available Voices

Continue-TTS includes 8 professionally designed voices:

| Voice | Gender | Description |
|-------|--------|-------------|
| **nova** | Female | Conversational and natural, perfect for general use |
| **aurora** | Female | Warm and friendly, excellent for storytelling |
| **stellar** | Female | Energetic and bright, great for upbeat content |
| **atlas** | Male | Deep and authoritative, ideal for narration |
| **orion** | Male | Friendly and casual, perfect for conversational content |
| **luna** | Female | Soft and gentle, excellent for calm narration |
| **phoenix** | Male | Dynamic and expressive, great for engaging content |
| **ember** | Female | Warm and engaging, perfect for emotional expression |

## Advanced Features

### Emotion Tags

Add natural emotions to your speech:

```python
text = "This is incredible! <laugh> I can't believe how natural it sounds. <gasp>"
```

**Supported emotions:**
- `<laugh>` - Natural laughter
- `<chuckle>` - Light laugh
- `<sigh>` - Expressive sigh
- `<gasp>` - Surprised gasp
- `<cough>` - Cough sound
- `<yawn>` - Yawn
- `<groan>` - Groan
- `<sniffle>` - Sniffle

### Custom Generation Parameters

Fine-tune generation quality:

```python
audio = model.generate_speech(
    prompt="Your text here",
    voice="nova",
    temperature=0.6,        # Lower = more consistent, Higher = more varied
    top_p=0.8,             # Nucleus sampling threshold
    max_tokens=1200,       # Maximum audio length
    repetition_penalty=1.3 # Prevent token repetition
)
```

## Use Cases

Continue-TTS excels at:

- **Audiobook Narration:** Natural storytelling with emotional expression
- **Virtual Assistants:** Conversational AI with personality
- **Accessibility:** Text-to-speech for visually impaired users
- **Content Creation:** Voiceovers for videos, podcasts, and presentations
- **Gaming:** Dynamic character voices and dialogue
- **Education:** Interactive learning materials with voice
- **Customer Service:** Natural-sounding automated responses

## Performance

- **Quality:** State-of-the-art natural speech synthesis
- **Latency:** ~200ms for streaming generation (GPU)
- **Speed:** Real-time on GPU, slower on CPU
- **Memory:** ~7GB GPU RAM (FP16), ~14GB (FP32)
- **Sample Rate:** 24kHz (high quality audio)

## Model Architecture

Continue-TTS is built on the Continue-1-OSS and combines:
- **Base Model:** Continue-1-OSS (LLaMA-based, 3.3B parameters)
- **Audio Codec:** SNAC multi-scale neural audio codec
- **Token Structure:** 7 audio tokens per frame (hierarchical encoding)
- **Training:** Fine-tuned on few hours of diverse speech data

The model generates audio tokens autoregressively, which are then decoded into waveforms using the SNAC neural codec.

## Training

Continue-TTS was fine-tuned on the Continue-1-OSS using:
- High-quality speech datasets covering diverse accents and styles
- Multi-speaker recordings for voice diversity
- Emotional speech data for expressive synthesis
- Conversational and narrative content

Training utilized:
- Continue-1-OSS as base
- Custom tokenizer with 28,672 audio tokens
- Multi-stage training (pretraining + fine-tuning)
- Optimized for naturalness and emotion

## Limitations

As with any TTS model, Continue-TTS has certain limitations:

- **Pronunciation:** May struggle with unusual names, technical terms, or non-English words
- **Consistency:** Long-form generation may have minor quality variations
- **Accents:** Primarily trained on specific accent patterns
- **Compute:** Requires GPU for real-time generation (CPU is slower)
- **Language:** Currently optimized for English

## Ethical Considerations

SVECTOR is committed to responsible AI development. Users should:

- **Transparency:** Disclose when audio is AI-generated
- **Consent:** Do not clone voices without explicit permission
- **Verification:** Implement safeguards against deepfakes and misinformation
- **Attribution:** Credit the model when used in public projects
- **Responsible Use:** Avoid generating harmful, deceptive, or illegal content

## License

This model is released under the **Apache License 2.0**. See the [LICENSE](https://huggingface.co/SVECTOR-CORPORATION/Continue-TTS/blob/main/LICENSE) file for complete details.

## Acknowledgments

Continue-1-OSS builds upon advances in neural speech synthesis, large language models, and neural audio codecs. We thank the open-source community for their contributions to these foundational technologies.

---

<p align="center">
    <i>Developed by <a href="https://www.svector.co.in">SVECTOR</a></i>
</p>