Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,9 @@ datasets: proprietary
|
|
| 7 |
pipeline_tag: text-to-speech
|
| 8 |
---
|
| 9 |
|
| 10 |
-
#
|
| 11 |
|
| 12 |
-
**
|
| 13 |
-
|
| 14 |
-
State-of-the-art from the open source community. Production-ready.
|
| 15 |
|
| 16 |
**What it does:**
|
| 17 |
- Voice design through natural language descriptions
|
|
@@ -98,7 +96,7 @@ After all we went through to pull him out of that mess <cry> I can't believe he
|
|
| 98 |
|
| 99 |
---
|
| 100 |
|
| 101 |
-
## Why
|
| 102 |
|
| 103 |
### 1. Natural Language Voice Control
|
| 104 |
Describe voices like you would brief a voice actor:
|
|
@@ -133,7 +131,7 @@ Real-time voice synthesis with SNAC neural codec (~0.98 kbps). Perfect for:
|
|
| 133 |
|
| 134 |
---
|
| 135 |
|
| 136 |
-
## How to Use
|
| 137 |
|
| 138 |
### Quick Start: Generate Voice with Emotions
|
| 139 |
|
|
@@ -145,18 +143,18 @@ import soundfile as sf
|
|
| 145 |
|
| 146 |
# Load the best open source voice AI model
|
| 147 |
model = AutoModelForCausalLM.from_pretrained(
|
| 148 |
-
"maya-research/
|
| 149 |
torch_dtype=torch.bfloat16,
|
| 150 |
device_map="auto"
|
| 151 |
)
|
| 152 |
-
tokenizer = AutoTokenizer.from_pretrained("maya-research/
|
| 153 |
|
| 154 |
# Load SNAC audio decoder (24kHz)
|
| 155 |
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to("cuda")
|
| 156 |
|
| 157 |
# Design your voice with natural language
|
| 158 |
description = "Realistic male voice in the 30s age with american accent. Normal pitch, warm timbre, conversational pacing."
|
| 159 |
-
text = "Hello! This is
|
| 160 |
|
| 161 |
# Create prompt with voice design
|
| 162 |
prompt = f'<description="{description}"> {text}'
|
|
@@ -192,14 +190,14 @@ with torch.inference_mode():
|
|
| 192 |
|
| 193 |
# Save your emotional voice output
|
| 194 |
sf.write("output.wav", audio, 24000)
|
| 195 |
-
print("
|
| 196 |
```
|
| 197 |
|
| 198 |
### Advanced: Production Streaming with vLLM
|
| 199 |
|
| 200 |
For production deployments with real-time streaming, use our vLLM script:
|
| 201 |
|
| 202 |
-
**Download:** [vllm_streaming_inference.py](https://huggingface.co/maya-research/
|
| 203 |
|
| 204 |
**Key Features:**
|
| 205 |
- Automatic Prefix Caching (APC) for repeated voice descriptions
|
|
@@ -209,7 +207,7 @@ For production deployments with real-time streaming, use our vLLM script:
|
|
| 209 |
|
| 210 |
---
|
| 211 |
|
| 212 |
-
## Technical Excellence: What Makes
|
| 213 |
|
| 214 |
### Architecture: 3B-Parameter Llama Backbone for Voice
|
| 215 |
|
|
@@ -278,7 +276,7 @@ Build screen readers and assistive technologies with natural, engaging voices.
|
|
| 278 |
|
| 279 |
## Frequently Asked Questions
|
| 280 |
|
| 281 |
-
**Q: What makes
|
| 282 |
A: We're the only open source model offering 20+ emotions, zero-shot voice design, production-ready streaming, and 3B parameters—all in one package.
|
| 283 |
|
| 284 |
**Q: Can I use this commercially?**
|
|
@@ -303,7 +301,7 @@ A: Yes. SNAC codec enables sub-100ms latency with vLLM deployment.
|
|
| 303 |
|
| 304 |
## Comparison
|
| 305 |
|
| 306 |
-
| Feature |
|
| 307 |
|---------|-------------|------------|------------|-----------|
|
| 308 |
| **Open Source** | Yes | No | No | Yes |
|
| 309 |
| **Emotions** | 20+ | Limited | No | No |
|
|
@@ -337,11 +335,11 @@ A: Yes. SNAC codec enables sub-100ms latency with vLLM deployment.
|
|
| 337 |
```bash
|
| 338 |
# Clone the model repository
|
| 339 |
git lfs install
|
| 340 |
-
git clone https://huggingface.co/maya-research/
|
| 341 |
|
| 342 |
# Or load directly in Python
|
| 343 |
from transformers import AutoModelForCausalLM
|
| 344 |
-
model = AutoModelForCausalLM.from_pretrained("maya-research/
|
| 345 |
```
|
| 346 |
|
| 347 |
### Requirements
|
|
@@ -350,23 +348,23 @@ pip install torch transformers snac soundfile
|
|
| 350 |
```
|
| 351 |
|
| 352 |
### Additional Resources
|
| 353 |
-
- **Full emotion list:** [emotions.txt](https://huggingface.co/maya-research/
|
| 354 |
-
- **Prompt examples:** [prompt.txt](https://huggingface.co/maya-research/
|
| 355 |
-
- **Streaming script:** [vllm_streaming_inference.py](https://huggingface.co/maya-research/
|
| 356 |
|
| 357 |
---
|
| 358 |
|
| 359 |
## Citations & References
|
| 360 |
|
| 361 |
-
If you use
|
| 362 |
|
| 363 |
```bibtex
|
| 364 |
@misc{maya1voice2025,
|
| 365 |
-
title={
|
| 366 |
author={Maya Research},
|
| 367 |
year={2025},
|
| 368 |
publisher={Hugging Face},
|
| 369 |
-
howpublished={\url{https://huggingface.co/maya-research/
|
| 370 |
}
|
| 371 |
```
|
| 372 |
|
|
|
|
| 7 |
pipeline_tag: text-to-speech
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Maya1
|
| 11 |
|
| 12 |
+
**Maya1** is a speech model built for expressive voice generation with rich human emotion and precise voice design.
|
|
|
|
|
|
|
| 13 |
|
| 14 |
**What it does:**
|
| 15 |
- Voice design through natural language descriptions
|
|
|
|
| 96 |
|
| 97 |
---
|
| 98 |
|
| 99 |
+
## Why Maya1 is Different: Voice Design Features That Matter
|
| 100 |
|
| 101 |
### 1. Natural Language Voice Control
|
| 102 |
Describe voices like you would brief a voice actor:
|
|
|
|
| 131 |
|
| 132 |
---
|
| 133 |
|
| 134 |
+
## How to Use maya1: Download and Run in Minutes
|
| 135 |
|
| 136 |
### Quick Start: Generate Voice with Emotions
|
| 137 |
|
|
|
|
| 143 |
|
| 144 |
# Load the best open source voice AI model
|
| 145 |
model = AutoModelForCausalLM.from_pretrained(
|
| 146 |
+
"maya-research/maya1",
|
| 147 |
torch_dtype=torch.bfloat16,
|
| 148 |
device_map="auto"
|
| 149 |
)
|
| 150 |
+
tokenizer = AutoTokenizer.from_pretrained("maya-research/maya1")
|
| 151 |
|
| 152 |
# Load SNAC audio decoder (24kHz)
|
| 153 |
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to("cuda")
|
| 154 |
|
| 155 |
# Design your voice with natural language
|
| 156 |
description = "Realistic male voice in the 30s age with american accent. Normal pitch, warm timbre, conversational pacing."
|
| 157 |
+
text = "Hello! This is Maya1 <laugh> the best open source voice AI model with emotions."
|
| 158 |
|
| 159 |
# Create prompt with voice design
|
| 160 |
prompt = f'<description="{description}"> {text}'
|
|
|
|
| 190 |
|
| 191 |
# Save your emotional voice output
|
| 192 |
sf.write("output.wav", audio, 24000)
|
| 193 |
+
print("Voice generated successfully! Play output.wav")
|
| 194 |
```
|
| 195 |
|
| 196 |
### Advanced: Production Streaming with vLLM
|
| 197 |
|
| 198 |
For production deployments with real-time streaming, use our vLLM script:
|
| 199 |
|
| 200 |
+
**Download:** [vllm_streaming_inference.py](https://huggingface.co/maya-research/maya1/blob/main/vllm_streaming_inference.py)
|
| 201 |
|
| 202 |
**Key Features:**
|
| 203 |
- Automatic Prefix Caching (APC) for repeated voice descriptions
|
|
|
|
| 207 |
|
| 208 |
---
|
| 209 |
|
| 210 |
+
## Technical Excellence: What Makes Maya1 the Best
|
| 211 |
|
| 212 |
### Architecture: 3B-Parameter Llama Backbone for Voice
|
| 213 |
|
|
|
|
| 276 |
|
| 277 |
## Frequently Asked Questions
|
| 278 |
|
| 279 |
+
**Q: What makes Maya1 different?**
|
| 280 |
A: We're the only open source model offering 20+ emotions, zero-shot voice design, production-ready streaming, and 3B parameters—all in one package.
|
| 281 |
|
| 282 |
**Q: Can I use this commercially?**
|
|
|
|
| 301 |
|
| 302 |
## Comparison
|
| 303 |
|
| 304 |
+
| Feature | Maya1 | ElevenLabs | OpenAI TTS | Coqui TTS |
|
| 305 |
|---------|-------------|------------|------------|-----------|
|
| 306 |
| **Open Source** | Yes | No | No | Yes |
|
| 307 |
| **Emotions** | 20+ | Limited | No | No |
|
|
|
|
| 335 |
```bash
|
| 336 |
# Clone the model repository
|
| 337 |
git lfs install
|
| 338 |
+
git clone https://huggingface.co/maya-research/maya1
|
| 339 |
|
| 340 |
# Or load directly in Python
|
| 341 |
from transformers import AutoModelForCausalLM
|
| 342 |
+
model = AutoModelForCausalLM.from_pretrained("maya-research/maya1")
|
| 343 |
```
|
| 344 |
|
| 345 |
### Requirements
|
|
|
|
| 348 |
```
|
| 349 |
|
| 350 |
### Additional Resources
|
| 351 |
+
- **Full emotion list:** [emotions.txt](https://huggingface.co/maya-research/maya1/blob/main/emotions.txt)
|
| 352 |
+
- **Prompt examples:** [prompt.txt](https://huggingface.co/maya-research/maya1/blob/main/prompt.txt)
|
| 353 |
+
- **Streaming script:** [vllm_streaming_inference.py](https://huggingface.co/maya-research/maya1/blob/main/vllm_streaming_inference.py)
|
| 354 |
|
| 355 |
---
|
| 356 |
|
| 357 |
## Citations & References
|
| 358 |
|
| 359 |
+
If you use Maya1 in your research or product, please cite:
|
| 360 |
|
| 361 |
```bibtex
|
| 362 |
@misc{maya1voice2025,
|
| 363 |
+
title={Maya1: Open Source Voice AI with Emotional Intelligence},
|
| 364 |
author={Maya Research},
|
| 365 |
year={2025},
|
| 366 |
publisher={Hugging Face},
|
| 367 |
+
howpublished={\url{https://huggingface.co/maya-research/maya1}},
|
| 368 |
}
|
| 369 |
```
|
| 370 |
|