|
|
--- |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- audio |
|
|
- speech-synthesis |
|
|
- robi-labs |
|
|
- echo-family |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
# Yana - Voice of Robi Labs' Echo Model Family |
|
|
|
|
|
A state-of-the-art Text-to-Speech (TTS) model designed for high-quality speech synthesis with multi-speaker support and efficient inference. Yana represents the voice synthesis capabilities of Robi Labs' innovative Echo Model Family. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Yana is a powerful TTS model that generates natural-sounding speech from text input. Built with advanced neural architecture as part of Robi Labs' Echo Model Family, it delivers high-quality audio output with support for multiple speakers and customizable voice characteristics. |
|
|
|
|
|
## Model Specifications |
|
|
|
|
|
- **Model Size**: 1.6B parameters |
|
|
- **Type**: Conditional Generation Model |
|
|
- **Task**: Text-to-Speech synthesis |
|
|
- **Framework**: PyTorch |
|
|
- **Family**: Robi Labs Echo Model Family |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoProcessor |
|
|
import torch |
|
|
import soundfile as sf |
|
|
|
|
|
# Load the Yana model |
|
|
model = AutoModel.from_pretrained("RobiLabs/Yana") |
|
|
processor = AutoProcessor.from_pretrained("RobiLabs/Yana") |
|
|
|
|
|
# Generate speech |
|
|
text = "Hello, this is Yana from Robi Labs' Echo Model Family." |
|
|
speaker_id = "0" |
|
|
|
|
|
conversation = [{ |
|
|
"role": speaker_id, |
|
|
"content": [{"type": "text", "text": text}] |
|
|
}] |
|
|
|
|
|
# Process and generate |
|
|
inputs = processor.apply_chat_template( |
|
|
conversation, |
|
|
tokenize=True, |
|
|
return_dict=True |
|
|
) |
|
|
|
|
|
# Generate audio |
|
|
with torch.no_grad(): |
|
|
audio_values = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=125, # ~10 seconds of audio |
|
|
output_audio=True, |
|
|
do_sample=True, |
|
|
temperature=0.9 |
|
|
) |
|
|
|
|
|
# Save the generated speech |
|
|
audio = audio_values[0].to(torch.float32).cpu().numpy() |
|
|
sf.write("yana_output.wav", audio, 24000) |
|
|
``` |
|
|
|
|
|
## Audio Quality |
|
|
|
|
|
- **Sample Rate**: 24,000 Hz |
|
|
- **Bit Depth**: 16-bit PCM |
|
|
- **Channels**: Mono |
|
|
- **Format**: WAV |
|
|
- **Duration**: Configurable (up to 10+ seconds per generation) |
|
|
|
|
|
## System Requirements |
|
|
|
|
|
- **RAM**: 8GB (16GB recommended) |
|
|
- **Storage**: 5GB free space |
|
|
- **Python**: 3.8+ |
|
|
- **OS**: macOS, Linux, Windows |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the MIT License. See the LICENSE file for more details. |
|
|
|
|
|
## Contact |
|
|
|
|
|
- **Email**: echo-yana@robiai.com |
|
|
- **Website**: https://labs.robiai.com |
|
|
- **Documentation**: https://docs.robiai.com |
|
|
|
|
|
--- |
|
|
|
|
|
**Yana TTS** - The voice of Robi Labs' Echo Model Family, bringing text to life with natural, high-quality speech synthesis. |
|
|
|