Instructions to use RobiLabs/Yana with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RobiLabs/Yana with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="RobiLabs/Yana")# Load model directly from transformers import AutoFeatureExtractor, AutoModelForTextToWaveform extractor = AutoFeatureExtractor.from_pretrained("RobiLabs/Yana") model = AutoModelForTextToWaveform.from_pretrained("RobiLabs/Yana") - Notebooks
- Google Colab
- Kaggle
Yana - Voice of Robi Labs' Echo Model Family
A state-of-the-art Text-to-Speech (TTS) model designed for high-quality speech synthesis with multi-speaker support and efficient inference. Yana represents the voice synthesis capabilities of Robi Labs' innovative Echo Model Family.
Model Description
Yana is a powerful TTS model that generates natural-sounding speech from text input. Built with advanced neural architecture as part of Robi Labs' Echo Model Family, it delivers high-quality audio output with support for multiple speakers and customizable voice characteristics.
Model Specifications
- Model Size: 1.6B parameters
- Type: Conditional Generation Model
- Task: Text-to-Speech synthesis
- Framework: PyTorch
- Family: Robi Labs Echo Model Family
Usage
from transformers import AutoModel, AutoProcessor
import torch
import soundfile as sf
# Load the Yana model
model = AutoModel.from_pretrained("RobiLabs/Yana")
processor = AutoProcessor.from_pretrained("RobiLabs/Yana")
# Generate speech
text = "Hello, this is Yana from Robi Labs' Echo Model Family."
speaker_id = "0"
conversation = [{
"role": speaker_id,
"content": [{"type": "text", "text": text}]
}]
# Process and generate
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True
)
# Generate audio
with torch.no_grad():
audio_values = model.generate(
**inputs,
max_new_tokens=125, # ~10 seconds of audio
output_audio=True,
do_sample=True,
temperature=0.9
)
# Save the generated speech
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("yana_output.wav", audio, 24000)
Audio Quality
- Sample Rate: 24,000 Hz
- Bit Depth: 16-bit PCM
- Channels: Mono
- Format: WAV
- Duration: Configurable (up to 10+ seconds per generation)
System Requirements
- RAM: 8GB (16GB recommended)
- Storage: 5GB free space
- Python: 3.8+
- OS: macOS, Linux, Windows
License
This model is licensed under the MIT License. See the LICENSE file for more details.
Contact
- Email: echo-yana@robiai.com
- Website: https://labs.robiai.com
- Documentation: https://docs.robiai.com
Yana TTS - The voice of Robi Labs' Echo Model Family, bringing text to life with natural, high-quality speech synthesis.
- Downloads last month
- -