Yana / README.md
unfavalen's picture
Upload Yana TTS model - Voice of Robi Labs' Echo Model Family
3cc8396 verified
metadata
license: mit
library_name: transformers
tags:
  - text-to-speech
  - tts
  - audio
  - speech-synthesis
  - robi-labs
  - echo-family
pipeline_tag: text-to-speech

Yana - Voice of Robi Labs' Echo Model Family

A state-of-the-art Text-to-Speech (TTS) model designed for high-quality speech synthesis with multi-speaker support and efficient inference. Yana represents the voice synthesis capabilities of Robi Labs' innovative Echo Model Family.

Model Description

Yana is a powerful TTS model that generates natural-sounding speech from text input. Built with advanced neural architecture as part of Robi Labs' Echo Model Family, it delivers high-quality audio output with support for multiple speakers and customizable voice characteristics.

Model Specifications

  • Model Size: 1.6B parameters
  • Type: Conditional Generation Model
  • Task: Text-to-Speech synthesis
  • Framework: PyTorch
  • Family: Robi Labs Echo Model Family

Usage

from transformers import AutoModel, AutoProcessor
import torch
import soundfile as sf

# Load the Yana model
model = AutoModel.from_pretrained("RobiLabs/Yana")
processor = AutoProcessor.from_pretrained("RobiLabs/Yana")

# Generate speech
text = "Hello, this is Yana from Robi Labs' Echo Model Family."
speaker_id = "0"

conversation = [{
    "role": speaker_id,
    "content": [{"type": "text", "text": text}]
}]

# Process and generate
inputs = processor.apply_chat_template(
    conversation,
    tokenize=True,
    return_dict=True
)

# Generate audio
with torch.no_grad():
    audio_values = model.generate(
        **inputs,
        max_new_tokens=125,  # ~10 seconds of audio
        output_audio=True,
        do_sample=True,
        temperature=0.9
    )

# Save the generated speech
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("yana_output.wav", audio, 24000)

Audio Quality

  • Sample Rate: 24,000 Hz
  • Bit Depth: 16-bit PCM
  • Channels: Mono
  • Format: WAV
  • Duration: Configurable (up to 10+ seconds per generation)

System Requirements

  • RAM: 8GB (16GB recommended)
  • Storage: 5GB free space
  • Python: 3.8+
  • OS: macOS, Linux, Windows

License

This model is licensed under the MIT License. See the LICENSE file for more details.

Contact


Yana TTS - The voice of Robi Labs' Echo Model Family, bringing text to life with natural, high-quality speech synthesis.