Yana / README.md

Upload Yana TTS model - Voice of Robi Labs' Echo Model Family

3cc8396 verified 4 months ago

2.54 kB

	---
	license: mit
	library_name: transformers
	tags:
	- text-to-speech
	- tts
	- audio
	- speech-synthesis
	- robi-labs
	- echo-family
	pipeline_tag: text-to-speech
	---

	# Yana - Voice of Robi Labs' Echo Model Family

	A state-of-the-art Text-to-Speech (TTS) model designed for high-quality speech synthesis with multi-speaker support and efficient inference. Yana represents the voice synthesis capabilities of Robi Labs' innovative Echo Model Family.

	## Model Description

	Yana is a powerful TTS model that generates natural-sounding speech from text input. Built with advanced neural architecture as part of Robi Labs' Echo Model Family, it delivers high-quality audio output with support for multiple speakers and customizable voice characteristics.

	## Model Specifications

	- Model Size: 1.6B parameters
	- Type: Conditional Generation Model
	- Task: Text-to-Speech synthesis
	- Framework: PyTorch
	- Family: Robi Labs Echo Model Family

	## Usage

	```python
	from transformers import AutoModel, AutoProcessor
	import torch
	import soundfile as sf

	# Load the Yana model
	model = AutoModel.from_pretrained("RobiLabs/Yana")
	processor = AutoProcessor.from_pretrained("RobiLabs/Yana")

	# Generate speech
	text = "Hello, this is Yana from Robi Labs' Echo Model Family."
	speaker_id = "0"

	conversation = [{
	"role": speaker_id,
	"content": [{"type": "text", "text": text}]
	}]

	# Process and generate
	inputs = processor.apply_chat_template(
	conversation,
	tokenize=True,
	return_dict=True
	)

	# Generate audio
	with torch.no_grad():
	audio_values = model.generate(
	**inputs,
	max_new_tokens=125, # ~10 seconds of audio
	output_audio=True,
	do_sample=True,
	temperature=0.9
	)

	# Save the generated speech
	audio = audio_values[0].to(torch.float32).cpu().numpy()
	sf.write("yana_output.wav", audio, 24000)
	```

	## Audio Quality

	- Sample Rate: 24,000 Hz
	- Bit Depth: 16-bit PCM
	- Channels: Mono
	- Format: WAV
	- Duration: Configurable (up to 10+ seconds per generation)

	## System Requirements

	- RAM: 8GB (16GB recommended)
	- Storage: 5GB free space
	- Python: 3.8+
	- OS: macOS, Linux, Windows

	## License

	This model is licensed under the MIT License. See the LICENSE file for more details.

	## Contact

	- Email: echo-yana@robiai.com
	- Website: https://labs.robiai.com
	- Documentation: https://docs.robiai.com

	---

	Yana TTS - The voice of Robi Labs' Echo Model Family, bringing text to life with natural, high-quality speech synthesis.