Kazakh Spark-TTS Model

Fine-tuned Spark-TTS model for high-quality Kazakh language text-to-speech synthesis with voice cloning capabilities.

Model Description

This model is specifically optimized for Kazakh language TTS, supporting both Cyrillic and Tote Zhazu (Arabic) scripts. It provides natural-sounding speech synthesis and voice cloning with just 3-10 seconds of reference audio.

Key Features

🎯 High-Quality Kazakh TTS: Natural and fluent speech synthesis
🎤 Voice Cloning: Clone any voice with 3-10 seconds of reference audio
📝 Dual Script Support: Cyrillic and Tote Zhazu scripts
⚡ Fast Inference: Optimized for real-time generation

Model Specifications

Attribute	Value
Language	Kazakh (kk)
Scripts	Cyrillic, Tote Zhazu
Sampling Rate	16kHz
Base Model	Spark-TTS
License	cc-by-nc-sa-4.0

🚀 Full Application

For complete usage with REST API, web interface, and automatic script conversion:

👉 GitHub Repository: Spark-TTS-Kazakh

The GitHub repository includes:

✅ FastAPI REST API server
✅ Web-based user interface
✅ Complete documentation and examples
✅ Easy installation and deployment

🛠️ Model Components

This repository provides all essential components required for inference:

BiCodec: Audio encoding/decoding module.
LLM: The core inference engine.
wav2vec2-large-xlsr-53: Feature extraction module for voice cloning.

📝 License & Attribution

This project is built upon the Spark-TTS framework and is distributed under the CC BY-NC-SA 4.0 license.

Attribution:

The base model and architecture were developed by the SparkAudio team.
Kazakh-specific fine-tuning and tool development were performed by allssai.

Disclaimer: This model is intended for research and educational purposes only. Commercial use of this model or its derivatives is strictly prohibited under the NC (Non-Commercial) clause.

Downloads last month: 14

Model tree for ErnarBahat/Spark-TTS-Kazakh

Base model

SparkAudio/Spark-TTS-0.5B

Finetuned

(25)

this model

ErnarBahat
/

Spark-TTS-Kazakh