Kazakh Spark-TTS Model
Fine-tuned Spark-TTS model for high-quality Kazakh language text-to-speech synthesis with voice cloning capabilities.
Model Description
This model is specifically optimized for Kazakh language TTS, supporting both Cyrillic and Tote Zhazu (Arabic) scripts. It provides natural-sounding speech synthesis and voice cloning with just 3-10 seconds of reference audio.
Key Features
- π― High-Quality Kazakh TTS: Natural and fluent speech synthesis
- π€ Voice Cloning: Clone any voice with 3-10 seconds of reference audio
- π Dual Script Support: Cyrillic and Tote Zhazu scripts
- β‘ Fast Inference: Optimized for real-time generation
Model Specifications
| Attribute | Value |
|---|---|
| Language | Kazakh (kk) |
| Scripts | Cyrillic, Tote Zhazu |
| Sampling Rate | 16kHz |
| Base Model | Spark-TTS |
| License | cc-by-nc-sa-4.0 |
π Full Application
For complete usage with REST API, web interface, and automatic script conversion:
π GitHub Repository: Spark-TTS-Kazakh
The GitHub repository includes:
- β FastAPI REST API server
- β Web-based user interface
- β Complete documentation and examples
- β Easy installation and deployment
π οΈ Model Components
This repository provides all essential components required for inference:
- BiCodec: Audio encoding/decoding module.
- LLM: The core inference engine.
- wav2vec2-large-xlsr-53: Feature extraction module for voice cloning.
π License & Attribution
This project is built upon the Spark-TTS framework and is distributed under the CC BY-NC-SA 4.0 license.
Attribution:
- The base model and architecture were developed by the SparkAudio team.
- Kazakh-specific fine-tuning and tool development were performed by allssai.
Disclaimer: This model is intended for research and educational purposes only. Commercial use of this model or its derivatives is strictly prohibited under the NC (Non-Commercial) clause.
- Downloads last month
- 54
Model tree for ErnarBahat/Spark-TTS-Kazakh
Base model
SparkAudio/Spark-TTS-0.5B