| | --- |
| | license: apache-2.0 |
| | tags: |
| | - text-to-speech |
| | - audio |
| | - cpu-optimized |
| | language: |
| | - zh |
| | - en |
| | - de |
| | - es |
| | - fr |
| | - ja |
| | - it |
| | - he |
| | - ko |
| | - ru |
| | - fa |
| | - ar |
| | - pl |
| | - pt |
| | - cs |
| | - da |
| | - sv |
| | - hu |
| | - el |
| | - tr |
| | library_name: transformers |
| | --- |
| | |
| | # MOSS-TTS (CPU Optimized) |
| |
|
| | > **π CPU Optimized Version**: This repository contains a specialized build of **MOSS-TTS** that has been specifically optimized for high-performance execution on CPU-only environments. |
| |
|
| | **This optimization and packaging process was performed autonomously by [NEO](https://heyneo.so/), an autonomous ML engineering agent.** |
| |
|
| | ## Overview |
| | This version of MOSS-TTS uses runtime dynamic quantization and specific architectural configurations to deliver low-latency speech synthesis without requiring a GPU. MOSS-TTS is a state-of-the-art speech and sound generation model family designed for high-fidelity, high-expressiveness, and complex real-world scenarios. |
| |
|
| | ### Key Optimizations by NEO: |
| | - **Dynamic INT8 Quantization**: Reduces memory footprint and accelerates inference on modern CPUs. |
| | - **Thread Scaling**: Configured for optimal multi-threaded performance. |
| | - **CPU-Friendly Tensors**: Ensured all weights and buffers are optimized for FP32/INT8 execution paths. |
| | - **Autonomous Validation**: Verified functionality in resource-constrained environments. |
| |
|
| | --- |
| |
|
| | ## π Usage |
| |
|
| | ### Installation |
| | ```bash |
| | pip install transformers torch torchaudio |
| | ``` |
| |
|
| | ### Quick Start |
| | ```python |
| | from transformers import AutoModel, AutoProcessor |
| | import torch |
| | |
| | # Load the CPU-optimized model |
| | model_name = "daksh-neo/MOSS-TTS" |
| | processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True) |
| | model = AutoModel.from_pretrained( |
| | model_name, |
| | trust_remote_code=True, |
| | torch_dtype=torch.float32 |
| | ) |
| | |
| | # Inference (Example) |
| | text = "This is a CPU-optimized speech synthesis by NEO." |
| | inputs = processor(text=[text], mode="generation") |
| | outputs = model.generate(**inputs) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π Capabilities |
| | - **Zero-shot Voice Cloning**: Clone voices from short reference clips. |
| | - **Multilingual Support**: High-quality synthesis across 20+ languages. |
| | - **Long-form Stability**: Synthesize stable audio for durations up to 1 hour. |
| | - **Fine-grained Control**: Phoneme-level and duration-level control for precise prosody. |
| |
|
| | ## π Architecture |
| | This specific export is based on the **MossTTSDelay** architecture, optimized for sequential stability and CPU throughput. |
| |
|
| | | Feature | Specification | |
| | |---|---| |
| | | Optimization Engine | NEO (Autonomous ML Agent) | |
| | | Device Target | CPU (x86_64 / ARM64) | |
| | | Quantization | Dynamic INT8 | |
| | | Sampling Rate | 24kHz / 44.1kHz (Configurable) | |
| | |
| | ## π License |
| | This model is released under the **Apache-2.0 License**. |
| | |
| | ## π€ Acknowledgments |
| | Original model by [MOSI.AI](https://mosi.cn/) and the [OpenMOSS Team](https://github.com/OpenMOSS/MOSS-TTS). |
| | CPU Optimization and Hugging Face packaging by **[NEO](https://heyneo.so/)**. |
| | |