File size: 2,915 Bytes

07acd5c
 
e3f020c
07acd5c
 
f1f9182
 
07acd5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1f9182
07acd5c
 
f1f9182
53a0ef9
f1f9182
53a0ef9
f1f9182
53a0ef9
07acd5c
f1f9182
07acd5c
f1f9182
 
 
 
 
07acd5c
f1f9182
07acd5c
f1f9182
07acd5c
f1f9182
07acd5c
f1f9182
07acd5c
 
f1f9182
07acd5c
 
 
 
f1f9182
 
 
07acd5c
f1f9182
07acd5c
f1f9182
 
07acd5c
f1f9182
 
 
 
07acd5c
 
f1f9182
07acd5c
f1f9182
 
 
 
 
07acd5c
f1f9182
 
07acd5c
f1f9182
 
 
 
 
 
07acd5c
f1f9182
 
07acd5c
f1f9182

---
license: apache-2.0
pipeline_tag: text-to-speech
tags:
- text-to-speech
- audio
- cpu-optimized
language:
- zh
- en
- de
- es
- fr
- ja
- it
- he
- ko
- ru
- fa
- ar
- pl
- pt
- cs
- da
- sv
- hu
- el
- tr
library_name: transformers
---

# MOSS-TTS (CPU Optimized)

> **🚀 CPU Optimized Version**: This repository contains a specialized build of **MOSS-TTS** that has been specifically optimized for high-performance execution on CPU-only environments.

**This optimization and packaging process was performed autonomously by [NEO](https://github.com/daksh-neo), an autonomous ML engineering agent.**

## Overview
This version of MOSS-TTS uses runtime dynamic quantization and specific architectural configurations to deliver low-latency speech synthesis without requiring a GPU. MOSS-TTS is a state-of-the-art speech and sound generation model family designed for high-fidelity, high-expressiveness, and complex real-world scenarios.

### Key Optimizations by NEO:
- **Dynamic INT8 Quantization**: Reduces memory footprint and accelerates inference on modern CPUs.
- **Thread Scaling**: Configured for optimal multi-threaded performance.
- **CPU-Friendly Tensors**: Ensured all weights and buffers are optimized for FP32/INT8 execution paths.
- **Autonomous Validation**: Verified functionality in resource-constrained environments.

---

## 🛠 Usage

### Installation
```bash
pip install transformers torch torchaudio
```

### Quick Start
```python
from transformers import AutoModel, AutoProcessor
import torch

# Load the CPU-optimized model
model_name = "daksh-neo/MOSS-TTS"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name, 
    trust_remote_code=True,
    torch_dtype=torch.float32 
)

# Inference (Example)
text = "This is a CPU-optimized speech synthesis by NEO."
inputs = processor(text=[text], mode="generation")
outputs = model.generate(**inputs)
```

---

## 📊 Capabilities
- **Zero-shot Voice Cloning**: Clone voices from short reference clips.
- **Multilingual Support**: High-quality synthesis across 20+ languages.
- **Long-form Stability**: Synthesize stable audio for durations up to 1 hour.
- **Fine-grained Control**: Phoneme-level and duration-level control for precise prosody.

## 🏗 Architecture
This specific export is based on the **MossTTSDelay** architecture, optimized for sequential stability and CPU throughput.

| Feature | Specification |
|---|---|
| Optimization Engine | NEO (Autonomous ML Agent) |
| Device Target | CPU (x86_64 / ARM64) |
| Quantization | Dynamic INT8 |
| Sampling Rate | 24kHz / 44.1kHz (Configurable) |

## 📜 License
This model is released under the **Apache-2.0 License**.

## 🤝 Acknowledgments
Original model by [MOSI.AI](https://mosi.cn/) and the [OpenMOSS Team](https://github.com/OpenMOSS/MOSS-TTS).
CPU Optimization and Hugging Face packaging by **NEO**.