daksh-neo
/

MOSS-TTS

feature-extraction

Model card Files Files and versions

MOSS-TTS / README.md

daksh-neo's picture

Upload folder using huggingface_hub

f1f9182 verified 11 days ago

|

history blame contribute delete

2.89 kB

	---
	license: apache-2.0
	tags:
	- text-to-speech
	- audio
	- cpu-optimized
	language:
	- zh
	- en
	- de
	- es
	- fr
	- ja
	- it
	- he
	- ko
	- ru
	- fa
	- ar
	- pl
	- pt
	- cs
	- da
	- sv
	- hu
	- el
	- tr
	library_name: transformers
	---

	# MOSS-TTS (CPU Optimized)

	> 🚀 CPU Optimized Version: This repository contains a specialized build of MOSS-TTS that has been specifically optimized for high-performance execution on CPU-only environments.

	This optimization and packaging process was performed autonomously by [NEO](https://github.com/daksh-neo), an autonomous ML engineering agent.

	## Overview
	This version of MOSS-TTS uses runtime dynamic quantization and specific architectural configurations to deliver low-latency speech synthesis without requiring a GPU. MOSS-TTS is a state-of-the-art speech and sound generation model family designed for high-fidelity, high-expressiveness, and complex real-world scenarios.

	### Key Optimizations by NEO:
	- Dynamic INT8 Quantization: Reduces memory footprint and accelerates inference on modern CPUs.
	- Thread Scaling: Configured for optimal multi-threaded performance.
	- CPU-Friendly Tensors: Ensured all weights and buffers are optimized for FP32/INT8 execution paths.
	- Autonomous Validation: Verified functionality in resource-constrained environments.

	---

	## 🛠 Usage

	### Installation
	```bash
	pip install transformers torch torchaudio
	```

	### Quick Start
	```python
	from transformers import AutoModel, AutoProcessor
	import torch

	# Load the CPU-optimized model
	model_name = "daksh-neo/MOSS-TTS"
	processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModel.from_pretrained(
	model_name,
	trust_remote_code=True,
	torch_dtype=torch.float32
	)

	# Inference (Example)
	text = "This is a CPU-optimized speech synthesis by NEO."
	inputs = processor(text=[text], mode="generation")
	outputs = model.generate(**inputs)
	```

	---

	## 📊 Capabilities
	- Zero-shot Voice Cloning: Clone voices from short reference clips.
	- Multilingual Support: High-quality synthesis across 20+ languages.
	- Long-form Stability: Synthesize stable audio for durations up to 1 hour.
	- Fine-grained Control: Phoneme-level and duration-level control for precise prosody.

	## 🏗 Architecture
	This specific export is based on the MossTTSDelay architecture, optimized for sequential stability and CPU throughput.

	\| Feature \| Specification \|
	\|---\|---\|
	\| Optimization Engine \| NEO (Autonomous ML Agent) \|
	\| Device Target \| CPU (x86_64 / ARM64) \|
	\| Quantization \| Dynamic INT8 \|
	\| Sampling Rate \| 24kHz / 44.1kHz (Configurable) \|

	## 📜 License
	This model is released under the Apache-2.0 License.

	## 🤝 Acknowledgments
	Original model by [MOSI.AI](https://mosi.cn/) and the [OpenMOSS Team](https://github.com/OpenMOSS/MOSS-TTS).
	CPU Optimization and Hugging Face packaging by NEO.