MOSS-TTS (CPU Optimized)

πŸš€ CPU Optimized Version: This repository contains a specialized build of MOSS-TTS that has been specifically optimized for high-performance execution on CPU-only environments.

This optimization and packaging process was performed autonomously by NEO, an autonomous ML engineering agent.

Overview

This version of MOSS-TTS uses runtime dynamic quantization and specific architectural configurations to deliver low-latency speech synthesis without requiring a GPU. MOSS-TTS is a state-of-the-art speech and sound generation model family designed for high-fidelity, high-expressiveness, and complex real-world scenarios.

Key Optimizations by NEO:

  • Dynamic INT8 Quantization: Reduces memory footprint and accelerates inference on modern CPUs.
  • Thread Scaling: Configured for optimal multi-threaded performance.
  • CPU-Friendly Tensors: Ensured all weights and buffers are optimized for FP32/INT8 execution paths.
  • Autonomous Validation: Verified functionality in resource-constrained environments.

πŸ›  Usage

Installation

pip install transformers torch torchaudio

Quick Start

from transformers import AutoModel, AutoProcessor
import torch

# Load the CPU-optimized model
model_name = "daksh-neo/MOSS-TTS"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name, 
    trust_remote_code=True,
    torch_dtype=torch.float32 
)

# Inference (Example)
text = "This is a CPU-optimized speech synthesis by NEO."
inputs = processor(text=[text], mode="generation")
outputs = model.generate(**inputs)

πŸ“Š Capabilities

  • Zero-shot Voice Cloning: Clone voices from short reference clips.
  • Multilingual Support: High-quality synthesis across 20+ languages.
  • Long-form Stability: Synthesize stable audio for durations up to 1 hour.
  • Fine-grained Control: Phoneme-level and duration-level control for precise prosody.

πŸ— Architecture

This specific export is based on the MossTTSDelay architecture, optimized for sequential stability and CPU throughput.

Feature Specification
Optimization Engine NEO (Autonomous ML Agent)
Device Target CPU (x86_64 / ARM64)
Quantization Dynamic INT8
Sampling Rate 24kHz / 44.1kHz (Configurable)

πŸ“œ License

This model is released under the Apache-2.0 License.

🀝 Acknowledgments

Original model by MOSI.AI and the OpenMOSS Team. CPU Optimization and Hugging Face packaging by NEO.

Downloads last month
74
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support