xtts-gguf / README.md
bnewton-genmedlabs's picture
Initial GGUF implementation with C++ inference engine
4688879 verified
metadata
language:
  - en
  - es
  - fr
  - de
  - it
  - pt
  - pl
  - tr
  - ru
  - nl
  - cs
  - ar
  - zh
  - ja
  - ko
  - hu
  - hi
tags:
  - text-to-speech
  - tts
  - xtts
  - gguf
  - quantized
  - mobile
  - embedded
  - cpp
license: apache-2.0

XTTS v2 GGUF - Memory-Efficient TTS for Mobile

πŸš€ EXPERIMENTAL: GGUF format XTTS v2 with C++ inference engine for ultra-low memory usage on mobile devices.

⚠️ NOTE: This is a proof-of-concept. GGUF files require the included C++ inference engine to run.

🎯 Key Features

  • Memory-Mapped Loading: Only loads needed parts into RAM
  • Multiple Quantizations: Q4 (290MB), Q8 (580MB), F16 (1.16GB)
  • Low RAM Usage: 90-350MB vs 1.5-2.5GB for PyTorch
  • Fast Loading: <1 second vs 15-20 seconds
  • React Native Ready: Full mobile integration

πŸ“Š Model Variants

Variant Size RAM (mmap) Quality Best For
q4_k 290MB ~90MB Good Low-end devices
q8 580MB ~180MB Very Good Mid-range devices
f16 1.16GB ~350MB Excellent High-end devices

πŸš€ Quick Start

React Native

import XTTS from '@genmedlabs/xtts-gguf';

// Initialize (downloads model automatically)
await XTTS.initialize();

// Generate speech
const audio = await XTTS.speak("Hello world!", {
  language: 'en'
});

C++

#include "xtts_inference.h"

auto model = std::make_unique<xtts::XTTSInference>();
model->load_model("xtts_v2_q4_k.gguf", true);
auto audio = model->generate("Hello world!", xtts::LANG_EN);

πŸ“¦ Repository Structure

gguf/
β”œβ”€β”€ xtts_v2_q4_k.gguf   # 4-bit quantized model
β”œβ”€β”€ xtts_v2_q8.gguf     # 8-bit quantized model
β”œβ”€β”€ xtts_v2_f16.gguf    # 16-bit half precision
└── manifest.json       # Model metadata

cpp/
β”œβ”€β”€ xtts_inference.h    # C++ header
β”œβ”€β”€ xtts_inference.cpp  # Implementation
└── CMakeLists.txt      # Build configuration

react-native/
β”œβ”€β”€ XTTSModule.cpp      # Native module
└── XTTSModule.ts       # TypeScript interface

πŸ”§ Implementation Status

Completed βœ…

  • GGUF format export
  • C++ engine structure
  • React Native bridge
  • Memory-mapped loading

In Progress 🚧

  • Full transformer implementation
  • Hardware acceleration
  • Voice cloning support

TODO πŸ“‹

  • Production optimizations
  • Comprehensive testing
  • WebAssembly support

πŸ“„ License

Apache 2.0

πŸ™ Credits

Based on XTTS v2 by Coqui AI. Uses GGML library for efficient inference.


See full documentation in the repository for detailed usage and build instructions.