--- language: - en - es - fr - de - it - pt - pl - tr - ru - nl - cs - ar - zh - ja - ko - hu - hi tags: - text-to-speech - tts - xtts - mobile - torchscript - android - ios license: apache-2.0 --- # XTTS v2 Mobile - TorchScript Edition ✨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment! Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices. ## 🎯 Key Features - **TorchScript Format**: Self-contained `.ts` files that run directly on mobile - **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations - **Multiple Variants**: Choose based on your device capabilities - **17 Languages**: Full multilingual support maintained - **24kHz Output**: High-quality audio generation ## 📦 Model Variants | Variant | Size | Memory | Target Devices | Quality | |---------|------|--------|----------------|---------| | **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best | | **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent | > **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality. ## 🚀 Quick Start ### Download Models ```python from huggingface_hub import hf_hub_download # Download FP16 variant (recommended) model_path = hf_hub_download( repo_id="GenMedLabs/xtts-mobile", filename="fp16/xtts_infer_fp16.ts" ) ``` ### Android Integration (Kotlin) ```kotlin // Add to build.gradle dependencies { implementation 'org.pytorch:pytorch_android_lite:2.1.0' } // Load and use model class XTTSModule(context: Context) { private var module: Module? = null fun initialize(modelPath: String) { module = Module.load(modelPath) } fun generateSpeech(text: String, language: String): FloatArray { val output = module?.forward( IValue.from(text), IValue.from(language) )?.toTensor() return output?.dataAsFloatArray ?: floatArrayOf() } } ``` ### iOS Integration (Swift) ```swift import LibTorch class XTTSModule { private var module: TorchModule? func initialize(modelPath: String) { module = TorchModule(fileAtPath: modelPath) } func generateSpeech(text: String, language: String) -> [Float] { guard let module = module else { return [] } let output = module.forward([text, language]) return output.toArray() } } ``` ### React Native Integration ```javascript // Download model from HuggingFace const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main"; async function downloadModel(variant = 'fp16') { const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`; const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`; await RNFS.downloadFile({ fromUrl: url, toFile: destPath, background: true }).promise; return destPath; } // Initialize native module const modelPath = await downloadModel('fp16'); await XTTSModule.initialize(modelPath); // Generate speech const audio = await XTTSModule.speak("Hello world", "en"); ``` ## 📊 Memory Requirements | Device RAM | Recommended Variant | Expected Performance | |------------|-------------------|---------------------| | < 3GB | FP16 with streaming | May require optimization | | 3-4GB | FP16 | Smooth performance | | 4GB+ | Original or FP16 | Excellent performance | ## 🌍 Supported Languages - `en` - English - `es` - Spanish - `fr` - French - `de` - German - `it` - Italian - `pt` - Portuguese - `pl` - Polish - `tr` - Turkish - `ru` - Russian - `nl` - Dutch - `cs` - Czech - `ar` - Arabic - `zh` - Chinese - `ja` - Japanese - `ko` - Korean - `hu` - Hungarian - `hi` - Hindi ## 🔧 Technical Details - **Model Architecture**: XTTS v2 with GPT-style backbone - **Export Method**: TorchScript with mobile optimizations - **PyTorch Version**: 2.8.0 (use matching LibTorch version) - **Sample Rate**: 24,000 Hz - **Quantization**: FP16 uses half-precision floating point ## 💡 Tips for Mobile Deployment 1. **Memory Management**: - Load model once at app startup - Keep model in memory for multiple generations - Use `module.setNumThreads(1)` to reduce memory usage 2. **Performance Optimization**: - Warm up model with dummy input on first load - Use FP16 variant for best balance - Consider chunking long texts 3. **Error Handling**: ```kotlin try { module = Module.load(modelPath) } catch (e: Exception) { // Fall back to server-side TTS Log.e("XTTS", "Failed to load model: ${e.message}") } ``` ## 📝 Changelog - **2024-09-23**: Initial release with TorchScript models - Added Original and FP16 variants - Optimized for PyTorch Mobile - Fixed compatibility issues ## 📄 License Apache 2.0 ## 🙏 Acknowledgments Based on the official XTTS v2 model. Optimized for mobile deployment. ## 📚 Citation ```bibtex @misc{xtts2024mobile, title={XTTS v2 Mobile - TorchScript Edition}, author={GenMedLabs}, year={2024}, publisher={HuggingFace} } ``` ## ⚠️ Important Notes - These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`) - Models are self-contained and include all necessary weights - No additional tokenizer files needed - tokenization is built into the model - INT8 quantization not available for ARM-based systems