xtts-mobile / README.md
bnewton-genmedlabs's picture
Update README for TorchScript models
209d56d verified
---
language:
- en
- es
- fr
- de
- it
- pt
- pl
- tr
- ru
- nl
- cs
- ar
- zh
- ja
- ko
- hu
- hi
tags:
- text-to-speech
- tts
- xtts
- mobile
- torchscript
- android
- ios
license: apache-2.0
---
# XTTS v2 Mobile - TorchScript Edition
✨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment!
Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.
## 🎯 Key Features
- **TorchScript Format**: Self-contained `.ts` files that run directly on mobile
- **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations
- **Multiple Variants**: Choose based on your device capabilities
- **17 Languages**: Full multilingual support maintained
- **24kHz Output**: High-quality audio generation
## πŸ“¦ Model Variants
| Variant | Size | Memory | Target Devices | Quality |
|---------|------|--------|----------------|---------|
| **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best |
| **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent |
> **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.
## πŸš€ Quick Start
### Download Models
```python
from huggingface_hub import hf_hub_download
# Download FP16 variant (recommended)
model_path = hf_hub_download(
repo_id="GenMedLabs/xtts-mobile",
filename="fp16/xtts_infer_fp16.ts"
)
```
### Android Integration (Kotlin)
```kotlin
// Add to build.gradle
dependencies {
implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}
// Load and use model
class XTTSModule(context: Context) {
private var module: Module? = null
fun initialize(modelPath: String) {
module = Module.load(modelPath)
}
fun generateSpeech(text: String, language: String): FloatArray {
val output = module?.forward(
IValue.from(text),
IValue.from(language)
)?.toTensor()
return output?.dataAsFloatArray ?: floatArrayOf()
}
}
```
### iOS Integration (Swift)
```swift
import LibTorch
class XTTSModule {
private var module: TorchModule?
func initialize(modelPath: String) {
module = TorchModule(fileAtPath: modelPath)
}
func generateSpeech(text: String, language: String) -> [Float] {
guard let module = module else { return [] }
let output = module.forward([text, language])
return output.toArray()
}
}
```
### React Native Integration
```javascript
// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";
async function downloadModel(variant = 'fp16') {
const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;
await RNFS.downloadFile({
fromUrl: url,
toFile: destPath,
background: true
}).promise;
return destPath;
}
// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);
// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");
```
## πŸ“Š Memory Requirements
| Device RAM | Recommended Variant | Expected Performance |
|------------|-------------------|---------------------|
| < 3GB | FP16 with streaming | May require optimization |
| 3-4GB | FP16 | Smooth performance |
| 4GB+ | Original or FP16 | Excellent performance |
## 🌍 Supported Languages
- `en` - English
- `es` - Spanish
- `fr` - French
- `de` - German
- `it` - Italian
- `pt` - Portuguese
- `pl` - Polish
- `tr` - Turkish
- `ru` - Russian
- `nl` - Dutch
- `cs` - Czech
- `ar` - Arabic
- `zh` - Chinese
- `ja` - Japanese
- `ko` - Korean
- `hu` - Hungarian
- `hi` - Hindi
## πŸ”§ Technical Details
- **Model Architecture**: XTTS v2 with GPT-style backbone
- **Export Method**: TorchScript with mobile optimizations
- **PyTorch Version**: 2.8.0 (use matching LibTorch version)
- **Sample Rate**: 24,000 Hz
- **Quantization**: FP16 uses half-precision floating point
## πŸ’‘ Tips for Mobile Deployment
1. **Memory Management**:
- Load model once at app startup
- Keep model in memory for multiple generations
- Use `module.setNumThreads(1)` to reduce memory usage
2. **Performance Optimization**:
- Warm up model with dummy input on first load
- Use FP16 variant for best balance
- Consider chunking long texts
3. **Error Handling**:
```kotlin
try {
module = Module.load(modelPath)
} catch (e: Exception) {
// Fall back to server-side TTS
Log.e("XTTS", "Failed to load model: ${e.message}")
}
```
## πŸ“ Changelog
- **2024-09-23**: Initial release with TorchScript models
- Added Original and FP16 variants
- Optimized for PyTorch Mobile
- Fixed compatibility issues
## πŸ“„ License
Apache 2.0
## πŸ™ Acknowledgments
Based on the official XTTS v2 model. Optimized for mobile deployment.
## πŸ“š Citation
```bibtex
@misc{xtts2024mobile,
title={XTTS v2 Mobile - TorchScript Edition},
author={GenMedLabs},
year={2024},
publisher={HuggingFace}
}
```
## ⚠️ Important Notes
- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`)
- Models are self-contained and include all necessary weights
- No additional tokenizer files needed - tokenization is built into the model
- INT8 quantization not available for ARM-based systems