File size: 5,424 Bytes
a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d a35c16e 209d56d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
language:
- en
- es
- fr
- de
- it
- pt
- pl
- tr
- ru
- nl
- cs
- ar
- zh
- ja
- ko
- hu
- hi
tags:
- text-to-speech
- tts
- xtts
- mobile
- torchscript
- android
- ios
license: apache-2.0
---
# XTTS v2 Mobile - TorchScript Edition
β¨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment!
Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.
## π― Key Features
- **TorchScript Format**: Self-contained `.ts` files that run directly on mobile
- **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations
- **Multiple Variants**: Choose based on your device capabilities
- **17 Languages**: Full multilingual support maintained
- **24kHz Output**: High-quality audio generation
## π¦ Model Variants
| Variant | Size | Memory | Target Devices | Quality |
|---------|------|--------|----------------|---------|
| **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best |
| **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent |
> **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.
## π Quick Start
### Download Models
```python
from huggingface_hub import hf_hub_download
# Download FP16 variant (recommended)
model_path = hf_hub_download(
repo_id="GenMedLabs/xtts-mobile",
filename="fp16/xtts_infer_fp16.ts"
)
```
### Android Integration (Kotlin)
```kotlin
// Add to build.gradle
dependencies {
implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}
// Load and use model
class XTTSModule(context: Context) {
private var module: Module? = null
fun initialize(modelPath: String) {
module = Module.load(modelPath)
}
fun generateSpeech(text: String, language: String): FloatArray {
val output = module?.forward(
IValue.from(text),
IValue.from(language)
)?.toTensor()
return output?.dataAsFloatArray ?: floatArrayOf()
}
}
```
### iOS Integration (Swift)
```swift
import LibTorch
class XTTSModule {
private var module: TorchModule?
func initialize(modelPath: String) {
module = TorchModule(fileAtPath: modelPath)
}
func generateSpeech(text: String, language: String) -> [Float] {
guard let module = module else { return [] }
let output = module.forward([text, language])
return output.toArray()
}
}
```
### React Native Integration
```javascript
// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";
async function downloadModel(variant = 'fp16') {
const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;
await RNFS.downloadFile({
fromUrl: url,
toFile: destPath,
background: true
}).promise;
return destPath;
}
// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);
// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");
```
## π Memory Requirements
| Device RAM | Recommended Variant | Expected Performance |
|------------|-------------------|---------------------|
| < 3GB | FP16 with streaming | May require optimization |
| 3-4GB | FP16 | Smooth performance |
| 4GB+ | Original or FP16 | Excellent performance |
## π Supported Languages
- `en` - English
- `es` - Spanish
- `fr` - French
- `de` - German
- `it` - Italian
- `pt` - Portuguese
- `pl` - Polish
- `tr` - Turkish
- `ru` - Russian
- `nl` - Dutch
- `cs` - Czech
- `ar` - Arabic
- `zh` - Chinese
- `ja` - Japanese
- `ko` - Korean
- `hu` - Hungarian
- `hi` - Hindi
## π§ Technical Details
- **Model Architecture**: XTTS v2 with GPT-style backbone
- **Export Method**: TorchScript with mobile optimizations
- **PyTorch Version**: 2.8.0 (use matching LibTorch version)
- **Sample Rate**: 24,000 Hz
- **Quantization**: FP16 uses half-precision floating point
## π‘ Tips for Mobile Deployment
1. **Memory Management**:
- Load model once at app startup
- Keep model in memory for multiple generations
- Use `module.setNumThreads(1)` to reduce memory usage
2. **Performance Optimization**:
- Warm up model with dummy input on first load
- Use FP16 variant for best balance
- Consider chunking long texts
3. **Error Handling**:
```kotlin
try {
module = Module.load(modelPath)
} catch (e: Exception) {
// Fall back to server-side TTS
Log.e("XTTS", "Failed to load model: ${e.message}")
}
```
## π Changelog
- **2024-09-23**: Initial release with TorchScript models
- Added Original and FP16 variants
- Optimized for PyTorch Mobile
- Fixed compatibility issues
## π License
Apache 2.0
## π Acknowledgments
Based on the official XTTS v2 model. Optimized for mobile deployment.
## π Citation
```bibtex
@misc{xtts2024mobile,
title={XTTS v2 Mobile - TorchScript Edition},
author={GenMedLabs},
year={2024},
publisher={HuggingFace}
}
```
## β οΈ Important Notes
- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`)
- Models are self-contained and include all necessary weights
- No additional tokenizer files needed - tokenization is built into the model
- INT8 quantization not available for ARM-based systems |