|
|
--- |
|
|
language: |
|
|
- en |
|
|
- es |
|
|
- fr |
|
|
- de |
|
|
- it |
|
|
- pt |
|
|
- pl |
|
|
- tr |
|
|
- ru |
|
|
- nl |
|
|
- cs |
|
|
- ar |
|
|
- zh |
|
|
- ja |
|
|
- ko |
|
|
- hu |
|
|
- hi |
|
|
tags: |
|
|
- text-to-speech |
|
|
- tts |
|
|
- xtts |
|
|
- mobile |
|
|
- torchscript |
|
|
- android |
|
|
- ios |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# XTTS v2 Mobile - TorchScript Edition |
|
|
|
|
|
β¨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment! |
|
|
|
|
|
Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices. |
|
|
|
|
|
## π― Key Features |
|
|
- **TorchScript Format**: Self-contained `.ts` files that run directly on mobile |
|
|
- **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations |
|
|
- **Multiple Variants**: Choose based on your device capabilities |
|
|
- **17 Languages**: Full multilingual support maintained |
|
|
- **24kHz Output**: High-quality audio generation |
|
|
|
|
|
## π¦ Model Variants |
|
|
|
|
|
| Variant | Size | Memory | Target Devices | Quality | |
|
|
|---------|------|--------|----------------|---------| |
|
|
| **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best | |
|
|
| **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent | |
|
|
|
|
|
> **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality. |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Download Models |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download FP16 variant (recommended) |
|
|
model_path = hf_hub_download( |
|
|
repo_id="GenMedLabs/xtts-mobile", |
|
|
filename="fp16/xtts_infer_fp16.ts" |
|
|
) |
|
|
``` |
|
|
|
|
|
### Android Integration (Kotlin) |
|
|
|
|
|
```kotlin |
|
|
// Add to build.gradle |
|
|
dependencies { |
|
|
implementation 'org.pytorch:pytorch_android_lite:2.1.0' |
|
|
} |
|
|
|
|
|
// Load and use model |
|
|
class XTTSModule(context: Context) { |
|
|
private var module: Module? = null |
|
|
|
|
|
fun initialize(modelPath: String) { |
|
|
module = Module.load(modelPath) |
|
|
} |
|
|
|
|
|
fun generateSpeech(text: String, language: String): FloatArray { |
|
|
val output = module?.forward( |
|
|
IValue.from(text), |
|
|
IValue.from(language) |
|
|
)?.toTensor() |
|
|
|
|
|
return output?.dataAsFloatArray ?: floatArrayOf() |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### iOS Integration (Swift) |
|
|
|
|
|
```swift |
|
|
import LibTorch |
|
|
|
|
|
class XTTSModule { |
|
|
private var module: TorchModule? |
|
|
|
|
|
func initialize(modelPath: String) { |
|
|
module = TorchModule(fileAtPath: modelPath) |
|
|
} |
|
|
|
|
|
func generateSpeech(text: String, language: String) -> [Float] { |
|
|
guard let module = module else { return [] } |
|
|
|
|
|
let output = module.forward([text, language]) |
|
|
return output.toArray() |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### React Native Integration |
|
|
|
|
|
```javascript |
|
|
// Download model from HuggingFace |
|
|
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main"; |
|
|
|
|
|
async function downloadModel(variant = 'fp16') { |
|
|
const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`; |
|
|
const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`; |
|
|
|
|
|
await RNFS.downloadFile({ |
|
|
fromUrl: url, |
|
|
toFile: destPath, |
|
|
background: true |
|
|
}).promise; |
|
|
|
|
|
return destPath; |
|
|
} |
|
|
|
|
|
// Initialize native module |
|
|
const modelPath = await downloadModel('fp16'); |
|
|
await XTTSModule.initialize(modelPath); |
|
|
|
|
|
// Generate speech |
|
|
const audio = await XTTSModule.speak("Hello world", "en"); |
|
|
``` |
|
|
|
|
|
## π Memory Requirements |
|
|
|
|
|
| Device RAM | Recommended Variant | Expected Performance | |
|
|
|------------|-------------------|---------------------| |
|
|
| < 3GB | FP16 with streaming | May require optimization | |
|
|
| 3-4GB | FP16 | Smooth performance | |
|
|
| 4GB+ | Original or FP16 | Excellent performance | |
|
|
|
|
|
## π Supported Languages |
|
|
|
|
|
- `en` - English |
|
|
- `es` - Spanish |
|
|
- `fr` - French |
|
|
- `de` - German |
|
|
- `it` - Italian |
|
|
- `pt` - Portuguese |
|
|
- `pl` - Polish |
|
|
- `tr` - Turkish |
|
|
- `ru` - Russian |
|
|
- `nl` - Dutch |
|
|
- `cs` - Czech |
|
|
- `ar` - Arabic |
|
|
- `zh` - Chinese |
|
|
- `ja` - Japanese |
|
|
- `ko` - Korean |
|
|
- `hu` - Hungarian |
|
|
- `hi` - Hindi |
|
|
|
|
|
## π§ Technical Details |
|
|
|
|
|
- **Model Architecture**: XTTS v2 with GPT-style backbone |
|
|
- **Export Method**: TorchScript with mobile optimizations |
|
|
- **PyTorch Version**: 2.8.0 (use matching LibTorch version) |
|
|
- **Sample Rate**: 24,000 Hz |
|
|
- **Quantization**: FP16 uses half-precision floating point |
|
|
|
|
|
## π‘ Tips for Mobile Deployment |
|
|
|
|
|
1. **Memory Management**: |
|
|
- Load model once at app startup |
|
|
- Keep model in memory for multiple generations |
|
|
- Use `module.setNumThreads(1)` to reduce memory usage |
|
|
|
|
|
2. **Performance Optimization**: |
|
|
- Warm up model with dummy input on first load |
|
|
- Use FP16 variant for best balance |
|
|
- Consider chunking long texts |
|
|
|
|
|
3. **Error Handling**: |
|
|
```kotlin |
|
|
try { |
|
|
module = Module.load(modelPath) |
|
|
} catch (e: Exception) { |
|
|
// Fall back to server-side TTS |
|
|
Log.e("XTTS", "Failed to load model: ${e.message}") |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Changelog |
|
|
|
|
|
- **2024-09-23**: Initial release with TorchScript models |
|
|
- Added Original and FP16 variants |
|
|
- Optimized for PyTorch Mobile |
|
|
- Fixed compatibility issues |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
Based on the official XTTS v2 model. Optimized for mobile deployment. |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{xtts2024mobile, |
|
|
title={XTTS v2 Mobile - TorchScript Edition}, |
|
|
author={GenMedLabs}, |
|
|
year={2024}, |
|
|
publisher={HuggingFace} |
|
|
} |
|
|
``` |
|
|
|
|
|
## β οΈ Important Notes |
|
|
|
|
|
- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`) |
|
|
- Models are self-contained and include all necessary weights |
|
|
- No additional tokenizer files needed - tokenization is built into the model |
|
|
- INT8 quantization not available for ARM-based systems |