File size: 5,424 Bytes

---
language:
- en
- es
- fr
- de
- it
- pt
- pl
- tr
- ru
- nl
- cs
- ar
- zh
- ja
- ko
- hu
- hi
tags:
- text-to-speech
- tts
- xtts
- mobile
- torchscript
- android
- ios
license: apache-2.0
---

# XTTS v2 Mobile - TorchScript Edition

✨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment!

Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.

## 🎯 Key Features
- **TorchScript Format**: Self-contained `.ts` files that run directly on mobile
- **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations
- **Multiple Variants**: Choose based on your device capabilities
- **17 Languages**: Full multilingual support maintained
- **24kHz Output**: High-quality audio generation

## 📦 Model Variants

| Variant | Size | Memory | Target Devices | Quality |
|---------|------|--------|----------------|---------|
| **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best |
| **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent |

> **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.

## 🚀 Quick Start

### Download Models

```python
from huggingface_hub import hf_hub_download

# Download FP16 variant (recommended)
model_path = hf_hub_download(
    repo_id="GenMedLabs/xtts-mobile",
    filename="fp16/xtts_infer_fp16.ts"
)
```

### Android Integration (Kotlin)

```kotlin
// Add to build.gradle
dependencies {
    implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}

// Load and use model
class XTTSModule(context: Context) {
    private var module: Module? = null

    fun initialize(modelPath: String) {
        module = Module.load(modelPath)
    }

    fun generateSpeech(text: String, language: String): FloatArray {
        val output = module?.forward(
            IValue.from(text),
            IValue.from(language)
        )?.toTensor()

        return output?.dataAsFloatArray ?: floatArrayOf()
    }
}
```

### iOS Integration (Swift)

```swift
import LibTorch

class XTTSModule {
    private var module: TorchModule?

    func initialize(modelPath: String) {
        module = TorchModule(fileAtPath: modelPath)
    }

    func generateSpeech(text: String, language: String) -> [Float] {
        guard let module = module else { return [] }

        let output = module.forward([text, language])
        return output.toArray()
    }
}
```

### React Native Integration

```javascript
// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";

async function downloadModel(variant = 'fp16') {
    const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
    const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;

    await RNFS.downloadFile({
        fromUrl: url,
        toFile: destPath,
        background: true
    }).promise;

    return destPath;
}

// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);

// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");
```

## 📊 Memory Requirements

| Device RAM | Recommended Variant | Expected Performance |
|------------|-------------------|---------------------|
| < 3GB | FP16 with streaming | May require optimization |
| 3-4GB | FP16 | Smooth performance |
| 4GB+ | Original or FP16 | Excellent performance |

## 🌍 Supported Languages

- `en` - English
- `es` - Spanish
- `fr` - French
- `de` - German
- `it` - Italian
- `pt` - Portuguese
- `pl` - Polish
- `tr` - Turkish
- `ru` - Russian
- `nl` - Dutch
- `cs` - Czech
- `ar` - Arabic
- `zh` - Chinese
- `ja` - Japanese
- `ko` - Korean
- `hu` - Hungarian
- `hi` - Hindi

## 🔧 Technical Details

- **Model Architecture**: XTTS v2 with GPT-style backbone
- **Export Method**: TorchScript with mobile optimizations
- **PyTorch Version**: 2.8.0 (use matching LibTorch version)
- **Sample Rate**: 24,000 Hz
- **Quantization**: FP16 uses half-precision floating point

## 💡 Tips for Mobile Deployment

1. **Memory Management**:
   - Load model once at app startup
   - Keep model in memory for multiple generations
   - Use `module.setNumThreads(1)` to reduce memory usage

2. **Performance Optimization**:
   - Warm up model with dummy input on first load
   - Use FP16 variant for best balance
   - Consider chunking long texts

3. **Error Handling**:
   ```kotlin
   try {
       module = Module.load(modelPath)
   } catch (e: Exception) {
       // Fall back to server-side TTS
       Log.e("XTTS", "Failed to load model: ${e.message}")
   }
   ```

## 📝 Changelog

- **2024-09-23**: Initial release with TorchScript models
  - Added Original and FP16 variants
  - Optimized for PyTorch Mobile
  - Fixed compatibility issues

## 📄 License

Apache 2.0

## 🙏 Acknowledgments

Based on the official XTTS v2 model. Optimized for mobile deployment.

## 📚 Citation

```bibtex
@misc{xtts2024mobile,
  title={XTTS v2 Mobile - TorchScript Edition},
  author={GenMedLabs},
  year={2024},
  publisher={HuggingFace}
}
```

## ⚠️ Important Notes

- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`)
- Models are self-contained and include all necessary weights
- No additional tokenizer files needed - tokenization is built into the model
- INT8 quantization not available for ARM-based systems