File size: 5,424 Bytes
a35c16e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209d56d
 
 
 
a35c16e
 
209d56d
a35c16e
209d56d
a35c16e
209d56d
a35c16e
209d56d
 
 
 
 
 
a35c16e
209d56d
a35c16e
209d56d
 
 
 
a35c16e
209d56d
a35c16e
209d56d
a35c16e
209d56d
a35c16e
 
209d56d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a35c16e
209d56d
 
 
 
 
 
 
 
 
 
 
 
 
a35c16e
 
209d56d
 
 
 
 
 
 
 
 
 
 
 
 
 
a35c16e
209d56d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a35c16e
209d56d
a35c16e
209d56d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
language:
- en
- es
- fr
- de
- it
- pt
- pl
- tr
- ru
- nl
- cs
- ar
- zh
- ja
- ko
- hu
- hi
tags:
- text-to-speech
- tts
- xtts
- mobile
- torchscript
- android
- ios
license: apache-2.0
---

# XTTS v2 Mobile - TorchScript Edition

✨ **UPDATED**: Now with proper TorchScript models ready for mobile deployment!

Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.

## 🎯 Key Features
- **TorchScript Format**: Self-contained `.ts` files that run directly on mobile
- **Optimized for Mobile**: Models processed with PyTorch Mobile optimizations
- **Multiple Variants**: Choose based on your device capabilities
- **17 Languages**: Full multilingual support maintained
- **24kHz Output**: High-quality audio generation

## πŸ“¦ Model Variants

| Variant | Size | Memory | Target Devices | Quality |
|---------|------|--------|----------------|---------|
| **Original** | 1.16 GB | ~1.5GB | High-end (4GB+ RAM) | Best |
| **FP16** | 581 MB | ~800MB | Mid-range (3GB+ RAM) | Excellent |

> **Recommendation**: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.

## πŸš€ Quick Start

### Download Models

```python
from huggingface_hub import hf_hub_download

# Download FP16 variant (recommended)
model_path = hf_hub_download(
    repo_id="GenMedLabs/xtts-mobile",
    filename="fp16/xtts_infer_fp16.ts"
)
```

### Android Integration (Kotlin)

```kotlin
// Add to build.gradle
dependencies {
    implementation 'org.pytorch:pytorch_android_lite:2.1.0'
}

// Load and use model
class XTTSModule(context: Context) {
    private var module: Module? = null

    fun initialize(modelPath: String) {
        module = Module.load(modelPath)
    }

    fun generateSpeech(text: String, language: String): FloatArray {
        val output = module?.forward(
            IValue.from(text),
            IValue.from(language)
        )?.toTensor()

        return output?.dataAsFloatArray ?: floatArrayOf()
    }
}
```

### iOS Integration (Swift)

```swift
import LibTorch

class XTTSModule {
    private var module: TorchModule?

    func initialize(modelPath: String) {
        module = TorchModule(fileAtPath: modelPath)
    }

    func generateSpeech(text: String, language: String) -> [Float] {
        guard let module = module else { return [] }

        let output = module.forward([text, language])
        return output.toArray()
    }
}
```

### React Native Integration

```javascript
// Download model from HuggingFace
const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";

async function downloadModel(variant = 'fp16') {
    const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
    const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;

    await RNFS.downloadFile({
        fromUrl: url,
        toFile: destPath,
        background: true
    }).promise;

    return destPath;
}

// Initialize native module
const modelPath = await downloadModel('fp16');
await XTTSModule.initialize(modelPath);

// Generate speech
const audio = await XTTSModule.speak("Hello world", "en");
```

## πŸ“Š Memory Requirements

| Device RAM | Recommended Variant | Expected Performance |
|------------|-------------------|---------------------|
| < 3GB | FP16 with streaming | May require optimization |
| 3-4GB | FP16 | Smooth performance |
| 4GB+ | Original or FP16 | Excellent performance |

## 🌍 Supported Languages

- `en` - English
- `es` - Spanish
- `fr` - French
- `de` - German
- `it` - Italian
- `pt` - Portuguese
- `pl` - Polish
- `tr` - Turkish
- `ru` - Russian
- `nl` - Dutch
- `cs` - Czech
- `ar` - Arabic
- `zh` - Chinese
- `ja` - Japanese
- `ko` - Korean
- `hu` - Hungarian
- `hi` - Hindi

## πŸ”§ Technical Details

- **Model Architecture**: XTTS v2 with GPT-style backbone
- **Export Method**: TorchScript with mobile optimizations
- **PyTorch Version**: 2.8.0 (use matching LibTorch version)
- **Sample Rate**: 24,000 Hz
- **Quantization**: FP16 uses half-precision floating point

## πŸ’‘ Tips for Mobile Deployment

1. **Memory Management**:
   - Load model once at app startup
   - Keep model in memory for multiple generations
   - Use `module.setNumThreads(1)` to reduce memory usage

2. **Performance Optimization**:
   - Warm up model with dummy input on first load
   - Use FP16 variant for best balance
   - Consider chunking long texts

3. **Error Handling**:
   ```kotlin
   try {
       module = Module.load(modelPath)
   } catch (e: Exception) {
       // Fall back to server-side TTS
       Log.e("XTTS", "Failed to load model: ${e.message}")
   }
   ```

## πŸ“ Changelog

- **2024-09-23**: Initial release with TorchScript models
  - Added Original and FP16 variants
  - Optimized for PyTorch Mobile
  - Fixed compatibility issues

## πŸ“„ License

Apache 2.0

## πŸ™ Acknowledgments

Based on the official XTTS v2 model. Optimized for mobile deployment.

## πŸ“š Citation

```bibtex
@misc{xtts2024mobile,
  title={XTTS v2 Mobile - TorchScript Edition},
  author={GenMedLabs},
  year={2024},
  publisher={HuggingFace}
}
```

## ⚠️ Important Notes

- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`)
- Models are self-contained and include all necessary weights
- No additional tokenizer files needed - tokenization is built into the model
- INT8 quantization not available for ARM-based systems