xtts-mobile / README.md

Update README for TorchScript models

209d56d verified 3 months ago

5.42 kB

	---
	language:
	- en
	- es
	- fr
	- de
	- it
	- pt
	- pl
	- tr
	- ru
	- nl
	- cs
	- ar
	- zh
	- ja
	- ko
	- hu
	- hi
	tags:
	- text-to-speech
	- tts
	- xtts
	- mobile
	- torchscript
	- android
	- ios
	license: apache-2.0
	---

	# XTTS v2 Mobile - TorchScript Edition

	✨ UPDATED: Now with proper TorchScript models ready for mobile deployment!

	Optimized XTTS v2 models exported to TorchScript format for direct mobile deployment on Android and iOS devices.

	## 🎯 Key Features
	- TorchScript Format: Self-contained `.ts` files that run directly on mobile
	- Optimized for Mobile: Models processed with PyTorch Mobile optimizations
	- Multiple Variants: Choose based on your device capabilities
	- 17 Languages: Full multilingual support maintained
	- 24kHz Output: High-quality audio generation

	## 📦 Model Variants

	\| Variant \| Size \| Memory \| Target Devices \| Quality \|
	\|---------\|------\|--------\|----------------\|---------\|
	\| Original \| 1.16 GB \| ~1.5GB \| High-end (4GB+ RAM) \| Best \|
	\| FP16 \| 581 MB \| ~800MB \| Mid-range (3GB+ RAM) \| Excellent \|

	> Recommendation: Use FP16 variant for most devices - it offers the best balance of size, memory usage, and quality.

	## 🚀 Quick Start

	### Download Models

	```python
	from huggingface_hub import hf_hub_download

	# Download FP16 variant (recommended)
	model_path = hf_hub_download(
	repo_id="GenMedLabs/xtts-mobile",
	filename="fp16/xtts_infer_fp16.ts"
	)
	```

	### Android Integration (Kotlin)

	```kotlin
	// Add to build.gradle
	dependencies {
	implementation 'org.pytorch:pytorch_android_lite:2.1.0'
	}

	// Load and use model
	class XTTSModule(context: Context) {
	private var module: Module? = null

	fun initialize(modelPath: String) {
	module = Module.load(modelPath)
	}

	fun generateSpeech(text: String, language: String): FloatArray {
	val output = module?.forward(
	IValue.from(text),
	IValue.from(language)
	)?.toTensor()

	return output?.dataAsFloatArray ?: floatArrayOf()
	}
	}
	```

	### iOS Integration (Swift)

	```swift
	import LibTorch

	class XTTSModule {
	private var module: TorchModule?

	func initialize(modelPath: String) {
	module = TorchModule(fileAtPath: modelPath)
	}

	func generateSpeech(text: String, language: String) -> [Float] {
	guard let module = module else { return [] }

	let output = module.forward([text, language])
	return output.toArray()
	}
	}
	```

	### React Native Integration

	```javascript
	// Download model from HuggingFace
	const HF_BASE = "https://huggingface.co/GenMedLabs/xtts-mobile/resolve/main";

	async function downloadModel(variant = 'fp16') {
	const url = `${HF_BASE}/${variant}/xtts_infer_${variant}.ts?download=true`;
	const destPath = `${RNFS.DocumentDirectoryPath}/xtts_model.ts`;

	await RNFS.downloadFile({
	fromUrl: url,
	toFile: destPath,
	background: true
	}).promise;

	return destPath;
	}

	// Initialize native module
	const modelPath = await downloadModel('fp16');
	await XTTSModule.initialize(modelPath);

	// Generate speech
	const audio = await XTTSModule.speak("Hello world", "en");
	```

	## 📊 Memory Requirements

	\| Device RAM \| Recommended Variant \| Expected Performance \|
	\|------------\|-------------------\|---------------------\|
	\| < 3GB \| FP16 with streaming \| May require optimization \|
	\| 3-4GB \| FP16 \| Smooth performance \|
	\| 4GB+ \| Original or FP16 \| Excellent performance \|

	## 🌍 Supported Languages

	- `en` - English
	- `es` - Spanish
	- `fr` - French
	- `de` - German
	- `it` - Italian
	- `pt` - Portuguese
	- `pl` - Polish
	- `tr` - Turkish
	- `ru` - Russian
	- `nl` - Dutch
	- `cs` - Czech
	- `ar` - Arabic
	- `zh` - Chinese
	- `ja` - Japanese
	- `ko` - Korean
	- `hu` - Hungarian
	- `hi` - Hindi

	## 🔧 Technical Details

	- Model Architecture: XTTS v2 with GPT-style backbone
	- Export Method: TorchScript with mobile optimizations
	- PyTorch Version: 2.8.0 (use matching LibTorch version)
	- Sample Rate: 24,000 Hz
	- Quantization: FP16 uses half-precision floating point

	## 💡 Tips for Mobile Deployment

	1. Memory Management:
	- Load model once at app startup
	- Keep model in memory for multiple generations
	- Use `module.setNumThreads(1)` to reduce memory usage

	2. Performance Optimization:
	- Warm up model with dummy input on first load
	- Use FP16 variant for best balance
	- Consider chunking long texts

	3. Error Handling:
	```kotlin
	try {
	module = Module.load(modelPath)
	} catch (e: Exception) {
	// Fall back to server-side TTS
	Log.e("XTTS", "Failed to load model: ${e.message}")
	}
	```

	## 📝 Changelog

	- 2024-09-23: Initial release with TorchScript models
	- Added Original and FP16 variants
	- Optimized for PyTorch Mobile
	- Fixed compatibility issues

	## 📄 License

	Apache 2.0

	## 🙏 Acknowledgments

	Based on the official XTTS v2 model. Optimized for mobile deployment.

	## 📚 Citation

	```bibtex
	@misc{xtts2024mobile,
	title={XTTS v2 Mobile - TorchScript Edition},
	author={GenMedLabs},
	year={2024},
	publisher={HuggingFace}
	}
	```

	## ⚠️ Important Notes

	- These are TorchScript models (`.ts` files), not PyTorch checkpoints (`.pth`)
	- Models are self-contained and include all necessary weights
	- No additional tokenizer files needed - tokenization is built into the model
	- INT8 quantization not available for ARM-based systems