|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- hi |
|
|
- gu |
|
|
- bn |
|
|
- kn |
|
|
- mr |
|
|
- bho |
|
|
- mag |
|
|
- mai |
|
|
- te |
|
|
- chh |
|
|
datasets: |
|
|
- TruthShieldAI/TruthShieldVoiceGen |
|
|
base_model: coqui-ai/TTS-VITS |
|
|
pipeline_tag: text-to-speech |
|
|
library_name: TTS |
|
|
tags: |
|
|
- tts |
|
|
- multi-speaker |
|
|
- multilingual |
|
|
- accent-transfer |
|
|
- style-transfer |
|
|
- voice-cloning |
|
|
- india-languages |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# TruthShield VoiceGen |
|
|
|
|
|
Multi-Speaker, Multilingual TTS with Accent & Style Transfer |
|
|
|
|
|
[](LICENSE) |
|
|
[](https://huggingface.co/truthshield/voicegen) |
|
|
|
|
|
## Overview |
|
|
|
|
|
TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification. |
|
|
|
|
|
## Features |
|
|
|
|
|
- π **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English |
|
|
- π€ **Voice Cloning**: Clone voices from short reference audio |
|
|
- π£οΈ **Accent Transfer**: Transfer accents while preserving content |
|
|
- π **Style Control**: Adjust speaking style and emotion |
|
|
- π‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/truthshield/voicegen.git |
|
|
cd voicegen |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### Run Server |
|
|
|
|
|
```bash |
|
|
uvicorn server:app --host 0.0.0.0 --port 8080 |
|
|
``` |
|
|
|
|
|
### API Usage |
|
|
|
|
|
```bash |
|
|
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \ |
|
|
-F "speaker_wav=@speaker.wav" \ |
|
|
--output output.wav |
|
|
``` |
|
|
|
|
|
## API Specification |
|
|
|
|
|
### Endpoint: GET /Get_Inference |
|
|
|
|
|
| Parameter | Type | Required | Description | |
|
|
|-----------|------|----------|-------------| |
|
|
| text | query | Yes | Text to synthesize | |
|
|
| lang | query | Yes | Language code | |
|
|
| speaker_wav | file | Yes | Reference speaker audio (WAV) | |
|
|
|
|
|
### Supported Languages |
|
|
|
|
|
`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu` |
|
|
|
|
|
### Response Headers |
|
|
|
|
|
- `X-Model-Version`: Model version string |
|
|
- `X-Speaker-Similarity`: Voice similarity score |
|
|
- `X-Safety-Verified`: Safety verification status |
|
|
|
|
|
## Architecture |
|
|
|
|
|
``` |
|
|
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ |
|
|
β Text ββββΆβ Phoneme ββββΆβ VITS ββββΆβ Safety β |
|
|
β Input β β Encoder β β Encoder β β Layer β |
|
|
ββββββββββββ ββββββββββββ ββββββββββββ ββββββ¬ββββββ |
|
|
β |
|
|
ββββββββββββ ββββββββββββββββ βββββββββββββββββΌβββββββ |
|
|
β Audio βββββ WAV Out βββββ HiFiGAN Vocoder β |
|
|
β Output β β + Headers β β β |
|
|
ββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
## Safety Layer |
|
|
|
|
|
All generated audio passes through ECAPA-TDNN speaker verification: |
|
|
|
|
|
1. Extract speaker embeddings from reference |
|
|
2. Generate audio using VITS |
|
|
3. Extract embeddings from generated audio |
|
|
4. Compute similarity score |
|
|
5. Apply threshold (0.85) for verification |
|
|
|
|
|
## Datasets |
|
|
|
|
|
See `datasets.csv` for training data sources. |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{truthshield2024voicegen, |
|
|
title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS}, |
|
|
author={TruthShield Team}, |
|
|
year={2024} |
|
|
} |
|
|
``` |