--- license: apache-2.0 language: - en - hi - gu - bn - kn - mr - bho - mag - mai - te - chh datasets: - TruthShieldAI/TruthShieldVoiceGen base_model: coqui-ai/TTS-VITS pipeline_tag: text-to-speech library_name: TTS tags: - tts - multi-speaker - multilingual - accent-transfer - style-transfer - voice-cloning - india-languages --- --- license: apache-2.0 --- # TruthShield VoiceGen Multi-Speaker, Multilingual TTS with Accent & Style Transfer [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) [![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen) ## Overview TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification. ## Features - 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English - 🎀 **Voice Cloning**: Clone voices from short reference audio - πŸ—£οΈ **Accent Transfer**: Transfer accents while preserving content - 🎭 **Style Control**: Adjust speaking style and emotion - πŸ›‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification ## Quick Start ### Installation ```bash git clone https://github.com/truthshield/voicegen.git cd voicegen pip install -r requirements.txt ``` ### Run Server ```bash uvicorn server:app --host 0.0.0.0 --port 8080 ``` ### API Usage ```bash curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \ -F "speaker_wav=@speaker.wav" \ --output output.wav ``` ## API Specification ### Endpoint: GET /Get_Inference | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | text | query | Yes | Text to synthesize | | lang | query | Yes | Language code | | speaker_wav | file | Yes | Reference speaker audio (WAV) | ### Supported Languages `bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu` ### Response Headers - `X-Model-Version`: Model version string - `X-Speaker-Similarity`: Voice similarity score - `X-Safety-Verified`: Safety verification status ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Text │──▢│ Phoneme │──▢│ VITS │──▢│ Safety β”‚ β”‚ Input β”‚ β”‚ Encoder β”‚ β”‚ Encoder β”‚ β”‚ Layer β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ Audio │◀──│ WAV Out │◀──│ HiFiGAN Vocoder β”‚ β”‚ Output β”‚ β”‚ + Headers β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Safety Layer All generated audio passes through ECAPA-TDNN speaker verification: 1. Extract speaker embeddings from reference 2. Generate audio using VITS 3. Extract embeddings from generated audio 4. Compute similarity score 5. Apply threshold (0.85) for verification ## Datasets See `datasets.csv` for training data sources. ## License Apache 2.0 ## Citation ```bibtex @misc{truthshield2024voicegen, title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS}, author={TruthShield Team}, year={2024} } ```