prabindersinghh
/

TruthShieldAIVoiceGen

+# TruthShield VoiceGen
+Multi-Speaker, Multilingual TTS with Accent & Style Transfer
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
+[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)
+## Overview
+TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.
+## Features
+- 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
+- 🎤 **Voice Cloning**: Clone voices from short reference audio
+- 🗣️ **Accent Transfer**: Transfer accents while preserving content
+- 🎭 **Style Control**: Adjust speaking style and emotion
+- 🛡️ **Safety Verification**: ECAPA-TDNN forensic verification
+## Quick Start
+### Installation
+```bash
+git clone https://github.com/truthshield/voicegen.git
+cd voicegen
+pip install -r requirements.txt
+```
+### Run Server
+```bash
+uvicorn server:app --host 0.0.0.0 --port 8080
+```
+### API Usage
+```bash
+curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
+  -F "speaker_wav=@speaker.wav" \
+  --output output.wav
+```
+## API Specification
+### Endpoint: GET /Get_Inference
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| text | query | Yes | Text to synthesize |
+| lang | query | Yes | Language code |
+| speaker_wav | file | Yes | Reference speaker audio (WAV) |
+### Supported Languages
+`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`
+### Response Headers
+- `X-Model-Version`: Model version string
+- `X-Speaker-Similarity`: Voice similarity score
+- `X-Safety-Verified`: Safety verification status
+## Architecture
+```
+┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
+│   Text   │──▶│ Phoneme  │──▶│   VITS   │──▶│  Safety  │
+│  Input   │   │ Encoder  │   │ Encoder  │   │  Layer   │
+└──────────┘   └──────────┘   └──────────┘   └────┬─────┘
+                                                  │
+┌──────────┐   ┌──────────────┐   ┌───────────────▼──────┐
+│  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    │
+│  Output  │   │  + Headers   │   │                      │
+└──────────┘   └──────────────┘   └──────────────────────┘
+```
+## Safety Layer
+All generated audio passes through ECAPA-TDNN speaker verification:
+1. Extract speaker embeddings from reference
+2. Generate audio using VITS
+3. Extract embeddings from generated audio
+4. Compute similarity score
+5. Apply threshold (0.85) for verification
+## Datasets
+See `datasets.csv` for training data sources.
+## License
+Apache 2.0
+## Citation
+```bibtex
+@misc{truthshield2024voicegen,
+  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
+  author={TruthShield Team},
+  year={2024}
+}
+```