prabindersinghh's picture
Update README.md
0a3b694 verified
---
license: apache-2.0
language:
- en
- hi
- gu
- bn
- kn
- mr
- bho
- mag
- mai
- te
- chh
datasets:
- TruthShieldAI/TruthShieldVoiceGen
base_model: coqui-ai/TTS-VITS
pipeline_tag: text-to-speech
library_name: TTS
tags:
- tts
- multi-speaker
- multilingual
- accent-transfer
- style-transfer
- voice-cloning
- india-languages
---
---
license: apache-2.0
---
# TruthShield VoiceGen
Multi-Speaker, Multilingual TTS with Accent & Style Transfer
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)
## Overview
TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.
## Features
- 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
- 🎀 **Voice Cloning**: Clone voices from short reference audio
- πŸ—£οΈ **Accent Transfer**: Transfer accents while preserving content
- 🎭 **Style Control**: Adjust speaking style and emotion
- πŸ›‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification
## Quick Start
### Installation
```bash
git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt
```
### Run Server
```bash
uvicorn server:app --host 0.0.0.0 --port 8080
```
### API Usage
```bash
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
-F "speaker_wav=@speaker.wav" \
--output output.wav
```
## API Specification
### Endpoint: GET /Get_Inference
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| text | query | Yes | Text to synthesize |
| lang | query | Yes | Language code |
| speaker_wav | file | Yes | Reference speaker audio (WAV) |
### Supported Languages
`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`
### Response Headers
- `X-Model-Version`: Model version string
- `X-Speaker-Similarity`: Voice similarity score
- `X-Safety-Verified`: Safety verification status
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Text │──▢│ Phoneme │──▢│ VITS │──▢│ Safety β”‚
β”‚ Input β”‚ β”‚ Encoder β”‚ β”‚ Encoder β”‚ β”‚ Layer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚ Audio │◀──│ WAV Out │◀──│ HiFiGAN Vocoder β”‚
β”‚ Output β”‚ β”‚ + Headers β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Safety Layer
All generated audio passes through ECAPA-TDNN speaker verification:
1. Extract speaker embeddings from reference
2. Generate audio using VITS
3. Extract embeddings from generated audio
4. Compute similarity score
5. Apply threshold (0.85) for verification
## Datasets
See `datasets.csv` for training data sources.
## License
Apache 2.0
## Citation
```bibtex
@misc{truthshield2024voicegen,
title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
author={TruthShield Team},
year={2024}
}
```