File size: 3,875 Bytes
0a3b694 64c4f62 0a3b694 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
license: apache-2.0
language:
- en
- hi
- gu
- bn
- kn
- mr
- bho
- mag
- mai
- te
- chh
datasets:
- TruthShieldAI/TruthShieldVoiceGen
base_model: coqui-ai/TTS-VITS
pipeline_tag: text-to-speech
library_name: TTS
tags:
- tts
- multi-speaker
- multilingual
- accent-transfer
- style-transfer
- voice-cloning
- india-languages
---
---
license: apache-2.0
---
# TruthShield VoiceGen
Multi-Speaker, Multilingual TTS with Accent & Style Transfer
[](LICENSE)
[](https://huggingface.co/truthshield/voicegen)
## Overview
TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.
## Features
- π **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
- π€ **Voice Cloning**: Clone voices from short reference audio
- π£οΈ **Accent Transfer**: Transfer accents while preserving content
- π **Style Control**: Adjust speaking style and emotion
- π‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification
## Quick Start
### Installation
```bash
git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt
```
### Run Server
```bash
uvicorn server:app --host 0.0.0.0 --port 8080
```
### API Usage
```bash
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
-F "speaker_wav=@speaker.wav" \
--output output.wav
```
## API Specification
### Endpoint: GET /Get_Inference
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| text | query | Yes | Text to synthesize |
| lang | query | Yes | Language code |
| speaker_wav | file | Yes | Reference speaker audio (WAV) |
### Supported Languages
`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`
### Response Headers
- `X-Model-Version`: Model version string
- `X-Speaker-Similarity`: Voice similarity score
- `X-Safety-Verified`: Safety verification status
## Architecture
```
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Text ββββΆβ Phoneme ββββΆβ VITS ββββΆβ Safety β
β Input β β Encoder β β Encoder β β Layer β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββ¬ββββββ
β
ββββββββββββ ββββββββββββββββ βββββββββββββββββΌβββββββ
β Audio βββββ WAV Out βββββ HiFiGAN Vocoder β
β Output β β + Headers β β β
ββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ
```
## Safety Layer
All generated audio passes through ECAPA-TDNN speaker verification:
1. Extract speaker embeddings from reference
2. Generate audio using VITS
3. Extract embeddings from generated audio
4. Compute similarity score
5. Apply threshold (0.85) for verification
## Datasets
See `datasets.csv` for training data sources.
## License
Apache 2.0
## Citation
```bibtex
@misc{truthshield2024voicegen,
title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
author={TruthShield Team},
year={2024}
}
``` |