File size: 3,875 Bytes
0a3b694
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64c4f62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a3b694
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: apache-2.0
language:
  - en
  - hi
  - gu
  - bn
  - kn
  - mr
  - bho
  - mag
  - mai
  - te
  - chh
datasets:
  - TruthShieldAI/TruthShieldVoiceGen
base_model: coqui-ai/TTS-VITS
pipeline_tag: text-to-speech
library_name: TTS
tags:
  - tts
  - multi-speaker
  - multilingual
  - accent-transfer
  - style-transfer
  - voice-cloning
  - india-languages
---




---
license: apache-2.0
---
# TruthShield VoiceGen

Multi-Speaker, Multilingual TTS with Accent & Style Transfer

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)

## Overview

TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

## Features

- 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
- 🎀 **Voice Cloning**: Clone voices from short reference audio
- πŸ—£οΈ **Accent Transfer**: Transfer accents while preserving content
- 🎭 **Style Control**: Adjust speaking style and emotion
- πŸ›‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification

## Quick Start

### Installation

```bash
git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt
```

### Run Server

```bash
uvicorn server:app --host 0.0.0.0 --port 8080
```

### API Usage

```bash
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
  -F "speaker_wav=@speaker.wav" \
  --output output.wav
```

## API Specification

### Endpoint: GET /Get_Inference

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| text | query | Yes | Text to synthesize |
| lang | query | Yes | Language code |
| speaker_wav | file | Yes | Reference speaker audio (WAV) |

### Supported Languages

`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`

### Response Headers

- `X-Model-Version`: Model version string
- `X-Speaker-Similarity`: Voice similarity score
- `X-Safety-Verified`: Safety verification status

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Text   │──▢│ Phoneme  │──▢│   VITS   │──▢│  Safety  β”‚
β”‚  Input   β”‚   β”‚ Encoder  β”‚   β”‚ Encoder  β”‚   β”‚  Layer   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                                                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    β”‚
β”‚  Output  β”‚   β”‚  + Headers   β”‚   β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Safety Layer

All generated audio passes through ECAPA-TDNN speaker verification:

1. Extract speaker embeddings from reference
2. Generate audio using VITS
3. Extract embeddings from generated audio
4. Compute similarity score
5. Apply threshold (0.85) for verification

## Datasets

See `datasets.csv` for training data sources.

## License

Apache 2.0

## Citation

```bibtex
@misc{truthshield2024voicegen,
  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
  author={TruthShield Team},
  year={2024}
}
```