prabindersinghh commited on
Commit
64c4f62
Β·
verified Β·
1 Parent(s): 692f57f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TruthShield VoiceGen
2
+
3
+ Multi-Speaker, Multilingual TTS with Accent & Style Transfer
4
+
5
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
6
+ [![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)
7
+
8
+ ## Overview
9
+
10
+ TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.
11
+
12
+ ## Features
13
+
14
+ - 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
15
+ - 🎀 **Voice Cloning**: Clone voices from short reference audio
16
+ - πŸ—£οΈ **Accent Transfer**: Transfer accents while preserving content
17
+ - 🎭 **Style Control**: Adjust speaking style and emotion
18
+ - πŸ›‘οΈ **Safety Verification**: ECAPA-TDNN forensic verification
19
+
20
+ ## Quick Start
21
+
22
+ ### Installation
23
+
24
+ ```bash
25
+ git clone https://github.com/truthshield/voicegen.git
26
+ cd voicegen
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ ### Run Server
31
+
32
+ ```bash
33
+ uvicorn server:app --host 0.0.0.0 --port 8080
34
+ ```
35
+
36
+ ### API Usage
37
+
38
+ ```bash
39
+ curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
40
+ -F "speaker_wav=@speaker.wav" \
41
+ --output output.wav
42
+ ```
43
+
44
+ ## API Specification
45
+
46
+ ### Endpoint: GET /Get_Inference
47
+
48
+ | Parameter | Type | Required | Description |
49
+ |-----------|------|----------|-------------|
50
+ | text | query | Yes | Text to synthesize |
51
+ | lang | query | Yes | Language code |
52
+ | speaker_wav | file | Yes | Reference speaker audio (WAV) |
53
+
54
+ ### Supported Languages
55
+
56
+ `bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`
57
+
58
+ ### Response Headers
59
+
60
+ - `X-Model-Version`: Model version string
61
+ - `X-Speaker-Similarity`: Voice similarity score
62
+ - `X-Safety-Verified`: Safety verification status
63
+
64
+ ## Architecture
65
+
66
+ ```
67
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
68
+ β”‚ Text │──▢│ Phoneme │──▢│ VITS │──▢│ Safety β”‚
69
+ β”‚ Input β”‚ β”‚ Encoder β”‚ β”‚ Encoder β”‚ β”‚ Layer β”‚
70
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
71
+ β”‚
72
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
73
+ β”‚ Audio │◀──│ WAV Out │◀──│ HiFiGAN Vocoder β”‚
74
+ β”‚ Output β”‚ β”‚ + Headers β”‚ β”‚ β”‚
75
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
76
+ ```
77
+
78
+ ## Safety Layer
79
+
80
+ All generated audio passes through ECAPA-TDNN speaker verification:
81
+
82
+ 1. Extract speaker embeddings from reference
83
+ 2. Generate audio using VITS
84
+ 3. Extract embeddings from generated audio
85
+ 4. Compute similarity score
86
+ 5. Apply threshold (0.85) for verification
87
+
88
+ ## Datasets
89
+
90
+ See `datasets.csv` for training data sources.
91
+
92
+ ## License
93
+
94
+ Apache 2.0
95
+
96
+ ## Citation
97
+
98
+ ```bibtex
99
+ @misc{truthshield2024voicegen,
100
+ title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
101
+ author={TruthShield Team},
102
+ year={2024}
103
+ }
104
+ ```