File size: 3,875 Bytes

---
license: apache-2.0
language:
  - en
  - hi
  - gu
  - bn
  - kn
  - mr
  - bho
  - mag
  - mai
  - te
  - chh
datasets:
  - TruthShieldAI/TruthShieldVoiceGen
base_model: coqui-ai/TTS-VITS
pipeline_tag: text-to-speech
library_name: TTS
tags:
  - tts
  - multi-speaker
  - multilingual
  - accent-transfer
  - style-transfer
  - voice-cloning
  - india-languages
---




---
license: apache-2.0
---
# TruthShield VoiceGen

Multi-Speaker, Multilingual TTS with Accent & Style Transfer

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)

## Overview

TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

## Features

- 🌍 **11 Languages**: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
- 🎤 **Voice Cloning**: Clone voices from short reference audio
- 🗣️ **Accent Transfer**: Transfer accents while preserving content
- 🎭 **Style Control**: Adjust speaking style and emotion
- 🛡️ **Safety Verification**: ECAPA-TDNN forensic verification

## Quick Start

### Installation

```bash
git clone https://github.com/truthshield/voicegen.git
cd voicegen
pip install -r requirements.txt
```

### Run Server

```bash
uvicorn server:app --host 0.0.0.0 --port 8080
```

### API Usage

```bash
curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
  -F "speaker_wav=@speaker.wav" \
  --output output.wav
```

## API Specification

### Endpoint: GET /Get_Inference

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| text | query | Yes | Text to synthesize |
| lang | query | Yes | Language code |
| speaker_wav | file | Yes | Reference speaker audio (WAV) |

### Supported Languages

`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`

### Response Headers

- `X-Model-Version`: Model version string
- `X-Speaker-Similarity`: Voice similarity score
- `X-Safety-Verified`: Safety verification status

## Architecture

```
┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│   Text   │──▶│ Phoneme  │──▶│   VITS   │──▶│  Safety  │
│  Input   │   │ Encoder  │   │ Encoder  │   │  Layer   │
└──────────┘   └──────────┘   └──────────┘   └────┬─────┘
                                                  │
┌──────────┐   ┌──────────────┐   ┌───────────────▼──────┐
│  Audio   │◀──│   WAV Out    │◀──│   HiFiGAN Vocoder    │
│  Output  │   │  + Headers   │   │                      │
└──────────┘   └──────────────┘   └──────────────────────┘
```

## Safety Layer

All generated audio passes through ECAPA-TDNN speaker verification:

1. Extract speaker embeddings from reference
2. Generate audio using VITS
3. Extract embeddings from generated audio
4. Compute similarity score
5. Apply threshold (0.85) for verification

## Datasets

See `datasets.csv` for training data sources.

## License

Apache 2.0

## Citation

```bibtex
@misc{truthshield2024voicegen,
  title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
  author={TruthShield Team},
  year={2024}
}
```