Update README.md

0a3b694 verified 29 days ago

3.88 kB

	---
	license: apache-2.0
	language:
	- en
	- hi
	- gu
	- bn
	- kn
	- mr
	- bho
	- mag
	- mai
	- te
	- chh
	datasets:
	- TruthShieldAI/TruthShieldVoiceGen
	base_model: coqui-ai/TTS-VITS
	pipeline_tag: text-to-speech
	library_name: TTS
	tags:
	- tts
	- multi-speaker
	- multilingual
	- accent-transfer
	- style-transfer
	- voice-cloning
	- india-languages
	---




	---
	license: apache-2.0
	---
	# TruthShield VoiceGen

	Multi-Speaker, Multilingual TTS with Accent & Style Transfer

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
	[![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/truthshield/voicegen)

	## Overview

	TruthShield VoiceGen is an advanced text-to-speech system supporting 11 languages with voice cloning, accent transfer, and style control capabilities. Built with safety-first principles using forensic speaker verification.

	## Features

	- 🌍 11 Languages: Hindi, Bengali, Telugu, Tamil, Kannada, Marathi, Gujarati, Bhojpuri, Maithili, Chhattisgarhi, Magahi, English
	- 🎤 Voice Cloning: Clone voices from short reference audio
	- 🗣️ Accent Transfer: Transfer accents while preserving content
	- 🎭 Style Control: Adjust speaking style and emotion
	- 🛡️ Safety Verification: ECAPA-TDNN forensic verification

	## Quick Start

	### Installation

	```bash
	git clone https://github.com/truthshield/voicegen.git
	cd voicegen
	pip install -r requirements.txt
	```

	### Run Server

	```bash
	uvicorn server:app --host 0.0.0.0 --port 8080
	```

	### API Usage

	```bash
	curl -X GET "http://localhost:8080/Get_Inference?text=hello%20world&lang=english" \
	-F "speaker_wav=@speaker.wav" \
	--output output.wav
	```

	## API Specification

	### Endpoint: GET /Get_Inference

	\| Parameter \| Type \| Required \| Description \|
	\|-----------\|------\|----------\|-------------\|
	\| text \| query \| Yes \| Text to synthesize \|
	\| lang \| query \| Yes \| Language code \|
	\| speaker_wav \| file \| Yes \| Reference speaker audio (WAV) \|

	### Supported Languages

	`bhojpuri, bengali, english, gujarati, hindi, chhattisgarhi, kannada, magahi, maithili, marathi, telugu`

	### Response Headers

	- `X-Model-Version`: Model version string
	- `X-Speaker-Similarity`: Voice similarity score
	- `X-Safety-Verified`: Safety verification status

	## Architecture

	```
	┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
	│ Text │──▶│ Phoneme │──▶│ VITS │──▶│ Safety │
	│ Input │ │ Encoder │ │ Encoder │ │ Layer │
	└──────────┘ └──────────┘ └──────────┘ └────┬─────┘
	│
	┌──────────┐ ┌──────────────┐ ┌───────────────▼──────┐
	│ Audio │◀──│ WAV Out │◀──│ HiFiGAN Vocoder │
	│ Output │ │ + Headers │ │ │
	└──────────┘ └──────────────┘ └──────────────────────┘
	```

	## Safety Layer

	All generated audio passes through ECAPA-TDNN speaker verification:

	1. Extract speaker embeddings from reference
	2. Generate audio using VITS
	3. Extract embeddings from generated audio
	4. Compute similarity score
	5. Apply threshold (0.85) for verification

	## Datasets

	See `datasets.csv` for training data sources.

	## License

	Apache 2.0

	## Citation

	```bibtex
	@misc{truthshield2024voicegen,
	title={TruthShield VoiceGen: Multi-Speaker Multilingual TTS},
	author={TruthShield Team},
	year={2024}
	}
	```