Multilingual TTS System
This repository contains a modular Multilingual Text-to-Speech (TTS) system consisting of multiple microservices:
- Gateway Service β Main API entrypoint
- ASR Service β Speech-to-text using Whisper
- Transliteration Service β Converts Romanized text to native scripts
- Normalization Service β Convert transcripts (including reference-derived and input text) into canonical spoken-form representations
- F5βTTS Core Module β Performs inference for multilingual speech generation
Each service runs independently with its own Docker container, and the entire stack can be deployed using Docker Compose or run individually as standalone services.
π Features
- Multilingual TextβtoβSpeech with support for 11 Indic languages
- Automatic ASR support for input reference speaker wav
- Text normalization & transliteration
- Modular microservice-based architecture
- Can run via single command (dockerβcompose) or individual service deployment
π Project Structure
.
βββ gateway/ # Main API
βββ asr_service/ # Whisper speech-to-text
βββ transliteration_service/ # Language transliteration
βββ normalization_service/ # Text cleaning and formatting
βββ f5_tts/ # Core multilingual TTS inference module
βββ docker-compose.yml # Docker compose file for running all services
βββ.env # Required environment variables
βββ Dockerfile # Gateway Dockerfile
βββ requirements.txt # Gateway requirements file
βββ requirements_all.txt # Requirements file for local development
βββ utils/ # Utilites for TTS model loading
π οΈ Prerequisites
(Current system Docker version detected: Docker version 29.1.1, build 0aedba5, Docker Compose version v2.40.3) Before running, ensure you have:
- Docker >= 24.0
- Docker Compose >= v2
- GPU with CUDA (optional but recommended)
CPU is supported, but GPU acceleration significantly improves inference speed.
π¦ Install Required System Packages
If you're running on a Linux host and plan to enable GPU support via NVIDIA Container Toolkit, install the following prerequisites:
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
curl \
gnupg2
π§ Install NVIDIA Container Toolkit (For GPU Support)
- Add NVIDIA repository:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- Update package list:
sudo apt-get update
- Install toolkit:
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.1-1
sudo apt-get install -y \
nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
- Restart Docker:
sudo systemctl restart docker
π§± Installation
Clone the repository:
git clone https://huggingface.co/immverse-ai/voice-tech-for-all-challenge
cd voice-tech-for-all-challenge
π Running the System
Option 1 β Run Entire System With Docker Compose (Recommended)
docker compose up -d
This spins up:
gatewayβ exposed serviceasr_servicetransliteration_servicenormalization_service
After startup, API is available at: http://localhost:8080
Option 2 β Run Each Container Individually
1οΈβ£ Build images
docker build -t tts-gateway ./gateway
docker build -t tts-asr ./asr_service
docker build -t tts-translit ./transliteration_service
docker build -t tts-normalization ./normalization_service
2οΈβ£ Run containers manually
sudo docker run --name tts-gateway --rm -d --gpus all \
-e REPO_ID="" \
-e HF_TOKEN="" \
-e ASR_URL="" \
-e TRANSLIT_URL="" \
-e NORM_URL="" \
-p 8080:8080 \
tts-gateway
sudo docker run -d --name asr --gpus all -p 8072:8072 tts-asr
sudo docker run -d --name norm -p 8070:8070 tts-translit
docker run -p 8030:8030 tts-normalization
2οΈβ£ Run containers logs
sudo docker logs -f tts-gateway
sudo docker logs -f tts-asr
sudo docker logs -f tts-translit
sudo docker logs -f tts-normalization
π£οΈ API Usage (Example)
Python Example
import time
import requests
import os
base_url = 'http://<ip_address>:8080/Get_Inference'
speaker_wav = ""
params = {
'text': "",
'lang': "",
}
file_name = os.path.basename(speaker_wav).split(".")[0]
t1 = time.time()
with open(speaker_wav, "rb") as AudioFile:
response = requests.get(base_url, params = params, files = {'speaker_wav':AudioFile.read()})
t2 = time.time()
print("Time Taken:", t2 - t1, "sec")
file_path = ""
if response.status_code == 200:
with open(file_path, 'wb') as f:
f.write(response.content)
print("Audio saved at file_path",file_path)
else:
print(f"Request failed with status code {response.status_code}")
print("Response:", response.text)
CURL Example
curl -X GET "http://<ip_address>:8080/Get_Inference?text=<your_text>&lang=<your_language>" \
-H "Content-Type: multipart/form-data" \
-F "speaker_wav=@/path/to/your/file.wav" \
--output output.wav
π§© Environment Variables
(Default configuration used by Docker Compose. Adjust values only if deploying manually for option 2 or customizing services. ) Below environment variables are used by the Gateway Service to communicate with other microservices:
| Variable | Description | Example |
|---|---|---|
ASR_URL |
Endpoint of the ASR Whisper service | http://asr:8072 |
TRANSLIT_URL |
Endpoint of the Transliteration service | http://translit:8071 |
NORM_URL |
Endpoint of the Text Normalization service | http://norm:8070 |
HF_TOKEN |
Optional Hugging Face token for private models | xxxxxxx |
REPO_ID |
Hugging Face model repo ID for TTS | microsoft/f5-tts |
Example .env file (used automatically by Docker Compose):
ASR_URL=http://asr:8072
TRANSLIT_URL=http://translit:8071
NORM_URL=http://norm:8070
HF_TOKEN=
REPO_ID=
π§ Development Mode
Run gateway directly without Docker (for debugging):
# 1. Create and activate virtual environment
python -m venv tts-venv
# On macOS/Linux
source tts-venv/bin/activate
# 2. Install dependencies
pip install -r requirements_all.txt
# 3. Run asr service
python -m asr_service.whisper_api
# 4. Run text normalization service
python -m normalization_service.text_norm_api
# 4. Run transliteration service
python -m transliteration_service.transliteration_api
# 3. Run the gateway
python -m gateway.main
π§βπ» Contributing
Pull requests and feature requests are welcome!
π¬ Support
For issues or improvements, open a ticket in the repository.
- Downloads last month
- 43