YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Multilingual TTS System

This repository contains a modular Multilingual Text-to-Speech (TTS) system consisting of multiple microservices:

  • Gateway Service β†’ Main API entrypoint
  • ASR Service β†’ Speech-to-text using Whisper
  • Transliteration Service β†’ Converts Romanized text to native scripts
  • Normalization Service β†’ Convert transcripts (including reference-derived and input text) into canonical spoken-form representations
  • F5‑TTS Core Module β†’ Performs inference for multilingual speech generation

Each service runs independently with its own Docker container, and the entire stack can be deployed using Docker Compose or run individually as standalone services.


πŸš€ Features

  • Multilingual Text‑to‑Speech with support for 11 Indic languages
  • Automatic ASR support for input reference speaker wav
  • Text normalization & transliteration
  • Modular microservice-based architecture
  • Can run via single command (docker‑compose) or individual service deployment

πŸ“ Project Structure

.
β”œβ”€β”€ gateway/                  # Main API
β”œβ”€β”€ asr_service/              # Whisper speech-to-text
β”œβ”€β”€ transliteration_service/  # Language transliteration
β”œβ”€β”€ normalization_service/    # Text cleaning and formatting
β”œβ”€β”€ f5_tts/                   # Core multilingual TTS inference module
β”œβ”€β”€ docker-compose.yml        # Docker compose file for running all services 
β”œβ”€β”€.env                       # Required environment variables
β”œβ”€β”€ Dockerfile                # Gateway Dockerfile
β”œβ”€β”€ requirements.txt          # Gateway requirements file
β”œβ”€β”€ requirements_all.txt      # Requirements file for local development 
└── utils/                    # Utilites for TTS model loading

πŸ› οΈ Prerequisites

(Current system Docker version detected: Docker version 29.1.1, build 0aedba5, Docker Compose version v2.40.3) Before running, ensure you have:

  • Docker >= 24.0
  • Docker Compose >= v2
  • GPU with CUDA (optional but recommended)

CPU is supported, but GPU acceleration significantly improves inference speed.


πŸ“¦ Install Required System Packages

If you're running on a Linux host and plan to enable GPU support via NVIDIA Container Toolkit, install the following prerequisites:

sudo apt-get update && sudo apt-get install -y --no-install-recommends \
   curl \
   gnupg2

πŸ”§ Install NVIDIA Container Toolkit (For GPU Support)

  1. Add NVIDIA repository:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  1. Update package list:
sudo apt-get update
  1. Install toolkit:
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.1-1
sudo apt-get install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
  1. Restart Docker:
sudo systemctl restart docker

🧱 Installation

Clone the repository:

git clone https://huggingface.co/immverse-ai/voice-tech-for-all-challenge
cd voice-tech-for-all-challenge

πŸƒ Running the System

Option 1 β€” Run Entire System With Docker Compose (Recommended)

docker compose up -d

This spins up:

  • gateway β†’ exposed service
  • asr_service
  • transliteration_service
  • normalization_service

After startup, API is available at: http://localhost:8080


Option 2 β€” Run Each Container Individually

1️⃣ Build images

docker build -t tts-gateway ./gateway
docker build -t tts-asr ./asr_service
docker build -t tts-translit ./transliteration_service
docker build -t tts-normalization ./normalization_service

2️⃣ Run containers manually

sudo docker run --name tts-gateway --rm -d --gpus all \
  -e REPO_ID="" \
  -e HF_TOKEN="" \
  -e ASR_URL="" \
  -e TRANSLIT_URL="" \
  -e NORM_URL="" \
  -p 8080:8080 \
  tts-gateway
sudo docker run -d --name asr --gpus all -p 8072:8072 tts-asr
sudo docker run -d --name norm -p 8070:8070 tts-translit
docker run -p 8030:8030 tts-normalization

2️⃣ Run containers logs

sudo docker logs -f tts-gateway
sudo docker logs -f tts-asr
sudo docker logs -f tts-translit
sudo docker logs -f tts-normalization

πŸ—£οΈ API Usage (Example)

Python Example

import time
import requests
import os

base_url = 'http://<ip_address>:8080/Get_Inference'

speaker_wav = ""

params = {
  'text': "",
  'lang': "",
  }

file_name = os.path.basename(speaker_wav).split(".")[0]

t1 = time.time()
with open(speaker_wav, "rb") as AudioFile:
    response = requests.get(base_url, params = params, files = {'speaker_wav':AudioFile.read()})
t2 = time.time()

print("Time Taken:", t2 - t1, "sec")

file_path = ""

if response.status_code == 200:
    with open(file_path, 'wb') as f:
        f.write(response.content)
        print("Audio saved at file_path",file_path)
else:
    print(f"Request failed with status code {response.status_code}")
    print("Response:", response.text)

CURL Example

curl -X GET "http://<ip_address>:8080/Get_Inference?text=<your_text>&lang=<your_language>" \
     -H "Content-Type: multipart/form-data" \
     -F "speaker_wav=@/path/to/your/file.wav" \
     --output output.wav

🧩 Environment Variables

(Default configuration used by Docker Compose. Adjust values only if deploying manually for option 2 or customizing services. ) Below environment variables are used by the Gateway Service to communicate with other microservices:

Variable Description Example
ASR_URL Endpoint of the ASR Whisper service http://asr:8072
TRANSLIT_URL Endpoint of the Transliteration service http://translit:8071
NORM_URL Endpoint of the Text Normalization service http://norm:8070
HF_TOKEN Optional Hugging Face token for private models xxxxxxx
REPO_ID Hugging Face model repo ID for TTS microsoft/f5-tts

Example .env file (used automatically by Docker Compose):

ASR_URL=http://asr:8072
TRANSLIT_URL=http://translit:8071
NORM_URL=http://norm:8070
HF_TOKEN=
REPO_ID=

πŸ”§ Development Mode

Run gateway directly without Docker (for debugging):

# 1. Create and activate virtual environment
python -m venv tts-venv

# On macOS/Linux
source tts-venv/bin/activate

# 2. Install dependencies
pip install -r requirements_all.txt

# 3. Run asr service
python -m asr_service.whisper_api

# 4. Run text normalization service
python -m normalization_service.text_norm_api

# 4. Run transliteration service
python -m transliteration_service.transliteration_api

# 3. Run the gateway
python -m gateway.main

πŸ§‘β€πŸ’» Contributing

Pull requests and feature requests are welcome!

πŸ’¬ Support

For issues or improvements, open a ticket in the repository.

Downloads last month
43
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support