Nav772's picture
Update README.md
65b0f99 verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: Audio Language Translator
emoji: ๐ŸŒ
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: run.py
pinned: false
license: mit
suggested_hardware: t4-small

๐ŸŒ Audio Language Translator

Translate spoken audio between 15 languages using a complete AI pipeline.

๐ŸŽฏ What This Does

  1. Upload or record audio in any supported language
  2. Automatic detection of source language
  3. Translation to your chosen target language
  4. Speech synthesis in the target language with selectable voices

๐Ÿ”Œ REST API

This translator is also available as a REST API for developers!

๐Ÿ“š Interactive API Docs: https://nav772-audio-language-translator.hf.space/docs

API Endpoints

Endpoint Method Description
/api/health GET Health check and model status
/api/languages GET List all 15 supported languages
/api/voices/{lang} GET Get available TTS voices for a language
/api/transcribe POST Transcribe audio only (no translation)
/api/translate POST Full pipeline (returns JSON)
/api/translate/audio POST Full pipeline (returns audio file)

Quick Example (Python)

import requests

# Translate audio to Spanish
with open("input.wav", "rb") as f:
    response = requests.post(
        "https://nav772-audio-language-translator.hf.space/api/translate",
        files={"file": f},
        params={"target_language": "es"}
    )

result = response.json()
print(f"Original: {result['original_text']}")
print(f"Translated: {result['translated_text']}")

Quick Example (cURL)

curl -X POST \
  "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
  -F "file=@input.wav"

๐Ÿ› ๏ธ Built With This API

Project Developer Description
Audio Translator App @kaunghtetsan11 Mobile app built using this API

Want your project featured here? Open a discussion or PR!

๐Ÿ—๏ธ Architecture

Audio Input (any language)
        โ†“
Whisper ASR (transcription + language detection)
        โ†“
NLLB Translation (to target language)
        โ†“
Edge-TTS (neural speech synthesis)
        โ†“
Audio Output + Text Display

๐Ÿ”ง Technical Stack

Component Model Parameters Purpose
ASR openai/whisper-small 244M Speech recognition with automatic language detection
Translation facebook/nllb-200-distilled-600M 615M Multilingual neural machine translation
TTS Microsoft Edge-TTS API High-quality neural text-to-speech
API FastAPI - REST API endpoints
UI Gradio - Interactive web interface

๐ŸŒ Supported Languages

Tier 1: Multiple Voice Options (3 each)

  • ๐Ÿ‡บ๐Ÿ‡ธ English (US/UK accents)
  • ๐Ÿ‡ช๐Ÿ‡ธ Spanish (Spain/Mexico)
  • ๐Ÿ‡ซ๐Ÿ‡ท French (France/Canada)
  • ๐Ÿ‡ฉ๐Ÿ‡ช German (Germany/Austria)
  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese (Mandarin)

Tier 2: Single High-Quality Voice

  • ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ง๐Ÿ‡ท Portuguese
  • ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡น๐Ÿ‡ท Turkish

Total: 15 languages, 25 voices

๐Ÿ“š Research Foundation

Paper Authors Year Contribution
Robust Speech Recognition via Large-Scale Weak Supervision Radford et al. 2022 Whisper ASR model
No Language Left Behind Costa-jussร  et al. 2022 NLLB translation model

๐Ÿ“ Limitations

  • Audio length: Optimized for clips under 30 seconds
  • Internet required: Edge-TTS requires connectivity
  • GPU recommended: CPU inference is significantly slower

โš ๏ธ Development Challenges & Solutions

Challenge 1: Gradio 5.x/6.x Giant Audio Icons

Problem: Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x.

Attempted fixes that didn't work:

  • Custom CSS targeting SVG elements
  • Using elem_classes and scale parameters
  • Various Gradio version downgrades

Solution: Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS.

Challenge 2: Gradio 4.x + Python 3.13 Incompatibility

Problem: Older Gradio versions (4.x) failed to build due to tokenizers and pyo3 not supporting Python 3.13.

Error: Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)

Solution: Used Gradio 6.x which has native Python 3.13 support.

Challenge 3: FastAPI + Gradio Mount Conflicts

Problem: Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops.

Error pattern:

Invalid port: '7861_appimmutablechunksD2RdMstj.js'
GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found

Root cause: Using demo.launch() after gr.mount_gradio_app() created conflicting servers.

Solution:

  1. Created separate run.py to handle uvicorn server
  2. Used gr.mount_gradio_app(api_app, demo, path="/") without calling demo.launch()
  3. Let uvicorn serve the combined FastAPI + Gradio app

Challenge 4: HuggingFace Hub Compatibility

Problem: Older Gradio versions required older huggingface_hub versions, causing import errors.

Error: ImportError: cannot import name 'HfFolder' from 'huggingface_hub'

Solution: Removed version pins and let HuggingFace Spaces resolve compatible versions automatically.

Key Takeaways

  • Version compatibility is critical when combining multiple frameworks
  • Simpler is better โ€” avoid custom CSS when possible
  • Separate concerns โ€” use run.py for server logic, app.py for app definition
  • Test incrementally โ€” verify UI works before adding API complexity

๐Ÿ‘ค Author

Nav772 โ€” Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.

๐Ÿ“š Related Projects

๐Ÿ“„ License

MIT License