Spaces:
Runtime error
A newer version of the Gradio SDK is available:
6.9.0
title: Audio Language Translator
emoji: ๐
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: run.py
pinned: false
license: mit
suggested_hardware: t4-small
๐ Audio Language Translator
Translate spoken audio between 15 languages using a complete AI pipeline.
๐ฏ What This Does
- Upload or record audio in any supported language
- Automatic detection of source language
- Translation to your chosen target language
- Speech synthesis in the target language with selectable voices
๐ REST API
This translator is also available as a REST API for developers!
๐ Interactive API Docs: https://nav772-audio-language-translator.hf.space/docs
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check and model status |
/api/languages |
GET | List all 15 supported languages |
/api/voices/{lang} |
GET | Get available TTS voices for a language |
/api/transcribe |
POST | Transcribe audio only (no translation) |
/api/translate |
POST | Full pipeline (returns JSON) |
/api/translate/audio |
POST | Full pipeline (returns audio file) |
Quick Example (Python)
import requests
# Translate audio to Spanish
with open("input.wav", "rb") as f:
response = requests.post(
"https://nav772-audio-language-translator.hf.space/api/translate",
files={"file": f},
params={"target_language": "es"}
)
result = response.json()
print(f"Original: {result['original_text']}")
print(f"Translated: {result['translated_text']}")
Quick Example (cURL)
curl -X POST \
"https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
-F "file=@input.wav"
๐ ๏ธ Built With This API
| Project | Developer | Description |
|---|---|---|
| Audio Translator App | @kaunghtetsan11 | Mobile app built using this API |
Want your project featured here? Open a discussion or PR!
๐๏ธ Architecture
Audio Input (any language)
โ
Whisper ASR (transcription + language detection)
โ
NLLB Translation (to target language)
โ
Edge-TTS (neural speech synthesis)
โ
Audio Output + Text Display
๐ง Technical Stack
| Component | Model | Parameters | Purpose |
|---|---|---|---|
| ASR | openai/whisper-small | 244M | Speech recognition with automatic language detection |
| Translation | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation |
| TTS | Microsoft Edge-TTS | API | High-quality neural text-to-speech |
| API | FastAPI | - | REST API endpoints |
| UI | Gradio | - | Interactive web interface |
๐ Supported Languages
Tier 1: Multiple Voice Options (3 each)
- ๐บ๐ธ English (US/UK accents)
- ๐ช๐ธ Spanish (Spain/Mexico)
- ๐ซ๐ท French (France/Canada)
- ๐ฉ๐ช German (Germany/Austria)
- ๐จ๐ณ Chinese (Mandarin)
Tier 2: Single High-Quality Voice
- ๐ธ๐ฆ Arabic, ๐ฎ๐ณ Hindi, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ง๐ท Portuguese
- ๐ท๐บ Russian, ๐ฎ๐น Italian, ๐ณ๐ฑ Dutch, ๐ต๐ฑ Polish, ๐น๐ท Turkish
Total: 15 languages, 25 voices
๐ Research Foundation
| Paper | Authors | Year | Contribution |
|---|---|---|---|
| Robust Speech Recognition via Large-Scale Weak Supervision | Radford et al. | 2022 | Whisper ASR model |
| No Language Left Behind | Costa-jussร et al. | 2022 | NLLB translation model |
๐ Limitations
- Audio length: Optimized for clips under 30 seconds
- Internet required: Edge-TTS requires connectivity
- GPU recommended: CPU inference is significantly slower
โ ๏ธ Development Challenges & Solutions
Challenge 1: Gradio 5.x/6.x Giant Audio Icons
Problem: Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x.
Attempted fixes that didn't work:
- Custom CSS targeting SVG elements
- Using
elem_classesandscaleparameters - Various Gradio version downgrades
Solution: Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS.
Challenge 2: Gradio 4.x + Python 3.13 Incompatibility
Problem: Older Gradio versions (4.x) failed to build due to tokenizers and pyo3 not supporting Python 3.13.
Error: Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)
Solution: Used Gradio 6.x which has native Python 3.13 support.
Challenge 3: FastAPI + Gradio Mount Conflicts
Problem: Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops.
Error pattern:
Invalid port: '7861_appimmutablechunksD2RdMstj.js'
GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found
Root cause: Using demo.launch() after gr.mount_gradio_app() created conflicting servers.
Solution:
- Created separate
run.pyto handle uvicorn server - Used
gr.mount_gradio_app(api_app, demo, path="/")without callingdemo.launch() - Let uvicorn serve the combined FastAPI + Gradio app
Challenge 4: HuggingFace Hub Compatibility
Problem: Older Gradio versions required older huggingface_hub versions, causing import errors.
Error: ImportError: cannot import name 'HfFolder' from 'huggingface_hub'
Solution: Removed version pins and let HuggingFace Spaces resolve compatible versions automatically.
Key Takeaways
- Version compatibility is critical when combining multiple frameworks
- Simpler is better โ avoid custom CSS when possible
- Separate concerns โ use
run.pyfor server logic,app.pyfor app definition - Test incrementally โ verify UI works before adding API complexity
๐ค Author
Nav772 โ Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.
๐ Related Projects
๐ License
MIT License