Spaces:

Nav772
/

audio-language-translator

Runtime error

App Files Files Community

audio-language-translator / README.md

Nav772

Update README.md

65b0f99 verified 14 days ago

preview code

raw

history blame contribute delete

6.64 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Audio Language Translator
emoji: 🌍
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.5.1
app_file: run.py
pinned: false
license: mit
suggested_hardware: t4-small

🌍 Audio Language Translator

Translate spoken audio between 15 languages using a complete AI pipeline.

🎯 What This Does

Upload or record audio in any supported language
Automatic detection of source language
Translation to your chosen target language
Speech synthesis in the target language with selectable voices

🔌 REST API

This translator is also available as a REST API for developers!

📚 Interactive API Docs: https://nav772-audio-language-translator.hf.space/docs

API Endpoints

Endpoint	Method	Description
`/api/health`	GET	Health check and model status
`/api/languages`	GET	List all 15 supported languages
`/api/voices/{lang}`	GET	Get available TTS voices for a language
`/api/transcribe`	POST	Transcribe audio only (no translation)
`/api/translate`	POST	Full pipeline (returns JSON)
`/api/translate/audio`	POST	Full pipeline (returns audio file)

Quick Example (Python)

import requests

# Translate audio to Spanish
with open("input.wav", "rb") as f:
    response = requests.post(
        "https://nav772-audio-language-translator.hf.space/api/translate",
        files={"file": f},
        params={"target_language": "es"}
    )

result = response.json()
print(f"Original: {result['original_text']}")
print(f"Translated: {result['translated_text']}")

Quick Example (cURL)

curl -X POST \
  "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
  -F "file=@input.wav"

🛠️ Built With This API

Project	Developer	Description
Audio Translator App	@kaunghtetsan11	Mobile app built using this API

Want your project featured here? Open a discussion or PR!

🏗️ Architecture

Audio Input (any language)
        ↓
Whisper ASR (transcription + language detection)
        ↓
NLLB Translation (to target language)
        ↓
Edge-TTS (neural speech synthesis)
        ↓
Audio Output + Text Display

🔧 Technical Stack

Component	Model	Parameters	Purpose
ASR	openai/whisper-small	244M	Speech recognition with automatic language detection
Translation	facebook/nllb-200-distilled-600M	615M	Multilingual neural machine translation
TTS	Microsoft Edge-TTS	API	High-quality neural text-to-speech
API	FastAPI	-	REST API endpoints
UI	Gradio	-	Interactive web interface

🌐 Supported Languages

Tier 1: Multiple Voice Options (3 each)

🇺🇸 English (US/UK accents)
🇪🇸 Spanish (Spain/Mexico)
🇫🇷 French (France/Canada)
🇩🇪 German (Germany/Austria)
🇨🇳 Chinese (Mandarin)

Tier 2: Single High-Quality Voice

🇸🇦 Arabic, 🇮🇳 Hindi, 🇯🇵 Japanese, 🇰🇷 Korean, 🇧🇷 Portuguese
🇷🇺 Russian, 🇮🇹 Italian, 🇳🇱 Dutch, 🇵🇱 Polish, 🇹🇷 Turkish

Total: 15 languages, 25 voices

📚 Research Foundation

Paper	Authors	Year	Contribution
Robust Speech Recognition via Large-Scale Weak Supervision	Radford et al.	2022	Whisper ASR model
No Language Left Behind	Costa-jussà et al.	2022	NLLB translation model

📝 Limitations

Audio length: Optimized for clips under 30 seconds
Internet required: Edge-TTS requires connectivity
GPU recommended: CPU inference is significantly slower

⚠️ Development Challenges & Solutions

Challenge 1: Gradio 5.x/6.x Giant Audio Icons

Problem: Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x.

Attempted fixes that didn't work:

Custom CSS targeting SVG elements
Using elem_classes and scale parameters
Various Gradio version downgrades

Solution: Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS.

Challenge 2: Gradio 4.x + Python 3.13 Incompatibility

Problem: Older Gradio versions (4.x) failed to build due to tokenizers and pyo3 not supporting Python 3.13.

Error: Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)

Solution: Used Gradio 6.x which has native Python 3.13 support.

Challenge 3: FastAPI + Gradio Mount Conflicts

Problem: Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops.

Error pattern:

Invalid port: '7861_appimmutablechunksD2RdMstj.js'
GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found

Root cause: Using demo.launch() after gr.mount_gradio_app() created conflicting servers.

Solution:

Created separate run.py to handle uvicorn server
Used gr.mount_gradio_app(api_app, demo, path="/") without calling demo.launch()
Let uvicorn serve the combined FastAPI + Gradio app

Challenge 4: HuggingFace Hub Compatibility

Problem: Older Gradio versions required older huggingface_hub versions, causing import errors.

Error: ImportError: cannot import name 'HfFolder' from 'huggingface_hub'

Solution: Removed version pins and let HuggingFace Spaces resolve compatible versions automatically.

Key Takeaways

Version compatibility is critical when combining multiple frameworks
Simpler is better — avoid custom CSS when possible
Separate concerns — use run.py for server logic, app.py for app definition
Test incrementally — verify UI works before adding API complexity

👤 Author

Nav772 — Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.

📚 Related Projects

📄 License

MIT License