--- title: Audio Language Translator emoji: 🌍 colorFrom: red colorTo: yellow sdk: gradio sdk_version: 6.5.1 app_file: run.py pinned: false license: mit suggested_hardware: t4-small --- # 🌍 Audio Language Translator Translate spoken audio between 15 languages using a complete AI pipeline. ## 🎯 What This Does 1. **Upload or record** audio in any supported language 2. **Automatic detection** of source language 3. **Translation** to your chosen target language 4. **Speech synthesis** in the target language with selectable voices ## 🔌 REST API This translator is also available as a REST API for developers! **📚 Interactive API Docs:** [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs) ### API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/api/health` | GET | Health check and model status | | `/api/languages` | GET | List all 15 supported languages | | `/api/voices/{lang}` | GET | Get available TTS voices for a language | | `/api/transcribe` | POST | Transcribe audio only (no translation) | | `/api/translate` | POST | Full pipeline (returns JSON) | | `/api/translate/audio` | POST | Full pipeline (returns audio file) | ### Quick Example (Python) ```python import requests # Translate audio to Spanish with open("input.wav", "rb") as f: response = requests.post( "https://nav772-audio-language-translator.hf.space/api/translate", files={"file": f}, params={"target_language": "es"} ) result = response.json() print(f"Original: {result['original_text']}") print(f"Translated: {result['translated_text']}") ``` ### Quick Example (cURL) ```bash curl -X POST \ "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \ -F "file=@input.wav" ``` ## 🛠️ Built With This API | Project | Developer | Description | |---------|-----------|-------------| | [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) | [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) | Mobile app built using this API | *Want your project featured here? Open a discussion or PR!* ## 🏗️ Architecture ``` Audio Input (any language) ↓ Whisper ASR (transcription + language detection) ↓ NLLB Translation (to target language) ↓ Edge-TTS (neural speech synthesis) ↓ Audio Output + Text Display ``` ## 🔧 Technical Stack | Component | Model | Parameters | Purpose | |-----------|-------|------------|---------| | **ASR** | openai/whisper-small | 244M | Speech recognition with automatic language detection | | **Translation** | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation | | **TTS** | Microsoft Edge-TTS | API | High-quality neural text-to-speech | | **API** | FastAPI | - | REST API endpoints | | **UI** | Gradio | - | Interactive web interface | ## 🌐 Supported Languages ### Tier 1: Multiple Voice Options (3 each) - 🇺🇸 English (US/UK accents) - 🇪🇸 Spanish (Spain/Mexico) - 🇫🇷 French (France/Canada) - 🇩🇪 German (Germany/Austria) - 🇨🇳 Chinese (Mandarin) ### Tier 2: Single High-Quality Voice - 🇸🇦 Arabic, 🇮🇳 Hindi, 🇯🇵 Japanese, 🇰🇷 Korean, 🇧🇷 Portuguese - 🇷🇺 Russian, 🇮🇹 Italian, 🇳🇱 Dutch, 🇵🇱 Polish, 🇹🇷 Turkish **Total: 15 languages, 25 voices** ## 📚 Research Foundation | Paper | Authors | Year | Contribution | |-------|---------|------|--------------| | [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) | Radford et al. | 2022 | Whisper ASR model | | [No Language Left Behind](https://arxiv.org/abs/2207.04672) | Costa-jussà et al. | 2022 | NLLB translation model | ## 📝 Limitations - Audio length: Optimized for clips under 30 seconds - Internet required: Edge-TTS requires connectivity - GPU recommended: CPU inference is significantly slower ## ⚠️ Development Challenges & Solutions ### Challenge 1: Gradio 5.x/6.x Giant Audio Icons **Problem:** Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x. **Attempted fixes that didn't work:** - Custom CSS targeting SVG elements - Using `elem_classes` and `scale` parameters - Various Gradio version downgrades **Solution:** Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS. ### Challenge 2: Gradio 4.x + Python 3.13 Incompatibility **Problem:** Older Gradio versions (4.x) failed to build due to `tokenizers` and `pyo3` not supporting Python 3.13. **Error:** `Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)` **Solution:** Used Gradio 6.x which has native Python 3.13 support. ### Challenge 3: FastAPI + Gradio Mount Conflicts **Problem:** Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops. **Error pattern:** ``` Invalid port: '7861_appimmutablechunksD2RdMstj.js' GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found ``` **Root cause:** Using `demo.launch()` after `gr.mount_gradio_app()` created conflicting servers. **Solution:** 1. Created separate `run.py` to handle uvicorn server 2. Used `gr.mount_gradio_app(api_app, demo, path="/")` without calling `demo.launch()` 3. Let uvicorn serve the combined FastAPI + Gradio app ### Challenge 4: HuggingFace Hub Compatibility **Problem:** Older Gradio versions required older `huggingface_hub` versions, causing import errors. **Error:** `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'` **Solution:** Removed version pins and let HuggingFace Spaces resolve compatible versions automatically. ### Key Takeaways - **Version compatibility** is critical when combining multiple frameworks - **Simpler is better** — avoid custom CSS when possible - **Separate concerns** — use `run.py` for server logic, `app.py` for app definition - **Test incrementally** — verify UI works before adding API complexity ## 👤 Author **[Nav772](https://huggingface.co/Nav772)** — Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development. ## 📚 Related Projects - [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard) - [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document) - [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) ## 📄 License MIT License