Spaces:
Runtime error
Runtime error
| title: Audio Language Translator | |
| emoji: ๐ | |
| colorFrom: red | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 6.5.1 | |
| app_file: run.py | |
| pinned: false | |
| license: mit | |
| suggested_hardware: t4-small | |
| # ๐ Audio Language Translator | |
| Translate spoken audio between 15 languages using a complete AI pipeline. | |
| ## ๐ฏ What This Does | |
| 1. **Upload or record** audio in any supported language | |
| 2. **Automatic detection** of source language | |
| 3. **Translation** to your chosen target language | |
| 4. **Speech synthesis** in the target language with selectable voices | |
| ## ๐ REST API | |
| This translator is also available as a REST API for developers! | |
| **๐ Interactive API Docs:** [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs) | |
| ### API Endpoints | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/api/health` | GET | Health check and model status | | |
| | `/api/languages` | GET | List all 15 supported languages | | |
| | `/api/voices/{lang}` | GET | Get available TTS voices for a language | | |
| | `/api/transcribe` | POST | Transcribe audio only (no translation) | | |
| | `/api/translate` | POST | Full pipeline (returns JSON) | | |
| | `/api/translate/audio` | POST | Full pipeline (returns audio file) | | |
| ### Quick Example (Python) | |
| ```python | |
| import requests | |
| # Translate audio to Spanish | |
| with open("input.wav", "rb") as f: | |
| response = requests.post( | |
| "https://nav772-audio-language-translator.hf.space/api/translate", | |
| files={"file": f}, | |
| params={"target_language": "es"} | |
| ) | |
| result = response.json() | |
| print(f"Original: {result['original_text']}") | |
| print(f"Translated: {result['translated_text']}") | |
| ``` | |
| ### Quick Example (cURL) | |
| ```bash | |
| curl -X POST \ | |
| "https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \ | |
| -F "file=@input.wav" | |
| ``` | |
| ## ๐ ๏ธ Built With This API | |
| | Project | Developer | Description | | |
| |---------|-----------|-------------| | |
| | [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) | [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) | Mobile app built using this API | | |
| *Want your project featured here? Open a discussion or PR!* | |
| ## ๐๏ธ Architecture | |
| ``` | |
| Audio Input (any language) | |
| โ | |
| Whisper ASR (transcription + language detection) | |
| โ | |
| NLLB Translation (to target language) | |
| โ | |
| Edge-TTS (neural speech synthesis) | |
| โ | |
| Audio Output + Text Display | |
| ``` | |
| ## ๐ง Technical Stack | |
| | Component | Model | Parameters | Purpose | | |
| |-----------|-------|------------|---------| | |
| | **ASR** | openai/whisper-small | 244M | Speech recognition with automatic language detection | | |
| | **Translation** | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation | | |
| | **TTS** | Microsoft Edge-TTS | API | High-quality neural text-to-speech | | |
| | **API** | FastAPI | - | REST API endpoints | | |
| | **UI** | Gradio | - | Interactive web interface | | |
| ## ๐ Supported Languages | |
| ### Tier 1: Multiple Voice Options (3 each) | |
| - ๐บ๐ธ English (US/UK accents) | |
| - ๐ช๐ธ Spanish (Spain/Mexico) | |
| - ๐ซ๐ท French (France/Canada) | |
| - ๐ฉ๐ช German (Germany/Austria) | |
| - ๐จ๐ณ Chinese (Mandarin) | |
| ### Tier 2: Single High-Quality Voice | |
| - ๐ธ๐ฆ Arabic, ๐ฎ๐ณ Hindi, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ง๐ท Portuguese | |
| - ๐ท๐บ Russian, ๐ฎ๐น Italian, ๐ณ๐ฑ Dutch, ๐ต๐ฑ Polish, ๐น๐ท Turkish | |
| **Total: 15 languages, 25 voices** | |
| ## ๐ Research Foundation | |
| | Paper | Authors | Year | Contribution | | |
| |-------|---------|------|--------------| | |
| | [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) | Radford et al. | 2022 | Whisper ASR model | | |
| | [No Language Left Behind](https://arxiv.org/abs/2207.04672) | Costa-jussร et al. | 2022 | NLLB translation model | | |
| ## ๐ Limitations | |
| - Audio length: Optimized for clips under 30 seconds | |
| - Internet required: Edge-TTS requires connectivity | |
| - GPU recommended: CPU inference is significantly slower | |
| ## โ ๏ธ Development Challenges & Solutions | |
| ### Challenge 1: Gradio 5.x/6.x Giant Audio Icons | |
| **Problem:** Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x. | |
| **Attempted fixes that didn't work:** | |
| - Custom CSS targeting SVG elements | |
| - Using `elem_classes` and `scale` parameters | |
| - Various Gradio version downgrades | |
| **Solution:** Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS. | |
| ### Challenge 2: Gradio 4.x + Python 3.13 Incompatibility | |
| **Problem:** Older Gradio versions (4.x) failed to build due to `tokenizers` and `pyo3` not supporting Python 3.13. | |
| **Error:** `Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)` | |
| **Solution:** Used Gradio 6.x which has native Python 3.13 support. | |
| ### Challenge 3: FastAPI + Gradio Mount Conflicts | |
| **Problem:** Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops. | |
| **Error pattern:** | |
| ``` | |
| Invalid port: '7861_appimmutablechunksD2RdMstj.js' | |
| GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found | |
| ``` | |
| **Root cause:** Using `demo.launch()` after `gr.mount_gradio_app()` created conflicting servers. | |
| **Solution:** | |
| 1. Created separate `run.py` to handle uvicorn server | |
| 2. Used `gr.mount_gradio_app(api_app, demo, path="/")` without calling `demo.launch()` | |
| 3. Let uvicorn serve the combined FastAPI + Gradio app | |
| ### Challenge 4: HuggingFace Hub Compatibility | |
| **Problem:** Older Gradio versions required older `huggingface_hub` versions, causing import errors. | |
| **Error:** `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'` | |
| **Solution:** Removed version pins and let HuggingFace Spaces resolve compatible versions automatically. | |
| ### Key Takeaways | |
| - **Version compatibility** is critical when combining multiple frameworks | |
| - **Simpler is better** โ avoid custom CSS when possible | |
| - **Separate concerns** โ use `run.py` for server logic, `app.py` for app definition | |
| - **Test incrementally** โ verify UI works before adding API complexity | |
| ## ๐ค Author | |
| **[Nav772](https://huggingface.co/Nav772)** โ Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development. | |
| ## ๐ Related Projects | |
| - [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard) | |
| - [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document) | |
| - [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer) | |
| ## ๐ License | |
| MIT License |