Spaces:
Sleeping
Sleeping
File size: 6,637 Bytes
9767f33 7ea33e8 0806a7f 9767f33 e77f2f2 e6c92d6 9767f33 7ea33e8 9767f33 7ea33e8 6afb63e 665e2fa 2773633 665e2fa 7ea33e8 6afb63e 7ea33e8 6afb63e 7ea33e8 6afb63e 7ea33e8 ed7cc6f 7ea33e8 6afb63e 7ea33e8 665e2fa 6afb63e 7ea33e8 0cac47e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
title: Audio Language Translator
emoji: ๐
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 6.11.0
app_file: run.py
pinned: false
license: mit
suggested_hardware: t4-small
---
# ๐ Audio Language Translator
Translate spoken audio between 15 languages using a complete AI pipeline.
## ๐ฏ What This Does
1. **Upload or record** audio in any supported language
2. **Automatic detection** of source language
3. **Translation** to your chosen target language
4. **Speech synthesis** in the target language with selectable voices
## ๐ REST API
This translator is also available as a REST API for developers!
**๐ Interactive API Docs:** [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs)
### API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/health` | GET | Health check and model status |
| `/api/languages` | GET | List all 15 supported languages |
| `/api/voices/{lang}` | GET | Get available TTS voices for a language |
| `/api/transcribe` | POST | Transcribe audio only (no translation) |
| `/api/translate` | POST | Full pipeline (returns JSON) |
| `/api/translate/audio` | POST | Full pipeline (returns audio file) |
### Quick Example (Python)
```python
import requests
# Translate audio to Spanish
with open("input.wav", "rb") as f:
response = requests.post(
"https://nav772-audio-language-translator.hf.space/api/translate",
files={"file": f},
params={"target_language": "es"}
)
result = response.json()
print(f"Original: {result['original_text']}")
print(f"Translated: {result['translated_text']}")
```
### Quick Example (cURL)
```bash
curl -X POST \
"https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
-F "file=@input.wav"
```
## ๐ ๏ธ Built With This API
| Project | Developer | Description |
|---------|-----------|-------------|
| [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) | [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) | Mobile app built using this API |
*Want your project featured here? Open a discussion or PR!*
## ๐๏ธ Architecture
```
Audio Input (any language)
โ
Whisper ASR (transcription + language detection)
โ
NLLB Translation (to target language)
โ
Edge-TTS (neural speech synthesis)
โ
Audio Output + Text Display
```
## ๐ง Technical Stack
| Component | Model | Parameters | Purpose |
|-----------|-------|------------|---------|
| **ASR** | openai/whisper-small | 244M | Speech recognition with automatic language detection |
| **Translation** | facebook/nllb-200-distilled-600M | 615M | Multilingual neural machine translation |
| **TTS** | Microsoft Edge-TTS | API | High-quality neural text-to-speech |
| **API** | FastAPI | - | REST API endpoints |
| **UI** | Gradio | - | Interactive web interface |
## ๐ Supported Languages
### Tier 1: Multiple Voice Options (3 each)
- ๐บ๐ธ English (US/UK accents)
- ๐ช๐ธ Spanish (Spain/Mexico)
- ๐ซ๐ท French (France/Canada)
- ๐ฉ๐ช German (Germany/Austria)
- ๐จ๐ณ Chinese (Mandarin)
### Tier 2: Single High-Quality Voice
- ๐ธ๐ฆ Arabic, ๐ฎ๐ณ Hindi, ๐ฏ๐ต Japanese, ๐ฐ๐ท Korean, ๐ง๐ท Portuguese
- ๐ท๐บ Russian, ๐ฎ๐น Italian, ๐ณ๐ฑ Dutch, ๐ต๐ฑ Polish, ๐น๐ท Turkish
**Total: 15 languages, 25 voices**
## ๐ Research Foundation
| Paper | Authors | Year | Contribution |
|-------|---------|------|--------------|
| [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) | Radford et al. | 2022 | Whisper ASR model |
| [No Language Left Behind](https://arxiv.org/abs/2207.04672) | Costa-jussร et al. | 2022 | NLLB translation model |
## ๐ Limitations
- Audio length: Optimized for clips under 30 seconds
- Internet required: Edge-TTS requires connectivity
- GPU recommended: CPU inference is significantly slower
## โ ๏ธ Development Challenges & Solutions
### Challenge 1: Gradio 5.x/6.x Giant Audio Icons
**Problem:** Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x.
**Attempted fixes that didn't work:**
- Custom CSS targeting SVG elements
- Using `elem_classes` and `scale` parameters
- Various Gradio version downgrades
**Solution:** Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS.
### Challenge 2: Gradio 4.x + Python 3.13 Incompatibility
**Problem:** Older Gradio versions (4.x) failed to build due to `tokenizers` and `pyo3` not supporting Python 3.13.
**Error:** `Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)`
**Solution:** Used Gradio 6.x which has native Python 3.13 support.
### Challenge 3: FastAPI + Gradio Mount Conflicts
**Problem:** Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops.
**Error pattern:**
```
Invalid port: '7861_appimmutablechunksD2RdMstj.js'
GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found
```
**Root cause:** Using `demo.launch()` after `gr.mount_gradio_app()` created conflicting servers.
**Solution:**
1. Created separate `run.py` to handle uvicorn server
2. Used `gr.mount_gradio_app(api_app, demo, path="/")` without calling `demo.launch()`
3. Let uvicorn serve the combined FastAPI + Gradio app
### Challenge 4: HuggingFace Hub Compatibility
**Problem:** Older Gradio versions required older `huggingface_hub` versions, causing import errors.
**Error:** `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`
**Solution:** Removed version pins and let HuggingFace Spaces resolve compatible versions automatically.
### Key Takeaways
- **Version compatibility** is critical when combining multiple frameworks
- **Simpler is better** โ avoid custom CSS when possible
- **Separate concerns** โ use `run.py` for server logic, `app.py` for app definition
- **Test incrementally** โ verify UI works before adding API complexity
## ๐ค Author
**[Nav772](https://huggingface.co/Nav772)** โ Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.
## ๐ Related Projects
- [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard)
- [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document)
- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer)
## ๐ License
MIT License |