Spaces:

Nav772
/

audio-language-translator

Runtime error

App Files Files Community

audio-language-translator / README.md

Nav772

Update README.md

65b0f99 verified 15 days ago

preview code

raw

history blame contribute delete

6.64 kB

	---
	title: Audio Language Translator
	emoji: 🌍
	colorFrom: red
	colorTo: yellow
	sdk: gradio
	sdk_version: 6.5.1
	app_file: run.py
	pinned: false
	license: mit
	suggested_hardware: t4-small
	---

	# 🌍 Audio Language Translator

	Translate spoken audio between 15 languages using a complete AI pipeline.

	## 🎯 What This Does

	1. Upload or record audio in any supported language
	2. Automatic detection of source language
	3. Translation to your chosen target language
	4. Speech synthesis in the target language with selectable voices

	## 🔌 REST API

	This translator is also available as a REST API for developers!

	📚 Interactive API Docs: [https://nav772-audio-language-translator.hf.space/docs](https://nav772-audio-language-translator.hf.space/docs)

	### API Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/api/health` \| GET \| Health check and model status \|
	\| `/api/languages` \| GET \| List all 15 supported languages \|
	\| `/api/voices/{lang}` \| GET \| Get available TTS voices for a language \|
	\| `/api/transcribe` \| POST \| Transcribe audio only (no translation) \|
	\| `/api/translate` \| POST \| Full pipeline (returns JSON) \|
	\| `/api/translate/audio` \| POST \| Full pipeline (returns audio file) \|

	### Quick Example (Python)
	```python
	import requests

	# Translate audio to Spanish
	with open("input.wav", "rb") as f:
	response = requests.post(
	"https://nav772-audio-language-translator.hf.space/api/translate",
	files={"file": f},
	params={"target_language": "es"}
	)

	result = response.json()
	print(f"Original: {result['original_text']}")
	print(f"Translated: {result['translated_text']}")
	```

	### Quick Example (cURL)
	```bash
	curl -X POST \
	"https://nav772-audio-language-translator.hf.space/api/translate?target_language=es" \
	-F "file=@input.wav"
	```

	## 🛠️ Built With This API

	\| Project \| Developer \| Description \|
	\|---------\|-----------\|-------------\|
	\| [Audio Translator App](https://github.com/kaunghtetsan1101/audio_translator) \| [@kaunghtetsan11](https://huggingface.co/kaunghtetsan11) \| Mobile app built using this API \|

	Want your project featured here? Open a discussion or PR!

	## 🏗️ Architecture
	```
	Audio Input (any language)
	↓
	Whisper ASR (transcription + language detection)
	↓
	NLLB Translation (to target language)
	↓
	Edge-TTS (neural speech synthesis)
	↓
	Audio Output + Text Display
	```

	## 🔧 Technical Stack

	\| Component \| Model \| Parameters \| Purpose \|
	\|-----------\|-------\|------------\|---------\|
	\| ASR \| openai/whisper-small \| 244M \| Speech recognition with automatic language detection \|
	\| Translation \| facebook/nllb-200-distilled-600M \| 615M \| Multilingual neural machine translation \|
	\| TTS \| Microsoft Edge-TTS \| API \| High-quality neural text-to-speech \|
	\| API \| FastAPI \| - \| REST API endpoints \|
	\| UI \| Gradio \| - \| Interactive web interface \|

	## 🌐 Supported Languages

	### Tier 1: Multiple Voice Options (3 each)
	- 🇺🇸 English (US/UK accents)
	- 🇪🇸 Spanish (Spain/Mexico)
	- 🇫🇷 French (France/Canada)
	- 🇩🇪 German (Germany/Austria)
	- 🇨🇳 Chinese (Mandarin)

	### Tier 2: Single High-Quality Voice
	- 🇸🇦 Arabic, 🇮🇳 Hindi, 🇯🇵 Japanese, 🇰🇷 Korean, 🇧🇷 Portuguese
	- 🇷🇺 Russian, 🇮🇹 Italian, 🇳🇱 Dutch, 🇵🇱 Polish, 🇹🇷 Turkish

	Total: 15 languages, 25 voices

	## 📚 Research Foundation

	\| Paper \| Authors \| Year \| Contribution \|
	\|-------\|---------\|------\|--------------\|
	\| [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) \| Radford et al. \| 2022 \| Whisper ASR model \|
	\| [No Language Left Behind](https://arxiv.org/abs/2207.04672) \| Costa-jussà et al. \| 2022 \| NLLB translation model \|

	## 📝 Limitations

	- Audio length: Optimized for clips under 30 seconds
	- Internet required: Edge-TTS requires connectivity
	- GPU recommended: CPU inference is significantly slower

	## ⚠️ Development Challenges & Solutions

	### Challenge 1: Gradio 5.x/6.x Giant Audio Icons
	Problem: Audio component SVG icons displayed extremely large (filling entire screen) in Gradio versions 5.x and 6.x.

	Attempted fixes that didn't work:
	- Custom CSS targeting SVG elements
	- Using `elem_classes` and `scale` parameters
	- Various Gradio version downgrades

	Solution: Removed custom CSS entirely and used clean Gradio components. The issue was related to Shadow DOM in newer Gradio versions blocking external CSS.

	### Challenge 2: Gradio 4.x + Python 3.13 Incompatibility
	Problem: Older Gradio versions (4.x) failed to build due to `tokenizers` and `pyo3` not supporting Python 3.13.

	Error: `Python interpreter version (3.13) is newer than PyO3's maximum supported version (3.12)`

	Solution: Used Gradio 6.x which has native Python 3.13 support.

	### Challenge 3: FastAPI + Gradio Mount Conflicts
	Problem: Combining FastAPI API endpoints with Gradio UI caused "Invalid port" errors and infinite request loops.

	Error pattern:
	```
	Invalid port: '7861_appimmutablechunksD2RdMstj.js'
	GET /_app/immutable/chunks/D2RdMstj.js HTTP/1.1" 404 Not Found
	```

	Root cause: Using `demo.launch()` after `gr.mount_gradio_app()` created conflicting servers.

	Solution:
	1. Created separate `run.py` to handle uvicorn server
	2. Used `gr.mount_gradio_app(api_app, demo, path="/")` without calling `demo.launch()`
	3. Let uvicorn serve the combined FastAPI + Gradio app

	### Challenge 4: HuggingFace Hub Compatibility
	Problem: Older Gradio versions required older `huggingface_hub` versions, causing import errors.

	Error: `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`

	Solution: Removed version pins and let HuggingFace Spaces resolve compatible versions automatically.

	### Key Takeaways
	- Version compatibility is critical when combining multiple frameworks
	- Simpler is better — avoid custom CSS when possible
	- Separate concerns — use `run.py` for server logic, `app.py` for app definition
	- Test incrementally — verify UI works before adding API complexity

	## 👤 Author

	[Nav772](https://huggingface.co/Nav772) — Built as part of an AI Engineering portfolio demonstrating multimodal AI capabilities and REST API development.

	## 📚 Related Projects

	- [LLM Evaluation Dashboard](https://huggingface.co/spaces/Nav772/llm-evaluation-dashboard)
	- [RAG Document Q&A](https://huggingface.co/spaces/Nav772/rag-qa-document)
	- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer)

	## 📄 License

	MIT License