Spaces:

arshadul
/

indic_conformer

Running

App Files Files Community

indic_conformer / README.md

arshadul

Upload 4 files

a0bf6f4 verified 8 days ago

preview code

raw

history blame contribute delete

3.26 kB

	---
	title: IndicConformer STT API
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	---

	# IndicConformer Speech-to-Text API 🎙️

	Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model.

	## 🌟 Features

	- 22 Indian Languages Supported: Hindi, Telugu, Bengali, Tamil, and 18 more
	- Long Audio Support: Process up to 30 minutes of audio
	- Parallel Processing: Fast transcription with chunked inference
	- Multiple Formats: Supports WAV, MP3, FLAC, M4A

	## 🚀 Quick Start

	### API Endpoints

	- Base URL: Your Space URL
	- Documentation: `/docs` (Interactive Swagger UI)
	- Transcribe: `POST /transcribe`
	- Health Check: `GET /health`

	### Example Usage

	#### Using cURL

	```bash
	curl -X POST "https://your-space-url.hf.space/transcribe" \
	-F "file=@audio.wav" \
	-F "language=hi"
	```

	#### Using Python

	```python
	import requests

	url = "https://your-space-url.hf.space/transcribe"

	files = {"file": open("audio.wav", "rb")}
	data = {"language": "hi"}

	response = requests.post(url, files=files, data=data)
	print(response.json())
	```

	#### Using JavaScript

	```javascript
	const formData = new FormData();
	formData.append('file', audioFile);
	formData.append('language', 'hi');

	const response = await fetch('https://your-space-url.hf.space/transcribe', {
	method: 'POST',
	body: formData
	});

	const result = await response.json();
	console.log(result.transcription);
	```

	## 🗣️ Supported Languages

	\| Code \| Language \| Code \| Language \|
	\|------\|----------\|------\|----------\|
	\| `hi` \| Hindi \| `te` \| Telugu \|
	\| `bn` \| Bengali \| `ta` \| Tamil \|
	\| `mr` \| Marathi \| `gu` \| Gujarati \|
	\| `kn` \| Kannada \| `ml` \| Malayalam \|
	\| `pa` \| Punjabi \| `or` \| Odia \|
	\| `as` \| Assamese \| `ur` \| Urdu \|
	\| `ne` \| Nepali \| `kok` \| Konkani \|
	\| `sd` \| Sindhi \| `doi` \| Dogri \|
	\| `brx` \| Bodo \| `mai` \| Maithili \|
	\| `mni` \| Manipuri \| `ks` \| Kashmiri \|
	\| `sa` \| Sanskrit \| `sat` \| Santali \|

	## 📊 Response Format

	```json
	{
	"success": true,
	"transcription": "आपका टेक्स्ट यहां",
	"metadata": {
	"audio_duration": 45.2,
	"audio_duration_minutes": 0.75,
	"inference_time": 2.1543,
	"rtf": 0.0476,
	"language": "hi",
	"decoder": "rnnt",
	"num_chunks": 2
	}
	}
	```

	## ⚡ Performance

	- Real-Time Factor (RTF): ~0.05 (20x faster than real-time on GPU)
	- Max Audio Length: 30 minutes
	- Chunk Processing: 30s chunks with 2s overlap for optimal accuracy

	## 🛠️ Model Information

	- Model: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
	- Decoder: RNNT (Recurrent Neural Network Transducer)
	- Architecture: Conformer (600M parameters)

	## 📝 Notes

	- Audio files are automatically resampled to 16kHz mono
	- Longer audio files are split into chunks for parallel processing
	- GPU acceleration is automatically used when available
	- Maximum audio duration is 30 minutes per request

	## 🤝 Credits

	Built with:
	- [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/)
	- [Hugging Face Transformers](https://huggingface.co/transformers)
	- [FastAPI](https://fastapi.tiangolo.com/)

	## 📄 License

	MIT License