indic_conformer / README.md
arshadul's picture
Upload 4 files
a0bf6f4 verified
---
title: IndicConformer STT API
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# IndicConformer Speech-to-Text API 🎙️
Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model.
## 🌟 Features
- **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more
- **Long Audio Support**: Process up to 30 minutes of audio
- **Parallel Processing**: Fast transcription with chunked inference
- **Multiple Formats**: Supports WAV, MP3, FLAC, M4A
## 🚀 Quick Start
### API Endpoints
- **Base URL**: Your Space URL
- **Documentation**: `/docs` (Interactive Swagger UI)
- **Transcribe**: `POST /transcribe`
- **Health Check**: `GET /health`
### Example Usage
#### Using cURL
```bash
curl -X POST "https://your-space-url.hf.space/transcribe" \
-F "file=@audio.wav" \
-F "language=hi"
```
#### Using Python
```python
import requests
url = "https://your-space-url.hf.space/transcribe"
files = {"file": open("audio.wav", "rb")}
data = {"language": "hi"}
response = requests.post(url, files=files, data=data)
print(response.json())
```
#### Using JavaScript
```javascript
const formData = new FormData();
formData.append('file', audioFile);
formData.append('language', 'hi');
const response = await fetch('https://your-space-url.hf.space/transcribe', {
method: 'POST',
body: formData
});
const result = await response.json();
console.log(result.transcription);
```
## 🗣️ Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| `hi` | Hindi | `te` | Telugu |
| `bn` | Bengali | `ta` | Tamil |
| `mr` | Marathi | `gu` | Gujarati |
| `kn` | Kannada | `ml` | Malayalam |
| `pa` | Punjabi | `or` | Odia |
| `as` | Assamese | `ur` | Urdu |
| `ne` | Nepali | `kok` | Konkani |
| `sd` | Sindhi | `doi` | Dogri |
| `brx` | Bodo | `mai` | Maithili |
| `mni` | Manipuri | `ks` | Kashmiri |
| `sa` | Sanskrit | `sat` | Santali |
## 📊 Response Format
```json
{
"success": true,
"transcription": "आपका टेक्स्ट यहां",
"metadata": {
"audio_duration": 45.2,
"audio_duration_minutes": 0.75,
"inference_time": 2.1543,
"rtf": 0.0476,
"language": "hi",
"decoder": "rnnt",
"num_chunks": 2
}
}
```
## ⚡ Performance
- **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU)
- **Max Audio Length**: 30 minutes
- **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy
## 🛠️ Model Information
- **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
- **Decoder**: RNNT (Recurrent Neural Network Transducer)
- **Architecture**: Conformer (600M parameters)
## 📝 Notes
- Audio files are automatically resampled to 16kHz mono
- Longer audio files are split into chunks for parallel processing
- GPU acceleration is automatically used when available
- Maximum audio duration is 30 minutes per request
## 🤝 Credits
Built with:
- [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/)
- [Hugging Face Transformers](https://huggingface.co/transformers)
- [FastAPI](https://fastapi.tiangolo.com/)
## 📄 License
MIT License