Spaces:
Paused
Paused
File size: 3,262 Bytes
3c862a2 a0bf6f4 3c862a2 a0bf6f4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | ---
title: IndicConformer STT API
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# IndicConformer Speech-to-Text API 🎙️
Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model.
## 🌟 Features
- **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more
- **Long Audio Support**: Process up to 30 minutes of audio
- **Parallel Processing**: Fast transcription with chunked inference
- **Multiple Formats**: Supports WAV, MP3, FLAC, M4A
## 🚀 Quick Start
### API Endpoints
- **Base URL**: Your Space URL
- **Documentation**: `/docs` (Interactive Swagger UI)
- **Transcribe**: `POST /transcribe`
- **Health Check**: `GET /health`
### Example Usage
#### Using cURL
```bash
curl -X POST "https://your-space-url.hf.space/transcribe" \
-F "file=@audio.wav" \
-F "language=hi"
```
#### Using Python
```python
import requests
url = "https://your-space-url.hf.space/transcribe"
files = {"file": open("audio.wav", "rb")}
data = {"language": "hi"}
response = requests.post(url, files=files, data=data)
print(response.json())
```
#### Using JavaScript
```javascript
const formData = new FormData();
formData.append('file', audioFile);
formData.append('language', 'hi');
const response = await fetch('https://your-space-url.hf.space/transcribe', {
method: 'POST',
body: formData
});
const result = await response.json();
console.log(result.transcription);
```
## 🗣️ Supported Languages
| Code | Language | Code | Language |
|------|----------|------|----------|
| `hi` | Hindi | `te` | Telugu |
| `bn` | Bengali | `ta` | Tamil |
| `mr` | Marathi | `gu` | Gujarati |
| `kn` | Kannada | `ml` | Malayalam |
| `pa` | Punjabi | `or` | Odia |
| `as` | Assamese | `ur` | Urdu |
| `ne` | Nepali | `kok` | Konkani |
| `sd` | Sindhi | `doi` | Dogri |
| `brx` | Bodo | `mai` | Maithili |
| `mni` | Manipuri | `ks` | Kashmiri |
| `sa` | Sanskrit | `sat` | Santali |
## 📊 Response Format
```json
{
"success": true,
"transcription": "आपका टेक्स्ट यहां",
"metadata": {
"audio_duration": 45.2,
"audio_duration_minutes": 0.75,
"inference_time": 2.1543,
"rtf": 0.0476,
"language": "hi",
"decoder": "rnnt",
"num_chunks": 2
}
}
```
## ⚡ Performance
- **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU)
- **Max Audio Length**: 30 minutes
- **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy
## 🛠️ Model Information
- **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
- **Decoder**: RNNT (Recurrent Neural Network Transducer)
- **Architecture**: Conformer (600M parameters)
## 📝 Notes
- Audio files are automatically resampled to 16kHz mono
- Longer audio files are split into chunks for parallel processing
- GPU acceleration is automatically used when available
- Maximum audio duration is 30 minutes per request
## 🤝 Credits
Built with:
- [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/)
- [Hugging Face Transformers](https://huggingface.co/transformers)
- [FastAPI](https://fastapi.tiangolo.com/)
## 📄 License
MIT License
|