File size: 3,262 Bytes
3c862a2
a0bf6f4
 
 
 
3c862a2
 
 
 
 
a0bf6f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: IndicConformer STT API
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---

# IndicConformer Speech-to-Text API 🎙️

Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model.

## 🌟 Features

- **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more
- **Long Audio Support**: Process up to 30 minutes of audio
- **Parallel Processing**: Fast transcription with chunked inference
- **Multiple Formats**: Supports WAV, MP3, FLAC, M4A

## 🚀 Quick Start

### API Endpoints

- **Base URL**: Your Space URL
- **Documentation**: `/docs` (Interactive Swagger UI)
- **Transcribe**: `POST /transcribe`
- **Health Check**: `GET /health`

### Example Usage

#### Using cURL

```bash
curl -X POST "https://your-space-url.hf.space/transcribe" \
  -F "file=@audio.wav" \
  -F "language=hi"
```

#### Using Python

```python
import requests

url = "https://your-space-url.hf.space/transcribe"

files = {"file": open("audio.wav", "rb")}
data = {"language": "hi"}

response = requests.post(url, files=files, data=data)
print(response.json())
```

#### Using JavaScript

```javascript
const formData = new FormData();
formData.append('file', audioFile);
formData.append('language', 'hi');

const response = await fetch('https://your-space-url.hf.space/transcribe', {
  method: 'POST',
  body: formData
});

const result = await response.json();
console.log(result.transcription);
```

## 🗣️ Supported Languages

| Code | Language | Code | Language |
|------|----------|------|----------|
| `hi` | Hindi | `te` | Telugu |
| `bn` | Bengali | `ta` | Tamil |
| `mr` | Marathi | `gu` | Gujarati |
| `kn` | Kannada | `ml` | Malayalam |
| `pa` | Punjabi | `or` | Odia |
| `as` | Assamese | `ur` | Urdu |
| `ne` | Nepali | `kok` | Konkani |
| `sd` | Sindhi | `doi` | Dogri |
| `brx` | Bodo | `mai` | Maithili |
| `mni` | Manipuri | `ks` | Kashmiri |
| `sa` | Sanskrit | `sat` | Santali |

## 📊 Response Format

```json
{
  "success": true,
  "transcription": "आपका टेक्स्ट यहां",
  "metadata": {
    "audio_duration": 45.2,
    "audio_duration_minutes": 0.75,
    "inference_time": 2.1543,
    "rtf": 0.0476,
    "language": "hi",
    "decoder": "rnnt",
    "num_chunks": 2
  }
}
```

## ⚡ Performance

- **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU)
- **Max Audio Length**: 30 minutes
- **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy

## 🛠️ Model Information

- **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
- **Decoder**: RNNT (Recurrent Neural Network Transducer)
- **Architecture**: Conformer (600M parameters)

## 📝 Notes

- Audio files are automatically resampled to 16kHz mono
- Longer audio files are split into chunks for parallel processing
- GPU acceleration is automatically used when available
- Maximum audio duration is 30 minutes per request

## 🤝 Credits

Built with:
- [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/)
- [Hugging Face Transformers](https://huggingface.co/transformers)
- [FastAPI](https://fastapi.tiangolo.com/)

## 📄 License

MIT License