Spaces:

arshadul
/

indic_conformer

Paused

File size: 3,262 Bytes

3c862a2
a0bf6f4
 
 
 
3c862a2
 
 
 
 
a0bf6f4

---
title: IndicConformer STT API
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
---

# IndicConformer Speech-to-Text API 🎙️

Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model.

## 🌟 Features

- **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more
- **Long Audio Support**: Process up to 30 minutes of audio
- **Parallel Processing**: Fast transcription with chunked inference
- **Multiple Formats**: Supports WAV, MP3, FLAC, M4A

## 🚀 Quick Start

### API Endpoints

- **Base URL**: Your Space URL
- **Documentation**: `/docs` (Interactive Swagger UI)
- **Transcribe**: `POST /transcribe`
- **Health Check**: `GET /health`

### Example Usage

#### Using cURL

```bash
curl -X POST "https://your-space-url.hf.space/transcribe" \
  -F "file=@audio.wav" \
  -F "language=hi"
```

#### Using Python

```python
import requests

url = "https://your-space-url.hf.space/transcribe"

files = {"file": open("audio.wav", "rb")}
data = {"language": "hi"}

response = requests.post(url, files=files, data=data)
print(response.json())
```

#### Using JavaScript

```javascript
const formData = new FormData();
formData.append('file', audioFile);
formData.append('language', 'hi');

const response = await fetch('https://your-space-url.hf.space/transcribe', {
  method: 'POST',
  body: formData
});

const result = await response.json();
console.log(result.transcription);
```

## 🗣️ Supported Languages

| Code | Language | Code | Language |
|------|----------|------|----------|
| `hi` | Hindi | `te` | Telugu |
| `bn` | Bengali | `ta` | Tamil |
| `mr` | Marathi | `gu` | Gujarati |
| `kn` | Kannada | `ml` | Malayalam |
| `pa` | Punjabi | `or` | Odia |
| `as` | Assamese | `ur` | Urdu |
| `ne` | Nepali | `kok` | Konkani |
| `sd` | Sindhi | `doi` | Dogri |
| `brx` | Bodo | `mai` | Maithili |
| `mni` | Manipuri | `ks` | Kashmiri |
| `sa` | Sanskrit | `sat` | Santali |

## 📊 Response Format

```json
{
  "success": true,
  "transcription": "आपका टेक्स्ट यहां",
  "metadata": {
    "audio_duration": 45.2,
    "audio_duration_minutes": 0.75,
    "inference_time": 2.1543,
    "rtf": 0.0476,
    "language": "hi",
    "decoder": "rnnt",
    "num_chunks": 2
  }
}
```

## ⚡ Performance

- **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU)
- **Max Audio Length**: 30 minutes
- **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy

## 🛠️ Model Information

- **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual)
- **Decoder**: RNNT (Recurrent Neural Network Transducer)
- **Architecture**: Conformer (600M parameters)

## 📝 Notes

- Audio files are automatically resampled to 16kHz mono
- Longer audio files are split into chunks for parallel processing
- GPU acceleration is automatically used when available
- Maximum audio duration is 30 minutes per request

## 🤝 Credits

Built with:
- [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/)
- [Hugging Face Transformers](https://huggingface.co/transformers)
- [FastAPI](https://fastapi.tiangolo.com/)

## 📄 License

MIT License