Spaces:
Running
Running
| title: IndicConformer STT API | |
| emoji: 🎙️ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # IndicConformer Speech-to-Text API 🎙️ | |
| Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model. | |
| ## 🌟 Features | |
| - **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more | |
| - **Long Audio Support**: Process up to 30 minutes of audio | |
| - **Parallel Processing**: Fast transcription with chunked inference | |
| - **Multiple Formats**: Supports WAV, MP3, FLAC, M4A | |
| ## 🚀 Quick Start | |
| ### API Endpoints | |
| - **Base URL**: Your Space URL | |
| - **Documentation**: `/docs` (Interactive Swagger UI) | |
| - **Transcribe**: `POST /transcribe` | |
| - **Health Check**: `GET /health` | |
| ### Example Usage | |
| #### Using cURL | |
| ```bash | |
| curl -X POST "https://your-space-url.hf.space/transcribe" \ | |
| -F "file=@audio.wav" \ | |
| -F "language=hi" | |
| ``` | |
| #### Using Python | |
| ```python | |
| import requests | |
| url = "https://your-space-url.hf.space/transcribe" | |
| files = {"file": open("audio.wav", "rb")} | |
| data = {"language": "hi"} | |
| response = requests.post(url, files=files, data=data) | |
| print(response.json()) | |
| ``` | |
| #### Using JavaScript | |
| ```javascript | |
| const formData = new FormData(); | |
| formData.append('file', audioFile); | |
| formData.append('language', 'hi'); | |
| const response = await fetch('https://your-space-url.hf.space/transcribe', { | |
| method: 'POST', | |
| body: formData | |
| }); | |
| const result = await response.json(); | |
| console.log(result.transcription); | |
| ``` | |
| ## 🗣️ Supported Languages | |
| | Code | Language | Code | Language | | |
| |------|----------|------|----------| | |
| | `hi` | Hindi | `te` | Telugu | | |
| | `bn` | Bengali | `ta` | Tamil | | |
| | `mr` | Marathi | `gu` | Gujarati | | |
| | `kn` | Kannada | `ml` | Malayalam | | |
| | `pa` | Punjabi | `or` | Odia | | |
| | `as` | Assamese | `ur` | Urdu | | |
| | `ne` | Nepali | `kok` | Konkani | | |
| | `sd` | Sindhi | `doi` | Dogri | | |
| | `brx` | Bodo | `mai` | Maithili | | |
| | `mni` | Manipuri | `ks` | Kashmiri | | |
| | `sa` | Sanskrit | `sat` | Santali | | |
| ## 📊 Response Format | |
| ```json | |
| { | |
| "success": true, | |
| "transcription": "आपका टेक्स्ट यहां", | |
| "metadata": { | |
| "audio_duration": 45.2, | |
| "audio_duration_minutes": 0.75, | |
| "inference_time": 2.1543, | |
| "rtf": 0.0476, | |
| "language": "hi", | |
| "decoder": "rnnt", | |
| "num_chunks": 2 | |
| } | |
| } | |
| ``` | |
| ## ⚡ Performance | |
| - **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU) | |
| - **Max Audio Length**: 30 minutes | |
| - **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy | |
| ## 🛠️ Model Information | |
| - **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual) | |
| - **Decoder**: RNNT (Recurrent Neural Network Transducer) | |
| - **Architecture**: Conformer (600M parameters) | |
| ## 📝 Notes | |
| - Audio files are automatically resampled to 16kHz mono | |
| - Longer audio files are split into chunks for parallel processing | |
| - GPU acceleration is automatically used when available | |
| - Maximum audio duration is 30 minutes per request | |
| ## 🤝 Credits | |
| Built with: | |
| - [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/) | |
| - [Hugging Face Transformers](https://huggingface.co/transformers) | |
| - [FastAPI](https://fastapi.tiangolo.com/) | |
| ## 📄 License | |
| MIT License | |