--- title: IndicConformer STT API emoji: đŸŽ™ī¸ colorFrom: blue colorTo: purple sdk: docker pinned: false license: mit --- # IndicConformer Speech-to-Text API đŸŽ™ī¸ Fast and accurate Speech-to-Text API for 22 Indian languages powered by AI4Bharat's IndicConformer model. ## 🌟 Features - **22 Indian Languages Supported**: Hindi, Telugu, Bengali, Tamil, and 18 more - **Long Audio Support**: Process up to 30 minutes of audio - **Parallel Processing**: Fast transcription with chunked inference - **Multiple Formats**: Supports WAV, MP3, FLAC, M4A ## 🚀 Quick Start ### API Endpoints - **Base URL**: Your Space URL - **Documentation**: `/docs` (Interactive Swagger UI) - **Transcribe**: `POST /transcribe` - **Health Check**: `GET /health` ### Example Usage #### Using cURL ```bash curl -X POST "https://your-space-url.hf.space/transcribe" \ -F "file=@audio.wav" \ -F "language=hi" ``` #### Using Python ```python import requests url = "https://your-space-url.hf.space/transcribe" files = {"file": open("audio.wav", "rb")} data = {"language": "hi"} response = requests.post(url, files=files, data=data) print(response.json()) ``` #### Using JavaScript ```javascript const formData = new FormData(); formData.append('file', audioFile); formData.append('language', 'hi'); const response = await fetch('https://your-space-url.hf.space/transcribe', { method: 'POST', body: formData }); const result = await response.json(); console.log(result.transcription); ``` ## đŸ—Ŗī¸ Supported Languages | Code | Language | Code | Language | |------|----------|------|----------| | `hi` | Hindi | `te` | Telugu | | `bn` | Bengali | `ta` | Tamil | | `mr` | Marathi | `gu` | Gujarati | | `kn` | Kannada | `ml` | Malayalam | | `pa` | Punjabi | `or` | Odia | | `as` | Assamese | `ur` | Urdu | | `ne` | Nepali | `kok` | Konkani | | `sd` | Sindhi | `doi` | Dogri | | `brx` | Bodo | `mai` | Maithili | | `mni` | Manipuri | `ks` | Kashmiri | | `sa` | Sanskrit | `sat` | Santali | ## 📊 Response Format ```json { "success": true, "transcription": "⤆ā¤Ēā¤•ā¤ž ⤟āĨ‡ā¤•āĨā¤¸āĨā¤Ÿ ā¤¯ā¤šā¤žā¤‚", "metadata": { "audio_duration": 45.2, "audio_duration_minutes": 0.75, "inference_time": 2.1543, "rtf": 0.0476, "language": "hi", "decoder": "rnnt", "num_chunks": 2 } } ``` ## ⚡ Performance - **Real-Time Factor (RTF)**: ~0.05 (20x faster than real-time on GPU) - **Max Audio Length**: 30 minutes - **Chunk Processing**: 30s chunks with 2s overlap for optimal accuracy ## đŸ› ī¸ Model Information - **Model**: [ai4bharat/indic-conformer-600m-multilingual](https://huggingface.co/ai4bharat/indic-conformer-600m-multilingual) - **Decoder**: RNNT (Recurrent Neural Network Transducer) - **Architecture**: Conformer (600M parameters) ## 📝 Notes - Audio files are automatically resampled to 16kHz mono - Longer audio files are split into chunks for parallel processing - GPU acceleration is automatically used when available - Maximum audio duration is 30 minutes per request ## 🤝 Credits Built with: - [AI4Bharat IndicConformer](https://ai4bharat.iitm.ac.in/) - [Hugging Face Transformers](https://huggingface.co/transformers) - [FastAPI](https://fastapi.tiangolo.com/) ## 📄 License MIT License