--- title: Ringg Parrot STT V1 emoji: 🦜 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: apache-2.0 short_description: High-Accuracy Hindi Speech-to-Text System --- tags: - speech-to-text - asr - bilingual - english - hindi - audio - transcription - ringg - real-time --- # 🎙️ Ringg Parrot STT V1 :parrot: **Bilingual Speech-to-Text for English & Hindi** [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0) [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0) ## 🌟 Overview Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks **1st place** among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions. ## 📊 Performance Benchmarks | Model | Indic Norm WER ↓ | Whisper Norm WER ↓ | |-------|------------------|---------------------| | IndicWav2Vec (Winner) | 18.55% | 63.31% | | **Ringg Parrot STT V1** | **21.03%** | **66.27%** | | VakyanSh Wav2Vec2 | 24.06% | 66.34% | | Whisper Large-v3 | 29.17% | 63.31% | | Whisper Large-v2 | 37.50% | 66.27% | **Lower WER (Word Error Rate) indicates better accuracy.** Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription. ## ✨ Features - 🌐 **Bilingual Support**: Native support for English and Hindi speech recognition - ⚡ **Real-time Streaming**: Instant transcription as you speak - 🎯 **High Accuracy**: 2nd place among top bilingual ASR models - 📁 **File Upload**: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.) - 🚀 **Fast Processing**: Optimized for low-latency inference - 💬 **Code-switching**: Handles mixed English-Hindi speech ## 🎯 Model Details | Specification | Details | |--------------|---------| | **Model Name** | Ringg Parrot STT V1 | | **Languages** | English (EN) & Hindi (HI) | | **Performance** | 2nd place among top models | | **Sample Rate** | 16kHz | ## 🚀 Usage ### Real-time Streaming 1. Go to the **"Real-time Streaming"** tab 2. Allow microphone permissions when prompted 3. Start speaking in English or Hindi 4. See real-time transcription appear ### File Upload 1. Go to the **"File Upload"** tab 2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.) 3. Click **"Transcribe"** 4. View the transcription result ## 💡 Tips for Best Results - **Audio Quality**: Use clear audio with minimal background noise - **Speaking Style**: Speak naturally at a moderate pace - **File Format**: 16kHz or higher sample rate recommended - **Code-switching**: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences ## 📊 Use Cases - 🤖 Voice assistants and chatbots - 📝 Meeting transcription - 🎬 Content creation and subtitling - ♿ Accessibility applications - 🔍 Voice search and commands - 📞 Call center automation - 🎓 Educational tools - 🌍 Multilingual communication ## 🔧 Technical Details ### Audio Processing - **Input Format**: Mono audio, automatically resampled to 16kHz - **Processing**: Chunked streaming with 3-second buffers - **Latency**: ~2-3 seconds for real-time streaming - **GPU Acceleration**: CUDA-enabled for faster inference ### Supported Audio Formats - WAV (PCM, 16-bit, 24-bit, 32-bit) - MP3 - FLAC - M4A - OGG - OPUS ## 📝 Limitations - Works best with clear audio and minimal background noise - Accuracy may vary with strong accents and dialects - Code-switching within sentences may occasionally affect accuracy - Very long audio files may take longer to process ## 📈 Performance - **WER (Word Error Rate)**: Optimized for conversational speech - **RTF (Real-Time Factor)**: < 0.3 on GPU (faster than real-time) - **Languages**: English & Hindi with native support ## 🔗 Links - **Organization**: [RinggAI on Hugging Face](https://huggingface.co/RinggAI) - **TTS Space**: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0) ## 👥 Team Made with ❤️ by the **RinggAI Team** --- **Note**: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy. | Dependency | Version | |------------|---------| | gradio | 5.49.1 | | gradio-client | 1.13.3 | | pandas | 2.3.3 | | requests | 2.32.5 |