File size: 4,592 Bytes
35f0708 b672ef4 876588f 35f0708 00c3484 35f0708 fe82c06 876588f fe82c06 35f0708 b672ef4 fe82c06 b672ef4 fe82c06 2cbdadf b672ef4 fe82c06 b672ef4 fe82c06 b672ef4 fe82c06 00c3484 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
title: Ringg Parrot STT V1
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System
---
tags:
- speech-to-text
- asr
- bilingual
- english
- hindi
- audio
- transcription
- ringg
- real-time
---
# ποΈ Ringg Parrot STT V1 :parrot:
**Bilingual Speech-to-Text for English & Hindi**
[](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0)
[](https://opensource.org/licenses/Apache-2.0)
## π Overview
Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks **1st place** among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.
## π Performance Benchmarks
| Model | Indic Norm WER β | Whisper Norm WER β |
|-------|------------------|---------------------|
| IndicWav2Vec (Winner) | 18.55% | 63.31% |
| **Ringg Parrot STT V1** | **21.03%** | **66.27%** |
| VakyanSh Wav2Vec2 | 24.06% | 66.34% |
| Whisper Large-v3 | 29.17% | 63.31% |
| Whisper Large-v2 | 37.50% | 66.27% |
**Lower WER (Word Error Rate) indicates better accuracy.** Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.
## β¨ Features
- π **Bilingual Support**: Native support for English and Hindi speech recognition
- β‘ **Real-time Streaming**: Instant transcription as you speak
- π― **High Accuracy**: 2nd place among top bilingual ASR models
- π **File Upload**: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
- π **Fast Processing**: Optimized for low-latency inference
- π¬ **Code-switching**: Handles mixed English-Hindi speech
## π― Model Details
| Specification | Details |
|--------------|---------|
| **Model Name** | Ringg Parrot STT V1 |
| **Languages** | English (EN) & Hindi (HI) |
| **Performance** | 2nd place among top models |
| **Sample Rate** | 16kHz |
## π Usage
### Real-time Streaming
1. Go to the **"Real-time Streaming"** tab
2. Allow microphone permissions when prompted
3. Start speaking in English or Hindi
4. See real-time transcription appear
### File Upload
1. Go to the **"File Upload"** tab
2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
3. Click **"Transcribe"**
4. View the transcription result
## π‘ Tips for Best Results
- **Audio Quality**: Use clear audio with minimal background noise
- **Speaking Style**: Speak naturally at a moderate pace
- **File Format**: 16kHz or higher sample rate recommended
- **Code-switching**: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences
## π Use Cases
- π€ Voice assistants and chatbots
- π Meeting transcription
- π¬ Content creation and subtitling
- βΏ Accessibility applications
- π Voice search and commands
- π Call center automation
- π Educational tools
- π Multilingual communication
## π§ Technical Details
### Audio Processing
- **Input Format**: Mono audio, automatically resampled to 16kHz
- **Processing**: Chunked streaming with 3-second buffers
- **Latency**: ~2-3 seconds for real-time streaming
- **GPU Acceleration**: CUDA-enabled for faster inference
### Supported Audio Formats
- WAV (PCM, 16-bit, 24-bit, 32-bit)
- MP3
- FLAC
- M4A
- OGG
- OPUS
## π Limitations
- Works best with clear audio and minimal background noise
- Accuracy may vary with strong accents and dialects
- Code-switching within sentences may occasionally affect accuracy
- Very long audio files may take longer to process
## π Performance
- **WER (Word Error Rate)**: Optimized for conversational speech
- **RTF (Real-Time Factor)**: < 0.3 on GPU (faster than real-time)
- **Languages**: English & Hindi with native support
## π Links
- **Organization**: [RinggAI on Hugging Face](https://huggingface.co/RinggAI)
- **TTS Space**: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0)
## π₯ Team
Made with β€οΈ by the **RinggAI Team**
---
**Note**: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.
| Dependency | Version |
|------------|---------|
| gradio | 5.49.1 |
| gradio-client | 1.13.3 |
| pandas | 2.3.3 |
| requests | 2.32.5 |
|