|
|
--- |
|
|
title: Ringg Parrot STT V1 |
|
|
emoji: π¦ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
short_description: High-Accuracy Hindi Speech-to-Text System |
|
|
--- |
|
|
tags: |
|
|
- speech-to-text |
|
|
- asr |
|
|
- bilingual |
|
|
- english |
|
|
- hindi |
|
|
- audio |
|
|
- transcription |
|
|
- ringg |
|
|
- real-time |
|
|
--- |
|
|
|
|
|
# ποΈ Ringg Parrot STT V1 :parrot: |
|
|
|
|
|
**Bilingual Speech-to-Text for English & Hindi** |
|
|
|
|
|
[](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0) |
|
|
[](https://opensource.org/licenses/Apache-2.0) |
|
|
|
|
|
## π Overview |
|
|
|
|
|
Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks **1st place** among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions. |
|
|
|
|
|
## π Performance Benchmarks |
|
|
|
|
|
| Model | Indic Norm WER β | Whisper Norm WER β | |
|
|
|-------|------------------|---------------------| |
|
|
| IndicWav2Vec (Winner) | 18.55% | 63.31% | |
|
|
| **Ringg Parrot STT V1** | **21.03%** | **66.27%** | |
|
|
| VakyanSh Wav2Vec2 | 24.06% | 66.34% | |
|
|
| Whisper Large-v3 | 29.17% | 63.31% | |
|
|
| Whisper Large-v2 | 37.50% | 66.27% | |
|
|
|
|
|
**Lower WER (Word Error Rate) indicates better accuracy.** Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription. |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
- π **Bilingual Support**: Native support for English and Hindi speech recognition |
|
|
- β‘ **Real-time Streaming**: Instant transcription as you speak |
|
|
- π― **High Accuracy**: 2nd place among top bilingual ASR models |
|
|
- π **File Upload**: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.) |
|
|
- π **Fast Processing**: Optimized for low-latency inference |
|
|
- π¬ **Code-switching**: Handles mixed English-Hindi speech |
|
|
|
|
|
## π― Model Details |
|
|
|
|
|
| Specification | Details | |
|
|
|--------------|---------| |
|
|
| **Model Name** | Ringg Parrot STT V1 | |
|
|
| **Languages** | English (EN) & Hindi (HI) | |
|
|
| **Performance** | 2nd place among top models | |
|
|
| **Sample Rate** | 16kHz | |
|
|
|
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Real-time Streaming |
|
|
1. Go to the **"Real-time Streaming"** tab |
|
|
2. Allow microphone permissions when prompted |
|
|
3. Start speaking in English or Hindi |
|
|
4. See real-time transcription appear |
|
|
|
|
|
### File Upload |
|
|
1. Go to the **"File Upload"** tab |
|
|
2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.) |
|
|
3. Click **"Transcribe"** |
|
|
4. View the transcription result |
|
|
|
|
|
## π‘ Tips for Best Results |
|
|
|
|
|
- **Audio Quality**: Use clear audio with minimal background noise |
|
|
- **Speaking Style**: Speak naturally at a moderate pace |
|
|
- **File Format**: 16kHz or higher sample rate recommended |
|
|
- **Code-switching**: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences |
|
|
|
|
|
## π Use Cases |
|
|
|
|
|
- π€ Voice assistants and chatbots |
|
|
- π Meeting transcription |
|
|
- π¬ Content creation and subtitling |
|
|
- βΏ Accessibility applications |
|
|
- π Voice search and commands |
|
|
- π Call center automation |
|
|
- π Educational tools |
|
|
- π Multilingual communication |
|
|
|
|
|
## π§ Technical Details |
|
|
|
|
|
### Audio Processing |
|
|
- **Input Format**: Mono audio, automatically resampled to 16kHz |
|
|
- **Processing**: Chunked streaming with 3-second buffers |
|
|
- **Latency**: ~2-3 seconds for real-time streaming |
|
|
- **GPU Acceleration**: CUDA-enabled for faster inference |
|
|
|
|
|
### Supported Audio Formats |
|
|
- WAV (PCM, 16-bit, 24-bit, 32-bit) |
|
|
- MP3 |
|
|
- FLAC |
|
|
- M4A |
|
|
- OGG |
|
|
- OPUS |
|
|
|
|
|
## π Limitations |
|
|
|
|
|
- Works best with clear audio and minimal background noise |
|
|
- Accuracy may vary with strong accents and dialects |
|
|
- Code-switching within sentences may occasionally affect accuracy |
|
|
- Very long audio files may take longer to process |
|
|
|
|
|
|
|
|
## π Performance |
|
|
|
|
|
- **WER (Word Error Rate)**: Optimized for conversational speech |
|
|
- **RTF (Real-Time Factor)**: < 0.3 on GPU (faster than real-time) |
|
|
- **Languages**: English & Hindi with native support |
|
|
|
|
|
## π Links |
|
|
|
|
|
- **Organization**: [RinggAI on Hugging Face](https://huggingface.co/RinggAI) |
|
|
- **TTS Space**: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## π₯ Team |
|
|
|
|
|
Made with β€οΈ by the **RinggAI Team** |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy. |
|
|
|
|
|
| Dependency | Version | |
|
|
|------------|---------| |
|
|
| gradio | 5.49.1 | |
|
|
| gradio-client | 1.13.3 | |
|
|
| pandas | 2.3.3 | |
|
|
| requests | 2.32.5 | |
|
|
|
|
|
|