File size: 4,592 Bytes
35f0708
b672ef4
 
876588f
 
35f0708
00c3484
35f0708
 
fe82c06
876588f
 
fe82c06
 
 
 
 
 
 
 
 
 
35f0708
 
b672ef4
fe82c06
 
 
 
 
 
 
 
b672ef4
fe82c06
 
 
 
 
2cbdadf
b672ef4
fe82c06
 
 
 
b672ef4
fe82c06
 
 
 
 
 
 
 
 
 
 
 
 
 
b672ef4
fe82c06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00c3484
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
title: Ringg Parrot STT V1
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System
---
tags:
  - speech-to-text
  - asr
  - bilingual
  - english
  - hindi
  - audio
  - transcription
  - ringg
  - real-time
---

# πŸŽ™οΈ Ringg Parrot STT V1 :parrot:

**Bilingual Speech-to-Text for English & Hindi**

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

## 🌟 Overview

Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks **1st place** among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.

## πŸ“Š Performance Benchmarks

| Model | Indic Norm WER ↓ | Whisper Norm WER ↓ |
|-------|------------------|---------------------|
| IndicWav2Vec (Winner) | 18.55% | 63.31% |
| **Ringg Parrot STT V1** | **21.03%** | **66.27%** |
| VakyanSh Wav2Vec2 | 24.06% | 66.34% |
| Whisper Large-v3 | 29.17% | 63.31% |
| Whisper Large-v2 | 37.50% | 66.27% |

**Lower WER (Word Error Rate) indicates better accuracy.** Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.

## ✨ Features

- 🌐 **Bilingual Support**: Native support for English and Hindi speech recognition
- ⚑ **Real-time Streaming**: Instant transcription as you speak
- 🎯 **High Accuracy**: 2nd place among top bilingual ASR models
- πŸ“ **File Upload**: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
- πŸš€ **Fast Processing**: Optimized for low-latency inference
- πŸ’¬ **Code-switching**: Handles mixed English-Hindi speech

## 🎯 Model Details

| Specification | Details |
|--------------|---------|
| **Model Name** | Ringg Parrot STT V1 |
| **Languages** | English (EN) & Hindi (HI) |
| **Performance** | 2nd place among top models |
| **Sample Rate** | 16kHz |


## πŸš€ Usage

### Real-time Streaming
1. Go to the **"Real-time Streaming"** tab
2. Allow microphone permissions when prompted
3. Start speaking in English or Hindi
4. See real-time transcription appear

### File Upload
1. Go to the **"File Upload"** tab
2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
3. Click **"Transcribe"**
4. View the transcription result

## πŸ’‘ Tips for Best Results

- **Audio Quality**: Use clear audio with minimal background noise
- **Speaking Style**: Speak naturally at a moderate pace
- **File Format**: 16kHz or higher sample rate recommended
- **Code-switching**: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences

## πŸ“Š Use Cases

- πŸ€– Voice assistants and chatbots
- πŸ“ Meeting transcription
- 🎬 Content creation and subtitling
- β™Ώ Accessibility applications
- πŸ” Voice search and commands
- πŸ“ž Call center automation
- πŸŽ“ Educational tools
- 🌍 Multilingual communication

## πŸ”§ Technical Details

### Audio Processing
- **Input Format**: Mono audio, automatically resampled to 16kHz
- **Processing**: Chunked streaming with 3-second buffers
- **Latency**: ~2-3 seconds for real-time streaming
- **GPU Acceleration**: CUDA-enabled for faster inference

### Supported Audio Formats
- WAV (PCM, 16-bit, 24-bit, 32-bit)
- MP3
- FLAC
- M4A
- OGG
- OPUS

## πŸ“ Limitations

- Works best with clear audio and minimal background noise
- Accuracy may vary with strong accents and dialects
- Code-switching within sentences may occasionally affect accuracy
- Very long audio files may take longer to process


## πŸ“ˆ Performance

- **WER (Word Error Rate)**: Optimized for conversational speech
- **RTF (Real-Time Factor)**: < 0.3 on GPU (faster than real-time)
- **Languages**: English & Hindi with native support

## πŸ”— Links

- **Organization**: [RinggAI on Hugging Face](https://huggingface.co/RinggAI)
- **TTS Space**: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0)




## πŸ‘₯ Team

Made with ❀️ by the **RinggAI Team**

---

**Note**: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.

| Dependency | Version |
|------------|---------|
| gradio | 5.49.1 |
| gradio-client | 1.13.3 |
| pandas | 2.3.3 |
| requests | 2.32.5 |