Spaces:

RinggAI
/

STT

Sleeping

STT

File size: 4,592 Bytes

---
title: Ringg Parrot STT V1
emoji: 🦜
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: High-Accuracy Hindi Speech-to-Text System
---
tags:
  - speech-to-text
  - asr
  - bilingual
  - english
  - hindi
  - audio
  - transcription
  - ringg
  - real-time
---

# 🎙️ Ringg Parrot STT V1 :parrot:

**Bilingual Speech-to-Text for English & Hindi**

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

## 🌟 Overview

Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks **1st place** among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.

## 📊 Performance Benchmarks

| Model | Indic Norm WER ↓ | Whisper Norm WER ↓ |
|-------|------------------|---------------------|
| IndicWav2Vec (Winner) | 18.55% | 63.31% |
| **Ringg Parrot STT V1** | **21.03%** | **66.27%** |
| VakyanSh Wav2Vec2 | 24.06% | 66.34% |
| Whisper Large-v3 | 29.17% | 63.31% |
| Whisper Large-v2 | 37.50% | 66.27% |

**Lower WER (Word Error Rate) indicates better accuracy.** Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.

## ✨ Features

- 🌐 **Bilingual Support**: Native support for English and Hindi speech recognition
- ⚡ **Real-time Streaming**: Instant transcription as you speak
- 🎯 **High Accuracy**: 2nd place among top bilingual ASR models
- 📁 **File Upload**: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
- 🚀 **Fast Processing**: Optimized for low-latency inference
- 💬 **Code-switching**: Handles mixed English-Hindi speech

## 🎯 Model Details

| Specification | Details |
|--------------|---------|
| **Model Name** | Ringg Parrot STT V1 |
| **Languages** | English (EN) & Hindi (HI) |
| **Performance** | 2nd place among top models |
| **Sample Rate** | 16kHz |


## 🚀 Usage

### Real-time Streaming
1. Go to the **"Real-time Streaming"** tab
2. Allow microphone permissions when prompted
3. Start speaking in English or Hindi
4. See real-time transcription appear

### File Upload
1. Go to the **"File Upload"** tab
2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
3. Click **"Transcribe"**
4. View the transcription result

## 💡 Tips for Best Results

- **Audio Quality**: Use clear audio with minimal background noise
- **Speaking Style**: Speak naturally at a moderate pace
- **File Format**: 16kHz or higher sample rate recommended
- **Code-switching**: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences

## 📊 Use Cases

- 🤖 Voice assistants and chatbots
- 📝 Meeting transcription
- 🎬 Content creation and subtitling
- ♿ Accessibility applications
- 🔍 Voice search and commands
- 📞 Call center automation
- 🎓 Educational tools
- 🌍 Multilingual communication

## 🔧 Technical Details

### Audio Processing
- **Input Format**: Mono audio, automatically resampled to 16kHz
- **Processing**: Chunked streaming with 3-second buffers
- **Latency**: ~2-3 seconds for real-time streaming
- **GPU Acceleration**: CUDA-enabled for faster inference

### Supported Audio Formats
- WAV (PCM, 16-bit, 24-bit, 32-bit)
- MP3
- FLAC
- M4A
- OGG
- OPUS

## 📝 Limitations

- Works best with clear audio and minimal background noise
- Accuracy may vary with strong accents and dialects
- Code-switching within sentences may occasionally affect accuracy
- Very long audio files may take longer to process


## 📈 Performance

- **WER (Word Error Rate)**: Optimized for conversational speech
- **RTF (Real-Time Factor)**: < 0.3 on GPU (faster than real-time)
- **Languages**: English & Hindi with native support

## 🔗 Links

- **Organization**: [RinggAI on Hugging Face](https://huggingface.co/RinggAI)
- **TTS Space**: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0)




## 👥 Team

Made with ❤️ by the **RinggAI Team**

---

**Note**: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.

| Dependency | Version |
|------------|---------|
| gradio | 5.49.1 |
| gradio-client | 1.13.3 |
| pandas | 2.3.3 |
| requests | 2.32.5 |