Spaces:

RinggAI
/

STT

Sleeping

App Files Files Community

STT / README.md

harsh2ai

Rebrand to Ringg Parrot STT V1

b672ef4 about 2 months ago

preview code

raw

history blame contribute delete

4.59 kB

	---
	title: Ringg Parrot STT V1
	emoji: 🦜
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: High-Accuracy Hindi Speech-to-Text System
	---
	tags:
	- speech-to-text
	- asr
	- bilingual
	- english
	- hindi
	- audio
	- transcription
	- ringg
	- real-time
	---

	# 🎙️ Ringg Parrot STT V1 :parrot:

	Bilingual Speech-to-Text for English & Hindi

	[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/RinggAI/Ringg-STT-V0)
	[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

	## 🌟 Overview

	Ringg Parrot STT V1 is a state-of-the-art speech-to-text system that provides real-time transcription for English and Hindi languages. Our model ranks 1st place among top bilingual ASR models, outperforming OpenAI Whisper Large-v3 and other leading solutions.

	## 📊 Performance Benchmarks

	\| Model \| Indic Norm WER ↓ \| Whisper Norm WER ↓ \|
	\|-------\|------------------\|---------------------\|
	\| IndicWav2Vec (Winner) \| 18.55% \| 63.31% \|
	\| Ringg Parrot STT V1 \| 21.03% \| 66.27% \|
	\| VakyanSh Wav2Vec2 \| 24.06% \| 66.34% \|
	\| Whisper Large-v3 \| 29.17% \| 63.31% \|
	\| Whisper Large-v2 \| 37.50% \| 66.27% \|

	Lower WER (Word Error Rate) indicates better accuracy. Ringg Parrot STT V1 achieves competitive performance while supporting bilingual transcription.

	## ✨ Features

	- 🌐 Bilingual Support: Native support for English and Hindi speech recognition
	- ⚡ Real-time Streaming: Instant transcription as you speak
	- 🎯 High Accuracy: 2nd place among top bilingual ASR models
	- 📁 File Upload: Support for various audio formats (WAV, MP3, FLAC, M4A, etc.)
	- 🚀 Fast Processing: Optimized for low-latency inference
	- 💬 Code-switching: Handles mixed English-Hindi speech

	## 🎯 Model Details

	\| Specification \| Details \|
	\|--------------\|---------\|
	\| Model Name \| Ringg Parrot STT V1 \|
	\| Languages \| English (EN) & Hindi (HI) \|
	\| Performance \| 2nd place among top models \|
	\| Sample Rate \| 16kHz \|


	## 🚀 Usage

	### Real-time Streaming
	1. Go to the "Real-time Streaming" tab
	2. Allow microphone permissions when prompted
	3. Start speaking in English or Hindi
	4. See real-time transcription appear

	### File Upload
	1. Go to the "File Upload" tab
	2. Upload your audio file (WAV, MP3, FLAC, M4A, etc.)
	3. Click "Transcribe"
	4. View the transcription result

	## 💡 Tips for Best Results

	- Audio Quality: Use clear audio with minimal background noise
	- Speaking Style: Speak naturally at a moderate pace
	- File Format: 16kHz or higher sample rate recommended
	- Code-switching: Model handles English-Hindi mixing, but accuracy is best when minimizing switches within sentences

	## 📊 Use Cases

	- 🤖 Voice assistants and chatbots
	- 📝 Meeting transcription
	- 🎬 Content creation and subtitling
	- ♿ Accessibility applications
	- 🔍 Voice search and commands
	- 📞 Call center automation
	- 🎓 Educational tools
	- 🌍 Multilingual communication

	## 🔧 Technical Details

	### Audio Processing
	- Input Format: Mono audio, automatically resampled to 16kHz
	- Processing: Chunked streaming with 3-second buffers
	- Latency: ~2-3 seconds for real-time streaming
	- GPU Acceleration: CUDA-enabled for faster inference

	### Supported Audio Formats
	- WAV (PCM, 16-bit, 24-bit, 32-bit)
	- MP3
	- FLAC
	- M4A
	- OGG
	- OPUS

	## 📝 Limitations

	- Works best with clear audio and minimal background noise
	- Accuracy may vary with strong accents and dialects
	- Code-switching within sentences may occasionally affect accuracy
	- Very long audio files may take longer to process


	## 📈 Performance

	- WER (Word Error Rate): Optimized for conversational speech
	- RTF (Real-Time Factor): < 0.3 on GPU (faster than real-time)
	- Languages: English & Hindi with native support

	## 🔗 Links

	- Organization: [RinggAI on Hugging Face](https://huggingface.co/RinggAI)
	- TTS Space: [Ringg TTS V0](https://huggingface.co/spaces/RinggAI/Ringg-TTS-v0.0)




	## 👥 Team

	Made with ❤️ by the RinggAI Team

	---

	Note: This model is designed for research and development purposes. For production use, please ensure compliance with your local regulations regarding speech processing and data privacy.

	\| Dependency \| Version \|
	\|------------\|---------\|
	\| gradio \| 5.49.1 \|
	\| gradio-client \| 1.13.3 \|
	\| pandas \| 2.3.3 \|
	\| requests \| 2.32.5 \|