Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Parakeet ASR for Cadayn
emoji: 🦜
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: apache-2.0
hardware: zero-a10g
python_version: '3.10'
Parakeet TDT Dedicated GPU Space
Ultra-fast ASR using NVIDIA Parakeet TDT 0.6B on a dedicated A10G GPU.
Features
- 3000x Faster than Whisper: Transcribe 1 hour in ~1 second on GPU
- Word-Level Timestamps: Precise timing for every word
- Long Audio Support: Auto-chunking for files >10 minutes
- Always-On: Model loaded at startup, no cold start per request
- State-of-the-Art Accuracy: Best WER on LibriSpeech benchmark
- Multi-Format Support: MP3, WAV, M4A, MP4, WebM, OGG, FLAC
- No Quota Limits: Dedicated A10G GPU — no ZeroGPU quota constraints
API Usage
Full Transcription
from gradio_client import Client
import base64
client = Client("Cadayn/parakeet-zerogpu")
# From URL
result = client.predict(
audio_url="https://example.com/audio.mp3",
api_name="/api_transcribe"
)
print(result)
# {"success": True, "text": "...", "words": [{"start_s": 0.0, "end_s": 0.5, "text": "Hello"}, ...]}
# From file
with open("audio.mp3", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
result = client.predict(
audio_base64=audio_b64,
api_name="/api_transcribe"
)
Segment Transcription (for streaming)
# For real-time or segment-by-segment processing
with open("segment.wav", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
result = client.predict(
audio_base64=audio_b64,
start_s=30.0, # Offset for timestamp alignment
api_name="/api_transcribe_segment"
)
print(result)
# {"success": True, "words": [{"start_s": 30.5, "end_s": 31.0, "text": "word"}, ...]}
Model
Uses nvidia/parakeet-tdt-0.6b-v3:
- 600M parameters
- TDT (Token-and-Duration Transducer) architecture
- 25 European languages with automatic language detection
- Long audio support (up to 24 min full attention, 3 hours with local attention)
- Runs on dedicated A10G Small (24 GB VRAM)
Performance
| Audio Length | Transcription Time |
|---|---|
| 1 minute | ~20ms |
| 10 minutes | ~200ms |
| 1 hour | ~1.2s |
Integration with EagleEye
Set environment variable in EagleEye:
PARAKEET_HF_SPACE_URL=https://cadayn-parakeet-zerogpu.hf.space
Limitations
- Dedicated A10G Small GPU ($1.00/hr billed while Space is running)
- Pause the Space from HuggingFace settings when not in use to save credits