Spaces:

magboola
/

parakeet-zerogpu

Running on Zero

App Files Files Community

parakeet-zerogpu / README.md

magboola

Fix ZeroGPU: move model.to(cuda) into @spaces.GPU functions

e73855d about 2 months ago

preview code

raw

history blame contribute delete

2.66 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Parakeet ASR for Cadayn
emoji: 🦜
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.42.0
app_file: app.py
pinned: false
license: apache-2.0
hardware: zero-a10g
python_version: '3.10'

Parakeet TDT Dedicated GPU Space

Ultra-fast ASR using NVIDIA Parakeet TDT 0.6B on a dedicated A10G GPU.

Features

3000x Faster than Whisper: Transcribe 1 hour in ~1 second on GPU
Word-Level Timestamps: Precise timing for every word
Long Audio Support: Auto-chunking for files >10 minutes
Always-On: Model loaded at startup, no cold start per request
State-of-the-Art Accuracy: Best WER on LibriSpeech benchmark
Multi-Format Support: MP3, WAV, M4A, MP4, WebM, OGG, FLAC
No Quota Limits: Dedicated A10G GPU — no ZeroGPU quota constraints

API Usage

Full Transcription

from gradio_client import Client
import base64

client = Client("Cadayn/parakeet-zerogpu")

# From URL
result = client.predict(
    audio_url="https://example.com/audio.mp3",
    api_name="/api_transcribe"
)
print(result)
# {"success": True, "text": "...", "words": [{"start_s": 0.0, "end_s": 0.5, "text": "Hello"}, ...]}

# From file
with open("audio.mp3", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

result = client.predict(
    audio_base64=audio_b64,
    api_name="/api_transcribe"
)

Segment Transcription (for streaming)

# For real-time or segment-by-segment processing
with open("segment.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

result = client.predict(
    audio_base64=audio_b64,
    start_s=30.0,  # Offset for timestamp alignment
    api_name="/api_transcribe_segment"
)
print(result)
# {"success": True, "words": [{"start_s": 30.5, "end_s": 31.0, "text": "word"}, ...]}

Model

Uses nvidia/parakeet-tdt-0.6b-v3:

600M parameters
TDT (Token-and-Duration Transducer) architecture
25 European languages with automatic language detection
Long audio support (up to 24 min full attention, 3 hours with local attention)
Runs on dedicated A10G Small (24 GB VRAM)

Performance

Audio Length	Transcription Time
1 minute	~20ms
10 minutes	~200ms
1 hour	~1.2s

Integration with EagleEye

Set environment variable in EagleEye:

PARAKEET_HF_SPACE_URL=https://cadayn-parakeet-zerogpu.hf.space

Limitations

Dedicated A10G Small GPU ($1.00/hr billed while Space is running)
Pause the Space from HuggingFace settings when not in use to save credits