A newer version of the Gradio SDK is available:
6.2.0
metadata
description: Check installed STT apps and suggest installations including local Whisper
tags:
- ai
- stt
- whisper
- speech-recognition
- audio
- project
- gitignored
You are helping the user set up speech-to-text applications including local Whisper.
Process
Check currently installed STT apps
- System packages:
dpkg -l | grep -E "whisper|speech|voice" - Python packages:
pip list | grep -E "whisper|speech|vosk" - Check
~/programs/ai-ml/for installed apps
- System packages:
Suggest STT installation candidates
Whisper (OpenAI) - Recommended:
- Best quality, local inference
- Multiple model sizes available
- Multilingual support
Other options:
- Vosk - Lightweight, offline
- Coqui STT - Mozilla's solution
- SpeechNote - Simple GUI
- Subtitle Edit - Video subtitling
- Subtld - Automatic subtitles
Install Whisper (local)
Method 1: Using pip (simple)
pip install openai-whisperMethod 2: Using conda (recommended)
conda create -n whisper python=3.11 -y conda activate whisper pip install openai-whisperInstall dependencies:
# For audio processing sudo apt install ffmpeg pip install setuptools-rustInstall faster-whisper (optimized)
pip install faster-whisper- Uses CTranslate2 for faster inference
- Lower VRAM usage
Install WhisperX (advanced)
pip install whisperx- Includes alignment and diarization
- Better timestamps
Download Whisper models
- Models are downloaded automatically on first use
- Sizes: tiny, base, small, medium, large
- Suggest based on VRAM:
- < 4GB: tiny or base
- 4-8GB: small or medium
- 8GB+: large
Test installation
whisper audio.mp3 --model base --language enInstall GUI options
Whisper Desktop:
- Check if available as AppImage or Flatpak
Subtitle Edit:
sudo apt install subtitleeditorCustom GUI:
- Suggest installing gradio-based Whisper UIs
Create helper script
- Offer to create
~/scripts/transcribe.sh:#!/bin/bash whisper "$1" --model medium --language en --output_format txt
- Offer to create
Suggest workflows
- Real-time transcription
- Batch processing
- Video subtitling
- Meeting transcription
Output
Provide a summary showing:
- Currently installed STT applications
- Whisper installation status and model sizes
- GPU acceleration status
- Suggested models based on hardware
- Example commands for transcription
- GUI options available
- Helper scripts created