danielrosehill's picture
commit
279efce
---
description: Check installed STT apps and suggest installations including local Whisper
tags: [ai, stt, whisper, speech-recognition, audio, project, gitignored]
---
You are helping the user set up speech-to-text applications including local Whisper.
## Process
1. **Check currently installed STT apps**
- System packages: `dpkg -l | grep -E "whisper|speech|voice"`
- Python packages: `pip list | grep -E "whisper|speech|vosk"`
- Check `~/programs/ai-ml/` for installed apps
2. **Suggest STT installation candidates**
**Whisper (OpenAI) - Recommended:**
- Best quality, local inference
- Multiple model sizes available
- Multilingual support
**Other options:**
- Vosk - Lightweight, offline
- Coqui STT - Mozilla's solution
- SpeechNote - Simple GUI
- Subtitle Edit - Video subtitling
- Subtld - Automatic subtitles
3. **Install Whisper (local)**
**Method 1: Using pip (simple)**
```bash
pip install openai-whisper
```
**Method 2: Using conda (recommended)**
```bash
conda create -n whisper python=3.11 -y
conda activate whisper
pip install openai-whisper
```
**Install dependencies:**
```bash
# For audio processing
sudo apt install ffmpeg
pip install setuptools-rust
```
4. **Install faster-whisper (optimized)**
```bash
pip install faster-whisper
```
- Uses CTranslate2 for faster inference
- Lower VRAM usage
5. **Install WhisperX (advanced)**
```bash
pip install whisperx
```
- Includes alignment and diarization
- Better timestamps
6. **Download Whisper models**
- Models are downloaded automatically on first use
- Sizes: tiny, base, small, medium, large
- Suggest based on VRAM:
- < 4GB: tiny or base
- 4-8GB: small or medium
- 8GB+: large
7. **Test installation**
```bash
whisper audio.mp3 --model base --language en
```
8. **Install GUI options**
**Whisper Desktop:**
- Check if available as AppImage or Flatpak
**Subtitle Edit:**
```bash
sudo apt install subtitleeditor
```
**Custom GUI:**
- Suggest installing gradio-based Whisper UIs
9. **Create helper script**
- Offer to create `~/scripts/transcribe.sh`:
```bash
#!/bin/bash
whisper "$1" --model medium --language en --output_format txt
```
10. **Suggest workflows**
- Real-time transcription
- Batch processing
- Video subtitling
- Meeting transcription
## Output
Provide a summary showing:
- Currently installed STT applications
- Whisper installation status and model sizes
- GPU acceleration status
- Suggested models based on hardware
- Example commands for transcription
- GUI options available
- Helper scripts created