|
|
--- |
|
|
description: Check installed STT apps and suggest installations including local Whisper |
|
|
tags: [ai, stt, whisper, speech-recognition, audio, project, gitignored] |
|
|
--- |
|
|
|
|
|
You are helping the user set up speech-to-text applications including local Whisper. |
|
|
|
|
|
## Process |
|
|
|
|
|
1. **Check currently installed STT apps** |
|
|
- System packages: `dpkg -l | grep -E "whisper|speech|voice"` |
|
|
- Python packages: `pip list | grep -E "whisper|speech|vosk"` |
|
|
- Check `~/programs/ai-ml/` for installed apps |
|
|
|
|
|
2. **Suggest STT installation candidates** |
|
|
|
|
|
**Whisper (OpenAI) - Recommended:** |
|
|
- Best quality, local inference |
|
|
- Multiple model sizes available |
|
|
- Multilingual support |
|
|
|
|
|
**Other options:** |
|
|
- Vosk - Lightweight, offline |
|
|
- Coqui STT - Mozilla's solution |
|
|
- SpeechNote - Simple GUI |
|
|
- Subtitle Edit - Video subtitling |
|
|
- Subtld - Automatic subtitles |
|
|
|
|
|
3. **Install Whisper (local)** |
|
|
|
|
|
**Method 1: Using pip (simple)** |
|
|
```bash |
|
|
pip install openai-whisper |
|
|
``` |
|
|
|
|
|
**Method 2: Using conda (recommended)** |
|
|
```bash |
|
|
conda create -n whisper python=3.11 -y |
|
|
conda activate whisper |
|
|
pip install openai-whisper |
|
|
``` |
|
|
|
|
|
**Install dependencies:** |
|
|
```bash |
|
|
# For audio processing |
|
|
sudo apt install ffmpeg |
|
|
pip install setuptools-rust |
|
|
``` |
|
|
|
|
|
4. **Install faster-whisper (optimized)** |
|
|
```bash |
|
|
pip install faster-whisper |
|
|
``` |
|
|
- Uses CTranslate2 for faster inference |
|
|
- Lower VRAM usage |
|
|
|
|
|
5. **Install WhisperX (advanced)** |
|
|
```bash |
|
|
pip install whisperx |
|
|
``` |
|
|
- Includes alignment and diarization |
|
|
- Better timestamps |
|
|
|
|
|
6. **Download Whisper models** |
|
|
- Models are downloaded automatically on first use |
|
|
- Sizes: tiny, base, small, medium, large |
|
|
- Suggest based on VRAM: |
|
|
- < 4GB: tiny or base |
|
|
- 4-8GB: small or medium |
|
|
- 8GB+: large |
|
|
|
|
|
7. **Test installation** |
|
|
```bash |
|
|
whisper audio.mp3 --model base --language en |
|
|
``` |
|
|
|
|
|
8. **Install GUI options** |
|
|
|
|
|
**Whisper Desktop:** |
|
|
- Check if available as AppImage or Flatpak |
|
|
|
|
|
**Subtitle Edit:** |
|
|
```bash |
|
|
sudo apt install subtitleeditor |
|
|
``` |
|
|
|
|
|
**Custom GUI:** |
|
|
- Suggest installing gradio-based Whisper UIs |
|
|
|
|
|
9. **Create helper script** |
|
|
- Offer to create `~/scripts/transcribe.sh`: |
|
|
```bash |
|
|
#!/bin/bash |
|
|
whisper "$1" --model medium --language en --output_format txt |
|
|
``` |
|
|
|
|
|
10. **Suggest workflows** |
|
|
- Real-time transcription |
|
|
- Batch processing |
|
|
- Video subtitling |
|
|
- Meeting transcription |
|
|
|
|
|
## Output |
|
|
|
|
|
Provide a summary showing: |
|
|
- Currently installed STT applications |
|
|
- Whisper installation status and model sizes |
|
|
- GPU acceleration status |
|
|
- Suggested models based on hardware |
|
|
- Example commands for transcription |
|
|
- GUI options available |
|
|
- Helper scripts created |
|
|
|