Transcription / README.md
Shubham32142
Add Docker support and implement Whisper transcription service
a4a3878
---
title: WhisperSelf ML Service
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---
# WhisperSelf ML Service
This folder contains the ML service for speech transcription using faster-whisper and FastAPI.
## What This Project Includes
- ML API service: ml/serve.py
- Model config: ml/config.yaml
- Python dependencies: ml/requirements.txt
- Fine-tuning scripts: ml/finetune/
- Model download helper: scripts/download_model.py
- Docker files: docker/Dockerfile.ml and docker/docker-compose.yml
## 1) Run Locally (Python)
### Prerequisites
- Python 3.11+
- ffmpeg installed and available in PATH
### Setup
```powershell
cd Transcription
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r ml\requirements.txt
```
### Download model weights
```powershell
python scripts\download_model.py --model large-v3 --output .\models
```
### Start ML server
```powershell
cd ml
uvicorn serve:app --host 0.0.0.0 --port 8000 --reload
```
### Health check
Open:
- http://localhost:8000/health
### Test transcription endpoint
```powershell
curl.exe -X POST "http://localhost:8000/transcribe" ^
-F "file=@C:\path\to\audio.wav" ^
-F "model=small" ^
-F "language=auto" ^
-F "task=transcribe"
```
## 2) Run With Docker
From the docker folder:
```powershell
cd Transcription\docker
docker compose up --build ml
```
ML service will be available at:
- http://localhost:8000
## 3) Environment Variables (Important)
These can be set in your host environment or container:
- MODEL_PATH (default: ../models/large-v3)
- WHISPER_DEVICE (default: cpu)
- WHISPER_COMPUTE_TYPE (default: int8)
- WHISPER_LANGUAGE (default: en)
- WHISPER_TASK (default: transcribe)
- WHISPER_BEAM_SIZE (default: 1)
- WHISPER_BEST_OF (default: 1)
- WHISPER_VAD_FILTER (default: true)
- WHISPER_CONDITION_ON_PREVIOUS_TEXT (default: false)
- WHISPER_CPU_THREADS (default: number of CPUs)
- WHISPER_NUM_WORKERS (default: 1)
- JOB_RETENTION_SECONDS (default: 3600)
## 4) Host On Hugging Face Space (Docker)
1. Create a new Hugging Face Space and choose Docker SDK.
2. Push this folder content to that Space repository.
3. Ensure there is a Dockerfile at repository root (Hugging Face builds from root).
4. Expose port 8000 in Dockerfile.
5. Start command should run uvicorn serve:app --host 0.0.0.0 --port 8000.
If your Space is only for inference and models are large, prefer downloading model weights at build/start time or using a smaller model (small/base) to avoid storage and startup issues.
## 5) API Endpoints
- GET /health
- POST /transcribe
- POST /transcribe/jobs
- GET /transcribe/jobs/{job_id}
- DELETE /transcribe/jobs/{job_id}
## 6) Common Errors and Fixes
- Error: app.py not found on Hugging Face
- Cause: Space configured as Gradio/Streamlit instead of Docker.
- Fix: Use sdk: docker and provide a root Dockerfile.
- Error: No module named faster_whisper
- Fix: pip install -r ml/requirements.txt
- Error: ffmpeg not found
- Fix: Install ffmpeg on host or use Docker image that installs ffmpeg.
- Slow startup / memory issues
- Fix: Use model=small and WHISPER_COMPUTE_TYPE=int8.
## 7) Quick Production Tips
- Keep model small or medium for free-tier hosting.
- Add request timeout and upload-size limits in reverse proxy.
- Keep health checks enabled on /health.
- Monitor disk usage when caching model weights.