Transcription / README.md
Shubham32142
Add Docker support and implement Whisper transcription service
a4a3878
metadata
title: WhisperSelf ML Service
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

WhisperSelf ML Service

This folder contains the ML service for speech transcription using faster-whisper and FastAPI.

What This Project Includes

  • ML API service: ml/serve.py
  • Model config: ml/config.yaml
  • Python dependencies: ml/requirements.txt
  • Fine-tuning scripts: ml/finetune/
  • Model download helper: scripts/download_model.py
  • Docker files: docker/Dockerfile.ml and docker/docker-compose.yml

1) Run Locally (Python)

Prerequisites

  • Python 3.11+
  • ffmpeg installed and available in PATH

Setup

cd Transcription
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r ml\requirements.txt

Download model weights

python scripts\download_model.py --model large-v3 --output .\models

Start ML server

cd ml
uvicorn serve:app --host 0.0.0.0 --port 8000 --reload

Health check

Open:

Test transcription endpoint

curl.exe -X POST "http://localhost:8000/transcribe" ^
    -F "file=@C:\path\to\audio.wav" ^
    -F "model=small" ^
    -F "language=auto" ^
    -F "task=transcribe"

2) Run With Docker

From the docker folder:

cd Transcription\docker
docker compose up --build ml

ML service will be available at:

3) Environment Variables (Important)

These can be set in your host environment or container:

  • MODEL_PATH (default: ../models/large-v3)
  • WHISPER_DEVICE (default: cpu)
  • WHISPER_COMPUTE_TYPE (default: int8)
  • WHISPER_LANGUAGE (default: en)
  • WHISPER_TASK (default: transcribe)
  • WHISPER_BEAM_SIZE (default: 1)
  • WHISPER_BEST_OF (default: 1)
  • WHISPER_VAD_FILTER (default: true)
  • WHISPER_CONDITION_ON_PREVIOUS_TEXT (default: false)
  • WHISPER_CPU_THREADS (default: number of CPUs)
  • WHISPER_NUM_WORKERS (default: 1)
  • JOB_RETENTION_SECONDS (default: 3600)

4) Host On Hugging Face Space (Docker)

  1. Create a new Hugging Face Space and choose Docker SDK.
  2. Push this folder content to that Space repository.
  3. Ensure there is a Dockerfile at repository root (Hugging Face builds from root).
  4. Expose port 8000 in Dockerfile.
  5. Start command should run uvicorn serve:app --host 0.0.0.0 --port 8000.

If your Space is only for inference and models are large, prefer downloading model weights at build/start time or using a smaller model (small/base) to avoid storage and startup issues.

5) API Endpoints

  • GET /health
  • POST /transcribe
  • POST /transcribe/jobs
  • GET /transcribe/jobs/{job_id}
  • DELETE /transcribe/jobs/{job_id}

6) Common Errors and Fixes

  • Error: app.py not found on Hugging Face

    • Cause: Space configured as Gradio/Streamlit instead of Docker.
    • Fix: Use sdk: docker and provide a root Dockerfile.
  • Error: No module named faster_whisper

    • Fix: pip install -r ml/requirements.txt
  • Error: ffmpeg not found

    • Fix: Install ffmpeg on host or use Docker image that installs ffmpeg.
  • Slow startup / memory issues

    • Fix: Use model=small and WHISPER_COMPUTE_TYPE=int8.

7) Quick Production Tips

  • Keep model small or medium for free-tier hosting.
  • Add request timeout and upload-size limits in reverse proxy.
  • Keep health checks enabled on /health.
  • Monitor disk usage when caching model weights.