Spaces:
No application file
No application file
metadata
license: apache-2.0
title: all in one transcribe
sdk: docker
emoji: 🏆
colorFrom: gray
colorTo: red
short_description: multiple file transcription
# Quick-Start Medical Transcription (Whisper small) — Multi-file -> Merged DOCX
This repository provides a simple, user-friendly transcription service:
- Web UI for multiple audio upload
- Background model loading so server becomes responsive quickly
- Transcription using Whisper (default: small) with light medical postprocessing
- Returns a single merged Word document (.docx) with per-file transcripts
Prerequisites
- Docker + docker-compose OR Python 3.10+ and pip
- For CPU-only: this setup uses the CPU PyTorch wheel. If you have a GPU, change the PyTorch wheel in requirements and set WHISPER_MODEL accordingly.
Quick: Run with Docker Compose (recommended)
1) Build and start (no preloading, smaller image):
docker-compose up --build
The model will be downloaded on first container run; check readiness:
curl -i http://localhost:5000/ready # 503 until model is loaded
2) Preload model into image (optional; faster startup, larger image):
# Change PRELOAD_MODEL to true when building to include model weights in the image:
docker-compose build --build-arg PRELOAD_MODEL=true
docker-compose up -d
3) Use the UI:
Open http://localhost:5000 in your browser, upload multiple audio files and click "Upload and Merge".
Or call the API:
curl -F "files=@a.wav" -F "files=@b.mp3" http://localhost:5000/transcribe --output merged_transcripts.docx
Run locally without Docker
1) Create virtualenv and install:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
2) Run:
python app.py
Notes and next steps
- The default model is `small` for fast startup; change WHISPER_MODEL env var to medium for slightly higher accuracy.
- For production / PHI handling:
- Use TLS, authentication, and private networking.
- Use a vetted PHI de-identification pipeline (medspaCy or transformer NER).
- Consider preloading the model at build time to avoid long first-start delays.
- To improve accuracy for medical terms:
- Add or expand `medical_vocab.txt`.
- Later upgrade to wav2vec2 fine-tuned + KenLM if you collect labeled medical audio.
If you want, I can:
- Provide a variant using faster-whisper/ONNX for much faster CPU inference.
- Add medspaCy-based PHI redaction integrated into the pipeline.
- Add speaker diarization (pyannote) and label physician vs patient sections in the merged docx.