Spaces:
Sleeping
title: Medscribe AI Backend
emoji: 🩺
colorFrom: blue
colorTo: pink
sdk: docker
app_file: app.py
pinned: false
Medscribe AI Backend
This repository contains the audio transcription backend for Medscribe AI, an AI medical scribe that records doctor-patient conversations, transcribes them, and generates structured follow-up summaries.
The backend is a Dockerized FastAPI service hosted on Hugging Face Spaces. It accepts audio uploads, improves audio quality with a denoising and normalization pipeline, sends the processed audio to Groq Whisper, and returns a typed JSON transcription response to the Next.js frontend.
Demo Video: YouTube walkthrough
Related Links
- Backend Space: huggingface.co/spaces/zeyadcode/medscribe-backend
- Production API: zeyadcode-medscribe-backend.hf.space
- Frontend App: medscribe-ai-lilac.vercel.app/dashboard
- Frontend Repository: zeyad-shaban/medscribe-ai-frontend
- Backend Repository: zeyad-shaban/medscribe-ai-backend
- Demo Video: YouTube walkthrough
Screenshots
Add backend screenshots or diagrams to docs/assets/ using these filenames. The README is already wired to render them once the files are added.
What It Does
- Exposes a FastAPI
/transcribeendpoint for audio upload. - Loads uploaded audio with
librosa. - Applies DeepFilterNet denoising before transcription.
- Normalizes RMS level and protects against clipping.
- Resamples audio to 16 kHz for the Groq transcription API.
- Sends cleaned audio to
whisper-large-v3-turbothrough Groq. - Returns a Pydantic-validated JSON response with transcript text, filename, and duration.
- Runs as a Docker container on Hugging Face Spaces.
- Uses GitHub Actions to sync the repository to Hugging Face Hub after pushes to
main.
Engineering Highlights
- Production-style API boundary between the web app and transcription service.
- Audio preprocessing pipeline designed to improve transcription quality before calling the speech-to-text model.
- Containerized deployment with a Hugging Face-compatible Dockerfile and port configuration.
- Typed response schema with Pydantic for predictable frontend integration.
- CI/CD workflow that pushes backend updates from GitHub to Hugging Face Hub.
- Built as part of a solo full-stack AI project covering frontend, backend, AI orchestration, deployment, and infrastructure.
Architecture
flowchart LR
Frontend[Next.js Frontend] --> Upload[POST /transcribe]
Upload --> FastAPI[FastAPI Service]
FastAPI --> Decode[librosa Audio Decode]
Decode --> Denoise[DeepFilterNet Denoising]
Denoise --> Normalize[RMS and Peak Normalization]
Normalize --> Resample[Resample to 16 kHz]
Resample --> Groq[Groq Whisper large-v3-turbo]
Groq --> Response[Pydantic TranscriptionResponse]
Response --> Frontend
Tech Stack
| Area | Technology |
|---|---|
| API framework | FastAPI |
| Runtime | Python 3.10 |
| Data validation | Pydantic |
| Audio loading/resampling | librosa, scipy |
| Audio denoising | DeepFilterNet |
| ML/audio runtime | PyTorch, torchaudio |
| Speech-to-text | Groq Whisper whisper-large-v3-turbo |
| Deployment | Docker on Hugging Face Spaces |
| CI/CD | GitHub Actions, huggingface/hub-sync |
API Reference
Health Check
GET /
Response:
{
"status": "ok"
}
Transcribe Audio
POST /transcribe
Content-Type: multipart/form-data
Form field:
| Name | Type | Description |
|---|---|---|
file |
audio file | Audio conversation file to clean and transcribe. |
Example response:
{
"transcript": "Patient reported...",
"filename": "consultation.wav",
"duration_seconds": 42.7
}
Local Development
Prerequisites
- Python 3.10
- ffmpeg
- Groq API key
Setup
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --reload --host 127.0.0.1 --port 8000
On macOS/Linux, activate the virtual environment with:
source .venv/bin/activate
Create .env.local:
GROQ_SECRET_KEY=
Open http://127.0.0.1:8000/docs to test the FastAPI endpoint.
Docker
Build and run locally:
docker build -t medscribe-ai-backend .
docker run --env-file .env.local -p 7860:7860 medscribe-ai-backend
The container listens on port 7860, which is the expected port for Hugging Face Spaces Docker apps.
CI/CD
The repository includes a GitHub Actions workflow at .github/workflows/sync-to-hub.yml.
On every push to main, the workflow:
- Checks out the repository.
- Enables Git LFS support.
- Pushes the backend code to the Hugging Face Space using
huggingface/hub-sync.
Required GitHub secret:
HF_TOKEN=
Project Structure
.
app.py
Dockerfile
requirements.txt
config/
constants.py
settings.py
schemas/
transcription.py
utils/
audio_cleaning.py
notebooks/
eda.ipynb
deepfilternet_demo.ipynb
.github/
workflows/
sync-to-hub.yml
Roadmap
- Add speaker separation before or after transcription.
- Add automated tests before syncing to Hugging Face.
- Host more AI model logic directly on Hugging Face.
- Add automatic model switching and fallback behavior during API spikes.
- Expand observability around transcription latency and failure modes.
Clinical Safety Scope
Medscribe AI is built as an AI-assisted documentation workflow. The backend provides transcription support for clinician review and should not be treated as a medical device or official medical record system by itself.



