---
title: Medscribe AI Backend
emoji: "\U0001FA7A"
colorFrom: blue
colorTo: pink
sdk: docker
app_file: app.py
pinned: false
---

# Medscribe AI Backend

[![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688?logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com/)
[![Python](https://img.shields.io/badge/Python-3.10-3776AB?logo=python&logoColor=white)](https://www.python.org/)
[![Docker](https://img.shields.io/badge/Dockerized-2496ED?logo=docker&logoColor=white)](https://www.docker.com/)
[![Hugging Face](https://img.shields.io/badge/Hosted_on-Hugging_Face-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/spaces/zeyadcode/medscribe-backend)

This repository contains the audio transcription backend for Medscribe AI, an AI medical scribe that records doctor-patient conversations, transcribes them, and generates structured follow-up summaries.

The backend is a Dockerized FastAPI service hosted on Hugging Face Spaces. It accepts audio uploads, improves audio quality with a denoising and normalization pipeline, sends the processed audio to Groq Whisper, and returns a typed JSON transcription response to the Next.js frontend.

Demo Video: [YouTube walkthrough](https://youtu.be/qco3Urdr8m8)

<a href="https://youtu.be/qco3Urdr8m8">
  <img src="https://img.youtube.com/vi/qco3Urdr8m8/hqdefault.jpg" alt="Watch the Medscribe AI demo video" width="720">
</a>

## Related Links

- Backend Space: [huggingface.co/spaces/zeyadcode/medscribe-backend](https://huggingface.co/spaces/zeyadcode/medscribe-backend)
- Production API: [zeyadcode-medscribe-backend.hf.space](https://zeyadcode-medscribe-backend.hf.space)
- Frontend App: [medscribe-ai-lilac.vercel.app/dashboard](https://medscribe-ai-lilac.vercel.app/dashboard)
- Frontend Repository: [zeyad-shaban/medscribe-ai-frontend](https://github.com/zeyad-shaban/medscribe-ai-frontend)
- Backend Repository: [zeyad-shaban/medscribe-ai-backend](https://github.com/zeyad-shaban/medscribe-ai-backend)
- Demo Video: [YouTube walkthrough](https://youtu.be/qco3Urdr8m8)

<a href="https://youtu.be/qco3Urdr8m8">
  <img src="https://img.youtube.com/vi/JsxUC4rlH0s/hqdefault.jpg" alt="Watch the Medscribe AI demo video" width="720">
</a>

## Screenshots

Add backend screenshots or diagrams to `docs/assets/` using these filenames. The README is already wired to render them once the files are added.

| API Docs | Hugging Face Space |
| --- | --- |
| ![FastAPI docs for transcription endpoint](docs/assets/api-docs.png) | ![Hugging Face Space deployment](docs/assets/hugging-face-space.png) |

| Audio Pipeline | CI/CD |
| --- | --- |
| ![Audio preprocessing and transcription pipeline](docs/assets/audio-pipeline.png) | ![GitHub Actions sync to Hugging Face Hub](docs/assets/ci-cd.png) |

## What It Does

- Exposes a FastAPI `/transcribe` endpoint for audio upload.
- Loads uploaded audio with `librosa`.
- Applies DeepFilterNet denoising before transcription.
- Normalizes RMS level and protects against clipping.
- Resamples audio to 16 kHz for the Groq transcription API.
- Sends cleaned audio to `whisper-large-v3-turbo` through Groq.
- Returns a Pydantic-validated JSON response with transcript text, filename, and duration.
- Runs as a Docker container on Hugging Face Spaces.
- Uses GitHub Actions to sync the repository to Hugging Face Hub after pushes to `main`.

## Engineering Highlights

- Production-style API boundary between the web app and transcription service.
- Audio preprocessing pipeline designed to improve transcription quality before calling the speech-to-text model.
- Containerized deployment with a Hugging Face-compatible Dockerfile and port configuration.
- Typed response schema with Pydantic for predictable frontend integration.
- CI/CD workflow that pushes backend updates from GitHub to Hugging Face Hub.
- Built as part of a solo full-stack AI project covering frontend, backend, AI orchestration, deployment, and infrastructure.

## Architecture

```mermaid
flowchart LR
    Frontend[Next.js Frontend] --> Upload[POST /transcribe]
    Upload --> FastAPI[FastAPI Service]
    FastAPI --> Decode[librosa Audio Decode]
    Decode --> Denoise[DeepFilterNet Denoising]
    Denoise --> Normalize[RMS and Peak Normalization]
    Normalize --> Resample[Resample to 16 kHz]
    Resample --> Groq[Groq Whisper large-v3-turbo]
    Groq --> Response[Pydantic TranscriptionResponse]
    Response --> Frontend
```

## Tech Stack

| Area | Technology |
| --- | --- |
| API framework | FastAPI |
| Runtime | Python 3.10 |
| Data validation | Pydantic |
| Audio loading/resampling | librosa, scipy |
| Audio denoising | DeepFilterNet |
| ML/audio runtime | PyTorch, torchaudio |
| Speech-to-text | Groq Whisper `whisper-large-v3-turbo` |
| Deployment | Docker on Hugging Face Spaces |
| CI/CD | GitHub Actions, `huggingface/hub-sync` |

## API Reference

### Health Check

```http
GET /
```

Response:

```json
{
  "status": "ok"
}
```

### Transcribe Audio

```http
POST /transcribe
Content-Type: multipart/form-data
```

Form field:

| Name | Type | Description |
| --- | --- | --- |
| `file` | audio file | Audio conversation file to clean and transcribe. |

Example response:

```json
{
  "transcript": "Patient reported...",
  "filename": "consultation.wav",
  "duration_seconds": 42.7
}
```

## Local Development

### Prerequisites

- Python 3.10
- ffmpeg
- Groq API key

### Setup

```bash
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --reload --host 127.0.0.1 --port 8000
```

On macOS/Linux, activate the virtual environment with:

```bash
source .venv/bin/activate
```

Create `.env.local`:

```bash
GROQ_SECRET_KEY=
```

Open [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) to test the FastAPI endpoint.

## Docker

Build and run locally:

```bash
docker build -t medscribe-ai-backend .
docker run --env-file .env.local -p 7860:7860 medscribe-ai-backend
```

The container listens on port `7860`, which is the expected port for Hugging Face Spaces Docker apps.

## CI/CD

The repository includes a GitHub Actions workflow at `.github/workflows/sync-to-hub.yml`.

On every push to `main`, the workflow:

1. Checks out the repository.
2. Enables Git LFS support.
3. Pushes the backend code to the Hugging Face Space using `huggingface/hub-sync`.

Required GitHub secret:

```bash
HF_TOKEN=
```

## Project Structure

```text
.
  app.py
  Dockerfile
  requirements.txt
  config/
    constants.py
    settings.py
  schemas/
    transcription.py
  utils/
    audio_cleaning.py
  notebooks/
    eda.ipynb
    deepfilternet_demo.ipynb
  .github/
    workflows/
      sync-to-hub.yml
```

## Roadmap

- Add speaker separation before or after transcription.
- Add automated tests before syncing to Hugging Face.
- Host more AI model logic directly on Hugging Face.
- Add automatic model switching and fallback behavior during API spikes.
- Expand observability around transcription latency and failure modes.

## Clinical Safety Scope

Medscribe AI is built as an AI-assisted documentation workflow. The backend provides transcription support for clinician review and should not be treated as a medical device or official medical record system by itself.