Spaces:

zeyadcode
/

medscribe-backend

Sleeping

App Files Files Community

medscribe-backend / README.md

zeyadcode

Sync from GitHub via hub-sync

213a7ef verified about 1 month ago

preview code

Raw

History Blame Contribute Delete

7.21 kB

metadata

title: Medscribe AI Backend
emoji: 🩺
colorFrom: blue
colorTo: pink
sdk: docker
app_file: app.py
pinned: false

Medscribe AI Backend

This repository contains the audio transcription backend for Medscribe AI, an AI medical scribe that records doctor-patient conversations, transcribes them, and generates structured follow-up summaries.

The backend is a Dockerized FastAPI service hosted on Hugging Face Spaces. It accepts audio uploads, improves audio quality with a denoising and normalization pipeline, sends the processed audio to Groq Whisper, and returns a typed JSON transcription response to the Next.js frontend.

Demo Video: YouTube walkthrough

Screenshots

Add backend screenshots or diagrams to docs/assets/ using these filenames. The README is already wired to render them once the files are added.

API Docs	Hugging Face Space

Audio Pipeline	CI/CD

What It Does

Exposes a FastAPI /transcribe endpoint for audio upload.
Loads uploaded audio with librosa.
Applies DeepFilterNet denoising before transcription.
Normalizes RMS level and protects against clipping.
Resamples audio to 16 kHz for the Groq transcription API.
Sends cleaned audio to whisper-large-v3-turbo through Groq.
Returns a Pydantic-validated JSON response with transcript text, filename, and duration.
Runs as a Docker container on Hugging Face Spaces.
Uses GitHub Actions to sync the repository to Hugging Face Hub after pushes to main.

Engineering Highlights

Production-style API boundary between the web app and transcription service.
Audio preprocessing pipeline designed to improve transcription quality before calling the speech-to-text model.
Containerized deployment with a Hugging Face-compatible Dockerfile and port configuration.
Typed response schema with Pydantic for predictable frontend integration.
CI/CD workflow that pushes backend updates from GitHub to Hugging Face Hub.
Built as part of a solo full-stack AI project covering frontend, backend, AI orchestration, deployment, and infrastructure.

Architecture

flowchart LR
    Frontend[Next.js Frontend] --> Upload[POST /transcribe]
    Upload --> FastAPI[FastAPI Service]
    FastAPI --> Decode[librosa Audio Decode]
    Decode --> Denoise[DeepFilterNet Denoising]
    Denoise --> Normalize[RMS and Peak Normalization]
    Normalize --> Resample[Resample to 16 kHz]
    Resample --> Groq[Groq Whisper large-v3-turbo]
    Groq --> Response[Pydantic TranscriptionResponse]
    Response --> Frontend

Tech Stack

Area	Technology
API framework	FastAPI
Runtime	Python 3.10
Data validation	Pydantic
Audio loading/resampling	librosa, scipy
Audio denoising	DeepFilterNet
ML/audio runtime	PyTorch, torchaudio
Speech-to-text	Groq Whisper `whisper-large-v3-turbo`
Deployment	Docker on Hugging Face Spaces
CI/CD	GitHub Actions, `huggingface/hub-sync`

API Reference

Health Check

GET /

Response:

{
  "status": "ok"
}

Transcribe Audio

POST /transcribe
Content-Type: multipart/form-data

Form field:

Name	Type	Description
`file`	audio file	Audio conversation file to clean and transcribe.

Example response:

{
  "transcript": "Patient reported...",
  "filename": "consultation.wav",
  "duration_seconds": 42.7
}

Local Development

Prerequisites

Python 3.10
ffmpeg
Groq API key

Setup

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --reload --host 127.0.0.1 --port 8000

On macOS/Linux, activate the virtual environment with:

source .venv/bin/activate

Create .env.local:

GROQ_SECRET_KEY=

Open http://127.0.0.1:8000/docs to test the FastAPI endpoint.

Docker

Build and run locally:

docker build -t medscribe-ai-backend .
docker run --env-file .env.local -p 7860:7860 medscribe-ai-backend

The container listens on port 7860, which is the expected port for Hugging Face Spaces Docker apps.

CI/CD

The repository includes a GitHub Actions workflow at .github/workflows/sync-to-hub.yml.

On every push to main, the workflow:

Checks out the repository.
Enables Git LFS support.
Pushes the backend code to the Hugging Face Space using huggingface/hub-sync.

Required GitHub secret:

HF_TOKEN=

Project Structure

.
  app.py
  Dockerfile
  requirements.txt
  config/
    constants.py
    settings.py
  schemas/
    transcription.py
  utils/
    audio_cleaning.py
  notebooks/
    eda.ipynb
    deepfilternet_demo.ipynb
  .github/
    workflows/
      sync-to-hub.yml

Roadmap

Add speaker separation before or after transcription.
Add automated tests before syncing to Hugging Face.
Host more AI model logic directly on Hugging Face.
Add automatic model switching and fallback behavior during API spikes.
Expand observability around transcription latency and failure modes.

Clinical Safety Scope

Medscribe AI is built as an AI-assisted documentation workflow. The backend provides transcription support for clinician review and should not be treated as a medical device or official medical record system by itself.