Kaiyeee's picture
Update README.md
a628757 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Medical Document Summarizer
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.23.2
app_file: app.py
pinned: false
license: mit
short_description: Upload your files and get a brief summary!

Medical Document Summarizer

This project is designed to automatically extract and summarize key information from clinical trial documents (e.g., PDF files of research articles) using state-of-the-art NLP models. The pipeline leverages the BigBird-Pegasus model for long-form summarization and includes content filtering, text cleaning, and post-processing to produce concise bullet-point and paragraph summaries.

Features

Note: User has to upload medical document into the file directory before running the model.

  • PDF Extraction: Reads and filters PDF files to capture only pages with core content (e.g., Abstract, Methods, Results, Conclusions).
  • Text Cleaning: Removes noisy metadata, citations, and excess whitespace.
  • Core Section Extraction: Attempts to identify and extract important sections using regex; falls back to header removal when sections are not detected.
  • Chunking & Summarization: Splits the text into manageable chunks and uses the BigBird-Pegasus summarization model for each chunk.
  • Post-Processing: Formats the final summary into bullet points and neatly wraps it into a paragraph.
  • Modular and Extensible: Each step is modular, making it easy to adjust, extend, or integrate with other systems.

Requirements

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/Medical_Doc_Summarization.git
    cd Medical_Doc_Summarization