Spaces:

Kaiyeee
/

Medical_Document_Summarizer

Sleeping

App Files Files Community

Kaiyeee commited on Apr 1, 2025

Commit

a628757

verified ·

1 Parent(s): 31ac0f5

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -4

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 title: Medical Document Summarizer
-emoji: 🦀
-colorFrom: indigo
-colorTo: red
 sdk: gradio
 sdk_version: 5.23.2
 app_file: app.py
@@ -11,4 +11,35 @@ license: mit
 short_description: Upload your files and get a brief summary!
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Medical Document Summarizer
+emoji: 🔥
+colorFrom: yellow
+colorTo: green
 sdk: gradio
 sdk_version: 5.23.2
 app_file: app.py
 short_description: Upload your files and get a brief summary!
 ---
+# Medical Document Summarizer
+This project is designed to automatically extract and summarize key information from clinical trial documents (e.g., PDF files of research articles) using state-of-the-art NLP models. The pipeline leverages the BigBird-Pegasus model for long-form summarization and includes content filtering, text cleaning, and post-processing to produce concise bullet-point and paragraph summaries.
+## Features
+*Note*: User has to upload medical document into the file directory before running the model.
+- **PDF Extraction:** Reads and filters PDF files to capture only pages with core content (e.g., Abstract, Methods, Results, Conclusions).
+- **Text Cleaning:** Removes noisy metadata, citations, and excess whitespace.
+- **Core Section Extraction:** Attempts to identify and extract important sections using regex; falls back to header removal when sections are not detected.
+- **Chunking & Summarization:** Splits the text into manageable chunks and uses the BigBird-Pegasus summarization model for each chunk.
+- **Post-Processing:** Formats the final summary into bullet points and neatly wraps it into a paragraph.
+- **Modular and Extensible:** Each step is modular, making it easy to adjust, extend, or integrate with other systems.
+## Requirements
+- Python 3.7+
+- [spaCy](https://spacy.io/) with the `en_core_web_sm` model
+- [NLTK](https://www.nltk.org/) (with the `punkt` tokenizer)
+- [Transformers](https://huggingface.co/transformers/)
+- [PyMuPDF](https://pymupdf.readthedocs.io/en/latest/)
+- [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
+## Installation
+1. **Clone the repository:**
+   ```bash
+   git clone https://github.com/yourusername/Medical_Doc_Summarization.git
+   cd Medical_Doc_Summarization
+    ```