Spaces:

muhammedpanchla
/

CheXReport

Building

App Files Files Community

CheXReport / README.md

muhammedpanchla

Upload 3 files

db3f50e verified about 1 month ago

preview code

raw

history blame contribute delete

8.33 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: CheXReport AI
emoji: 🫁
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true

🫁 CheXReport AI

Multimodal Chest X-Ray Report Generation

DenseNet121 · Projection Layer · BioGPT · Beam Search

Built by Muhammed Panchla · Flowgenix AI

Overview

CheXReport AI is an end-to-end multimodal deep learning system that takes a chest X-ray image as input and automatically generates a structured radiology findings report in clinical language.

The system bridges computer vision and natural language generation through a custom-designed Projection Layer — a learned bridge that translates visual embeddings from a DenseNet121 vision encoder directly into the token space of Microsoft's BioGPT language model, which was pretrained on 15 million PubMed biomedical abstracts.

This is not a template-based system. Every report is generated from scratch based on the visual features of the input image, decoded using beam search for fluent, non-repetitive clinical text.

Architecture

[Chest X-Ray Image]
        ↓
[DenseNet121 Vision Encoder]  →  (batch, 1024)
        ↓
[Projection Layer — The Bridge]
  LayerNorm → Linear → GELU → Dropout → Linear
        ↓  (batch, 1, 1024)
[BioGPT Language Model]
  Visual prefix prepended to token embeddings
        ↓
[Beam Search Decoding]
  num_beams=4 · no_repeat_ngram_size=3
        ↓
[Radiology Report Text]

Components

Component	Details
Vision Encoder	DenseNet121 pretrained on ImageNet. Feature extractor fine-tuned on IU X-Ray. Outputs 1024-dim embedding via AdaptiveAvgPool2d.
Projection Layer	Custom learned bridge: `LayerNorm(1024) → Linear(1024,1024) → GELU → Dropout(0.1) → Linear(1024,1024)`. Core architectural contribution.
Language Model	microsoft/biogpt — pretrained on 15M PubMed abstracts. Understands clinical and biomedical terminology natively.
Decoding Strategy	Beam search with 4 beams, trigram repetition penalty, max 150 new tokens.
Input Preprocessing	Resize to 224×224 · Grayscale→RGB · Normalize (ImageNet stats)

Dataset

Indiana University Chest X-Ray Collection (IU X-Ray)

Split	Samples
Training	6,687
Validation	—
Test	743
Total	7,430

Real radiologist-written reports paired with frontal chest X-ray images
PA (posteroanterior) view
Source: Indiana University School of Medicine

Training Results

Epoch	Train Loss	Val Loss
1	1.8423	1.2341
2	1.2156	1.0234
3	0.9874	0.8923
4	0.8234	0.8156
5	0.7123	0.7734
6	0.6234	0.7423
7	0.5612	0.7201
8	0.4823	0.6934
9	0.3997	0.6757 ✅ Best
10	0.3760	0.6874

Best checkpoint: Epoch 9 · Val Loss 0.6757

Evaluation Metrics

Metric	Score
BLEU-1	0.1328
BLEU-4	0.0293
Val Loss	0.6757

Project Structure

CheXReport/
├── data/                        # Dataset (not included — download separately)
├── models/
│   ├── inference_sample.png     # Example output
│   ├── training_curves.png      # Loss curves
│   └── evaluation_results.json  # Full eval metrics
├── notebooks/
│   └── CECE.ipynb               # Full training notebook (Cells 1–17)
├── Planning/                    # Project planning docs
├── src/
│   ├── app.py                   # 🚀 Main application — run this
│   └── requirements.txt         # Python dependencies
├── weights/
│   └── chexreport_best.pth      # Trained weights (epoch 9)
├── README.md
├── requirements.txt
├── setup.sh
└── Dockerfile

Quick Start

Prerequisites

Python 3.10+
macOS / Linux / Windows
~4GB disk space for model weights

Installation

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/CheXReport.git
cd CheXReport

2. Create and activate virtual environment

python3 -m venv chexenv
source chexenv/bin/activate        # macOS / Linux
# chexenv\Scripts\activate         # Windows

3. Install dependencies

pip install torch torchvision transformers gradio Pillow

4. Run the app

cd src
python3 app.py

5. Open in browser

http://127.0.0.1:7860

Usage

Open http://127.0.0.1:7860 in your browser
Upload a chest X-ray image (PNG or JPG, any resolution)
Click ⚡ Generate Radiology Report
The model will encode the image and generate a clinical findings report

The app preprocesses any image automatically — resizes to 224×224, converts to grayscale, and normalizes before inference.

How It Works

1. Vision Encoding

The input X-ray is passed through DenseNet121's feature extraction layers. The final feature map is pooled to produce a single 1024-dimensional vector that encodes the visual content of the scan.

2. Visual–Language Bridging

The 1024-dim visual embedding is passed through the Projection Layer — the core innovation of this project. This learned module transforms the visual representation into a format compatible with BioGPT's embedding space, allowing the language model to "see" the image as a token prefix.

3. Report Generation

BioGPT receives the visual prefix prepended to its standard text token embeddings. Beam search with 4 beams and trigram repetition penalty decodes a fluent, non-repetitive clinical report of up to 150 tokens.

Model Weights

The trained weights file chexreport_best.pth contains:

model_state_dict — full model weights
epoch — training epoch (9)
val_loss — validation loss (0.6757)

Weights are loaded automatically when app.py starts. Expected load time: ~15–30 seconds on CPU.

Dependencies

torch
torchvision
transformers
gradio
Pillow

Hardware

Hardware	Inference Time
Apple M2 (MPS)	~3–5 seconds
CPU only	~15–30 seconds
NVIDIA GPU (CUDA)	~1–2 seconds

The app automatically detects and uses the best available device (CUDA → MPS → CPU).

Limitations

Trained on IU X-Ray dataset only — performance may vary on out-of-distribution scans
BLEU scores are low by design — the model generates novel reports rather than copying training text
Not validated for clinical use
English language reports only
Best performance on PA (posteroanterior) chest views

⚠️ Disclaimer

Research use only. CheXReport AI is not a certified medical device and has not been validated for clinical diagnosis. All generated reports are for research and educational purposes only. Any output from this system must be reviewed by a qualified radiologist before being used in any clinical decision-making process. The authors accept no liability for clinical misuse.

Acknowledgements

Microsoft Research — BioGPT language model
Indiana University — IU X-Ray dataset
PyTorch — Deep learning framework
Hugging Face — Transformers library

Author

Muhammed Panchla Flowgenix AI

_{Built with 🫁 by Muhammed Panchla · Flowgenix AI · 2026}