mammogram-analyzer / README.md
tampee's picture
feat: add Dockerfile and HF Spaces config for deployment
f192d59
---
title: Mammogram Analyzer
emoji: πŸ₯
colorFrom: pink
colorTo: red
sdk: docker
app_port: 7860
---
# Mammogram Inference Service
A FastAPI microservice that runs mammographic image analysis using the **SensiNet** dual-stream deep learning model. It accepts a public image URL or a direct file upload, runs Bayesian MC-Dropout inference, and returns a BI-RADS classification with confidence and malignancy probability scores.
This service is the AI backend for the [Blossom](../README.md) clinical radiology platform.
---
## Authorship and Attribution
### This service (the wrapper)
The FastAPI service, inference pipeline, training script, and Blossom integration code in this repository were written by the **Blossom team**. They are original works that integrate the SensiNet model.
### The AI model (SensiNet)
The neural network architecture (`app/architecture.py`) and pretrained weights (`weights/advanced_model_best.pth`) are the work of **Aredeksu** and the SensiNet-Mammography project.
> **Original model:** [Aredeksu/SensiNet-Mammography](https://huggingface.co/Aredeksu/SensiNet-Mammography) on Hugging Face
> **License:** Apache License 2.0
> **Trained on:** CBIS-DDSM mammography dataset
We are **integrators and licensees** of this model β€” not its authors or joint authors. Full credit for the model architecture, training methodology, and pretrained weights belongs to the original authors.
In compliance with the Apache 2.0 license:
- The original architecture is reproduced with attribution in `app/architecture.py`
- Modifications made by us: added the FastAPI wrapper, Bayesian MC-Dropout inference loop, BI-RADS mapping logic, and the training script for fine-tuning
- A copy of the Apache 2.0 license is included in `LICENSE`
### Training data
The model was trained on the **CBIS-DDSM** (Curated Breast Imaging Subset of DDSM) dataset, a publicly available mammography benchmark dataset from The Cancer Imaging Archive (TCIA).
> Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin DL. (2017). *A curated mammography data set for use in computer-aided detection and diagnosis research.* Scientific Data, 4, 170177. https://doi.org/10.1038/sdata.2017.177
---
## Architecture (SensiNet)
```
Input Image (299Γ—299 RGB)
|
β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
β–Ό β–Ό
Xception EfficientNet-B3
(2048ch) (1536ch)
β”‚ β”‚
β–Ό β–Ό
Proj→512 Proj→512
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
β–Ό
Concat (1024ch)
β–Ό
CBAM Attention
(channel + spatial)
β–Ό
GlobalAvgPool β†’ Linear(1024β†’512) β†’ BN β†’ ReLU β†’ Dropout(0.5) β†’ Linear(512β†’1)
β–Ό
Sigmoid β†’ malignancy probability β†’ BI-RADS 1–5
```
Inference uses **Bayesian MC-Dropout** (10 stochastic forward passes) to estimate both the mean malignancy probability and prediction variance, which informs the confidence score.
---
## Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Service health check, model mode, version |
| `POST` | `/predict` | Multipart image upload β†’ inference |
| `POST` | `/analyze` | JSON `{ image_url }` β†’ download β†’ inference |
### `/analyze` request
```json
{ "image_url": "https://your-storage.supabase.co/storage/v1/object/public/mammograms/..." }
```
### Response shape
```json
{
"birads": 3,
"confidence": 0.82,
"malignancy_probability": 0.31,
"findings_text": "Model prediction: Benign (probability 31.0%). Probably benign appearance. Short-interval follow-up may be considered.",
"model_version": "sensinet-v1"
}
```
BI-RADS mapping:
| Probability | BI-RADS | Interpretation |
|-------------|---------|---------------|
| < 10% | 1 | Negative |
| 10–24% | 2 | Benign |
| 25–49% | 3 | Probably benign |
| 50–74% | 4 | Suspicious |
| β‰₯ 75% | 5 | Highly suggestive of malignancy |
---
## Setup
```bash
cd mammogram-inference-service
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
### Model weights
Download the pretrained weights from [Aredeksu/SensiNet-Mammography](https://huggingface.co/Aredeksu/SensiNet-Mammography) on Hugging Face and place the file at:
```
weights/advanced_model_best.pth
```
If the weights file is absent, the service automatically falls back to **mock mode** β€” a deterministic pixel-statistics-based predictor that returns consistent (but not clinically meaningful) results. Useful for UI development without a GPU.
### Environment variables
| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL_MODE` | `real` | Set to `mock` to force mock mode |
| `MODEL_VERSION` | `sensinet-v1` | Version string returned in responses |
| `MODEL_WEIGHTS` | `weights/advanced_model_best.pth` | Path to weights file |
| `ALLOWED_IMAGE_HOSTS` | _(empty = allow all HTTPS)_ | Comma-separated allowlist of image hostnames (SSRF protection) |
---
## Run
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
- Swagger UI: http://localhost:8000/docs
- Health check: http://localhost:8000/health
In your Blossom `.env.local`:
```env
GCLOUD_MODEL_ENDPOINT=http://localhost:8000
```
---
## Training your own weights
If you have a local copy of the CBIS-DDSM dataset, you can fine-tune the model:
```bash
# Step 1: organise images into train/val folders
python prepare_data.py \
--images /path/to/raw/images \
--csv /path/to/labels.csv \
--output data
# Step 2: train (two-phase: frozen backbones β†’ full fine-tune)
python train.py --data data --output weights/advanced_model_best.pth
```
The training script uses the same `AdvancedBreastCancerModel` architecture as the original SensiNet. Phase 1 trains only the projection layers and classifier head with frozen backbones (20 epochs). Phase 2 fine-tunes all layers at a lower learning rate (50 epochs). Best checkpoint is saved automatically.
---
## Important notice
This service is intended for **research and development use only**. It has not been validated for clinical decision-making. Outputs must not be used to diagnose, treat, or manage patients without appropriate clinical oversight and regulatory approval. The BI-RADS scores produced are AI-generated estimates and do not replace radiologist interpretation.
---
## License
The **wrapper code** in this repository (FastAPI service, training scripts, integration layer) is original work by the Blossom team.
The **SensiNet model architecture and weights** are used under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Copyright belongs to the original authors at [Aredeksu/SensiNet-Mammography](https://huggingface.co/Aredeksu/SensiNet-Mammography).