AMDRisk / README.md
Hou
add src
a7c73c5
|
Raw
History Blame Contribute Delete
2.9 kB
---
title: AMDRisk
emoji: 📉
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Predicting risk of late AMD using Deep Learning
---
# AMD Risk Prediction
PyTorch reimplementation of the DeepSeeNet-based AMD progression risk framework from Peng et al., *npj Digital Medicine* 2020, **“Predicting risk of late age-related macular degeneration using deep learning.”**
- Extracts DeepSeeNet hidden features from drusen and pigment abnormality models.
- Fits Cox proportional hazards models using image-derived features plus age and smoking status.
## Feature Extraction
- Input: paired baseline CFPs
- `LE_PATHNAME`
- `RE_PATHNAME`
- Models used:
- `deepseenet/weights/drus.pt`
- `deepseenet/weights/pig.pt`
- Image preprocessing:
- RGB conversion
- validation transform from `deepseenet/augmentations.py`
- default input size: `1024 × 1024`
- Feature source:
- penultimate layer of each DeepSeeNet classifier
- final linear layer input captured by forward hook
- Feature layout:
```text
LE_DRUS_000 ... LE_DRUS_127
RE_DRUS_000 ... RE_DRUS_127
LE_PIG_000 ... LE_PIG_127
RE_PIG_000 ... RE_PIG_127
````
- Total feature dimension:
```text
128 × 2 models × 2 eyes = 512 features / patient
```
- Output:
```text
data/areds1_deepseenet_features.npz
```
- Stored arrays:
- `features`: `(N, 512)`
- `patids`: `(N,)`
- `feature_names`: `(512,)`
## Cox Model Training
- Survival labels from endpoint-specific JSON files:
- `Status_late_amd`
- `Status_anyga`
- `Status_nv`
- Time-to-event column:
- `Survival_in_years`
- Predefined fold split:
- train: folds `3, 4, 5, 6, 7, 8, 9`
- validation: fold `2`
- test: folds `0, 1`
- Stage 1 baseline:
- structured grading features only
```text
LE_DRUS
RE_DRUS
LE_PIG
RE_PIG
age
smkever
````
* Stage 2 DeepSeeNet features:
* selected hidden features from 512-dimensional feature vector
* plus `age`
* plus `smkever`
* Preprocessing:
* low-variance feature filtering
* train-only `StandardScaler`
* same scaler applied to validation/test
* Cox model:
* `lifelines.CoxPHFitter`
* L2 penalization
* default penalizer: `0.01`
* Feature-selection tweaks:
* global top-k selection
* rank features by univariate train-set concordance
* examples: `--top-k 8`, `--top-k 16`
* block-balanced top-k selection
* select top-k features separately from each block:
```text
LE_DRUS_*
RE_DRUS_*
LE_PIG_*
RE_PIG_*
```
* Block-balanced example:
```text
--top-k-per-block 4
4 LE_DRUS features
4 RE_DRUS features
4 LE_PIG features
4 RE_PIG features
+ age
+ smkever
= 18 total features
```
* Rationale:
* avoids one highly correlated feature block dominating global top-k
* improves stability of Cox fitting
* closer in spirit to grouped/correlation-aware feature selection