---
title: AMDRisk
emoji: 📉
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 6.14.0
python_version: '3.13'
app_file: app.py
pinned: false
license: mit
short_description: Predicting risk of late AMD using Deep Learning
---

# AMD Risk Prediction

PyTorch reimplementation of the DeepSeeNet-based AMD progression risk framework from Peng et al., *npj Digital Medicine* 2020, **“Predicting risk of late age-related macular degeneration using deep learning.”**

- Extracts DeepSeeNet hidden features from drusen and pigment abnormality models.
- Fits Cox proportional hazards models using image-derived features plus age and smoking status.

## Feature Extraction

- Input: paired baseline CFPs
  - `LE_PATHNAME`
  - `RE_PATHNAME`
- Models used:
  - `deepseenet/weights/drus.pt`
  - `deepseenet/weights/pig.pt`
- Image preprocessing:
  - RGB conversion
  - validation transform from `deepseenet/augmentations.py`
  - default input size: `1024 × 1024`
- Feature source:
  - penultimate layer of each DeepSeeNet classifier
  - final linear layer input captured by forward hook

- Feature layout:
```text
LE_DRUS_000 ... LE_DRUS_127
RE_DRUS_000 ... RE_DRUS_127
LE_PIG_000  ... LE_PIG_127
RE_PIG_000  ... RE_PIG_127
````

- Total feature dimension:
```text
128 × 2 models × 2 eyes = 512 features / patient
```

- Output:
```text
data/areds1_deepseenet_features.npz
```

- Stored arrays:
  - `features`: `(N, 512)`
  - `patids`: `(N,)`
  - `feature_names`: `(512,)`


## Cox Model Training

- Survival labels from endpoint-specific JSON files:
  - `Status_late_amd`
  - `Status_anyga`
  - `Status_nv`

- Time-to-event column:
  - `Survival_in_years`

- Predefined fold split:
  - train: folds `3, 4, 5, 6, 7, 8, 9`
  - validation: fold `2`
  - test: folds `0, 1`

- Stage 1 baseline:
  - structured grading features only

```text
LE_DRUS
RE_DRUS
LE_PIG
RE_PIG
age
smkever
````

* Stage 2 DeepSeeNet features:

  * selected hidden features from 512-dimensional feature vector
  * plus `age`
  * plus `smkever`

* Preprocessing:

  * low-variance feature filtering
  * train-only `StandardScaler`
  * same scaler applied to validation/test

* Cox model:

  * `lifelines.CoxPHFitter`
  * L2 penalization
  * default penalizer: `0.01`

* Feature-selection tweaks:

  * global top-k selection

    * rank features by univariate train-set concordance
    * examples: `--top-k 8`, `--top-k 16`
  * block-balanced top-k selection

    * select top-k features separately from each block:

```text
LE_DRUS_*
RE_DRUS_*
LE_PIG_*
RE_PIG_*
```

* Block-balanced example:

```text
--top-k-per-block 4

4 LE_DRUS features
4 RE_DRUS features
4 LE_PIG features
4 RE_PIG features
+ age
+ smkever
= 18 total features
```

* Rationale:

  * avoids one highly correlated feature block dominating global top-k
  * improves stability of Cox fitting
  * closer in spirit to grouped/correlation-aware feature selection