---
license: mit
tags:
  - ecg
  - ppg
  - jepa
  - self-supervised
  - cardiac
  - representation-learning
datasets:
  - lucky9-cyou/mimic-iv-aligned-ppg-ecg
  - physionet/ptb-xl
---

# PhysioJEPA

**Self-supervised ECG-PPG representation learning via Joint Embedding Predictive Architecture.**

## Key finding: mask ratio is the hidden lever

We discovered that unimodal ECG-JEPA (Weimann & Conrad, 2024) has a
**predictor shortcut vulnerability** at the standard 50% mask ratio: the
predictor learns local-interpolation shortcuts that degrade downstream
performance as training progresses.

**Raising mask ratio from 50% to 75% eliminates the shortcut** and recovers
full downstream performance, matching cross-modal JEPA:

| Model | Mask | AF AUROC (ep25) |
|-------|------|-----------------|
| Unimodal ECG-JEPA | 0.50 | 0.703 |
| **Unimodal ECG-JEPA** | **0.75** | **0.848** |
| Cross-modal ECG-PPG JEPA | -- | 0.847 |
| PhysioJEPA (cross-modal + Δt) | -- | 0.835 |

The mechanism: at 50% masking with contiguous blocks, the predictor has 25
visible context patches and 25 target patches. It discovers a short-range
interpolation shortcut early in training (L_self dips at step ~1500). As
the encoder refines and patches become less linearly interpolatable, the
shortcut fails (L_self spikes at step ~4675). The encoder locks into a
self-consistent but downstream-uninformative optimum.

At 75% masking (12 visible, 37 target), no interpolation path exists. The
predictor learns long-range structure from the start.

Cross-modal prediction works by the same mechanism: 0% of PPG is visible
as context, so no interpolation shortcut can form.

## Confirmed by 5 ablation arms

1. **Slow tau** (ema_end=0.999, warmup=60%): spike persists -> tau is NOT the cause
2. **Smaller predictor** (depth 4->1): spike persists -> capacity is NOT the cause
3. **Sinusoidal queries** (no learned embeddings): spike WORSENS
4. **Mask ratio 0.75**: spike ELIMINATED, AUROC recovers to 0.848
5. **Full data** (10x): spike delayed but present -> architectural, not data-scale

## Architecture

- ECG encoder: ViT-S (12 layers, d=256, 8 heads) on single-lead II @ 250 Hz
- PPG encoder: ViT-T (6 layers, d=256) on Pleth @ 125 Hz
- Predictor: 4-layer cross-attention transformer
- EMA target encoder (tau 0.996 -> 0.9999 cosine over 30% of training)
- Loss: L1 latent prediction (cross-modal) + 0.3 * L1 ECG self-prediction

## Dataset

Training: [lucky9-cyou/mimic-iv-aligned-ppg-ecg](https://huggingface.co/datasets/lucky9-cyou/mimic-iv-aligned-ppg-ecg)
(MIMIC-IV ICU waveforms, ~814 hours, ~381 patients, sample-accurate ECG-PPG alignment)

Evaluation: PTB-XL (PhysioNet, 21.8k 12-lead ECGs, lead II resampled to 250 Hz)

## Usage

```bash
# Install
git clone https://huggingface.co/guychuk/PhysioJEPA
cd PhysioJEPA
uv sync

# Smoke test (CPU, random data)
PYTHONPATH=src uv run python scripts/smoke_test.py

# Train (requires GPU + MIMIC data)
PYTHONPATH=src uv run python scripts/train.py --config configs/base.yaml --model A --mask_ratio 0.75
```

## Repository structure

```
src/physiojepa/
  models.py      # 4 model variants (A=unimodal, B=cross-modal, C=InfoNCE, F=PhysioJEPA)
  vit.py         # ViT-1D encoder + cross-attention predictor
  data.py        # MIMIC dataset with sliding windows
  data_fast.py   # mmap-backed fast dataset for full-scale runs
  trainer.py     # shared training loop with WandB + collapse monitoring
  ema.py         # EMA with cosine tau schedule
  masking.py     # I-JEPA multi-block 1D masking
  probe.py       # linear probe evaluators
configs/
  base.yaml      # shared hyperparameters
docs/
  RESEARCH_LOG.md          # complete research narrative
  e2_e3_results.md         # K-gate results + ablation findings
  EXPERIMENT_TRACKING.md   # experiment matrix + post-hoc results
  RESEARCH_DEVELOPMENT.md  # full research development document
```

## Citation

```
@misc{physiojepa2026,
  title={PhysioJEPA: Mask Ratio as the Hidden Lever in Cardiac JEPA},
  author={Oz Labs},
  year={2026},
  url={https://huggingface.co/guychuk/PhysioJEPA}
}
```

## License

MIT