File size: 13,869 Bytes
31e2456 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 | # PAPERS.md β PhysioJEPA Reference Index
*Oz Labs β April 2026*
*Covers every paper referenced across the full conversation and all project documents.*
---
## How to use this file
Three things per entry:
1. **What to use it for** β the specific task or decision the agent needs this paper for
2. **Key numbers** β exact figures the agent must not get wrong in code or prose
3. **Location** β where to fetch the PDF
Read the tier before writing any code in that tier's domain.
Do not cite a number that isn't in this file without fetching the source first.
---
## Tier 1 β Implement from these
*Read before writing any training code. Contains exact equations, hyperparameters, architecture details.*
---
### [T1-1] Weimann & Conrad β ECG-JEPA
**arXiv**: 2410.13867 Β· `arxiv.org/pdf/2410.13867`
**Code**: `github.com/kweimann/ECG-JEPA` β fork this
**Use for**: This is the codebase we fork. Before writing any encoder code, read Section 2 (architecture), Section 3 (data), Appendix A (hyperparameters).
- Patch tokenisation: 2D over (12 leads Γ time), patch size = 25 time steps at 500 Hz
- Masking: multi-block contiguous, 50% ratio, 4 target blocks
- EMA: Ο starts 0.996, cosine-annealed to 0.9999 over training
- Loss: L1 in latent space β no pixel decoder
- ViT-S: 12 layers, d=256, 8 heads, MLP ratio=4
**Key numbers**: PTB-XL all-statements AUC **0.945** β this is Baseline A in the experiment matrix. Training time ~26h on RTX 3090.
---
### [T1-2] Assran et al. β I-JEPA
**arXiv**: 2301.08243 Β· `arxiv.org/pdf/2301.08243`
**Code**: `github.com/facebookresearch/ijepa`
**Use for**: The masking strategy foundation. Why multi-block contiguous > random masking (forces semantic prediction, not texture interpolation). The stop-gradient / EMA target encoder design justification. The predictor should be *narrower* than the encoder β this prevents shortcutting through the predictor.
**Key numbers**: ViT-H/14 ImageNet β scale reference only, not a target for us.
---
### [T1-3] Bardes et al. β V-JEPA (Revisiting Feature Prediction)
**arXiv**: 2404.08471 Β· `arxiv.org/pdf/2404.08471`
**Use for**: Spatiotemporal tube masking β how to mask contiguous blocks across both spatial and temporal axes simultaneously. Template for PPG 1D+time representation. Two-encoder EMA recipe at scale. Why predicting in latent space beats pixel reconstruction for noisy signals β core justification for JEPA over MAE.
**Key numbers**: SSv2 top-1 77.3%.
---
### [T1-4] Balestriero & LeCun β LeJEPA
**arXiv**: 2511.08544 Β· `arxiv.org/pdf/2511.08544`
**Use for**: Ablation A3 only (SIGReg). Do not implement SIGReg without reading this first.
- Theorem 1: isotropic Gaussian is the optimal JEPA embedding distribution
- SIGReg: K=128 random 1D projections w~N(0,I), KL(zΒ·w || N(0,1)) per projection, sum. O(Kd).
- Ξ» range: [0.01, 0.1]; start at 0.05
- Apply to *pooled global representation only* β not per-patch tokens
- ~50 lines of PyTorch
**Key numbers**: 79% ImageNet ViT-H/14 with only 2 loss terms.
---
### [T1-5] Kim β CroPA-ECG-JEPA
**arXiv**: 2410.08559 Β· `arxiv.org/pdf/2410.08559`
**Code**: `github.com/sehunfromdaegu/ECG_JEPA`
**Use for**: Second ECG-JEPA implementation for debugging. Cross-Pattern Attention (CroPA) = inter-lead masked attention = inspiration for cardiac phase encoding in ablation A2. Also: 1D PE for predictor vs 2D for encoders β different from Weimann, compare before finalising.
**Key numbers**: Recovers HR and QRS duration from frozen representations without supervised training β target behaviour for PTT.
---
### [T1-6] Botman et al. β Laya (LeJEPA for EEG)
**arXiv**: 2603.16281 Β· `arxiv.org/pdf/2603.16281`
**Use for**: Most direct prior to PhysioJEPA. Read before implementing ablation A3.
- SIGReg with aggressive Ξ» destabilises training on impulsive signals (QRS-like spikes in EEG)
- Mitigation: lower Ξ» (0.001β0.01), aggressive gradient clipping, apply to pooled global rep only
- Latent prediction outperforms reconstruction on EEG clinical tasks
**Key numbers**: Outperforms reconstruction baselines on EEG-Bench with 10% of pretraining data.
---
## Tier 2 β Baseline numbers and comparisons
*Read to correctly report comparison numbers. Getting baselines wrong is a rejection risk.*
---
### [T2-1] Nie et al. β AnyPPG
**arXiv**: 2511.01747 Β· `arxiv.org/pdf/2511.01747`
**Use for**: Primary contrastive baseline (Baseline C in experiment matrix).
- Exact loss: **symmetric InfoNCE** with learnable temperature Ο
- **CRITICAL: ECGFounder encoder is FROZEN during AnyPPG training.** ECG is a fixed supervisory signal. AnyPPG is not a jointly trained dual-encoder model.
- Architecture: Net1D (PPG branch), ECGFounder frozen (ECG branch)
- Trained on >100,000 hours
**Key numbers**: PPGβECG retrieval **R@1=0.736**, R@5=0.906, R@10=0.935. AF detection AUC ~0.90. Mean **9.1% AUC improvement** over non-ECG-guided baselines.
---
### [T2-2] Wagner et al. β PTB-XL
**arXiv**: 2004.13701 Β· `arxiv.org/pdf/2004.13701`
**Use for**: ECG evaluation benchmark. Task definitions, train/test/val splits, and label hierarchy. Must replicate Weimann's exact split for comparison.
**Key numbers**: Weimann ECG-JEPA AUC **0.945** all-statements = Baseline A target.
---
### [T2-3] Charlton et al. β Towards Ubiquitous BP Monitoring via PTT (review)
**URL**: `pmc.ncbi.nlm.nih.gov/articles/PMC4515215/`
**Use for**: Before writing E4 rollout coherence physiological consistency checks. PTT definition, normal range, PTTβBP and HRβPTT relationships. Per-patient calibration required for absolute BP β do not claim uncalibrated absolute BP from PTT.
**Key numbers**: Normal PTT **100β400ms** (ICU adults). Within-patient tracking ~10 mmHg MAE with calibration.
---
### [T2-4] Assran et al. β V-JEPA 2 (including V-JEPA 2-AC)
**arXiv**: 2506.09985 Β· `arxiv.org/pdf/2506.09985`
**Use for**: Architecture D future work template. Two-stage recipe: action-free pretraining β action-conditioned fine-tuning with frozen encoder.
**Key numbers**: **<62 hours** of robot interaction data for Stage 2. SSv2 top-1 77.3%.
---
## Tier 3 β Related work framing
*Read to correctly describe prior work and differentiate PhysioJEPA.*
---
### [T3-1] Sarkar & Etemad β CardioGAN
**arXiv**: 2010.00104 Β· `arxiv.org/pdf/2010.00104`
**Code**: `github.com/pritamqu/ppg2ecg-cardiogan`
**Use for**: First major cross-modal ECG-PPG paper (AAAI 2021).
- Uses **CycleGAN backbone** with attention-based generators and dual time/frequency discriminators
- **NOT reconstruction/L1, NOT InfoNCE** β adversarial + cycle consistency loss
- t=0 alignment β discards lag. Do NOT call this "pixel reconstruction."
---
### [T3-2] Liu, Wang & Wang β TSTA-Net
**PMLR**: proceedings.mlr.press/v278/liu25d.html
**Use for**: Hierarchical contrastive ECG-PPG baseline (PMLR 2025).
- **Hierarchical contrastive learning** β NOT raw InfoNCE
- 9.3% higher AF F1 vs prior SSL methods
- Still t=0 aligned
---
### [T3-3] Fang et al. β PPGFlowECG
**arXiv**: 2509.19774 Β· `arxiv.org/pdf/2509.19774`
**Use for**: Two-stage generative translation baseline.
- Stage 1: **InfoNCE instance alignment** (CardioAlign encoder, shared weights)
- Stage 2: **rectified flow** generation from aligned latents
- Figure 1 explicitly shows ECG precedes PPG temporally but the architecture does not exploit this
- Do NOT describe as "rectified flow only" β InfoNCE is in Stage 1
---
### [T3-4] Dong et al. β Brain-JEPA (NeurIPS 2024 Spotlight)
**arXiv**: 2409.19407 Β· `arxiv.org/pdf/2409.19407`
**Code**: `github.com/hzlab/2024_Dong_Li_NeurIPS_Brain-JEPA`
**Use for**: Cardiac phase encoding inspiration (ablation A2). Brain Gradient Positioning β our cardiac phase PE. Hard phase boundaries fail during AF β use soft Gaussian encoding over cardiac landmarks.
**Key numbers**: NeurIPS 2024 Spotlight. UK Biobank 40k patients.
---
### [T3-5] Hojjati et al. β EEG-VJEPA
**arXiv**: 2507.03633 Β· `arxiv.org/pdf/2507.03633`
**Code**: `github.com/amir-hojjati/eeg-vjepa`
**Use for**: V-JEPA adapted to 1D physiological signal β most direct predecessor. How to reshape multi-channel 1D signal into 3D tensor treated as "video." UMAP showing pathological clustering without labels.
**Key numbers**: TUH fine-tuned accuracy **85.8%**, AUROC **88.5%**. Frozen probe 83.3%.
---
### [T3-6] Munim et al. β EchoJEPA
**arXiv**: 2602.02603 Β· `arxiv.org/pdf/2602.02603`
**Use for**: Strongest empirical evidence that JEPA > MAE for noisy medical signals. Use in intro to justify JEPA over MAE.
**Key numbers**: JEPA degrades **2%** under perturbation vs **17%** for VideoMAE. **79%** accuracy at 1% labels. 20% LVEF improvement.
---
### [T3-7] Wu, Lei et al. β SurgMotion
**arXiv**: 2602.05638 Β· `arxiv.org/pdf/2602.05638`
**Use for**: One-sentence citation alongside EchoJEPA: "JEPA's noise rejection under clinical signal artifacts has been validated in echocardiography [EchoJEPA] and surgical video [SurgMotion]."
---
### [T3-8] LeCun β A Path Towards Autonomous Machine Intelligence (JEPA position paper)
**URL**: `openreview.net/pdf?id=BZ5a1r-kVsf`
**Use for**: One intro citation: "A world model should predict consequences of actions in abstract representation space [LeCun 2022]."
---
### [T3-9] Abbaspourazad et al. β Apple Heart Study Foundation Model
**arXiv**: 2312.05409 Β· `arxiv.org/pdf/2312.05409`
**Published**: ICLR 2024
**Use for**: Prior art on wearable-scale PPG+ECG foundation models. InfoNCE + KoLeo, participant-level positives, Apple Watch data. Shows ECG more discriminative than PPG β context for why cross-modal training helps PPG.
---
## Tier 4 β Evaluation methodology and datasets
*Read when writing the evaluation harness code.*
---
### [T4-1] Pimentel et al. β BIDMC PPG and Respiration Dataset
**PhysioNet**: `physionet.org/content/bidmc/1.0.0/`
**Use for**: Fallback dataset if E0 fails.
- WFDB format, **53 recordings Γ 8 min**, **125 Hz**
- Signals: **Lead II ECG + fingertip PPG** + impedance respiration
- Labels: HR, RR, SpO2 β **no AF labels** (use for HR probe only)
**Key numbers**: **53 patients**, ~7 hours total, **125 Hz**.
---
### [T4-2] Moody et al. β MIMIC-IV Waveform Database
**PhysioNet**: `physionet.org/content/mimic4wdb/0.1.0/`
**Use for**: Understanding HuggingFace mirror provenance.
- v0.1.0: **200 records from 198 patients**; upcoming release ~10,000 records
- MIMIC-IV-ECG module: **~800k ECGs across ~160k patients**, 500 Hz, 10s, 12-lead β AF label source candidate
---
### [T4-3] Kachuee et al. β Cuffless BP Estimation Dataset (UCI)
**UCI**: `archive.ics.uci.edu/dataset/340`
**Use for**: E5a PTT probe evaluation.
- 12,000 records, 942 patients β **patient ID removed** β population-level evaluation only
- PPG + ABP at 125 Hz, derived from MIMIC-II
**Key numbers**: AAMI standard β€5 mmHg mean Β± 8 mmHg SD.
---
### [T4-4] Goldberger et al. β PhysioBank, PhysioToolkit, PhysioNet
**DOI**: 10.1161/01.CIR.101.23.e215
**Use for**: Required citation whenever using BIDMC, MIMIC waveforms, or any PhysioNet dataset. One line in methods: "Data obtained from PhysioNet [Goldberger et al., 2000]."
---
## Tier 5 β Context and intellectual lineage
*Do not read these to implement anything. One citation each.*
---
### [T5-1] Ha & Schmidhuber β World Models
**arXiv**: 1803.10122
**Use for**: Intro citation only. "World models learn a compressed latent representation and a transition function [Ha & Schmidhuber, 2018]."
---
### [T5-2] Bardes et al. β VICReg
**arXiv**: 2105.04906
**Use for**: Related work only. "VICReg requires hand-crafted augmentations that JEPA avoids."
---
### [T5-3] Ronan et al. β VICReg for Brugada ECG Detection
**DOI**: 10.1038/s41598-025-94130-x
**Use for**: One sentence. "VICReg-based SSL has been applied to ECG classification [Ronan et al., 2025] but requires augmentation engineering."
---
### [T5-4] Johnson et al. β MIMIC-IV (clinical database paper)
**DOI**: 10.1038/s41597-022-01899-x
**Use for**: Required data citation whenever using MIMIC-IV derived data. "MIMIC-IV [Johnson et al., 2023], a freely accessible EHR database."
---
### [T5-5] CLIMB multimodal clinical benchmark
**arXiv**: 2503.07667
**Use for**: ECG-JEPA performance in multimodal settings. "ECG-JEPA outperforms general time-series models like UniTS by 36.8% on ECG tasks [CLIMB, 2025]." One citation in intro.
---
## Quick reference: numbers the agent must not get wrong
| Claim | Correct value | Source |
|-------|--------------|--------|
| ECG-JEPA PTB-XL AUC | **0.945** all-statements | T1-1 Weimann |
| AnyPPG PPGβECG R@1 | **0.736** | T2-1 Nie |
| AnyPPG AUC improvement | **9.1%** over non-ECG baselines | T2-1 Nie |
| AnyPPG ECGFounder | **FROZEN** during training | T2-1 Nie |
| EchoJEPA JEPA perturbation | **2%** degradation | T3-6 Munim |
| EchoJEPA MAE perturbation | **17%** degradation | T3-6 Munim |
| EchoJEPA 1% label accuracy | **79%** | T3-6 Munim |
| Normal PTT range (ICU) | **100β400ms** | T2-3 Charlton |
| BIDMC size | **53 recordings Γ 8 min @ 125 Hz** | T4-1 Pimentel |
| V-JEPA 2-AC interaction data | **<62 hours** | T2-4 Assran |
| EEG-VJEPA TUH AUROC | **88.5%** fine-tuned | T3-5 Hojjati |
| CardioGAN objective | **CycleGAN adversarial** β not reconstruction | T3-1 Sarkar |
| TSTA-Net objective | **Hierarchical contrastive** β not raw InfoNCE | T3-2 Liu |
| PPGFlowECG Stage 1 | **InfoNCE alignment**, then rectified flow | T3-3 Fang |
| BP calibration requirement | **Per-patient calibration required** for absolute values | T2-3 Charlton |
---
## File locations in repo
```
docs/papers/*.pdf
```
---
*This is the complete reference index. Fetch from arXiv if a PDF is missing. Never cite a number not in this file without verifying the source first.* |