| # PAPERS.md β PhysioJEPA Reference Index |
| *Oz Labs β April 2026* |
| *Covers every paper referenced across the full conversation and all project documents.* |
|
|
| --- |
|
|
| ## How to use this file |
|
|
| Three things per entry: |
| 1. **What to use it for** β the specific task or decision the agent needs this paper for |
| 2. **Key numbers** β exact figures the agent must not get wrong in code or prose |
| 3. **Location** β where to fetch the PDF |
|
|
| Read the tier before writing any code in that tier's domain. |
| Do not cite a number that isn't in this file without fetching the source first. |
|
|
| --- |
|
|
| ## Tier 1 β Implement from these |
| *Read before writing any training code. Contains exact equations, hyperparameters, architecture details.* |
|
|
| --- |
|
|
| ### [T1-1] Weimann & Conrad β ECG-JEPA |
| **arXiv**: 2410.13867 Β· `arxiv.org/pdf/2410.13867` |
| **Code**: `github.com/kweimann/ECG-JEPA` β fork this |
|
|
| **Use for**: This is the codebase we fork. Before writing any encoder code, read Section 2 (architecture), Section 3 (data), Appendix A (hyperparameters). |
| - Patch tokenisation: 2D over (12 leads Γ time), patch size = 25 time steps at 500 Hz |
| - Masking: multi-block contiguous, 50% ratio, 4 target blocks |
| - EMA: Ο starts 0.996, cosine-annealed to 0.9999 over training |
| - Loss: L1 in latent space β no pixel decoder |
| - ViT-S: 12 layers, d=256, 8 heads, MLP ratio=4 |
|
|
| **Key numbers**: PTB-XL all-statements AUC **0.945** β this is Baseline A in the experiment matrix. Training time ~26h on RTX 3090. |
|
|
| --- |
|
|
| ### [T1-2] Assran et al. β I-JEPA |
| **arXiv**: 2301.08243 Β· `arxiv.org/pdf/2301.08243` |
| **Code**: `github.com/facebookresearch/ijepa` |
|
|
| **Use for**: The masking strategy foundation. Why multi-block contiguous > random masking (forces semantic prediction, not texture interpolation). The stop-gradient / EMA target encoder design justification. The predictor should be *narrower* than the encoder β this prevents shortcutting through the predictor. |
|
|
| **Key numbers**: ViT-H/14 ImageNet β scale reference only, not a target for us. |
|
|
| --- |
|
|
| ### [T1-3] Bardes et al. β V-JEPA (Revisiting Feature Prediction) |
| **arXiv**: 2404.08471 Β· `arxiv.org/pdf/2404.08471` |
|
|
| **Use for**: Spatiotemporal tube masking β how to mask contiguous blocks across both spatial and temporal axes simultaneously. Template for PPG 1D+time representation. Two-encoder EMA recipe at scale. Why predicting in latent space beats pixel reconstruction for noisy signals β core justification for JEPA over MAE. |
|
|
| **Key numbers**: SSv2 top-1 77.3%. |
|
|
| --- |
|
|
| ### [T1-4] Balestriero & LeCun β LeJEPA |
| **arXiv**: 2511.08544 Β· `arxiv.org/pdf/2511.08544` |
|
|
| **Use for**: Ablation A3 only (SIGReg). Do not implement SIGReg without reading this first. |
| - Theorem 1: isotropic Gaussian is the optimal JEPA embedding distribution |
| - SIGReg: K=128 random 1D projections w~N(0,I), KL(zΒ·w || N(0,1)) per projection, sum. O(Kd). |
| - Ξ» range: [0.01, 0.1]; start at 0.05 |
| - Apply to *pooled global representation only* β not per-patch tokens |
| - ~50 lines of PyTorch |
|
|
| **Key numbers**: 79% ImageNet ViT-H/14 with only 2 loss terms. |
|
|
| --- |
|
|
| ### [T1-5] Kim β CroPA-ECG-JEPA |
| **arXiv**: 2410.08559 Β· `arxiv.org/pdf/2410.08559` |
| **Code**: `github.com/sehunfromdaegu/ECG_JEPA` |
|
|
| **Use for**: Second ECG-JEPA implementation for debugging. Cross-Pattern Attention (CroPA) = inter-lead masked attention = inspiration for cardiac phase encoding in ablation A2. Also: 1D PE for predictor vs 2D for encoders β different from Weimann, compare before finalising. |
|
|
| **Key numbers**: Recovers HR and QRS duration from frozen representations without supervised training β target behaviour for PTT. |
|
|
| --- |
|
|
| ### [T1-6] Botman et al. β Laya (LeJEPA for EEG) |
| **arXiv**: 2603.16281 Β· `arxiv.org/pdf/2603.16281` |
|
|
| **Use for**: Most direct prior to PhysioJEPA. Read before implementing ablation A3. |
| - SIGReg with aggressive Ξ» destabilises training on impulsive signals (QRS-like spikes in EEG) |
| - Mitigation: lower Ξ» (0.001β0.01), aggressive gradient clipping, apply to pooled global rep only |
| - Latent prediction outperforms reconstruction on EEG clinical tasks |
|
|
| **Key numbers**: Outperforms reconstruction baselines on EEG-Bench with 10% of pretraining data. |
|
|
| --- |
|
|
| ## Tier 2 β Baseline numbers and comparisons |
| *Read to correctly report comparison numbers. Getting baselines wrong is a rejection risk.* |
|
|
| --- |
|
|
| ### [T2-1] Nie et al. β AnyPPG |
| **arXiv**: 2511.01747 Β· `arxiv.org/pdf/2511.01747` |
|
|
| **Use for**: Primary contrastive baseline (Baseline C in experiment matrix). |
| - Exact loss: **symmetric InfoNCE** with learnable temperature Ο |
| - **CRITICAL: ECGFounder encoder is FROZEN during AnyPPG training.** ECG is a fixed supervisory signal. AnyPPG is not a jointly trained dual-encoder model. |
| - Architecture: Net1D (PPG branch), ECGFounder frozen (ECG branch) |
| - Trained on >100,000 hours |
|
|
| **Key numbers**: PPGβECG retrieval **R@1=0.736**, R@5=0.906, R@10=0.935. AF detection AUC ~0.90. Mean **9.1% AUC improvement** over non-ECG-guided baselines. |
|
|
| --- |
|
|
| ### [T2-2] Wagner et al. β PTB-XL |
| **arXiv**: 2004.13701 Β· `arxiv.org/pdf/2004.13701` |
|
|
| **Use for**: ECG evaluation benchmark. Task definitions, train/test/val splits, and label hierarchy. Must replicate Weimann's exact split for comparison. |
|
|
| **Key numbers**: Weimann ECG-JEPA AUC **0.945** all-statements = Baseline A target. |
|
|
| --- |
|
|
| ### [T2-3] Charlton et al. β Towards Ubiquitous BP Monitoring via PTT (review) |
| **URL**: `pmc.ncbi.nlm.nih.gov/articles/PMC4515215/` |
|
|
| **Use for**: Before writing E4 rollout coherence physiological consistency checks. PTT definition, normal range, PTTβBP and HRβPTT relationships. Per-patient calibration required for absolute BP β do not claim uncalibrated absolute BP from PTT. |
|
|
| **Key numbers**: Normal PTT **100β400ms** (ICU adults). Within-patient tracking ~10 mmHg MAE with calibration. |
|
|
| --- |
|
|
| ### [T2-4] Assran et al. β V-JEPA 2 (including V-JEPA 2-AC) |
| **arXiv**: 2506.09985 Β· `arxiv.org/pdf/2506.09985` |
|
|
| **Use for**: Architecture D future work template. Two-stage recipe: action-free pretraining β action-conditioned fine-tuning with frozen encoder. |
|
|
| **Key numbers**: **<62 hours** of robot interaction data for Stage 2. SSv2 top-1 77.3%. |
|
|
| --- |
|
|
| ## Tier 3 β Related work framing |
| *Read to correctly describe prior work and differentiate PhysioJEPA.* |
|
|
| --- |
|
|
| ### [T3-1] Sarkar & Etemad β CardioGAN |
| **arXiv**: 2010.00104 Β· `arxiv.org/pdf/2010.00104` |
| **Code**: `github.com/pritamqu/ppg2ecg-cardiogan` |
|
|
| **Use for**: First major cross-modal ECG-PPG paper (AAAI 2021). |
| - Uses **CycleGAN backbone** with attention-based generators and dual time/frequency discriminators |
| - **NOT reconstruction/L1, NOT InfoNCE** β adversarial + cycle consistency loss |
| - t=0 alignment β discards lag. Do NOT call this "pixel reconstruction." |
|
|
| --- |
|
|
| ### [T3-2] Liu, Wang & Wang β TSTA-Net |
| **PMLR**: proceedings.mlr.press/v278/liu25d.html |
|
|
| **Use for**: Hierarchical contrastive ECG-PPG baseline (PMLR 2025). |
| - **Hierarchical contrastive learning** β NOT raw InfoNCE |
| - 9.3% higher AF F1 vs prior SSL methods |
| - Still t=0 aligned |
|
|
| --- |
|
|
| ### [T3-3] Fang et al. β PPGFlowECG |
| **arXiv**: 2509.19774 Β· `arxiv.org/pdf/2509.19774` |
|
|
| **Use for**: Two-stage generative translation baseline. |
| - Stage 1: **InfoNCE instance alignment** (CardioAlign encoder, shared weights) |
| - Stage 2: **rectified flow** generation from aligned latents |
| - Figure 1 explicitly shows ECG precedes PPG temporally but the architecture does not exploit this |
| - Do NOT describe as "rectified flow only" β InfoNCE is in Stage 1 |
|
|
| --- |
|
|
| ### [T3-4] Dong et al. β Brain-JEPA (NeurIPS 2024 Spotlight) |
| **arXiv**: 2409.19407 Β· `arxiv.org/pdf/2409.19407` |
| **Code**: `github.com/hzlab/2024_Dong_Li_NeurIPS_Brain-JEPA` |
|
|
| **Use for**: Cardiac phase encoding inspiration (ablation A2). Brain Gradient Positioning β our cardiac phase PE. Hard phase boundaries fail during AF β use soft Gaussian encoding over cardiac landmarks. |
|
|
| **Key numbers**: NeurIPS 2024 Spotlight. UK Biobank 40k patients. |
|
|
| --- |
|
|
| ### [T3-5] Hojjati et al. β EEG-VJEPA |
| **arXiv**: 2507.03633 Β· `arxiv.org/pdf/2507.03633` |
| **Code**: `github.com/amir-hojjati/eeg-vjepa` |
|
|
| **Use for**: V-JEPA adapted to 1D physiological signal β most direct predecessor. How to reshape multi-channel 1D signal into 3D tensor treated as "video." UMAP showing pathological clustering without labels. |
|
|
| **Key numbers**: TUH fine-tuned accuracy **85.8%**, AUROC **88.5%**. Frozen probe 83.3%. |
|
|
| --- |
|
|
| ### [T3-6] Munim et al. β EchoJEPA |
| **arXiv**: 2602.02603 Β· `arxiv.org/pdf/2602.02603` |
|
|
| **Use for**: Strongest empirical evidence that JEPA > MAE for noisy medical signals. Use in intro to justify JEPA over MAE. |
|
|
| **Key numbers**: JEPA degrades **2%** under perturbation vs **17%** for VideoMAE. **79%** accuracy at 1% labels. 20% LVEF improvement. |
|
|
| --- |
|
|
| ### [T3-7] Wu, Lei et al. β SurgMotion |
| **arXiv**: 2602.05638 Β· `arxiv.org/pdf/2602.05638` |
|
|
| **Use for**: One-sentence citation alongside EchoJEPA: "JEPA's noise rejection under clinical signal artifacts has been validated in echocardiography [EchoJEPA] and surgical video [SurgMotion]." |
|
|
| --- |
|
|
| ### [T3-8] LeCun β A Path Towards Autonomous Machine Intelligence (JEPA position paper) |
| **URL**: `openreview.net/pdf?id=BZ5a1r-kVsf` |
|
|
| **Use for**: One intro citation: "A world model should predict consequences of actions in abstract representation space [LeCun 2022]." |
|
|
| --- |
|
|
| ### [T3-9] Abbaspourazad et al. β Apple Heart Study Foundation Model |
| **arXiv**: 2312.05409 Β· `arxiv.org/pdf/2312.05409` |
| **Published**: ICLR 2024 |
|
|
| **Use for**: Prior art on wearable-scale PPG+ECG foundation models. InfoNCE + KoLeo, participant-level positives, Apple Watch data. Shows ECG more discriminative than PPG β context for why cross-modal training helps PPG. |
|
|
| --- |
|
|
| ## Tier 4 β Evaluation methodology and datasets |
| *Read when writing the evaluation harness code.* |
|
|
| --- |
|
|
| ### [T4-1] Pimentel et al. β BIDMC PPG and Respiration Dataset |
| **PhysioNet**: `physionet.org/content/bidmc/1.0.0/` |
|
|
| **Use for**: Fallback dataset if E0 fails. |
| - WFDB format, **53 recordings Γ 8 min**, **125 Hz** |
| - Signals: **Lead II ECG + fingertip PPG** + impedance respiration |
| - Labels: HR, RR, SpO2 β **no AF labels** (use for HR probe only) |
|
|
| **Key numbers**: **53 patients**, ~7 hours total, **125 Hz**. |
|
|
| --- |
|
|
| ### [T4-2] Moody et al. β MIMIC-IV Waveform Database |
| **PhysioNet**: `physionet.org/content/mimic4wdb/0.1.0/` |
|
|
| **Use for**: Understanding HuggingFace mirror provenance. |
| - v0.1.0: **200 records from 198 patients**; upcoming release ~10,000 records |
| - MIMIC-IV-ECG module: **~800k ECGs across ~160k patients**, 500 Hz, 10s, 12-lead β AF label source candidate |
|
|
| --- |
|
|
| ### [T4-3] Kachuee et al. β Cuffless BP Estimation Dataset (UCI) |
| **UCI**: `archive.ics.uci.edu/dataset/340` |
|
|
| **Use for**: E5a PTT probe evaluation. |
| - 12,000 records, 942 patients β **patient ID removed** β population-level evaluation only |
| - PPG + ABP at 125 Hz, derived from MIMIC-II |
|
|
| **Key numbers**: AAMI standard β€5 mmHg mean Β± 8 mmHg SD. |
|
|
| --- |
|
|
| ### [T4-4] Goldberger et al. β PhysioBank, PhysioToolkit, PhysioNet |
| **DOI**: 10.1161/01.CIR.101.23.e215 |
|
|
| **Use for**: Required citation whenever using BIDMC, MIMIC waveforms, or any PhysioNet dataset. One line in methods: "Data obtained from PhysioNet [Goldberger et al., 2000]." |
|
|
| --- |
|
|
| ## Tier 5 β Context and intellectual lineage |
| *Do not read these to implement anything. One citation each.* |
|
|
| --- |
|
|
| ### [T5-1] Ha & Schmidhuber β World Models |
| **arXiv**: 1803.10122 |
|
|
| **Use for**: Intro citation only. "World models learn a compressed latent representation and a transition function [Ha & Schmidhuber, 2018]." |
|
|
| --- |
|
|
| ### [T5-2] Bardes et al. β VICReg |
| **arXiv**: 2105.04906 |
|
|
| **Use for**: Related work only. "VICReg requires hand-crafted augmentations that JEPA avoids." |
|
|
| --- |
|
|
| ### [T5-3] Ronan et al. β VICReg for Brugada ECG Detection |
| **DOI**: 10.1038/s41598-025-94130-x |
|
|
| **Use for**: One sentence. "VICReg-based SSL has been applied to ECG classification [Ronan et al., 2025] but requires augmentation engineering." |
|
|
| --- |
|
|
| ### [T5-4] Johnson et al. β MIMIC-IV (clinical database paper) |
| **DOI**: 10.1038/s41597-022-01899-x |
|
|
| **Use for**: Required data citation whenever using MIMIC-IV derived data. "MIMIC-IV [Johnson et al., 2023], a freely accessible EHR database." |
|
|
| --- |
|
|
| ### [T5-5] CLIMB multimodal clinical benchmark |
| **arXiv**: 2503.07667 |
|
|
| **Use for**: ECG-JEPA performance in multimodal settings. "ECG-JEPA outperforms general time-series models like UniTS by 36.8% on ECG tasks [CLIMB, 2025]." One citation in intro. |
|
|
| --- |
|
|
| ## Quick reference: numbers the agent must not get wrong |
|
|
| | Claim | Correct value | Source | |
| |-------|--------------|--------| |
| | ECG-JEPA PTB-XL AUC | **0.945** all-statements | T1-1 Weimann | |
| | AnyPPG PPGβECG R@1 | **0.736** | T2-1 Nie | |
| | AnyPPG AUC improvement | **9.1%** over non-ECG baselines | T2-1 Nie | |
| | AnyPPG ECGFounder | **FROZEN** during training | T2-1 Nie | |
| | EchoJEPA JEPA perturbation | **2%** degradation | T3-6 Munim | |
| | EchoJEPA MAE perturbation | **17%** degradation | T3-6 Munim | |
| | EchoJEPA 1% label accuracy | **79%** | T3-6 Munim | |
| | Normal PTT range (ICU) | **100β400ms** | T2-3 Charlton | |
| | BIDMC size | **53 recordings Γ 8 min @ 125 Hz** | T4-1 Pimentel | |
| | V-JEPA 2-AC interaction data | **<62 hours** | T2-4 Assran | |
| | EEG-VJEPA TUH AUROC | **88.5%** fine-tuned | T3-5 Hojjati | |
| | CardioGAN objective | **CycleGAN adversarial** β not reconstruction | T3-1 Sarkar | |
| | TSTA-Net objective | **Hierarchical contrastive** β not raw InfoNCE | T3-2 Liu | |
| | PPGFlowECG Stage 1 | **InfoNCE alignment**, then rectified flow | T3-3 Fang | |
| | BP calibration requirement | **Per-patient calibration required** for absolute values | T2-3 Charlton | |
|
|
| --- |
|
|
| ## File locations in repo |
|
|
| ``` |
| docs/papers/*.pdf |
| ``` |
|
|
| --- |
|
|
| *This is the complete reference index. Fetch from arXiv if a PDF is missing. Never cite a number not in this file without verifying the source first.* |