Upload folder using huggingface_hub

31e2456 verified about 1 month ago

13.9 kB

	# PAPERS.md — PhysioJEPA Reference Index
	Oz Labs — April 2026
	Covers every paper referenced across the full conversation and all project documents.

	---

	## How to use this file

	Three things per entry:
	1. What to use it for — the specific task or decision the agent needs this paper for
	2. Key numbers — exact figures the agent must not get wrong in code or prose
	3. Location — where to fetch the PDF

	Read the tier before writing any code in that tier's domain.
	Do not cite a number that isn't in this file without fetching the source first.

	---

	## Tier 1 — Implement from these
	Read before writing any training code. Contains exact equations, hyperparameters, architecture details.

	---

	### [T1-1] Weimann & Conrad — ECG-JEPA
	arXiv: 2410.13867 · `arxiv.org/pdf/2410.13867`
	Code: `github.com/kweimann/ECG-JEPA` ← fork this

	Use for: This is the codebase we fork. Before writing any encoder code, read Section 2 (architecture), Section 3 (data), Appendix A (hyperparameters).
	- Patch tokenisation: 2D over (12 leads × time), patch size = 25 time steps at 500 Hz
	- Masking: multi-block contiguous, 50% ratio, 4 target blocks
	- EMA: τ starts 0.996, cosine-annealed to 0.9999 over training
	- Loss: L1 in latent space — no pixel decoder
	- ViT-S: 12 layers, d=256, 8 heads, MLP ratio=4

	Key numbers: PTB-XL all-statements AUC 0.945 — this is Baseline A in the experiment matrix. Training time ~26h on RTX 3090.

	---

	### [T1-2] Assran et al. — I-JEPA
	arXiv: 2301.08243 · `arxiv.org/pdf/2301.08243`
	Code: `github.com/facebookresearch/ijepa`

	Use for: The masking strategy foundation. Why multi-block contiguous > random masking (forces semantic prediction, not texture interpolation). The stop-gradient / EMA target encoder design justification. The predictor should be narrower than the encoder — this prevents shortcutting through the predictor.

	Key numbers: ViT-H/14 ImageNet — scale reference only, not a target for us.

	---

	### [T1-3] Bardes et al. — V-JEPA (Revisiting Feature Prediction)
	arXiv: 2404.08471 · `arxiv.org/pdf/2404.08471`

	Use for: Spatiotemporal tube masking — how to mask contiguous blocks across both spatial and temporal axes simultaneously. Template for PPG 1D+time representation. Two-encoder EMA recipe at scale. Why predicting in latent space beats pixel reconstruction for noisy signals — core justification for JEPA over MAE.

	Key numbers: SSv2 top-1 77.3%.

	---

	### [T1-4] Balestriero & LeCun — LeJEPA
	arXiv: 2511.08544 · `arxiv.org/pdf/2511.08544`

	Use for: Ablation A3 only (SIGReg). Do not implement SIGReg without reading this first.
	- Theorem 1: isotropic Gaussian is the optimal JEPA embedding distribution
	- SIGReg: K=128 random 1D projections w~N(0,I), KL(z·w \|\| N(0,1)) per projection, sum. O(Kd).
	- λ range: [0.01, 0.1]; start at 0.05
	- Apply to pooled global representation only — not per-patch tokens
	- ~50 lines of PyTorch

	Key numbers: 79% ImageNet ViT-H/14 with only 2 loss terms.

	---

	### [T1-5] Kim — CroPA-ECG-JEPA
	arXiv: 2410.08559 · `arxiv.org/pdf/2410.08559`
	Code: `github.com/sehunfromdaegu/ECG_JEPA`

	Use for: Second ECG-JEPA implementation for debugging. Cross-Pattern Attention (CroPA) = inter-lead masked attention = inspiration for cardiac phase encoding in ablation A2. Also: 1D PE for predictor vs 2D for encoders — different from Weimann, compare before finalising.

	Key numbers: Recovers HR and QRS duration from frozen representations without supervised training — target behaviour for PTT.

	---

	### [T1-6] Botman et al. — Laya (LeJEPA for EEG)
	arXiv: 2603.16281 · `arxiv.org/pdf/2603.16281`

	Use for: Most direct prior to PhysioJEPA. Read before implementing ablation A3.
	- SIGReg with aggressive λ destabilises training on impulsive signals (QRS-like spikes in EEG)
	- Mitigation: lower λ (0.001–0.01), aggressive gradient clipping, apply to pooled global rep only
	- Latent prediction outperforms reconstruction on EEG clinical tasks

	Key numbers: Outperforms reconstruction baselines on EEG-Bench with 10% of pretraining data.

	---

	## Tier 2 — Baseline numbers and comparisons
	Read to correctly report comparison numbers. Getting baselines wrong is a rejection risk.

	---

	### [T2-1] Nie et al. — AnyPPG
	arXiv: 2511.01747 · `arxiv.org/pdf/2511.01747`

	Use for: Primary contrastive baseline (Baseline C in experiment matrix).
	- Exact loss: symmetric InfoNCE with learnable temperature τ
	- CRITICAL: ECGFounder encoder is FROZEN during AnyPPG training. ECG is a fixed supervisory signal. AnyPPG is not a jointly trained dual-encoder model.
	- Architecture: Net1D (PPG branch), ECGFounder frozen (ECG branch)
	- Trained on >100,000 hours

	Key numbers: PPG→ECG retrieval R@1=0.736, R@5=0.906, R@10=0.935. AF detection AUC ~0.90. Mean 9.1% AUC improvement over non-ECG-guided baselines.

	---

	### [T2-2] Wagner et al. — PTB-XL
	arXiv: 2004.13701 · `arxiv.org/pdf/2004.13701`

	Use for: ECG evaluation benchmark. Task definitions, train/test/val splits, and label hierarchy. Must replicate Weimann's exact split for comparison.

	Key numbers: Weimann ECG-JEPA AUC 0.945 all-statements = Baseline A target.

	---

	### [T2-3] Charlton et al. — Towards Ubiquitous BP Monitoring via PTT (review)
	URL: `pmc.ncbi.nlm.nih.gov/articles/PMC4515215/`

	Use for: Before writing E4 rollout coherence physiological consistency checks. PTT definition, normal range, PTT–BP and HR–PTT relationships. Per-patient calibration required for absolute BP — do not claim uncalibrated absolute BP from PTT.

	Key numbers: Normal PTT 100–400ms (ICU adults). Within-patient tracking ~10 mmHg MAE with calibration.

	---

	### [T2-4] Assran et al. — V-JEPA 2 (including V-JEPA 2-AC)
	arXiv: 2506.09985 · `arxiv.org/pdf/2506.09985`

	Use for: Architecture D future work template. Two-stage recipe: action-free pretraining → action-conditioned fine-tuning with frozen encoder.

	Key numbers: <62 hours of robot interaction data for Stage 2. SSv2 top-1 77.3%.

	---

	## Tier 3 — Related work framing
	Read to correctly describe prior work and differentiate PhysioJEPA.

	---

	### [T3-1] Sarkar & Etemad — CardioGAN
	arXiv: 2010.00104 · `arxiv.org/pdf/2010.00104`
	Code: `github.com/pritamqu/ppg2ecg-cardiogan`

	Use for: First major cross-modal ECG-PPG paper (AAAI 2021).
	- Uses CycleGAN backbone with attention-based generators and dual time/frequency discriminators
	- NOT reconstruction/L1, NOT InfoNCE — adversarial + cycle consistency loss
	- t=0 alignment — discards lag. Do NOT call this "pixel reconstruction."

	---

	### [T3-2] Liu, Wang & Wang — TSTA-Net
	PMLR: proceedings.mlr.press/v278/liu25d.html

	Use for: Hierarchical contrastive ECG-PPG baseline (PMLR 2025).
	- Hierarchical contrastive learning — NOT raw InfoNCE
	- 9.3% higher AF F1 vs prior SSL methods
	- Still t=0 aligned

	---

	### [T3-3] Fang et al. — PPGFlowECG
	arXiv: 2509.19774 · `arxiv.org/pdf/2509.19774`

	Use for: Two-stage generative translation baseline.
	- Stage 1: InfoNCE instance alignment (CardioAlign encoder, shared weights)
	- Stage 2: rectified flow generation from aligned latents
	- Figure 1 explicitly shows ECG precedes PPG temporally but the architecture does not exploit this
	- Do NOT describe as "rectified flow only" — InfoNCE is in Stage 1

	---

	### [T3-4] Dong et al. — Brain-JEPA (NeurIPS 2024 Spotlight)
	arXiv: 2409.19407 · `arxiv.org/pdf/2409.19407`
	Code: `github.com/hzlab/2024_Dong_Li_NeurIPS_Brain-JEPA`

	Use for: Cardiac phase encoding inspiration (ablation A2). Brain Gradient Positioning → our cardiac phase PE. Hard phase boundaries fail during AF — use soft Gaussian encoding over cardiac landmarks.

	Key numbers: NeurIPS 2024 Spotlight. UK Biobank 40k patients.

	---

	### [T3-5] Hojjati et al. — EEG-VJEPA
	arXiv: 2507.03633 · `arxiv.org/pdf/2507.03633`
	Code: `github.com/amir-hojjati/eeg-vjepa`

	Use for: V-JEPA adapted to 1D physiological signal — most direct predecessor. How to reshape multi-channel 1D signal into 3D tensor treated as "video." UMAP showing pathological clustering without labels.

	Key numbers: TUH fine-tuned accuracy 85.8%, AUROC 88.5%. Frozen probe 83.3%.

	---

	### [T3-6] Munim et al. — EchoJEPA
	arXiv: 2602.02603 · `arxiv.org/pdf/2602.02603`

	Use for: Strongest empirical evidence that JEPA > MAE for noisy medical signals. Use in intro to justify JEPA over MAE.

	Key numbers: JEPA degrades 2% under perturbation vs 17% for VideoMAE. 79% accuracy at 1% labels. 20% LVEF improvement.

	---

	### [T3-7] Wu, Lei et al. — SurgMotion
	arXiv: 2602.05638 · `arxiv.org/pdf/2602.05638`

	Use for: One-sentence citation alongside EchoJEPA: "JEPA's noise rejection under clinical signal artifacts has been validated in echocardiography [EchoJEPA] and surgical video [SurgMotion]."

	---

	### [T3-8] LeCun — A Path Towards Autonomous Machine Intelligence (JEPA position paper)
	URL: `openreview.net/pdf?id=BZ5a1r-kVsf`

	Use for: One intro citation: "A world model should predict consequences of actions in abstract representation space [LeCun 2022]."

	---

	### [T3-9] Abbaspourazad et al. — Apple Heart Study Foundation Model
	arXiv: 2312.05409 · `arxiv.org/pdf/2312.05409`
	Published: ICLR 2024

	Use for: Prior art on wearable-scale PPG+ECG foundation models. InfoNCE + KoLeo, participant-level positives, Apple Watch data. Shows ECG more discriminative than PPG — context for why cross-modal training helps PPG.

	---

	## Tier 4 — Evaluation methodology and datasets
	Read when writing the evaluation harness code.

	---

	### [T4-1] Pimentel et al. — BIDMC PPG and Respiration Dataset
	PhysioNet: `physionet.org/content/bidmc/1.0.0/`

	Use for: Fallback dataset if E0 fails.
	- WFDB format, 53 recordings × 8 min, 125 Hz
	- Signals: Lead II ECG + fingertip PPG + impedance respiration
	- Labels: HR, RR, SpO2 — no AF labels (use for HR probe only)

	Key numbers: 53 patients, ~7 hours total, 125 Hz.

	---

	### [T4-2] Moody et al. — MIMIC-IV Waveform Database
	PhysioNet: `physionet.org/content/mimic4wdb/0.1.0/`

	Use for: Understanding HuggingFace mirror provenance.
	- v0.1.0: 200 records from 198 patients; upcoming release ~10,000 records
	- MIMIC-IV-ECG module: ~800k ECGs across ~160k patients, 500 Hz, 10s, 12-lead — AF label source candidate

	---

	### [T4-3] Kachuee et al. — Cuffless BP Estimation Dataset (UCI)
	UCI: `archive.ics.uci.edu/dataset/340`

	Use for: E5a PTT probe evaluation.
	- 12,000 records, 942 patients — patient ID removed — population-level evaluation only
	- PPG + ABP at 125 Hz, derived from MIMIC-II

	Key numbers: AAMI standard ≤5 mmHg mean ± 8 mmHg SD.

	---

	### [T4-4] Goldberger et al. — PhysioBank, PhysioToolkit, PhysioNet
	DOI: 10.1161/01.CIR.101.23.e215

	Use for: Required citation whenever using BIDMC, MIMIC waveforms, or any PhysioNet dataset. One line in methods: "Data obtained from PhysioNet [Goldberger et al., 2000]."

	---

	## Tier 5 — Context and intellectual lineage
	Do not read these to implement anything. One citation each.

	---

	### [T5-1] Ha & Schmidhuber — World Models
	arXiv: 1803.10122

	Use for: Intro citation only. "World models learn a compressed latent representation and a transition function [Ha & Schmidhuber, 2018]."

	---

	### [T5-2] Bardes et al. — VICReg
	arXiv: 2105.04906

	Use for: Related work only. "VICReg requires hand-crafted augmentations that JEPA avoids."

	---

	### [T5-3] Ronan et al. — VICReg for Brugada ECG Detection
	DOI: 10.1038/s41598-025-94130-x

	Use for: One sentence. "VICReg-based SSL has been applied to ECG classification [Ronan et al., 2025] but requires augmentation engineering."

	---

	### [T5-4] Johnson et al. — MIMIC-IV (clinical database paper)
	DOI: 10.1038/s41597-022-01899-x

	Use for: Required data citation whenever using MIMIC-IV derived data. "MIMIC-IV [Johnson et al., 2023], a freely accessible EHR database."

	---

	### [T5-5] CLIMB multimodal clinical benchmark
	arXiv: 2503.07667

	Use for: ECG-JEPA performance in multimodal settings. "ECG-JEPA outperforms general time-series models like UniTS by 36.8% on ECG tasks [CLIMB, 2025]." One citation in intro.

	---

	## Quick reference: numbers the agent must not get wrong

	\| Claim \| Correct value \| Source \|
	\|-------\|--------------\|--------\|
	\| ECG-JEPA PTB-XL AUC \| 0.945 all-statements \| T1-1 Weimann \|
	\| AnyPPG PPG→ECG R@1 \| 0.736 \| T2-1 Nie \|
	\| AnyPPG AUC improvement \| 9.1% over non-ECG baselines \| T2-1 Nie \|
	\| AnyPPG ECGFounder \| FROZEN during training \| T2-1 Nie \|
	\| EchoJEPA JEPA perturbation \| 2% degradation \| T3-6 Munim \|
	\| EchoJEPA MAE perturbation \| 17% degradation \| T3-6 Munim \|
	\| EchoJEPA 1% label accuracy \| 79% \| T3-6 Munim \|
	\| Normal PTT range (ICU) \| 100–400ms \| T2-3 Charlton \|
	\| BIDMC size \| 53 recordings × 8 min @ 125 Hz \| T4-1 Pimentel \|
	\| V-JEPA 2-AC interaction data \| <62 hours \| T2-4 Assran \|
	\| EEG-VJEPA TUH AUROC \| 88.5% fine-tuned \| T3-5 Hojjati \|
	\| CardioGAN objective \| CycleGAN adversarial — not reconstruction \| T3-1 Sarkar \|
	\| TSTA-Net objective \| Hierarchical contrastive — not raw InfoNCE \| T3-2 Liu \|
	\| PPGFlowECG Stage 1 \| InfoNCE alignment, then rectified flow \| T3-3 Fang \|
	\| BP calibration requirement \| Per-patient calibration required for absolute values \| T2-3 Charlton \|

	---

	## File locations in repo

	```
	docs/papers/*.pdf
	```

	---

	This is the complete reference index. Fetch from arXiv if a PDF is missing. Never cite a number not in this file without verifying the source first.