File size: 13,869 Bytes
31e2456
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
# PAPERS.md β€” PhysioJEPA Reference Index
*Oz Labs β€” April 2026*
*Covers every paper referenced across the full conversation and all project documents.*

---

## How to use this file

Three things per entry:
1. **What to use it for** β€” the specific task or decision the agent needs this paper for
2. **Key numbers** β€” exact figures the agent must not get wrong in code or prose
3. **Location** β€” where to fetch the PDF

Read the tier before writing any code in that tier's domain.
Do not cite a number that isn't in this file without fetching the source first.

---

## Tier 1 β€” Implement from these
*Read before writing any training code. Contains exact equations, hyperparameters, architecture details.*

---

### [T1-1] Weimann & Conrad β€” ECG-JEPA
**arXiv**: 2410.13867 Β· `arxiv.org/pdf/2410.13867`
**Code**: `github.com/kweimann/ECG-JEPA` ← fork this

**Use for**: This is the codebase we fork. Before writing any encoder code, read Section 2 (architecture), Section 3 (data), Appendix A (hyperparameters).
- Patch tokenisation: 2D over (12 leads Γ— time), patch size = 25 time steps at 500 Hz
- Masking: multi-block contiguous, 50% ratio, 4 target blocks
- EMA: Ο„ starts 0.996, cosine-annealed to 0.9999 over training
- Loss: L1 in latent space β€” no pixel decoder
- ViT-S: 12 layers, d=256, 8 heads, MLP ratio=4

**Key numbers**: PTB-XL all-statements AUC **0.945** β€” this is Baseline A in the experiment matrix. Training time ~26h on RTX 3090.

---

### [T1-2] Assran et al. β€” I-JEPA
**arXiv**: 2301.08243 Β· `arxiv.org/pdf/2301.08243`
**Code**: `github.com/facebookresearch/ijepa`

**Use for**: The masking strategy foundation. Why multi-block contiguous > random masking (forces semantic prediction, not texture interpolation). The stop-gradient / EMA target encoder design justification. The predictor should be *narrower* than the encoder β€” this prevents shortcutting through the predictor.

**Key numbers**: ViT-H/14 ImageNet β€” scale reference only, not a target for us.

---

### [T1-3] Bardes et al. β€” V-JEPA (Revisiting Feature Prediction)
**arXiv**: 2404.08471 Β· `arxiv.org/pdf/2404.08471`

**Use for**: Spatiotemporal tube masking β€” how to mask contiguous blocks across both spatial and temporal axes simultaneously. Template for PPG 1D+time representation. Two-encoder EMA recipe at scale. Why predicting in latent space beats pixel reconstruction for noisy signals β€” core justification for JEPA over MAE.

**Key numbers**: SSv2 top-1 77.3%.

---

### [T1-4] Balestriero & LeCun β€” LeJEPA
**arXiv**: 2511.08544 Β· `arxiv.org/pdf/2511.08544`

**Use for**: Ablation A3 only (SIGReg). Do not implement SIGReg without reading this first.
- Theorem 1: isotropic Gaussian is the optimal JEPA embedding distribution
- SIGReg: K=128 random 1D projections w~N(0,I), KL(zΒ·w || N(0,1)) per projection, sum. O(Kd).
- Ξ» range: [0.01, 0.1]; start at 0.05
- Apply to *pooled global representation only* β€” not per-patch tokens
- ~50 lines of PyTorch

**Key numbers**: 79% ImageNet ViT-H/14 with only 2 loss terms.

---

### [T1-5] Kim β€” CroPA-ECG-JEPA
**arXiv**: 2410.08559 Β· `arxiv.org/pdf/2410.08559`
**Code**: `github.com/sehunfromdaegu/ECG_JEPA`

**Use for**: Second ECG-JEPA implementation for debugging. Cross-Pattern Attention (CroPA) = inter-lead masked attention = inspiration for cardiac phase encoding in ablation A2. Also: 1D PE for predictor vs 2D for encoders β€” different from Weimann, compare before finalising.

**Key numbers**: Recovers HR and QRS duration from frozen representations without supervised training β€” target behaviour for PTT.

---

### [T1-6] Botman et al. β€” Laya (LeJEPA for EEG)
**arXiv**: 2603.16281 Β· `arxiv.org/pdf/2603.16281`

**Use for**: Most direct prior to PhysioJEPA. Read before implementing ablation A3.
- SIGReg with aggressive Ξ» destabilises training on impulsive signals (QRS-like spikes in EEG)
- Mitigation: lower Ξ» (0.001–0.01), aggressive gradient clipping, apply to pooled global rep only
- Latent prediction outperforms reconstruction on EEG clinical tasks

**Key numbers**: Outperforms reconstruction baselines on EEG-Bench with 10% of pretraining data.

---

## Tier 2 β€” Baseline numbers and comparisons
*Read to correctly report comparison numbers. Getting baselines wrong is a rejection risk.*

---

### [T2-1] Nie et al. β€” AnyPPG
**arXiv**: 2511.01747 Β· `arxiv.org/pdf/2511.01747`

**Use for**: Primary contrastive baseline (Baseline C in experiment matrix).
- Exact loss: **symmetric InfoNCE** with learnable temperature Ο„
- **CRITICAL: ECGFounder encoder is FROZEN during AnyPPG training.** ECG is a fixed supervisory signal. AnyPPG is not a jointly trained dual-encoder model.
- Architecture: Net1D (PPG branch), ECGFounder frozen (ECG branch)
- Trained on >100,000 hours

**Key numbers**: PPG→ECG retrieval **R@1=0.736**, R@5=0.906, R@10=0.935. AF detection AUC ~0.90. Mean **9.1% AUC improvement** over non-ECG-guided baselines.

---

### [T2-2] Wagner et al. β€” PTB-XL
**arXiv**: 2004.13701 Β· `arxiv.org/pdf/2004.13701`

**Use for**: ECG evaluation benchmark. Task definitions, train/test/val splits, and label hierarchy. Must replicate Weimann's exact split for comparison.

**Key numbers**: Weimann ECG-JEPA AUC **0.945** all-statements = Baseline A target.

---

### [T2-3] Charlton et al. β€” Towards Ubiquitous BP Monitoring via PTT (review)
**URL**: `pmc.ncbi.nlm.nih.gov/articles/PMC4515215/`

**Use for**: Before writing E4 rollout coherence physiological consistency checks. PTT definition, normal range, PTT–BP and HR–PTT relationships. Per-patient calibration required for absolute BP β€” do not claim uncalibrated absolute BP from PTT.

**Key numbers**: Normal PTT **100–400ms** (ICU adults). Within-patient tracking ~10 mmHg MAE with calibration.

---

### [T2-4] Assran et al. β€” V-JEPA 2 (including V-JEPA 2-AC)
**arXiv**: 2506.09985 Β· `arxiv.org/pdf/2506.09985`

**Use for**: Architecture D future work template. Two-stage recipe: action-free pretraining β†’ action-conditioned fine-tuning with frozen encoder.

**Key numbers**: **<62 hours** of robot interaction data for Stage 2. SSv2 top-1 77.3%.

---

## Tier 3 β€” Related work framing
*Read to correctly describe prior work and differentiate PhysioJEPA.*

---

### [T3-1] Sarkar & Etemad β€” CardioGAN
**arXiv**: 2010.00104 Β· `arxiv.org/pdf/2010.00104`
**Code**: `github.com/pritamqu/ppg2ecg-cardiogan`

**Use for**: First major cross-modal ECG-PPG paper (AAAI 2021).
- Uses **CycleGAN backbone** with attention-based generators and dual time/frequency discriminators
- **NOT reconstruction/L1, NOT InfoNCE** β€” adversarial + cycle consistency loss
- t=0 alignment β€” discards lag. Do NOT call this "pixel reconstruction."

---

### [T3-2] Liu, Wang & Wang β€” TSTA-Net
**PMLR**: proceedings.mlr.press/v278/liu25d.html

**Use for**: Hierarchical contrastive ECG-PPG baseline (PMLR 2025).
- **Hierarchical contrastive learning** β€” NOT raw InfoNCE
- 9.3% higher AF F1 vs prior SSL methods
- Still t=0 aligned

---

### [T3-3] Fang et al. β€” PPGFlowECG
**arXiv**: 2509.19774 Β· `arxiv.org/pdf/2509.19774`

**Use for**: Two-stage generative translation baseline.
- Stage 1: **InfoNCE instance alignment** (CardioAlign encoder, shared weights)
- Stage 2: **rectified flow** generation from aligned latents
- Figure 1 explicitly shows ECG precedes PPG temporally but the architecture does not exploit this
- Do NOT describe as "rectified flow only" β€” InfoNCE is in Stage 1

---

### [T3-4] Dong et al. β€” Brain-JEPA (NeurIPS 2024 Spotlight)
**arXiv**: 2409.19407 Β· `arxiv.org/pdf/2409.19407`
**Code**: `github.com/hzlab/2024_Dong_Li_NeurIPS_Brain-JEPA`

**Use for**: Cardiac phase encoding inspiration (ablation A2). Brain Gradient Positioning β†’ our cardiac phase PE. Hard phase boundaries fail during AF β€” use soft Gaussian encoding over cardiac landmarks.

**Key numbers**: NeurIPS 2024 Spotlight. UK Biobank 40k patients.

---

### [T3-5] Hojjati et al. β€” EEG-VJEPA
**arXiv**: 2507.03633 Β· `arxiv.org/pdf/2507.03633`
**Code**: `github.com/amir-hojjati/eeg-vjepa`

**Use for**: V-JEPA adapted to 1D physiological signal β€” most direct predecessor. How to reshape multi-channel 1D signal into 3D tensor treated as "video." UMAP showing pathological clustering without labels.

**Key numbers**: TUH fine-tuned accuracy **85.8%**, AUROC **88.5%**. Frozen probe 83.3%.

---

### [T3-6] Munim et al. β€” EchoJEPA
**arXiv**: 2602.02603 Β· `arxiv.org/pdf/2602.02603`

**Use for**: Strongest empirical evidence that JEPA > MAE for noisy medical signals. Use in intro to justify JEPA over MAE.

**Key numbers**: JEPA degrades **2%** under perturbation vs **17%** for VideoMAE. **79%** accuracy at 1% labels. 20% LVEF improvement.

---

### [T3-7] Wu, Lei et al. β€” SurgMotion
**arXiv**: 2602.05638 Β· `arxiv.org/pdf/2602.05638`

**Use for**: One-sentence citation alongside EchoJEPA: "JEPA's noise rejection under clinical signal artifacts has been validated in echocardiography [EchoJEPA] and surgical video [SurgMotion]."

---

### [T3-8] LeCun β€” A Path Towards Autonomous Machine Intelligence (JEPA position paper)
**URL**: `openreview.net/pdf?id=BZ5a1r-kVsf`

**Use for**: One intro citation: "A world model should predict consequences of actions in abstract representation space [LeCun 2022]."

---

### [T3-9] Abbaspourazad et al. β€” Apple Heart Study Foundation Model
**arXiv**: 2312.05409 Β· `arxiv.org/pdf/2312.05409`
**Published**: ICLR 2024

**Use for**: Prior art on wearable-scale PPG+ECG foundation models. InfoNCE + KoLeo, participant-level positives, Apple Watch data. Shows ECG more discriminative than PPG β€” context for why cross-modal training helps PPG.

---

## Tier 4 β€” Evaluation methodology and datasets
*Read when writing the evaluation harness code.*

---

### [T4-1] Pimentel et al. β€” BIDMC PPG and Respiration Dataset
**PhysioNet**: `physionet.org/content/bidmc/1.0.0/`

**Use for**: Fallback dataset if E0 fails.
- WFDB format, **53 recordings Γ— 8 min**, **125 Hz**
- Signals: **Lead II ECG + fingertip PPG** + impedance respiration
- Labels: HR, RR, SpO2 β€” **no AF labels** (use for HR probe only)

**Key numbers**: **53 patients**, ~7 hours total, **125 Hz**.

---

### [T4-2] Moody et al. β€” MIMIC-IV Waveform Database
**PhysioNet**: `physionet.org/content/mimic4wdb/0.1.0/`

**Use for**: Understanding HuggingFace mirror provenance.
- v0.1.0: **200 records from 198 patients**; upcoming release ~10,000 records
- MIMIC-IV-ECG module: **~800k ECGs across ~160k patients**, 500 Hz, 10s, 12-lead β€” AF label source candidate

---

### [T4-3] Kachuee et al. β€” Cuffless BP Estimation Dataset (UCI)
**UCI**: `archive.ics.uci.edu/dataset/340`

**Use for**: E5a PTT probe evaluation.
- 12,000 records, 942 patients β€” **patient ID removed** β€” population-level evaluation only
- PPG + ABP at 125 Hz, derived from MIMIC-II

**Key numbers**: AAMI standard ≀5 mmHg mean Β± 8 mmHg SD.

---

### [T4-4] Goldberger et al. β€” PhysioBank, PhysioToolkit, PhysioNet
**DOI**: 10.1161/01.CIR.101.23.e215

**Use for**: Required citation whenever using BIDMC, MIMIC waveforms, or any PhysioNet dataset. One line in methods: "Data obtained from PhysioNet [Goldberger et al., 2000]."

---

## Tier 5 β€” Context and intellectual lineage
*Do not read these to implement anything. One citation each.*

---

### [T5-1] Ha & Schmidhuber β€” World Models
**arXiv**: 1803.10122

**Use for**: Intro citation only. "World models learn a compressed latent representation and a transition function [Ha & Schmidhuber, 2018]."

---

### [T5-2] Bardes et al. β€” VICReg
**arXiv**: 2105.04906

**Use for**: Related work only. "VICReg requires hand-crafted augmentations that JEPA avoids."

---

### [T5-3] Ronan et al. β€” VICReg for Brugada ECG Detection
**DOI**: 10.1038/s41598-025-94130-x

**Use for**: One sentence. "VICReg-based SSL has been applied to ECG classification [Ronan et al., 2025] but requires augmentation engineering."

---

### [T5-4] Johnson et al. β€” MIMIC-IV (clinical database paper)
**DOI**: 10.1038/s41597-022-01899-x

**Use for**: Required data citation whenever using MIMIC-IV derived data. "MIMIC-IV [Johnson et al., 2023], a freely accessible EHR database."

---

### [T5-5] CLIMB multimodal clinical benchmark
**arXiv**: 2503.07667

**Use for**: ECG-JEPA performance in multimodal settings. "ECG-JEPA outperforms general time-series models like UniTS by 36.8% on ECG tasks [CLIMB, 2025]." One citation in intro.

---

## Quick reference: numbers the agent must not get wrong

| Claim | Correct value | Source |
|-------|--------------|--------|
| ECG-JEPA PTB-XL AUC | **0.945** all-statements | T1-1 Weimann |
| AnyPPG PPG→ECG R@1 | **0.736** | T2-1 Nie |
| AnyPPG AUC improvement | **9.1%** over non-ECG baselines | T2-1 Nie |
| AnyPPG ECGFounder | **FROZEN** during training | T2-1 Nie |
| EchoJEPA JEPA perturbation | **2%** degradation | T3-6 Munim |
| EchoJEPA MAE perturbation | **17%** degradation | T3-6 Munim |
| EchoJEPA 1% label accuracy | **79%** | T3-6 Munim |
| Normal PTT range (ICU) | **100–400ms** | T2-3 Charlton |
| BIDMC size | **53 recordings Γ— 8 min @ 125 Hz** | T4-1 Pimentel |
| V-JEPA 2-AC interaction data | **<62 hours** | T2-4 Assran |
| EEG-VJEPA TUH AUROC | **88.5%** fine-tuned | T3-5 Hojjati |
| CardioGAN objective | **CycleGAN adversarial** β€” not reconstruction | T3-1 Sarkar |
| TSTA-Net objective | **Hierarchical contrastive** β€” not raw InfoNCE | T3-2 Liu |
| PPGFlowECG Stage 1 | **InfoNCE alignment**, then rectified flow | T3-3 Fang |
| BP calibration requirement | **Per-patient calibration required** for absolute values | T2-3 Charlton |

---

## File locations in repo

```
docs/papers/*.pdf
```

---

*This is the complete reference index. Fetch from arXiv if a PDF is missing. Never cite a number not in this file without verifying the source first.*