File size: 22,890 Bytes
31e2456
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
# PhysioJEPA: Learning Cardiovascular Dynamics via Time-Shifted Cross-Modal Prediction
*Oz Labs β€” Full Research Development Document β€” April 2026*
*Revision 2: post-reviewer critique. Replaces causalcardio_jepa_full.md*

---

## Change log from revision 2 (post-E0 audit, 2026-04-14)

- ECG input revised from 12-lead @ 500 Hz to **single lead II @ 250 Hz** (lead II present in 93.7% of HF-mirror segments; 12-lead not available in this dataset)
- ECG patch size revised: 200 ms = **50 samples @ 250 Hz**, 1D over single lead (was 2D (12, 25) @ 500 Hz)
- AF label source locked to **PTB-XL** (see `docs/af_label_decision.md`): MIMIC-IV-ECG path blocked by (a) ~381-patient cohort yielding <100 AF-positive, (b) missing PhysioNet credentialing. Paper now frames AF eval as a transfer claim
- PPG encoding locked to **raw patches** for v1 per E1 Stage-1 result (extraction rate 98.6% but Stage-2 probe deferred to ablation A1 when AF labels are integrated)
- Baseline A (ECG-JEPA) cannot load Weimann's 12-lead PTB-XL checkpoints; must retrain from scratch on single-lead II to be an honest comparison

## Change log from revision 1

- Renamed throughout from CausalCardio-JEPA β†’ PhysioJEPA
- Core claim simplified to one sentence; PTT demoted from contribution to validation signal
- v1 architecture stripped to minimum: raw PPG patches, EMA only, no cardiac phase encoding, no SIGReg
- Morphological encoding, cardiac phase encoding, SIGReg moved to labelled ablations
- "Causal" language replaced throughout with "physiologically informed asymmetry" or "directional asymmetry"
- AnyPPG characterisation corrected: ECGFounder encoder is frozen during AnyPPG training
- Venue targets corrected to reflect actual 2026 deadlines
- PTT head reframed: validation signal, not contribution

---

## 1. The Hypothesis

**Core claim β€” one sentence:**

> Predicting PPG at a variable time offset Ξ”t from ECG produces cardiovascular representations that encode vascular timing structure, while contrastive alignment at t=0 and predictive alignment at t=0 both destroy this structure.

**What this means concretely:**
After self-supervised pretraining on synchronized ECG+PPG without labels, the model should:

1. Predict PPG windows N beats ahead from ECG context with lower error than predicting mean PPG β€” the model is actually learning something
2. Outperform a symmetric JEPA trained at Ξ”t=0 on downstream cardiovascular tasks β€” the temporal offset matters
3. Produce latent embeddings where PTT (measured post-hoc from the latent's optimal Ξ”t) correlates with ground-truth PTT from peak detection β€” PTT is implicitly encoded
4. Show physiologically consistent rollout: predicted optimal Ξ”t varies inversely with heart rate and directly with blood pressure categories

Points 1 and 2 are the paper. Points 3 and 4 are the supporting evidence.

**Why this is different from existing methods:**

Every prior cross-modal ECG-PPG method treats the two modalities as symmetric windows on the same cardiac state at the same moment:

- **AnyPPG** (Nie et al., 2511.01747): symmetric InfoNCE at t=0. Important nuance: the ECGFounder encoder is *frozen* during AnyPPG training β€” it functions as a fixed supervisory signal, not a jointly-learned representation. This means AnyPPG is not even learning a shared representation; it is distilling a frozen ECG model into a PPG encoder. Same-time alignment still applies.
- **TSTA-Net** (Liu et al., PMLR 2025): hierarchical contrastive learning with spatiotemporal alignment of ECG and PPG. Same-time alignment.
- **PPGFlowECG** (Fang et al., 2509.19774): uses InfoNCE instance alignment internally in Stage 1, then rectified flow generation in Stage 2. Both stages operate at t=0 alignment.
- **CardioGAN** (Sarkar & Etemad, AAAI 2021): CycleGAN-based adversarial waveform synthesis. Pixel-space signal translation, not representation learning. t=0.

All of them discard the ECGβ†’PPG lag. The lag is the measurement: PTT β‰ˆ 100–400ms encodes arterial stiffness, which encodes blood pressure via the Moens-Korteweg equation. PPGFlowECG even acknowledges this in Figure 1 ("ventricular electrical activation precedes the peripheral pulse") but their architecture doesn't use it.

**Why JEPA specifically:**

JEPA's implicit bias β€” shown formally by Balestriero & LeCun (LeJEPA, 2511.08544) and empirically by Weimann & Conrad (2410.13867) β€” is toward high-influence, predictable features. In a cardiac signal, the most stable and predictable cross-modal feature is the time-shifted PPG peak following the QRS complex. JEPA will naturally attend to this; symmetric InfoNCE cannot because it penalises the model for not aligning ECG(t) with PPG(t), actively destroying the lag information in order to minimise the contrastive loss.

---

## 2. Architecture

### v1 (what runs in the experiment matrix)

The minimum architecture needed to test the core claim. No unnecessary complexity.

```
INPUT  (revised post-E0, 2026-04-14)
───────────────────────────────────────────────────────
ECG:  [B, 1, 2500]   β€” lead II, 10s @ 250Hz (native HF-mirror rate)
PPG:  [B, 1, 1250]   β€” fingertip PPG (Pleth), 10s @ 125Hz (native)
Temporal alignment: sample-accurate (shared segment clock per HF record)

PREPROCESSING
───────────────────────────────────────────────────────
ECG:  bandpass 0.5–40 Hz β†’ z-score normalisation per window
      R-peak detection (Pan-Tompkins) only used for PTT ground truth,
      not consumed by the encoder

PPG:  bandpass 0.5–8 Hz β†’ z-score normalisation
      [v1: raw patches only β€” no morphological extraction]

Segments without lead II (~6.3%) are dropped.

TOKENISATION
───────────────────────────────────────────────────────
ECG context encoder:
  - 1D patch: 50 samples = 200ms @ 250Hz
  - 50 patches per 10s window
  - Linear projection β†’ d=256
  - 1D sinusoidal positional encoding (time)
  [v1: single-lead; multi-lead 2D is deferred β€” only II/V/aVR consistently
   present, and the Ξ”t claim is lead-agnostic]

PPG target encoder:
  - 1D patch: 25 samples = 200ms per patch
  - 60 patches per 10s window
  - Linear projection β†’ d=256
  - 1D sinusoidal positional encoding
  [v1: raw patches β€” not morphological tokens]

ECG CONTEXT ENCODER  E_e
───────────────────────────────────────────────────────
ViT-S (adapted from Weimann & Conrad ECG-JEPA, 1D instead of 2D)
  12 transformer layers, d=256, 8 heads, MLP ratio=4
  I-JEPA masking within ECG (multi-block, 50% ratio) for auxiliary loss
  EMA updated: Ο„ annealed 0.996β†’0.9999 over first 30% of training
  Note: cannot load Weimann's published 12-lead checkpoints directly;
  Baseline A retrains from scratch on single-lead II for fair comparison

PPG TARGET ENCODER  E_p  [EMA updated]
───────────────────────────────────────────────────────
ViT-T (lighter: 6 layers, d=256)
  No masking β€” encodes full PPG window as target
  EMA updated: same Ο„ schedule as E_e
  [v1: EMA only β€” SIGReg is an ablation, not v1]

Ξ”t EMBEDDING
───────────────────────────────────────────────────────
Scalar Ξ”t ∈ [50ms, 500ms] β†’ sinusoidal encoding β†’ R^64
Linear projection β†’ R^256
Added to predictor as conditioning token

CAUSAL PREDICTOR  P
───────────────────────────────────────────────────────
4-layer cross-attention transformer
  Query:    positional tokens for target PPG window positions
  Key/Val:  ECG context latents z_e + Ξ”t conditioning token
  Output:   predicted PPG latent αΊ‘_p(t+Ξ”t)

The predictor sees no PPG input β€” only ECG latents + Ξ”t.
This is the architectural enforcement of directional asymmetry.

LOSS FUNCTION (v1)
───────────────────────────────────────────────────────
L_total = L_cross + 0.3 * L_self

L_cross = L1(αΊ‘_p(t+Ξ”t),  z_p(t+Ξ”t))   ← main prediction loss
L_self  = L1(αΊ‘_e_masked, z_e_target)   ← auxiliary ECG self-prediction

[v1: no SIGReg, no PTT head in training loop]

Ξ”t SAMPLING
───────────────────────────────────────────────────────
Per batch:
  60% log-uniform in [50ms, 500ms]
  40% ground-truth PTT measured from aligned dataset
```

### Ablations (not v1 β€” run after E3 passes K2)

| Ablation | What changes | What it tests |
|----------|-------------|---------------|
| A1: Morphological PPG | PPG target encoder uses morphological tokens instead of raw patches | Does structured PPG encoding improve latent quality? |
| A2: Cardiac phase encoding | Add beat-phase positional encoding (P/QRS/ST/T) to ECG encoder | Does phase-aware PE beat standard 2D sinusoidal? |
| A3: SIGReg instead of EMA | Replace EMA with SIGReg (Balestriero & LeCun 2511.08544) | Is SIGReg more stable than EMA on cardiac signals? |
| A4: Joint PTT head | Add PTT regression MLP head to training loss (Ξ³=0.1) | Does supervised PTT signal improve latent vascular encoding? |
| A5: Curriculum Ξ”t | Start with ground-truth PTT only, introduce log-uniform Ξ”t after 30% training | Does curriculum scheduling improve PTT coherence? |

---

## 3. Required Resources

### Compute
- **E0–E2 (baseline suite)**: ~10 GPU-hours (3 baselines Γ— 20 epochs Γ— small data)
- **E3 (full training)**: ~48–72 hours on A100/H100 for 100 epochs
- **E4–E6**: ~10 GPU-hours (frozen encoder probes + ablations)
- **Full ablation suite (A1–A5)**: ~5 Γ— 24h = 120 hours
- **Total to paper-ready**: ~200 GPU-hours β‰ˆ $500 on Runpod H100

### Data
Primary: `lucky9-cyou/mimic-iv-aligned-ppg-ecg` (HuggingFace, instant)
Fallback (if E0 fails): PhysioNet BIDMC (ECG+PPG, documented alignment, open access)
PTT validation: MIMIC-BP curated dataset (UCL/UCI, 1,524 patients)

### Software
- Base codebase: `kweimann/ECG-JEPA` (MIT licence)
- PPG peak detection: `wfdb` + `scipy.signal`
- SIGReg (ablation A3): ~50 lines PyTorch, implement from Balestriero & LeCun 2511.08544
- Evaluation: `sklearn` linear probe + custom rollout harness

### People and timeline
- Guy: architecture, training loop, paper
- Zack: data pipeline, PPG encoder, evaluation harness
- Weeks 1–2: E0β†’E3 (go/no-go on K2)
- Weeks 3–4: E4β†’E6 + ablations (if green)
- Weeks 5–8: writing

---

## 4. Execution plan

See the experiment matrix document (`physiojep_experiment_matrix.md`) for day-by-day detail. Summary:

| Days | Task | Gate |
|------|------|------|
| 1–2 | E0: data audit | Dataset go/no-go |
| 3 | E1: PPG encoding decision | Architecture lock |
| 4–5 | E2: baseline suite | Floor + ceiling |
| 6–8 | E3: PhysioJEPA v1 | K1/K2/K3 at epoch 25 |
| 9–10 | E4: rollout coherence | World model evidence |
| 11–12 | E5: downstream probes | PTT/AF/HR numbers |
| 13–14 | E6: decisive ablation (Ξ”t vs Ξ”t=0) | Table 1 of paper |
| 15 | Green/yellow/red decision | What gets written |

---

## 5. Pitfalls and Failure Modes

### Pitfall 1: Dataset alignment coarser than 50ms
**Probability**: Medium. HuggingFace mirror is undocumented.
**Symptom**: PTT ground-truth variance >100ms within-patient
**Response**: Pivot to PhysioNet BIDMC immediately (2-day delay)
**Impact on claim**: Architecture identical; only provenance label changes

### Pitfall 2: Morphological PPG feature extraction unreliable
**Note**: This is now an ablation (A1), not v1. If E1 shows morphological encoding is unreliable, we simply don't run A1. This is no longer a project-killing risk.

### Pitfall 3: EMA collapse
**Probability**: Low. ECG-JEPA with EMA is validated at scale.
**Symptom**: Mean cosine sim >0.99 for 500 consecutive steps
**Response**: Reduce Ο„ start to 0.99, check batch size; add SIGReg (ablation A3) earlier
**Monitoring**: Log every 100 steps from epoch 1

### Pitfall 4: Cross-modal loss never beats mean baseline (K1)
**Probability**: Low-medium. Depends on dataset quality.
**Symptom**: L_cross plateau above 0.85Γ— mean-PPG-latent baseline
**Response**: Check data quality, increase window overlap, verify EMA schedule
**Nuclear option**: Pivot to Architecture A (temporal ECG-JEPA, unimodal) β€” reuses all code

### Pitfall 5 (critical): Ξ”t-aware β‰ˆ t-aligned (K2)
**Probability**: Unknown β€” this is the central empirical question.
**Symptom**: E3 AUROC β‰ˆ Baseline B AUROC (within 0.02)
**Response**: This is the K2 failure mode. The core claim is wrong on this data at this scale.
**Pivot options**: Architecture A, Study 4 (anomaly detection), or re-run on BIDMC

### Pitfall 6: Shortcut learning
**Probability**: Medium, especially early in training.
**Symptom**: Model predicts mean PPG morphology for all inputs; L_cross decreases but predictions are identical regardless of ECG input
**Detection**: Compute per-patient prediction variance β€” if near zero, shortcut is occurring
**Response**: Increase batch diversity, add within-patient hard negatives to Ξ”t sampling

### Pitfall 7: PTT coherence fails (E4 passes but PTT probe fails)
**Probability**: Low-medium.
**Implication**: The temporal structure is encoded nonlinearly. Try 3-layer MLP probe instead of linear. If that fails, this is a limitation β€” remove PTT probe from paper claims but keep E4 rollout coherence evidence.

---

## 6. Checkpoints

| # | When | Pass criterion | Fail action |
|---|------|----------------|-------------|
| C1 | Day 2 | Alignment ≀50ms; β‰₯500 patients; missing ≀20% | Pivot to BIDMC |
| C2 | Day 3 | E1 decision made and committed | Block on architecture |
| C3 | Day 5 | Baseline B training stable (no collapse) | Add SIGReg to E3 from start |
| C4 | Day 8 (epoch 25) | K1: L_cross < 0.85Γ— mean baseline | Fix or exit |
| C5 | Day 8 (epoch 25) | K2: E3 AUROC > Baseline B + 0.02 | Paper doesn't exist |
| C6 | Day 8 (epoch 25) | K3: E3 AUROC β‰₯ Baseline A βˆ’ 0.01 | Reduce PPG encoder capacity |
| C7 | Day 10 | E4: Spearman(optimal Ξ”t, ground-truth PTT) > 0.30 | Keep as limitation |
| C8 | Day 12 | E5: PTT probe MAE < naive by 20% | 3-layer MLP probe fallback |
| C9 | Day 14 | E6: Ξ”t>0 > Ξ”t=0 on β‰₯2 of 3 metrics | Re-examine K2 |

---

## 7. Evaluation Protocol

### Primary metrics (determine the paper)

**E3 / E6 β€” Core claim test**

| Metric | What it tests | Baseline |
|--------|--------------|---------|
| AF detection AUROC (linear probe, frozen) | Representation quality | ECG-JEPA: 0.945 (Weimann 2410.13867) |
| HR regression RΒ² (linear probe, frozen) | Cardiovascular signal content | RR-interval baseline |
| ECG-PPG retrieval R@1 | Cross-modal alignment | AnyPPG: 0.736 |

**E4 β€” World model evidence (rollout coherence)**

| Check | Pass criterion |
|-------|---------------|
| Spearman(optimal Ξ”t, measured PTT) | > 0.30 |
| HR-PTT inverse ordering | Significant, p < 0.05 |
| U-shaped prediction error curve | β‰₯60% of patients |

**E5 β€” Downstream validation**

| Task | Metric | Framing |
|------|--------|---------|
| PTT regression (linear probe) | MAE (ms) vs naive | Validation only β€” not the contribution |
| AF sample efficiency | AUROC at 1/5/10/100% labels | JEPA sample efficiency advantage |

### Evaluation philosophy

Table 1 of the paper (from E6): a 4-row Γ— 4-column table showing Baseline A (ECG-JEPA), Baseline B (Ξ”t=0), Baseline C (InfoNCE), and PhysioJEPA across AF AUROC, HR RΒ², PTT correlation, and retrieval R@1. If rows 3 and 4 are clearly separated, the paper exists.

The PTT probe and rollout coherence are supporting figures. They interpret why the representation quality is better. They do not constitute the primary claim.

---

## 8. Critic β€” Strongest Arguments Against

### Critic 1: PTT can be computed with peak detection in 10 lines of code

**Correct.** That is exactly why PTT is a *validation signal*, not the contribution. We are not claiming novelty in PTT computation. We are claiming that a model trained on the Ξ”t prediction objective implicitly encodes PTT in its latent space β€” which is evidence that the latent captures vascular dynamics rather than just cardiac rhythm. If the same latent did *not* encode PTT, we would doubt that it learned anything physiologically meaningful.

### Critic 2: Small dataset vs AnyPPG's 100k+ hours

**Conceded.** We are not competing at scale. The comparison is controlled: PhysioJEPA vs Baseline C (InfoNCE) trained on the same N hours. The architectural claim is about inductive bias on fixed data, not about scale. We report this comparison explicitly.

### Critic 3: "Physiological asymmetry" is just an architectural choice, not a principled claim

**Partially conceded.** The architecture encodes a *hypothesis* about the direction of information flow (ECG→PPG). If the ablation (Baseline B, symmetric at Δt>0) performs identically to PhysioJEPA, the asymmetry contributed nothing and we remove it from the contribution list. The ablation is the test.

### Critic 4: The Ξ”t sampling mixing ratio (60/40) is a hyperparameter

**Correct.** Ablation A5 (curriculum Ξ”t) tests whether this specific ratio matters. For v1 we use 60/40 pragmatically; if A5 shows a different schedule is better, we adopt it. This is not a fundamental weakness β€” it is a hyperparameter like any other.

### Critic 5: Shortcut β€” the model predicts mean PPG for all inputs

**Real risk.** Explicitly monitored via per-patient prediction variance (Pitfall 6). If detected, addressed before any results are reported.

---

## 9. Reviewer Critiques (updated post-feedback)

The reviewer critique document (provided separately) raised five structural issues. Status of each:

| Issue | Status | Resolution |
|-------|--------|-----------|
| 3 contributions in 1 paper | Fixed | Core claim reduced to one sentence; PTT and morphology are evidence/ablations |
| PTT head framing backwards | Fixed | PTT is validation signal; cross-modal Ξ”t prediction is the claim |
| Morphological encoding = #1 technical risk | Fixed | Moved to ablation A1; not in v1 |
| "Causal" overclaimed | Fixed | Renamed to PhysioJEPA; language changed to "directional asymmetry" / "physiologically informed" |
| Core idea not isolated | Fixed | E3 vs Baseline B (Ξ”t=0) is the controlled isolation; both are identical except Ξ”t |
| Baselines needed from Week 1 | Fixed | E2 baseline suite runs days 4–5, before E3 |
| "World model" evaluation missing | Fixed | E4 rollout coherence is explicit and uses physiological consistency checks |

---

## 10. Open Questions

**Q1: How well is the MIMIC-IV aligned PPG-ECG dataset actually aligned?**
Unknown until E0. The most important unanswered question. Answer by Day 2.

**Q2: Does the asymmetric architecture (ECG predicts PPG, not PPG predicts ECG) outperform the symmetric version?**
This is ablation A1's question at the architecture level. Baseline B isolates Ξ”t but not directionality β€” if we add a symmetric Ξ”t>0 variant (PPG predicts ECG with the same lag), we can test this separately. Lower priority; add if time permits.

**Q3: Does the cross-modal training improve the ECG encoder relative to ECG-only training?**
K3 tests this: E3 AUROC should match Baseline A (ECG-JEPA alone). If it's worse, the cross-modal objective is hurting the ECG representation. This would be a significant negative result worth reporting.

**Q4: How does the model behave during AF?**
AF removes the periodic P-wave and makes RR intervals irregular. The Ξ”t sampling may fail to find a meaningful optimal during AF episodes. This is actually interesting β€” the model's inability to predict a stable optimal Ξ”t during AF could itself be a detection signal. Monitor in E4.

**Q5: Is MIMIC-BP the right held-out dataset for PTT validation?**
MIMIC-BP (Kachuee et al.) is derived from MIMIC-III; the training data is MIMIC-IV-derived. Same institution (BIDMC), no patient overlap, but similar population. This is a reasonable evaluation setup but should be documented carefully to pre-empt reviewer concerns about distribution leakage.

---

## 11. Paper Identity and Venues

**Title**: *PhysioJEPA: Learning Cardiovascular Dynamics via Time-Shifted Cross-Modal Prediction*

**One-paragraph abstract (draft)**:
Contrastive self-supervised methods for ECG-PPG representation learning align same-time signals in a shared embedding space, discarding the physiological lag between cardiac electrical activation and peripheral perfusion. This lag β€” the pulse transit time (PTT) β€” encodes arterial stiffness and correlates with blood pressure. We introduce PhysioJEPA, a JEPA-based world model that instead trains an ECG encoder to predict PPG latents at a variable time offset Ξ”t, preserving and exploiting the directional temporal structure that contrastive methods destroy. We show that Ξ”t-aware prediction produces cardiovascular representations that (1) outperform same-time contrastive alignment on AF detection sample efficiency, (2) implicitly encode PTT without label supervision β€” demonstrated via rollout coherence tests and linear probing β€” and (3) transfer more efficiently from limited labelled data than InfoNCE-trained baselines. Code and models are released under an open licence.

**Venue targets (updated with real 2026 deadlines)**:

| Venue | Deadline | Type | Fit |
|-------|----------|------|-----|
| NeurIPS 2026 workshops (TS4H, BrainBodyFM) | ~August 2026 | Workshop (non-archival) | Strong β€” 4-page format, time series + health |
| ML4H 2026 | ~September 2026 (estimated from 2025 pattern) | Symposium (archival proceedings track) | Strong β€” healthcare ML focus, 8 pages |
| ICLR 2027 | ~October 2026 | Conference (archival) | Stretch β€” needs clean ablations and strong Table 1 |
| NeurIPS 2026 main | May 6, 2026 | Conference (archival) | Too soon β€” experiment matrix runs through mid-May |

**Realistic path**: NeurIPS 2026 workshop (TS4H) as the first landing point (~August deadline, results from experiment matrix available by then); ML4H 2026 as the archival target; ICLR 2027 as stretch if the rollout coherence result is strong.

---

*Document revision 2 β€” April 2026*
*All "CausalCardio-JEPA" references replaced. Reviewer feedback incorporated.*
*Active documents: this file + physiojep_experiment_matrix.md*