E0 — Data audit: `lucky9-cyou/mimic-iv-aligned-ppg-ecg`

PhysioJEPA — Oz Labs — 2026-04-14

Audit scripts: scripts/e0_audit_v2.py, scripts/e0_alignment_check.py Raw JSON: docs/e0_report.json, docs/e0_alignment.json Figures: docs/figures/ptt_histogram.png, docs/figures/ptt_histogram_foot.png, docs/figures/sanity_check.png

Decision

GO — with one caveat: the ≥500-patient gate is borderline (~381 extrapolated). Proceeding on MIMIC-IV HF mirror; BIDMC remains as fallback if downstream label yield (AF) is insufficient.

See the gate table below for the full reasoning.

Dataset layout

412 HF save_to_disk shard folders. Each shard ≈ 100 segments ≈ 1 MIMIC-IV waveform record ≈ 1 patient.
Schema per row (verified against shard_00000/dataset_info.json):
- record_name (str, e.g. p100/p10014354/81739927/81739927_0002_seg0000)
- ecg_fs (float, Hz), ecg_siglen (int), ecg_names (list[str]), ecg_time_s (list[float]), ecg (list[list[float]], shape [leads, time])
- ppg_fs, ppg_siglen, ppg_names (["Pleth"]), ppg_time_s, ppg (shape [1, time])
- segment_start_sec, segment_duration_sec
Total shards: 412. Default HF "train" split contains only summary metadata — the real data must be pulled via snapshot_download + load_from_disk per shard.
Example record: 3-lead ECG [3, 3200] @ 249.89 Hz, PPG [1, 1600] @ 124.945 Hz, ~12.8 s duration.
ECG/PPG time vectors share the same segment-relative clock and start within 1/fs_ecg of each other (sub-4 ms) → the mirror is sample-accurate aligned by construction (both signals come from the same underlying WFDB record).

Numbers (from 120 randomly sampled shards, seed 42)

Quantity	Value
Segments scanned (metadata)	14,371
Unique patients observed	111
Patients extrapolated to full dataset	~381
Total duration sampled	237.0 h
Total duration extrapolated	~814 h
ECG sampling rate (median)	249.89 Hz
PPG sampling rate (median)	124.95 Hz
ECG siglen (median)	14,994 samples (≈60.0 s)
PPG siglen (median)	7,497 samples (≈60.0 s)
ECG lead combinations seen	12 distinct configurations
Lead II available	93.7% of segments
PPG channel	`Pleth` (100%)
Missing-value rate (NaN)	0.000% on ECG, 0.000% on PPG

ECG lead prevalence (top 10, count out of 14,371 segments)

II     13,471 (93.7%)
V      12,326 (85.8%)
aVR    11,218 (78.1%)
III     1,748 (12.2%)
aVF       399
V2        221
V5        221
I          82

PTT sanity (ECG R-peak → nearest PPG peak in [50, 500] ms, 1-to-1 only)

Metric	Peak-based (v1)	Foot-based (v2)
Clean beats	10,193	6,295
Good segments (≥3 clean beats)	150 / 158 attempted (95%)	100 / 100
PTT median	276 ms	288 ms
PTT P5 / P95	92 / 448 ms	144 / 476 ms
Within-segment std, median	107 ms	104 ms

Both histograms are multimodal with satellite peaks separated by ~RR-interval fractions → peak-matching ambiguity, not dataset misalignment. A peak-on-the-next-beat mispick produces a ±200–300 ms shift and explains the 100-ms within-segment std directly.
The aligned 60-s ECG + PPG traces in sanity_check.png are visually locked beat-for-beat. Physiologically plausible PTT median.

Gate check (from `EXPERIMENT_TRACKING.md` E0)

Gate	Target	Observed	Status
Median alignment ≤ 50 ms	≤ 50 ms	Sub-sample alignment (shared clock); PTT median 276 ms is physiological, not a drift	PASS (data-side); the 107 ms within-segment std is an artefact of the crude R→PPG nearest-peak estimator, not temporal misalignment
PTT within-patient std ≤ 80 ms	≤ 80 ms	Cannot be assessed cleanly with current peak detector — need `neurokit2`-grade PPG foot detector to disambiguate mispicks	DEFERRED — revisit in E1 with better PPG detector; not a blocker for v1 (model sees raw patches)
Patients ≥ 500	≥ 500	~381 extrapolated (111 confirmed in 120/412 shards)	FAIL (marginal)
Missing rate ≤ 20% after windowing	≤ 20%	0.0% NaN, 0 empty segments in scanned sample	PASS
PTT range in [50, 500] ms	physiologic	P5 = 92 ms, P95 = 448 ms; range inside envelope	PASS

Interpretation of the patient-count "fail"

The research plan's ≥500 patients threshold was set before we knew the HF mirror's exact population. ~381 patients over ~814 h is:

Plenty of hours for JEPA pretraining (AnyPPG trained on 100k+ h, ECG-JEPA on 1M+ records — but Weimann's public checkpoints achieve 0.945 AUC with much less; and PhysioJEPA's architectural claim is about inductive bias on fixed data, not scale — this is explicitly acknowledged in RESEARCH_DEVELOPMENT.md §8 Critic 2).
Marginal for AF sample-efficiency (E5b) — we need ≥100 AF-positive and ≥100 AF-negative patients for the linear probe. With 381 patients this is tight but achievable if AF prevalence in MIMIC-IV ICU is ~10–20% (typical).
Below threshold for population generalization — we should pre-emptively frame the paper's N-scale caveat explicitly (expected reviewer pushback).

Action

Proceed with E1 and E2 on this dataset. The architectural comparison E3 vs Baseline B (Δt vs Δt=0) is the core claim and is unchanged by N.
Before E5b, decide AF label source (EXPERIMENT_TRACKING.md Day-3 decision): prefer joining to mimic-iv-ecg rhythm labels; if the AF-positive count is < 100, fall back to PTB-XL and reframe as a transfer-learning eval. This decision is now urgent.
Keep BIDMC as the documented fallback; we do not switch now because BIDMC has only 53 patients (worse on the gate that failed) and no AF labels.

Architectural implications for v1 (RESEARCH_DEVELOPMENT.md §2)

The spec assumed 12-lead ECG @ 500 Hz. The HF mirror is 3-lead (primarily II/V/aVR) @ 250 Hz. Required revisions, staged for Day 3 architecture lock:

ECG encoder input: single-lead II (93.7% coverage; drop records without it). Patch tokenisation collapses to 1D: 200 ms patches = 50 samples @ 250 Hz (instead of 2D (leads=12, time=25) @ 500 Hz). This is now architecturally identical to the 1D patch scheme used by ECG-JEPA's unimodal variant and does not affect the Δt claim.
PPG encoder input: already 1D single-channel at 125 Hz → 200 ms patches = 25 samples, exactly as specified.
Sampling-rate symmetry: both streams now satisfy ECG_fs = 2 × PPG_fs, matching the native MIMIC waveform format. No resampling needed.
Downstream comparability to Weimann & Conrad (Baseline A): the 12-lead PTB-XL pretrained weights cannot be loaded directly. Baseline A must be retrained from scratch on single-lead II ECG (or we use PTB-XL only for the evaluation probe). Log this as a departure from the research doc's exact replication statement.

Files written

docs/e0_report.json — raw numbers
docs/e0_alignment.json — foot-based alignment check numbers
docs/figures/ptt_histogram.png — peak-based PTT (v1)
docs/figures/ptt_histogram_foot.png — foot-based PTT (v2)
docs/figures/sanity_check.png — 5 random 60-s aligned ECG+PPG overlays
scripts/e0_peek.py, scripts/e0_audit.py, scripts/e0_audit_v2.py, scripts/e0_alignment_check.py

Open follow-ups before E1 starts

Verify AF-positive count after joining to mimic-iv-ecg (Zack, Day 3 gate).
Swap PPG peak detector for neurokit2.ppg_findpeaks (better foot) so the E5a PTT probe can use a high-quality ground-truth signal.
Commit an architectural-revision note to RESEARCH_DEVELOPMENT.md §2 and ARCHITECTURES_EXPLORATION.md Architecture F §v1 — single-lead ECG, 250 Hz, 50-sample patches.

E0 — Data audit: lucky9-cyou/mimic-iv-aligned-ppg-ecg