--- license: mit tags: - ecg - ppg - jepa - self-supervised - cardiac - representation-learning datasets: - lucky9-cyou/mimic-iv-aligned-ppg-ecg - physionet/ptb-xl --- # PhysioJEPA **Self-supervised ECG-PPG representation learning via Joint Embedding Predictive Architecture.** ## Key finding: mask ratio is the hidden lever We discovered that unimodal ECG-JEPA (Weimann & Conrad, 2024) has a **predictor shortcut vulnerability** at the standard 50% mask ratio: the predictor learns local-interpolation shortcuts that degrade downstream performance as training progresses. **Raising mask ratio from 50% to 75% eliminates the shortcut** and recovers full downstream performance, matching cross-modal JEPA: | Model | Mask | AF AUROC (ep25) | |-------|------|-----------------| | Unimodal ECG-JEPA | 0.50 | 0.703 | | **Unimodal ECG-JEPA** | **0.75** | **0.848** | | Cross-modal ECG-PPG JEPA | -- | 0.847 | | PhysioJEPA (cross-modal + Δt) | -- | 0.835 | The mechanism: at 50% masking with contiguous blocks, the predictor has 25 visible context patches and 25 target patches. It discovers a short-range interpolation shortcut early in training (L_self dips at step ~1500). As the encoder refines and patches become less linearly interpolatable, the shortcut fails (L_self spikes at step ~4675). The encoder locks into a self-consistent but downstream-uninformative optimum. At 75% masking (12 visible, 37 target), no interpolation path exists. The predictor learns long-range structure from the start. Cross-modal prediction works by the same mechanism: 0% of PPG is visible as context, so no interpolation shortcut can form. ## Confirmed by 5 ablation arms 1. **Slow tau** (ema_end=0.999, warmup=60%): spike persists -> tau is NOT the cause 2. **Smaller predictor** (depth 4->1): spike persists -> capacity is NOT the cause 3. **Sinusoidal queries** (no learned embeddings): spike WORSENS 4. **Mask ratio 0.75**: spike ELIMINATED, AUROC recovers to 0.848 5. **Full data** (10x): spike delayed but present -> architectural, not data-scale ## Architecture - ECG encoder: ViT-S (12 layers, d=256, 8 heads) on single-lead II @ 250 Hz - PPG encoder: ViT-T (6 layers, d=256) on Pleth @ 125 Hz - Predictor: 4-layer cross-attention transformer - EMA target encoder (tau 0.996 -> 0.9999 cosine over 30% of training) - Loss: L1 latent prediction (cross-modal) + 0.3 * L1 ECG self-prediction ## Dataset Training: [lucky9-cyou/mimic-iv-aligned-ppg-ecg](https://huggingface.co/datasets/lucky9-cyou/mimic-iv-aligned-ppg-ecg) (MIMIC-IV ICU waveforms, ~814 hours, ~381 patients, sample-accurate ECG-PPG alignment) Evaluation: PTB-XL (PhysioNet, 21.8k 12-lead ECGs, lead II resampled to 250 Hz) ## Usage ```bash # Install git clone https://huggingface.co/guychuk/PhysioJEPA cd PhysioJEPA uv sync # Smoke test (CPU, random data) PYTHONPATH=src uv run python scripts/smoke_test.py # Train (requires GPU + MIMIC data) PYTHONPATH=src uv run python scripts/train.py --config configs/base.yaml --model A --mask_ratio 0.75 ``` ## Repository structure ``` src/physiojepa/ models.py # 4 model variants (A=unimodal, B=cross-modal, C=InfoNCE, F=PhysioJEPA) vit.py # ViT-1D encoder + cross-attention predictor data.py # MIMIC dataset with sliding windows data_fast.py # mmap-backed fast dataset for full-scale runs trainer.py # shared training loop with WandB + collapse monitoring ema.py # EMA with cosine tau schedule masking.py # I-JEPA multi-block 1D masking probe.py # linear probe evaluators configs/ base.yaml # shared hyperparameters docs/ RESEARCH_LOG.md # complete research narrative e2_e3_results.md # K-gate results + ablation findings EXPERIMENT_TRACKING.md # experiment matrix + post-hoc results RESEARCH_DEVELOPMENT.md # full research development document ``` ## Citation ``` @misc{physiojepa2026, title={PhysioJEPA: Mask Ratio as the Hidden Lever in Cardiac JEPA}, author={Oz Labs}, year={2026}, url={https://huggingface.co/guychuk/PhysioJEPA} } ``` ## License MIT