File size: 2,378 Bytes
31e2456
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# E1 — PPG encoding decision
*PhysioJEPA — Oz Labs — 2026-04-14*

Script: `scripts/e1_ppg_encoding.py`
Raw JSON: `docs/e1_stage1_report.json`

---

## Decision

**v1 uses raw 200 ms PPG patches (25 samples @ 125 Hz) → linear projection → d=256 tokens.**

Morphological encoding is *viable* on this data but is held as ablation **A1** per the research plan (`RESEARCH_DEVELOPMENT.md` §2 v1 spec, §Change-log bullet 3). The Stage-2 linear-probe comparison that would justify switching to morphology cannot run until AF labels are in place; it runs as part of A1 after E5a.

## Numbers (Stage 1, neurokit2 v5 on 500 random segments)

| Metric | Value |
|---|---|
| Segments attempted | 500 |
| Segments non-empty | 500 |
| Segments where morphology extraction was valid (detected/expected in [0.70, 1.30]) | **493 (98.6%)** |
| Median beats detected per ~60-s segment | 76 |
| Mean beats detected per ~60-s segment | 76.6 |

Extraction rate 0.986 ≫ 0.70 threshold → Stage 1 pass → rule routes to Stage 2 comparison.

## Why we still pick raw patches for v1

1. **Spec alignment.** `RESEARCH_DEVELOPMENT.md` §2 v1 locks raw patches. Morphology is explicitly called out as ablation A1. Changing v1 silently would contradict the revision-2 change log.
2. **Stage 2 is blocked on AF labels.** The deciding comparison (`morph_AUROC > raw + 0.02`) requires the frozen-encoder AF probe that depends on AF labels. That decision arrives post-E5a.
3. **Minimise moving parts in v1.** The core claim is about Δt — not about PPG feature engineering. Raw patches remove a failure surface from the Day-6–8 E3 run.
4. **Stage-2 still runs.** Ablation A1 is the formal Stage-2 comparison; it executes after E3 passes K2 and we have AF labels. If A1 wins by ≥0.02 AUROC we adopt morphology for the camera-ready run.

## Implementation

- `src/physiojepa/ppg_encoder.py``PPGPatchTokeniser(patch_size=25, d_model=256)`.
- `src/physiojepa/ecg_encoder.py``ECGPatchTokeniser(patch_size=50, d_model=256)` for single-lead II @ 250 Hz (paired change; see `docs/e0_data_card.md` architectural implications).

## Follow-ups

- A1 (morphology probe) is scheduled for Weeks 3–4 after E3 passes K2.
- The 1.4% of segments where neurokit2 fails extraction will be filtered out of A1 but kept for raw-patch training (no PPG feature engineering means these are still usable).