TomNotch
/

familiarity-flow-onebox-8L

@@ -23,6 +23,13 @@ Intended primarily as the **conditioning-energy OOD-detection backend** for
 robotic-policy gating, exposed through the
 [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
 ---
 ## Checkpoint summary
@@ -34,21 +41,46 @@ robotic-policy gating, exposed through the
 | Action space | ℝ³ (3-DoF grasp offset) |
 | Time sampling | Beta(1.5, 1) (π₀ schedule) |
 | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
-| Training steps | 25,000 |
-| Best val_loss | **0.0726** |
 | Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
 | License | MIT |
-### OOD-separation at this checkpoint (step 21,850)
 | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
 |---|---|---|---|---|---|
-| CE  | 0.225 | 1.197 | 0.629 | **5.31×** | 2.79× |
-| DCE | 0.028 | 0.121 | 0.067 | **4.32×** | 2.41× |
-Reported directly from the training log (`outputs/csv/onebox/version_14`).
-All ID vs OOD AUROCs ≈ 1.0 at this checkpoint (rank-based separation is
-saturated; magnitude ratios are what vary with training).
 ---
@@ -110,27 +142,6 @@ the full ODE map. Both scale as out-of-distribution inputs excite the
 learned velocity field's sensitivity to conditioning — a signal that
 falls out of the geometry of the flow without any auxiliary classifier.
-See the companion repo for the full derivation, training recipe, and the
-ongoing empirical study of how CE/DCE separation evolves past the
-conventional convergence point (multi-descent dynamics observed under
-extended training).
----
-## Training recipe (reproduce)
-```Shell
-git clone https://github.com/Finding-Familiarity/Familiarity-Flow.git
-cd Familiarity-Flow
-conda env create -f environment.yml
-conda activate familiarity-flow
-uv pip install -e .
-train dataset=onebox   # 25k steps, ≈ 2 h on one H200
-```
-Hardware used for this checkpoint: single NVIDIA H200, 16-mixed precision,
-batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
 ---
 ## Limitations
@@ -159,6 +170,8 @@ batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
   ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
 - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
   NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
 ---

 robotic-policy gating, exposed through the
 [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
+**This checkpoint comes from a 150,000-step extended-training study**
+that explored flow / OOD-separation dynamics well past the conventional
+convergence point. See
+[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
+in the repo for the full write-up (multi-descent behaviour observed, not
+the monotone-plateau or terminal-collapse initially hypothesised).
 ---
 ## Checkpoint summary
 | Action space | ℝ³ (3-DoF grasp offset) |
 | Time sampling | Beta(1.5, 1) (π₀ schedule) |
 | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
+| Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
+| Best val_loss | **0.0639** |
+| Best val L2 error | **0.1462** |
 | Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
 | License | MIT |
+### OOD-separation at this checkpoint (step 128,250)
 | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
 |---|---|---|---|---|---|
+| CE  | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
+| DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |
+AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
+separation is perfect and has been since step ≈ 8k).
+Reported directly from the training log at
+`outputs/csv/onebox/version_15` in the repo.
+### vs the previous checkpoint (step 21,850, val_loss 0.0726)
+Strictly better or tied on every metric we measured:
+| | Previous | This checkpoint | Δ |
+|---|---|---|---|
+| val/loss | 0.0726 | **0.0639** | −12.0% |
+| val/l2_error | 0.1755 | **0.1462** | −16.7% |
+| ood/loss | 4.414 | 4.241 | −3.9% |
+| ood/l2_error | 1.371 | 1.271 | −7.3% |
+| CE WILD/ID | 2.79× | **3.23×** | +15.8% |
+| DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
+| DCE WILD/ID | 2.41× | **2.99×** | +24.1% |
+(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
+during the extended run.)
+> **Threshold-shift note**: absolute CE/DCE values in this checkpoint
+> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
+> downstream OOD detector using an absolute threshold needs to be
+> re-calibrated — ratios are preserved but the raw scale is not.
 ---
 learned velocity field's sensitivity to conditioning — a signal that
 falls out of the geometry of the flow without any auxiliary classifier.
 ---
 ## Limitations
   ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
 - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
   NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
+- Nakkiran et al., *Deep Double Descent*, ICLR 2020
+  ([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))
 ---