TomNotch commited on
Commit
c1a6da2
·
verified ·
1 Parent(s): b9a21d6

Refresh model card with accurate shipped-checkpoint metrics and expanded usage

Browse files
Files changed (1) hide show
  1. README.md +42 -29
README.md CHANGED
@@ -23,6 +23,13 @@ Intended primarily as the **conditioning-energy OOD-detection backend** for
23
  robotic-policy gating, exposed through the
24
  [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
25
 
 
 
 
 
 
 
 
26
  ---
27
 
28
  ## Checkpoint summary
@@ -34,21 +41,46 @@ robotic-policy gating, exposed through the
34
  | Action space | ℝ³ (3-DoF grasp offset) |
35
  | Time sampling | Beta(1.5, 1) (π₀ schedule) |
36
  | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
37
- | Training steps | 25,000 |
38
- | Best val_loss | **0.0726** |
 
39
  | Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
40
  | License | MIT |
41
 
42
- ### OOD-separation at this checkpoint (step 21,850)
43
 
44
  | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
45
  |---|---|---|---|---|---|
46
- | CE | 0.225 | 1.197 | 0.629 | **5.31×** | 2.79× |
47
- | DCE | 0.028 | 0.121 | 0.067 | **4.32×** | 2.41× |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- Reported directly from the training log (`outputs/csv/onebox/version_14`).
50
- All ID vs OOD AUROCs 1.0 at this checkpoint (rank-based separation is
51
- saturated; magnitude ratios are what vary with training).
 
52
 
53
  ---
54
 
@@ -110,27 +142,6 @@ the full ODE map. Both scale as out-of-distribution inputs excite the
110
  learned velocity field's sensitivity to conditioning — a signal that
111
  falls out of the geometry of the flow without any auxiliary classifier.
112
 
113
- See the companion repo for the full derivation, training recipe, and the
114
- ongoing empirical study of how CE/DCE separation evolves past the
115
- conventional convergence point (multi-descent dynamics observed under
116
- extended training).
117
-
118
- ---
119
-
120
- ## Training recipe (reproduce)
121
-
122
- ```Shell
123
- git clone https://github.com/Finding-Familiarity/Familiarity-Flow.git
124
- cd Familiarity-Flow
125
- conda env create -f environment.yml
126
- conda activate familiarity-flow
127
- uv pip install -e .
128
- train dataset=onebox # 25k steps, ≈ 2 h on one H200
129
- ```
130
-
131
- Hardware used for this checkpoint: single NVIDIA H200, 16-mixed precision,
132
- batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
133
-
134
  ---
135
 
136
  ## Limitations
@@ -159,6 +170,8 @@ batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
159
  ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
160
  - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
161
  NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
 
 
162
 
163
  ---
164
 
 
23
  robotic-policy gating, exposed through the
24
  [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
25
 
26
+ **This checkpoint comes from a 150,000-step extended-training study**
27
+ that explored flow / OOD-separation dynamics well past the conventional
28
+ convergence point. See
29
+ [`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
30
+ in the repo for the full write-up (multi-descent behaviour observed, not
31
+ the monotone-plateau or terminal-collapse initially hypothesised).
32
+
33
  ---
34
 
35
  ## Checkpoint summary
 
41
  | Action space | ℝ³ (3-DoF grasp offset) |
42
  | Time sampling | Beta(1.5, 1) (π₀ schedule) |
43
  | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
44
+ | Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
45
+ | Best val_loss | **0.0639** |
46
+ | Best val L2 error | **0.1462** |
47
  | Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
48
  | License | MIT |
49
 
50
+ ### OOD-separation at this checkpoint (step 128,250)
51
 
52
  | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
53
  |---|---|---|---|---|---|
54
+ | CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
55
+ | DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |
56
+
57
+ AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
58
+ separation is perfect and has been since step ≈ 8k).
59
+
60
+ Reported directly from the training log at
61
+ `outputs/csv/onebox/version_15` in the repo.
62
+
63
+ ### vs the previous checkpoint (step 21,850, val_loss 0.0726)
64
+
65
+ Strictly better or tied on every metric we measured:
66
+
67
+ | | Previous | This checkpoint | Δ |
68
+ |---|---|---|---|
69
+ | val/loss | 0.0726 | **0.0639** | −12.0% |
70
+ | val/l2_error | 0.1755 | **0.1462** | −16.7% |
71
+ | ood/loss | 4.414 | 4.241 | −3.9% |
72
+ | ood/l2_error | 1.371 | 1.271 | −7.3% |
73
+ | CE WILD/ID | 2.79× | **3.23×** | +15.8% |
74
+ | DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
75
+ | DCE WILD/ID | 2.41× | **2.99×** | +24.1% |
76
+
77
+ (CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
78
+ during the extended run.)
79
 
80
+ > **Threshold-shift note**: absolute CE/DCE values in this checkpoint
81
+ > are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
82
+ > downstream OOD detector using an absolute threshold needs to be
83
+ > re-calibrated — ratios are preserved but the raw scale is not.
84
 
85
  ---
86
 
 
142
  learned velocity field's sensitivity to conditioning — a signal that
143
  falls out of the geometry of the flow without any auxiliary classifier.
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ---
146
 
147
  ## Limitations
 
170
  ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
171
  - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
172
  NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
173
+ - Nakkiran et al., *Deep Double Descent*, ICLR 2020
174
+ ([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))
175
 
176
  ---
177