Refresh model card with accurate shipped-checkpoint metrics and expanded usage
Browse files
README.md
CHANGED
|
@@ -23,6 +23,13 @@ Intended primarily as the **conditioning-energy OOD-detection backend** for
|
|
| 23 |
robotic-policy gating, exposed through the
|
| 24 |
[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
---
|
| 27 |
|
| 28 |
## Checkpoint summary
|
|
@@ -34,21 +41,46 @@ robotic-policy gating, exposed through the
|
|
| 34 |
| Action space | ℝ³ (3-DoF grasp offset) |
|
| 35 |
| Time sampling | Beta(1.5, 1) (π₀ schedule) |
|
| 36 |
| Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
|
| 37 |
-
| Training steps |
|
| 38 |
-
| Best val_loss | **0.
|
|
|
|
| 39 |
| Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
|
| 40 |
| License | MIT |
|
| 41 |
|
| 42 |
-
### OOD-separation at this checkpoint (step
|
| 43 |
|
| 44 |
| Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
|
| 45 |
|---|---|---|---|---|---|
|
| 46 |
-
| CE | 0.
|
| 47 |
-
| DCE | 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
|
|
|
| 52 |
|
| 53 |
---
|
| 54 |
|
|
@@ -110,27 +142,6 @@ the full ODE map. Both scale as out-of-distribution inputs excite the
|
|
| 110 |
learned velocity field's sensitivity to conditioning — a signal that
|
| 111 |
falls out of the geometry of the flow without any auxiliary classifier.
|
| 112 |
|
| 113 |
-
See the companion repo for the full derivation, training recipe, and the
|
| 114 |
-
ongoing empirical study of how CE/DCE separation evolves past the
|
| 115 |
-
conventional convergence point (multi-descent dynamics observed under
|
| 116 |
-
extended training).
|
| 117 |
-
|
| 118 |
-
---
|
| 119 |
-
|
| 120 |
-
## Training recipe (reproduce)
|
| 121 |
-
|
| 122 |
-
```Shell
|
| 123 |
-
git clone https://github.com/Finding-Familiarity/Familiarity-Flow.git
|
| 124 |
-
cd Familiarity-Flow
|
| 125 |
-
conda env create -f environment.yml
|
| 126 |
-
conda activate familiarity-flow
|
| 127 |
-
uv pip install -e .
|
| 128 |
-
train dataset=onebox # 25k steps, ≈ 2 h on one H200
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
Hardware used for this checkpoint: single NVIDIA H200, 16-mixed precision,
|
| 132 |
-
batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
|
| 133 |
-
|
| 134 |
---
|
| 135 |
|
| 136 |
## Limitations
|
|
@@ -159,6 +170,8 @@ batch size 16. Deterministic with `familiarity_flow.utils.seed.fixed_seed`.
|
|
| 159 |
([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
|
| 160 |
- Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
|
| 161 |
NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
|
|
|
|
|
|
|
| 162 |
|
| 163 |
---
|
| 164 |
|
|
|
|
| 23 |
robotic-policy gating, exposed through the
|
| 24 |
[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
|
| 25 |
|
| 26 |
+
**This checkpoint comes from a 150,000-step extended-training study**
|
| 27 |
+
that explored flow / OOD-separation dynamics well past the conventional
|
| 28 |
+
convergence point. See
|
| 29 |
+
[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
|
| 30 |
+
in the repo for the full write-up (multi-descent behaviour observed, not
|
| 31 |
+
the monotone-plateau or terminal-collapse initially hypothesised).
|
| 32 |
+
|
| 33 |
---
|
| 34 |
|
| 35 |
## Checkpoint summary
|
|
|
|
| 41 |
| Action space | ℝ³ (3-DoF grasp offset) |
|
| 42 |
| Time sampling | Beta(1.5, 1) (π₀ schedule) |
|
| 43 |
| Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
|
| 44 |
+
| Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
|
| 45 |
+
| Best val_loss | **0.0639** |
|
| 46 |
+
| Best val L2 error | **0.1462** |
|
| 47 |
| Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
|
| 48 |
| License | MIT |
|
| 49 |
|
| 50 |
+
### OOD-separation at this checkpoint (step 128,250)
|
| 51 |
|
| 52 |
| Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
|
| 53 |
|---|---|---|---|---|---|
|
| 54 |
+
| CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
|
| 55 |
+
| DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |
|
| 56 |
+
|
| 57 |
+
AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
|
| 58 |
+
separation is perfect and has been since step ≈ 8k).
|
| 59 |
+
|
| 60 |
+
Reported directly from the training log at
|
| 61 |
+
`outputs/csv/onebox/version_15` in the repo.
|
| 62 |
+
|
| 63 |
+
### vs the previous checkpoint (step 21,850, val_loss 0.0726)
|
| 64 |
+
|
| 65 |
+
Strictly better or tied on every metric we measured:
|
| 66 |
+
|
| 67 |
+
| | Previous | This checkpoint | Δ |
|
| 68 |
+
|---|---|---|---|
|
| 69 |
+
| val/loss | 0.0726 | **0.0639** | −12.0% |
|
| 70 |
+
| val/l2_error | 0.1755 | **0.1462** | −16.7% |
|
| 71 |
+
| ood/loss | 4.414 | 4.241 | −3.9% |
|
| 72 |
+
| ood/l2_error | 1.371 | 1.271 | −7.3% |
|
| 73 |
+
| CE WILD/ID | 2.79× | **3.23×** | +15.8% |
|
| 74 |
+
| DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
|
| 75 |
+
| DCE WILD/ID | 2.41× | **2.99×** | +24.1% |
|
| 76 |
+
|
| 77 |
+
(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
|
| 78 |
+
during the extended run.)
|
| 79 |
|
| 80 |
+
> **Threshold-shift note**: absolute CE/DCE values in this checkpoint
|
| 81 |
+
> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
|
| 82 |
+
> downstream OOD detector using an absolute threshold needs to be
|
| 83 |
+
> re-calibrated — ratios are preserved but the raw scale is not.
|
| 84 |
|
| 85 |
---
|
| 86 |
|
|
|
|
| 142 |
learned velocity field's sensitivity to conditioning — a signal that
|
| 143 |
falls out of the geometry of the flow without any auxiliary classifier.
|
| 144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
---
|
| 146 |
|
| 147 |
## Limitations
|
|
|
|
| 170 |
([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
|
| 171 |
- Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
|
| 172 |
NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
|
| 173 |
+
- Nakkiran et al., *Deep Double Descent*, ICLR 2020
|
| 174 |
+
([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))
|
| 175 |
|
| 176 |
---
|
| 177 |
|