File size: 6,209 Bytes
c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd c1a6da2 2deb8ce c819abd 2deb8ce c1a6da2 2deb8ce c1a6da2 2deb8ce c1a6da2 2deb8ce c1a6da2 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c819abd 2deb8ce c1a6da2 2deb8ce c819abd 2deb8ce c819abd 2deb8ce | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | ---
license: mit
tags:
- robotics
- flow-matching
- ood-detection
- visual-servoing
- conditioning-energy
- uncertainty-quantification
pipeline_tag: robotics
library_name: pytorch
---
# Familiarity-Flow OneBox 8-Layer
Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction,
trained on the **OneBox** synthetic Isaac-Sim dataset. The full learning
dynamics — value of the prediction, geometry of the flow, and
Jacobian-of-conditioning OOD signal — are studied in the
[Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow).
Intended primarily as the **conditioning-energy OOD-detection backend** for
robotic-policy gating, exposed through the
[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
**This checkpoint comes from a 150,000-step extended-training study**
that explored flow / OOD-separation dynamics well past the conventional
convergence point. See
[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
in the repo for the full write-up (multi-descent behaviour observed, not
the monotone-plateau or terminal-collapse initially hypothesised).
---
## Checkpoint summary
| Field | Value |
|---|---|
| Architecture | `FlowMatchingPolicy`, 8 cross-attention layers |
| Vision encoder | DINOv2-B (ViT-B/14, frozen) |
| Action space | ℝ³ (3-DoF grasp offset) |
| Time sampling | Beta(1.5, 1) (π₀ schedule) |
| Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
| Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
| Best val_loss | **0.0639** |
| Best val L2 error | **0.1462** |
| Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
| License | MIT |
### OOD-separation at this checkpoint (step 128,250)
| Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
|---|---|---|---|---|---|
| CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
| DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |
AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
separation is perfect and has been since step ≈ 8k).
Reported directly from the training log at
`outputs/csv/onebox/version_15` in the repo.
### vs the previous checkpoint (step 21,850, val_loss 0.0726)
Strictly better or tied on every metric we measured:
| | Previous | This checkpoint | Δ |
|---|---|---|---|
| val/loss | 0.0726 | **0.0639** | −12.0% |
| val/l2_error | 0.1755 | **0.1462** | −16.7% |
| ood/loss | 4.414 | 4.241 | −3.9% |
| ood/l2_error | 1.371 | 1.271 | −7.3% |
| CE WILD/ID | 2.79× | **3.23×** | +15.8% |
| DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
| DCE WILD/ID | 2.41× | **2.99×** | +24.1% |
(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
during the extended run.)
> **Threshold-shift note**: absolute CE/DCE values in this checkpoint
> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
> downstream OOD detector using an absolute threshold needs to be
> re-calibrated — ratios are preserved but the raw scale is not.
---
## Usage
### Download
```python
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="TomNotch/familiarity-flow-onebox-8L",
filename="onebox_8L.ckpt",
)
```
### Load directly (Familiarity-Flow must be installed)
```python
from familiarity_flow.lightning.module import FlowMatchingModule
module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda")
module.eval()
policy = module.ema_policy # EMA-averaged weights used for inference
```
### Score a batch for OOD-ness
```python
# images: list of stereo image tensors, each shaped (B, 3, 224, 224)
ce = policy.ood_score(images, num_steps=10) # shape: (B,)
# Higher CE = more OOD
```
### Via familiarity-planner
```python
from familiarity_planner.familiarity import Familiarity
fam = Familiarity(
"conditioning_energy",
checkpoint_path="TomNotch/familiarity-flow-onebox-8L", # auto-downloaded
)
score = fam(stereo_observation) # smaller = more familiar
```
---
## Method
Conditional flow matching with linear interpolation and independent coupling
(Lipman et al., *ICLR 2023*). The **conditioning energy**
$$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$
is measured along the deterministic Euler ODE trajectory from noise
(`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian
cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is
the full ODE map. Both scale as out-of-distribution inputs excite the
learned velocity field's sensitivity to conditioning — a signal that
falls out of the geometry of the flow without any auxiliary classifier.
---
## Limitations
- Trained on a **single synthetic domain** (OneBox Isaac Sim renderings).
Generalisation across robots, object sets, or camera rigs is **not**
claimed.
- Action head predicts only a 3-DoF grasp offset; not a full pose or
trajectory.
- OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and
`wild` eval sets used during training — behaviour on arbitrary
out-of-domain inputs is untested.
- **Not for deployment on physical robots** without independent
validation. Intended as a research artefact and as a concrete
backend for methodology study.
---
## Related work
- Lipman et al., *Flow Matching for Generative Modeling*, ICLR 2023
([arXiv:2210.02747](https://arxiv.org/abs/2210.02747))
- Black et al., *π₀: A Vision-Language-Action Flow Model for General
Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164))
- Chen et al., *Neural Ordinary Differential Equations*, NeurIPS 2018
([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
- Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
- Nakkiran et al., *Deep Double Descent*, ICLR 2020
([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))
---
## Author
Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute.
Course project for 16-832 / 16-761 (Spring 2026).
|