| --- |
| license: mit |
| tags: |
| - robotics |
| - flow-matching |
| - ood-detection |
| - visual-servoing |
| - conditioning-energy |
| - uncertainty-quantification |
| pipeline_tag: robotics |
| library_name: pytorch |
| --- |
| |
| # Familiarity-Flow OneBox 8-Layer |
|
|
| Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction, |
| trained on the **OneBox** synthetic Isaac-Sim dataset. The full learning |
| dynamics — value of the prediction, geometry of the flow, and |
| Jacobian-of-conditioning OOD signal — are studied in the |
| [Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow). |
|
|
| Intended primarily as the **conditioning-energy OOD-detection backend** for |
| robotic-policy gating, exposed through the |
| [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package. |
|
|
| **This checkpoint comes from a 150,000-step extended-training study** |
| that explored flow / OOD-separation dynamics well past the conventional |
| convergence point. See |
| [`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md) |
| in the repo for the full write-up (multi-descent behaviour observed, not |
| the monotone-plateau or terminal-collapse initially hypothesised). |
|
|
| --- |
|
|
| ## Checkpoint summary |
|
|
| | Field | Value | |
| |---|---| |
| | Architecture | `FlowMatchingPolicy`, 8 cross-attention layers | |
| | Vision encoder | DINOv2-B (ViT-B/14, frozen) | |
| | Action space | ℝ³ (3-DoF grasp offset) | |
| | Time sampling | Beta(1.5, 1) (π₀ schedule) | |
| | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) | |
| | Training steps | 128,250 (best val_loss checkpoint of 150k-step run) | |
| | Best val_loss | **0.0639** | |
| | Best val L2 error | **0.1462** | |
| | Parameters | 244 M total, 35.6 M trainable (encoder frozen) | |
| | License | MIT | |
|
|
| ### OOD-separation at this checkpoint (step 128,250) |
|
|
| | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID | |
| |---|---|---|---|---|---| |
| | CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× | |
| | DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× | |
|
|
| AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based |
| separation is perfect and has been since step ≈ 8k). |
|
|
| Reported directly from the training log at |
| `outputs/csv/onebox/version_15` in the repo. |
|
|
| ### vs the previous checkpoint (step 21,850, val_loss 0.0726) |
| |
| Strictly better or tied on every metric we measured: |
| |
| | | Previous | This checkpoint | Δ | |
| |---|---|---|---| |
| | val/loss | 0.0726 | **0.0639** | −12.0% | |
| | val/l2_error | 0.1755 | **0.1462** | −16.7% | |
| | ood/loss | 4.414 | 4.241 | −3.9% | |
| | ood/l2_error | 1.371 | 1.271 | −7.3% | |
| | CE WILD/ID | 2.79× | **3.23×** | +15.8% | |
| | DCE OOD/ID | 4.32× | **4.87×** | +12.7% | |
| | DCE WILD/ID | 2.41× | **2.99×** | +24.1% | |
| |
| (CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed |
| during the extended run.) |
| |
| > **Threshold-shift note**: absolute CE/DCE values in this checkpoint |
| > are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A |
| > downstream OOD detector using an absolute threshold needs to be |
| > re-calibrated — ratios are preserved but the raw scale is not. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Download |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| ckpt_path = hf_hub_download( |
| repo_id="TomNotch/familiarity-flow-onebox-8L", |
| filename="onebox_8L.ckpt", |
| ) |
| ``` |
|
|
| ### Load directly (Familiarity-Flow must be installed) |
|
|
| ```python |
| from familiarity_flow.lightning.module import FlowMatchingModule |
| |
| module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda") |
| module.eval() |
| policy = module.ema_policy # EMA-averaged weights used for inference |
| ``` |
|
|
| ### Score a batch for OOD-ness |
|
|
| ```python |
| # images: list of stereo image tensors, each shaped (B, 3, 224, 224) |
| ce = policy.ood_score(images, num_steps=10) # shape: (B,) |
| # Higher CE = more OOD |
| ``` |
|
|
| ### Via familiarity-planner |
|
|
| ```python |
| from familiarity_planner.familiarity import Familiarity |
| |
| fam = Familiarity( |
| "conditioning_energy", |
| checkpoint_path="TomNotch/familiarity-flow-onebox-8L", # auto-downloaded |
| ) |
| score = fam(stereo_observation) # smaller = more familiar |
| ``` |
|
|
| --- |
|
|
| ## Method |
|
|
| Conditional flow matching with linear interpolation and independent coupling |
| (Lipman et al., *ICLR 2023*). The **conditioning energy** |
|
|
| $$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$ |
|
|
| is measured along the deterministic Euler ODE trajectory from noise |
| (`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian |
| cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is |
| the full ODE map. Both scale as out-of-distribution inputs excite the |
| learned velocity field's sensitivity to conditioning — a signal that |
| falls out of the geometry of the flow without any auxiliary classifier. |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - Trained on a **single synthetic domain** (OneBox Isaac Sim renderings). |
| Generalisation across robots, object sets, or camera rigs is **not** |
| claimed. |
| - Action head predicts only a 3-DoF grasp offset; not a full pose or |
| trajectory. |
| - OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and |
| `wild` eval sets used during training — behaviour on arbitrary |
| out-of-domain inputs is untested. |
| - **Not for deployment on physical robots** without independent |
| validation. Intended as a research artefact and as a concrete |
| backend for methodology study. |
|
|
| --- |
|
|
| ## Related work |
|
|
| - Lipman et al., *Flow Matching for Generative Modeling*, ICLR 2023 |
| ([arXiv:2210.02747](https://arxiv.org/abs/2210.02747)) |
| - Black et al., *π₀: A Vision-Language-Action Flow Model for General |
| Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164)) |
| - Chen et al., *Neural Ordinary Differential Equations*, NeurIPS 2018 |
| ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366)) |
| - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*, |
| NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108)) |
| - Nakkiran et al., *Deep Double Descent*, ICLR 2020 |
| ([arXiv:1912.02292](https://arxiv.org/abs/1912.02292)) |
|
|
| --- |
|
|
| ## Author |
|
|
| Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute. |
| Course project for 16-832 / 16-761 (Spring 2026). |
|
|