--- license: mit tags: - robotics - flow-matching - ood-detection - visual-servoing - conditioning-energy - uncertainty-quantification pipeline_tag: robotics library_name: pytorch --- # Familiarity-Flow OneBox 8-Layer Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction, trained on the **OneBox** synthetic Isaac-Sim dataset. The full learning dynamics — value of the prediction, geometry of the flow, and Jacobian-of-conditioning OOD signal — are studied in the [Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow). Intended primarily as the **conditioning-energy OOD-detection backend** for robotic-policy gating, exposed through the [familiarity-planner](https://github.com/tomnotch/familiarity-planner) package. **This checkpoint comes from a 150,000-step extended-training study** that explored flow / OOD-separation dynamics well past the conventional convergence point. See [`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md) in the repo for the full write-up (multi-descent behaviour observed, not the monotone-plateau or terminal-collapse initially hypothesised). --- ## Checkpoint summary | Field | Value | |---|---| | Architecture | `FlowMatchingPolicy`, 8 cross-attention layers | | Vision encoder | DINOv2-B (ViT-B/14, frozen) | | Action space | ℝ³ (3-DoF grasp offset) | | Time sampling | Beta(1.5, 1) (π₀ schedule) | | Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) | | Training steps | 128,250 (best val_loss checkpoint of 150k-step run) | | Best val_loss | **0.0639** | | Best val L2 error | **0.1462** | | Parameters | 244 M total, 35.6 M trainable (encoder frozen) | | License | MIT | ### OOD-separation at this checkpoint (step 128,250) | Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID | |---|---|---|---|---|---| | CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× | | DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× | AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based separation is perfect and has been since step ≈ 8k). Reported directly from the training log at `outputs/csv/onebox/version_15` in the repo. ### vs the previous checkpoint (step 21,850, val_loss 0.0726) Strictly better or tied on every metric we measured: | | Previous | This checkpoint | Δ | |---|---|---|---| | val/loss | 0.0726 | **0.0639** | −12.0% | | val/l2_error | 0.1755 | **0.1462** | −16.7% | | ood/loss | 4.414 | 4.241 | −3.9% | | ood/l2_error | 1.371 | 1.271 | −7.3% | | CE WILD/ID | 2.79× | **3.23×** | +15.8% | | DCE OOD/ID | 4.32× | **4.87×** | +12.7% | | DCE WILD/ID | 2.41× | **2.99×** | +24.1% | (CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed during the extended run.) > **Threshold-shift note**: absolute CE/DCE values in this checkpoint > are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A > downstream OOD detector using an absolute threshold needs to be > re-calibrated — ratios are preserved but the raw scale is not. --- ## Usage ### Download ```python from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="TomNotch/familiarity-flow-onebox-8L", filename="onebox_8L.ckpt", ) ``` ### Load directly (Familiarity-Flow must be installed) ```python from familiarity_flow.lightning.module import FlowMatchingModule module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda") module.eval() policy = module.ema_policy # EMA-averaged weights used for inference ``` ### Score a batch for OOD-ness ```python # images: list of stereo image tensors, each shaped (B, 3, 224, 224) ce = policy.ood_score(images, num_steps=10) # shape: (B,) # Higher CE = more OOD ``` ### Via familiarity-planner ```python from familiarity_planner.familiarity import Familiarity fam = Familiarity( "conditioning_energy", checkpoint_path="TomNotch/familiarity-flow-onebox-8L", # auto-downloaded ) score = fam(stereo_observation) # smaller = more familiar ``` --- ## Method Conditional flow matching with linear interpolation and independent coupling (Lipman et al., *ICLR 2023*). The **conditioning energy** $$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$ is measured along the deterministic Euler ODE trajectory from noise (`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is the full ODE map. Both scale as out-of-distribution inputs excite the learned velocity field's sensitivity to conditioning — a signal that falls out of the geometry of the flow without any auxiliary classifier. --- ## Limitations - Trained on a **single synthetic domain** (OneBox Isaac Sim renderings). Generalisation across robots, object sets, or camera rigs is **not** claimed. - Action head predicts only a 3-DoF grasp offset; not a full pose or trajectory. - OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and `wild` eval sets used during training — behaviour on arbitrary out-of-domain inputs is untested. - **Not for deployment on physical robots** without independent validation. Intended as a research artefact and as a concrete backend for methodology study. --- ## Related work - Lipman et al., *Flow Matching for Generative Modeling*, ICLR 2023 ([arXiv:2210.02747](https://arxiv.org/abs/2210.02747)) - Black et al., *π₀: A Vision-Language-Action Flow Model for General Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164)) - Chen et al., *Neural Ordinary Differential Equations*, NeurIPS 2018 ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366)) - Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*, NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108)) - Nakkiran et al., *Deep Double Descent*, ICLR 2020 ([arXiv:1912.02292](https://arxiv.org/abs/1912.02292)) --- ## Author Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute. Course project for 16-832 / 16-761 (Spring 2026).