TomNotch's picture
Refresh model card with accurate shipped-checkpoint metrics and expanded usage
c1a6da2 verified
---
license: mit
tags:
- robotics
- flow-matching
- ood-detection
- visual-servoing
- conditioning-energy
- uncertainty-quantification
pipeline_tag: robotics
library_name: pytorch
---
# Familiarity-Flow OneBox 8-Layer
Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction,
trained on the **OneBox** synthetic Isaac-Sim dataset. The full learning
dynamics — value of the prediction, geometry of the flow, and
Jacobian-of-conditioning OOD signal — are studied in the
[Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow).
Intended primarily as the **conditioning-energy OOD-detection backend** for
robotic-policy gating, exposed through the
[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.
**This checkpoint comes from a 150,000-step extended-training study**
that explored flow / OOD-separation dynamics well past the conventional
convergence point. See
[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
in the repo for the full write-up (multi-descent behaviour observed, not
the monotone-plateau or terminal-collapse initially hypothesised).
---
## Checkpoint summary
| Field | Value |
|---|---|
| Architecture | `FlowMatchingPolicy`, 8 cross-attention layers |
| Vision encoder | DINOv2-B (ViT-B/14, frozen) |
| Action space | ℝ³ (3-DoF grasp offset) |
| Time sampling | Beta(1.5, 1) (π₀ schedule) |
| Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
| Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
| Best val_loss | **0.0639** |
| Best val L2 error | **0.1462** |
| Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
| License | MIT |
### OOD-separation at this checkpoint (step 128,250)
| Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
|---|---|---|---|---|---|
| CE | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
| DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |
AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
separation is perfect and has been since step ≈ 8k).
Reported directly from the training log at
`outputs/csv/onebox/version_15` in the repo.
### vs the previous checkpoint (step 21,850, val_loss 0.0726)
Strictly better or tied on every metric we measured:
| | Previous | This checkpoint | Δ |
|---|---|---|---|
| val/loss | 0.0726 | **0.0639** | −12.0% |
| val/l2_error | 0.1755 | **0.1462** | −16.7% |
| ood/loss | 4.414 | 4.241 | −3.9% |
| ood/l2_error | 1.371 | 1.271 | −7.3% |
| CE WILD/ID | 2.79× | **3.23×** | +15.8% |
| DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
| DCE WILD/ID | 2.41× | **2.99×** | +24.1% |
(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
during the extended run.)
> **Threshold-shift note**: absolute CE/DCE values in this checkpoint
> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
> downstream OOD detector using an absolute threshold needs to be
> re-calibrated — ratios are preserved but the raw scale is not.
---
## Usage
### Download
```python
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="TomNotch/familiarity-flow-onebox-8L",
filename="onebox_8L.ckpt",
)
```
### Load directly (Familiarity-Flow must be installed)
```python
from familiarity_flow.lightning.module import FlowMatchingModule
module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda")
module.eval()
policy = module.ema_policy # EMA-averaged weights used for inference
```
### Score a batch for OOD-ness
```python
# images: list of stereo image tensors, each shaped (B, 3, 224, 224)
ce = policy.ood_score(images, num_steps=10) # shape: (B,)
# Higher CE = more OOD
```
### Via familiarity-planner
```python
from familiarity_planner.familiarity import Familiarity
fam = Familiarity(
"conditioning_energy",
checkpoint_path="TomNotch/familiarity-flow-onebox-8L", # auto-downloaded
)
score = fam(stereo_observation) # smaller = more familiar
```
---
## Method
Conditional flow matching with linear interpolation and independent coupling
(Lipman et al., *ICLR 2023*). The **conditioning energy**
$$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$
is measured along the deterministic Euler ODE trajectory from noise
(`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian
cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is
the full ODE map. Both scale as out-of-distribution inputs excite the
learned velocity field's sensitivity to conditioning — a signal that
falls out of the geometry of the flow without any auxiliary classifier.
---
## Limitations
- Trained on a **single synthetic domain** (OneBox Isaac Sim renderings).
Generalisation across robots, object sets, or camera rigs is **not**
claimed.
- Action head predicts only a 3-DoF grasp offset; not a full pose or
trajectory.
- OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and
`wild` eval sets used during training — behaviour on arbitrary
out-of-domain inputs is untested.
- **Not for deployment on physical robots** without independent
validation. Intended as a research artefact and as a concrete
backend for methodology study.
---
## Related work
- Lipman et al., *Flow Matching for Generative Modeling*, ICLR 2023
([arXiv:2210.02747](https://arxiv.org/abs/2210.02747))
- Black et al., *π₀: A Vision-Language-Action Flow Model for General
Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164))
- Chen et al., *Neural Ordinary Differential Equations*, NeurIPS 2018
([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
- Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
- Nakkiran et al., *Deep Double Descent*, ICLR 2020
([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))
---
## Author
Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute.
Course project for 16-832 / 16-761 (Spring 2026).