---
license: mit
tags:
  - robotics
  - flow-matching
  - ood-detection
  - visual-servoing
  - conditioning-energy
  - uncertainty-quantification
pipeline_tag: robotics
library_name: pytorch
---

# Familiarity-Flow OneBox 8-Layer

Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction,
trained on the **OneBox** synthetic Isaac-Sim dataset. The full learning
dynamics — value of the prediction, geometry of the flow, and
Jacobian-of-conditioning OOD signal — are studied in the
[Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow).

Intended primarily as the **conditioning-energy OOD-detection backend** for
robotic-policy gating, exposed through the
[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.

**This checkpoint comes from a 150,000-step extended-training study**
that explored flow / OOD-separation dynamics well past the conventional
convergence point. See
[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
in the repo for the full write-up (multi-descent behaviour observed, not
the monotone-plateau or terminal-collapse initially hypothesised).

---

## Checkpoint summary

| Field | Value |
|---|---|
| Architecture | `FlowMatchingPolicy`, 8 cross-attention layers |
| Vision encoder | DINOv2-B (ViT-B/14, frozen) |
| Action space | ℝ³ (3-DoF grasp offset) |
| Time sampling | Beta(1.5, 1) (π₀ schedule) |
| Training data | OneBox (synthetic Isaac Sim, ZED-Mini stereo) |
| Training steps | 128,250 (best val_loss checkpoint of 150k-step run) |
| Best val_loss | **0.0639** |
| Best val L2 error | **0.1462** |
| Parameters | 244 M total, 35.6 M trainable (encoder frozen) |
| License | MIT |

### OOD-separation at this checkpoint (step 128,250)

| Metric | ID | OOD (clutter) | WILD (real) | OOD/ID | WILD/ID |
|---|---|---|---|---|---|
| CE  | 0.642 | 3.341 | 2.077 | **5.20×** | 3.23× |
| DCE | 0.062 | 0.303 | 0.186 | **4.87×** | 2.99× |

AUROC(ID vs OOD) and AUROC(ID vs WILD) are both **1.000** (rank-based
separation is perfect and has been since step ≈ 8k).

Reported directly from the training log at
`outputs/csv/onebox/version_15` in the repo.

### vs the previous checkpoint (step 21,850, val_loss 0.0726)

Strictly better or tied on every metric we measured:

| | Previous | This checkpoint | Δ |
|---|---|---|---|
| val/loss | 0.0726 | **0.0639** | −12.0% |
| val/l2_error | 0.1755 | **0.1462** | −16.7% |
| ood/loss | 4.414 | 4.241 | −3.9% |
| ood/l2_error | 1.371 | 1.271 | −7.3% |
| CE WILD/ID | 2.79× | **3.23×** | +15.8% |
| DCE OOD/ID | 4.32× | **4.87×** | +12.7% |
| DCE WILD/ID | 2.41× | **2.99×** | +24.1% |

(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
during the extended run.)

> **Threshold-shift note**: absolute CE/DCE values in this checkpoint
> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
> downstream OOD detector using an absolute threshold needs to be
> re-calibrated — ratios are preserved but the raw scale is not.

---

## Usage

### Download

```python
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
    repo_id="TomNotch/familiarity-flow-onebox-8L",
    filename="onebox_8L.ckpt",
)
```

### Load directly (Familiarity-Flow must be installed)

```python
from familiarity_flow.lightning.module import FlowMatchingModule

module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda")
module.eval()
policy = module.ema_policy   # EMA-averaged weights used for inference
```

### Score a batch for OOD-ness

```python
# images: list of stereo image tensors, each shaped (B, 3, 224, 224)
ce = policy.ood_score(images, num_steps=10)   # shape: (B,)
# Higher CE = more OOD
```

### Via familiarity-planner

```python
from familiarity_planner.familiarity import Familiarity

fam = Familiarity(
    "conditioning_energy",
    checkpoint_path="TomNotch/familiarity-flow-onebox-8L",   # auto-downloaded
)
score = fam(stereo_observation)   # smaller = more familiar
```

---

## Method

Conditional flow matching with linear interpolation and independent coupling
(Lipman et al., *ICLR 2023*). The **conditioning energy**

$$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$

is measured along the deterministic Euler ODE trajectory from noise
(`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian
cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is
the full ODE map. Both scale as out-of-distribution inputs excite the
learned velocity field's sensitivity to conditioning — a signal that
falls out of the geometry of the flow without any auxiliary classifier.

---

## Limitations

- Trained on a **single synthetic domain** (OneBox Isaac Sim renderings).
  Generalisation across robots, object sets, or camera rigs is **not**
  claimed.
- Action head predicts only a 3-DoF grasp offset; not a full pose or
  trajectory.
- OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and
  `wild` eval sets used during training — behaviour on arbitrary
  out-of-domain inputs is untested.
- **Not for deployment on physical robots** without independent
  validation. Intended as a research artefact and as a concrete
  backend for methodology study.

---

## Related work

- Lipman et al., *Flow Matching for Generative Modeling*, ICLR 2023
  ([arXiv:2210.02747](https://arxiv.org/abs/2210.02747))
- Black et al., *π₀: A Vision-Language-Action Flow Model for General
  Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164))
- Chen et al., *Neural Ordinary Differential Equations*, NeurIPS 2018
  ([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
- Liu et al., *Simple and Principled Uncertainty Estimation (SNGP)*,
  NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
- Nakkiran et al., *Deep Double Descent*, ICLR 2020
  ([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))

---

## Author

Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute.
Course project for 16-832 / 16-761 (Spring 2026).