Refresh model card with accurate shipped-checkpoint metrics and expanded usage

c1a6da2 verified 4 days ago

6.21 kB

	---
	license: mit
	tags:
	- robotics
	- flow-matching
	- ood-detection
	- visual-servoing
	- conditioning-energy
	- uncertainty-quantification
	pipeline_tag: robotics
	library_name: pytorch
	---

	# Familiarity-Flow OneBox 8-Layer

	Flow-matching policy for stereo-image-conditioned 3D grasp-offset prediction,
	trained on the OneBox synthetic Isaac-Sim dataset. The full learning
	dynamics — value of the prediction, geometry of the flow, and
	Jacobian-of-conditioning OOD signal — are studied in the
	[Familiarity-Flow repo](https://github.com/Finding-Familiarity/Familiarity-Flow).

	Intended primarily as the conditioning-energy OOD-detection backend for
	robotic-policy gating, exposed through the
	[familiarity-planner](https://github.com/tomnotch/familiarity-planner) package.

	This checkpoint comes from a 150,000-step extended-training study
	that explored flow / OOD-separation dynamics well past the conventional
	convergence point. See
	[`docs/long_run_analysis.md`](https://github.com/Finding-Familiarity/Familiarity-Flow/blob/main/docs/long_run_analysis.md)
	in the repo for the full write-up (multi-descent behaviour observed, not
	the monotone-plateau or terminal-collapse initially hypothesised).

	---

	## Checkpoint summary

	\| Field \| Value \|
	\|---\|---\|
	\| Architecture \| `FlowMatchingPolicy`, 8 cross-attention layers \|
	\| Vision encoder \| DINOv2-B (ViT-B/14, frozen) \|
	\| Action space \| ℝ³ (3-DoF grasp offset) \|
	\| Time sampling \| Beta(1.5, 1) (π₀ schedule) \|
	\| Training data \| OneBox (synthetic Isaac Sim, ZED-Mini stereo) \|
	\| Training steps \| 128,250 (best val_loss checkpoint of 150k-step run) \|
	\| Best val_loss \| 0.0639 \|
	\| Best val L2 error \| 0.1462 \|
	\| Parameters \| 244 M total, 35.6 M trainable (encoder frozen) \|
	\| License \| MIT \|

	### OOD-separation at this checkpoint (step 128,250)

	\| Metric \| ID \| OOD (clutter) \| WILD (real) \| OOD/ID \| WILD/ID \|
	\|---\|---\|---\|---\|---\|---\|
	\| CE \| 0.642 \| 3.341 \| 2.077 \| 5.20× \| 3.23× \|
	\| DCE \| 0.062 \| 0.303 \| 0.186 \| 4.87× \| 2.99× \|

	AUROC(ID vs OOD) and AUROC(ID vs WILD) are both 1.000 (rank-based
	separation is perfect and has been since step ≈ 8k).

	Reported directly from the training log at
	`outputs/csv/onebox/version_15` in the repo.

	### vs the previous checkpoint (step 21,850, val_loss 0.0726)

	Strictly better or tied on every metric we measured:

	\| \| Previous \| This checkpoint \| Δ \|
	\|---\|---\|---\|---\|
	\| val/loss \| 0.0726 \| 0.0639 \| −12.0% \|
	\| val/l2_error \| 0.1755 \| 0.1462 \| −16.7% \|
	\| ood/loss \| 4.414 \| 4.241 \| −3.9% \|
	\| ood/l2_error \| 1.371 \| 1.271 \| −7.3% \|
	\| CE WILD/ID \| 2.79× \| 3.23× \| +15.8% \|
	\| DCE OOD/ID \| 4.32× \| 4.87× \| +12.7% \|
	\| DCE WILD/ID \| 2.41× \| 2.99× \| +24.1% \|

	(CE OOD/ID drifted −2.1%, well inside the run-to-run variance observed
	during the extended run.)

	> Threshold-shift note: absolute CE/DCE values in this checkpoint
	> are ~3× larger than in the previous one (CE_ID 0.225 → 0.642). A
	> downstream OOD detector using an absolute threshold needs to be
	> re-calibrated — ratios are preserved but the raw scale is not.

	---

	## Usage

	### Download

	```python
	from huggingface_hub import hf_hub_download
	ckpt_path = hf_hub_download(
	repo_id="TomNotch/familiarity-flow-onebox-8L",
	filename="onebox_8L.ckpt",
	)
	```

	### Load directly (Familiarity-Flow must be installed)

	```python
	from familiarity_flow.lightning.module import FlowMatchingModule

	module = FlowMatchingModule.load_from_checkpoint(ckpt_path, map_location="cuda")
	module.eval()
	policy = module.ema_policy # EMA-averaged weights used for inference
	```

	### Score a batch for OOD-ness

	```python
	# images: list of stereo image tensors, each shaped (B, 3, 224, 224)
	ce = policy.ood_score(images, num_steps=10) # shape: (B,)
	# Higher CE = more OOD
	```

	### Via familiarity-planner

	```python
	from familiarity_planner.familiarity import Familiarity

	fam = Familiarity(
	"conditioning_energy",
	checkpoint_path="TomNotch/familiarity-flow-onebox-8L", # auto-downloaded
	)
	score = fam(stereo_observation) # smaller = more familiar
	```

	---

	## Method

	Conditional flow matching with linear interpolation and independent coupling
	(Lipman et al., ICLR 2023). The conditioning energy

	$$\mathrm{CE}(c) = \int_1^0 \left\lVert \frac{\partial v_\theta}{\partial c}(x_t, t, c) \right\rVert_F^2 \, \mathrm{d}t$$

	is measured along the deterministic Euler ODE trajectory from noise
	(`x_1 ∼ N(0, I)`) to the predicted action (`x_0`). Its endpoint-Jacobian
	cousin DCE measures the squared Frobenius norm of `∂φ/∂c` where `φ` is
	the full ODE map. Both scale as out-of-distribution inputs excite the
	learned velocity field's sensitivity to conditioning — a signal that
	falls out of the geometry of the flow without any auxiliary classifier.

	---

	## Limitations

	- Trained on a single synthetic domain (OneBox Isaac Sim renderings).
	Generalisation across robots, object sets, or camera rigs is not
	claimed.
	- Action head predicts only a 3-DoF grasp offset; not a full pose or
	trajectory.
	- OOD-detection quality (CE/DCE) is strong on the OneBox `clutter` and
	`wild` eval sets used during training — behaviour on arbitrary
	out-of-domain inputs is untested.
	- Not for deployment on physical robots without independent
	validation. Intended as a research artefact and as a concrete
	backend for methodology study.

	---

	## Related work

	- Lipman et al., Flow Matching for Generative Modeling, ICLR 2023
	([arXiv:2210.02747](https://arxiv.org/abs/2210.02747))
	- Black et al., *π₀: A Vision-Language-Action Flow Model for General
	Robot Control* ([arXiv:2410.24164](https://arxiv.org/abs/2410.24164))
	- Chen et al., Neural Ordinary Differential Equations, NeurIPS 2018
	([arXiv:1806.07366](https://arxiv.org/abs/1806.07366))
	- Liu et al., Simple and Principled Uncertainty Estimation (SNGP),
	NeurIPS 2020 ([arXiv:2006.10108](https://arxiv.org/abs/2006.10108))
	- Nakkiran et al., Deep Double Descent, ICLR 2020
	([arXiv:1912.02292](https://arxiv.org/abs/1912.02292))

	---

	## Author

	Mukai (Tom Notch) Yu — Carnegie Mellon University, Robotics Institute.
	Course project for 16-832 / 16-761 (Spring 2026).