NF4 pre-quantized Robometer-4B reward rSkill (ADR-0057): bit-identical, meta-loadable

5a77c1f verified 16 days ago

3.68 kB

	---
	license: apache-2.0
	base_model: robometer/Robometer-4B
	tags:
	- reward-model
	- robotics
	- vision-language-action
	- openral
	- bitsandbytes
	- nf4
	library_name: openral
	pipeline_tag: robotics
	---

	# rskill-robometer-4b-nf4

	Pre-quantized NF4 build of [`robometer/Robometer-4B`](https://huggingface.co/robometer/Robometer-4B)
	(a Qwen3-VL-4B robotic reward foundation model, arXiv 2603.02115), packaged as an
	[OpenRAL](https://github.com/OpenRAL/openral) `kind: reward` rSkill (ADR-0057).

	It runs in parallel with a VLA policy and scores the live rollout: given the
	robot's camera frames + the task instruction, it emits **per-frame normalized
	progress (0–1) and per-frame success probability**. The OpenRAL reasoner polls
	it on demand (read-only `query_task_progress` tool) to decide whether to continue,
	advance, or replan — advisory only, never on the control path.

	## What's in this repo

	A self-contained checkpoint that the OpenRAL reward sidecar loads **directly as
	4-bit** — no bf16 materialization, no requantize:

	- `model.safetensors` — 236 `Linear` modules packed to bitsandbytes NF4 (~3.32 GB
	resident), plus the folded non-persistent rotary `inv_freq` buffers.
	- `config.json` — model config (resized vocab 151674).
	- `config.yaml` — the `robometer` `ExperimentConfig` (lets the sidecar rebuild the
	`RBM` graph offline).
	- tokenizer / processor files (incl. `added_tokens.json` — the model's added
	progress token).
	- `quantization_metadata.json` — provenance.

	> The model class is `RBM` (`robometer.models.rbm`) — the upstream
	> `config.json` advertises `architectures: ["RFM"]` with no `auto_map`, so
	> vanilla `transformers.AutoModel` cannot load it. The OpenRAL sidecar installs the
	> pinned `robometer` package (commit `a669dffc`) with `transformers==4.57.1` in
	> an isolated venv and builds the skeleton on the `meta` device, then installs these
	> packed NF4 weights via `Params4bit.from_prequantized`.

	## Provenance & verification

	- Source: `robometer/Robometer-4B` @ `beef63bc914c5c189329d49c6d712d96d632aa34` (Apache-2.0).
	- Quantization: bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL
	rule `nn.Linear.numel ≥ 4e6 → Linear4bit`. Built by
	[`tools/build_robometer_nf4_checkpoint.py`](https://github.com/OpenRAL/openral/blob/master/tools/build_robometer_nf4_checkpoint.py).
	- Bit-identical to loading the upstream bf16 weights and quantizing in place:
	same-process forward `max\|Δ\| = 0`; 4-bit dequant round-trip `0`. For a byte-stable
	reward ramp across process launches, the sidecar pins the math SDP kernel +
	`use_deterministic_algorithms(True)` + `CUBLAS_WORKSPACE_CONFIG=:4096:8` +
	`cudnn.allow_tf32=False`.
	- Footprint: ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a
	small NF4 VLA). The reward forward subsamples the frame window to bound activation.

	## Usage (OpenRAL)

	This is consumed by OpenRAL, not loaded standalone. The `kind: reward` manifest
	points `weights_uri` here:

	```yaml
	weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4"
	```

	and in deploy-sim:

	```bash
	openral deploy sim --config scenes/deploy/<scene>.yaml --enable-reward-monitor
	```

	brings up the reward monitor parallel to the VLA and lets the reasoner poll
	`/openral/perception/query_task_progress`. See
	[ADR-0057](https://github.com/OpenRAL/openral/blob/master/docs/adr/0057-robometer-reward-rskill.md).

	## License

	Apache-2.0, inherited from the upstream `robometer/Robometer-4B`. See `LICENSE`.
	The upstream `robometer` package is pinned by commit and executed only in an
	isolated sidecar venv (it is not an OpenRAL-trusted org).