NF4 pre-quantized Robometer-4B reward rSkill (ADR-0057): bit-identical, meta-loadable

5a77c1f verified 15 days ago

3.68 kB

license: apache-2.0
base_model: robometer/Robometer-4B
tags:
  - reward-model
  - robotics
  - vision-language-action
  - openral
  - bitsandbytes
  - nf4
library_name: openral
pipeline_tag: robotics

rskill-robometer-4b-nf4

Pre-quantized NF4 build of robometer/Robometer-4B (a Qwen3-VL-4B robotic reward foundation model, arXiv 2603.02115), packaged as an OpenRAL kind: reward rSkill (ADR-0057).

It runs in parallel with a VLA policy and scores the live rollout: given the robot's camera frames + the task instruction, it emits per-frame normalized progress (0–1) and per-frame success probability. The OpenRAL reasoner polls it on demand (read-only query_task_progress tool) to decide whether to continue, advance, or replan — advisory only, never on the control path.

What's in this repo

A self-contained checkpoint that the OpenRAL reward sidecar loads directly as 4-bit — no bf16 materialization, no requantize:

model.safetensors — 236 Linear modules packed to bitsandbytes NF4 (~3.32 GB resident), plus the folded non-persistent rotary inv_freq buffers.
config.json — model config (resized vocab 151674).
config.yaml — the robometer ExperimentConfig (lets the sidecar rebuild the RBM graph offline).
tokenizer / processor files (incl. added_tokens.json — the model's added progress token).
quantization_metadata.json — provenance.

The model class is RBM (robometer.models.rbm) — the upstream config.json advertises architectures: ["RFM"] with no auto_map, so vanilla transformers.AutoModel cannot load it. The OpenRAL sidecar installs the pinned robometer package (commit a669dffc) with transformers==4.57.1 in an isolated venv and builds the skeleton on the meta device, then installs these packed NF4 weights via Params4bit.from_prequantized.

Provenance & verification

Source: robometer/Robometer-4B @ beef63bc914c5c189329d49c6d712d96d632aa34 (Apache-2.0).
Quantization: bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL rule nn.Linear.numel ≥ 4e6 → Linear4bit. Built by tools/build_robometer_nf4_checkpoint.py.
Bit-identical to loading the upstream bf16 weights and quantizing in place: same-process forward max|Δ| = 0; 4-bit dequant round-trip 0. For a byte-stable reward ramp across process launches, the sidecar pins the math SDP kernel + use_deterministic_algorithms(True) + CUBLAS_WORKSPACE_CONFIG=:4096:8 + cudnn.allow_tf32=False.
Footprint: ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a small NF4 VLA). The reward forward subsamples the frame window to bound activation.

Usage (OpenRAL)

This is consumed by OpenRAL, not loaded standalone. The kind: reward manifest points weights_uri here:

weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4"

and in deploy-sim:

openral deploy sim --config scenes/deploy/<scene>.yaml --enable-reward-monitor

brings up the reward monitor parallel to the VLA and lets the reasoner poll /openral/perception/query_task_progress. See ADR-0057.

License

Apache-2.0, inherited from the upstream robometer/Robometer-4B. See LICENSE. The upstream robometer package is pinned by commit and executed only in an isolated sidecar venv (it is not an OpenRAL-trusted org).