---
license: apache-2.0
base_model: robometer/Robometer-4B
tags:
  - reward-model
  - robotics
  - vision-language-action
  - openral
  - bitsandbytes
  - nf4
library_name: openral
pipeline_tag: robotics
---

# rskill-robometer-4b-nf4

Pre-quantized **NF4** build of [`robometer/Robometer-4B`](https://huggingface.co/robometer/Robometer-4B)
(a Qwen3-VL-4B robotic **reward foundation model**, arXiv 2603.02115), packaged as an
[OpenRAL](https://github.com/OpenRAL/openral) **`kind: reward`** rSkill (ADR-0057).

It runs **in parallel with a VLA policy** and scores the live rollout: given the
robot's camera frames + the task instruction, it emits **per-frame normalized
progress (0–1)** and **per-frame success probability**. The OpenRAL reasoner polls
it on demand (read-only `query_task_progress` tool) to decide whether to continue,
advance, or replan — **advisory only**, never on the control path.

## What's in this repo

A self-contained checkpoint that the OpenRAL reward sidecar loads **directly as
4-bit** — no bf16 materialization, no requantize:

- `model.safetensors` — 236 `Linear` modules packed to bitsandbytes NF4 (~3.32 GB
  resident), plus the folded non-persistent rotary `inv_freq` buffers.
- `config.json` — model config (resized vocab 151674).
- `config.yaml` — the `robometer` `ExperimentConfig` (lets the sidecar rebuild the
  `RBM` graph offline).
- tokenizer / processor files (incl. `added_tokens.json` — the model's added
  progress token).
- `quantization_metadata.json` — provenance.

> The model **class is `RBM`** (`robometer.models.rbm`) — the upstream
> `config.json` advertises `architectures: ["RFM"]` with **no `auto_map`**, so
> vanilla `transformers.AutoModel` cannot load it. The OpenRAL sidecar installs the
> pinned `robometer` package (commit `a669dffc`) with **`transformers==4.57.1`** in
> an isolated venv and builds the skeleton on the `meta` device, then installs these
> packed NF4 weights via `Params4bit.from_prequantized`.

## Provenance & verification

- **Source:** `robometer/Robometer-4B` @ `beef63bc914c5c189329d49c6d712d96d632aa34` (Apache-2.0).
- **Quantization:** bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL
  rule `nn.Linear.numel ≥ 4e6 → Linear4bit`. Built by
  [`tools/build_robometer_nf4_checkpoint.py`](https://github.com/OpenRAL/openral/blob/master/tools/build_robometer_nf4_checkpoint.py).
- **Bit-identical** to loading the upstream bf16 weights and quantizing in place:
  same-process forward `max|Δ| = 0`; 4-bit dequant round-trip `0`. For a byte-stable
  reward ramp across process launches, the sidecar pins the math SDP kernel +
  `use_deterministic_algorithms(True)` + `CUBLAS_WORKSPACE_CONFIG=:4096:8` +
  `cudnn.allow_tf32=False`.
- **Footprint:** ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a
  small NF4 VLA). The reward forward subsamples the frame window to bound activation.

## Usage (OpenRAL)

This is consumed by OpenRAL, not loaded standalone. The `kind: reward` manifest
points `weights_uri` here:

```yaml
weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4"
```

and in deploy-sim:

```bash
openral deploy sim --config scenes/deploy/<scene>.yaml --enable-reward-monitor
```

brings up the reward monitor parallel to the VLA and lets the reasoner poll
`/openral/perception/query_task_progress`. See
[ADR-0057](https://github.com/OpenRAL/openral/blob/master/docs/adr/0057-robometer-reward-rskill.md).

## License

Apache-2.0, inherited from the upstream `robometer/Robometer-4B`. See `LICENSE`.
The upstream `robometer` package is pinned by commit and executed only in an
isolated sidecar venv (it is not an OpenRAL-trusted org).