--- license: apache-2.0 base_model: robometer/Robometer-4B tags: - reward-model - robotics - vision-language-action - openral - bitsandbytes - nf4 library_name: openral pipeline_tag: robotics --- # rskill-robometer-4b-nf4 Pre-quantized **NF4** build of [`robometer/Robometer-4B`](https://huggingface.co/robometer/Robometer-4B) (a Qwen3-VL-4B robotic **reward foundation model**, arXiv 2603.02115), packaged as an [OpenRAL](https://github.com/OpenRAL/openral) **`kind: reward`** rSkill (ADR-0057). It runs **in parallel with a VLA policy** and scores the live rollout: given the robot's camera frames + the task instruction, it emits **per-frame normalized progress (0–1)** and **per-frame success probability**. The OpenRAL reasoner polls it on demand (read-only `query_task_progress` tool) to decide whether to continue, advance, or replan — **advisory only**, never on the control path. ## What's in this repo A self-contained checkpoint that the OpenRAL reward sidecar loads **directly as 4-bit** — no bf16 materialization, no requantize: - `model.safetensors` — 236 `Linear` modules packed to bitsandbytes NF4 (~3.32 GB resident), plus the folded non-persistent rotary `inv_freq` buffers. - `config.json` — model config (resized vocab 151674). - `config.yaml` — the `robometer` `ExperimentConfig` (lets the sidecar rebuild the `RBM` graph offline). - tokenizer / processor files (incl. `added_tokens.json` — the model's added progress token). - `quantization_metadata.json` — provenance. > The model **class is `RBM`** (`robometer.models.rbm`) — the upstream > `config.json` advertises `architectures: ["RFM"]` with **no `auto_map`**, so > vanilla `transformers.AutoModel` cannot load it. The OpenRAL sidecar installs the > pinned `robometer` package (commit `a669dffc`) with **`transformers==4.57.1`** in > an isolated venv and builds the skeleton on the `meta` device, then installs these > packed NF4 weights via `Params4bit.from_prequantized`. ## Provenance & verification - **Source:** `robometer/Robometer-4B` @ `beef63bc914c5c189329d49c6d712d96d632aa34` (Apache-2.0). - **Quantization:** bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL rule `nn.Linear.numel ≥ 4e6 → Linear4bit`. Built by [`tools/build_robometer_nf4_checkpoint.py`](https://github.com/OpenRAL/openral/blob/master/tools/build_robometer_nf4_checkpoint.py). - **Bit-identical** to loading the upstream bf16 weights and quantizing in place: same-process forward `max|Δ| = 0`; 4-bit dequant round-trip `0`. For a byte-stable reward ramp across process launches, the sidecar pins the math SDP kernel + `use_deterministic_algorithms(True)` + `CUBLAS_WORKSPACE_CONFIG=:4096:8` + `cudnn.allow_tf32=False`. - **Footprint:** ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a small NF4 VLA). The reward forward subsamples the frame window to bound activation. ## Usage (OpenRAL) This is consumed by OpenRAL, not loaded standalone. The `kind: reward` manifest points `weights_uri` here: ```yaml weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4" ``` and in deploy-sim: ```bash openral deploy sim --config scenes/deploy/.yaml --enable-reward-monitor ``` brings up the reward monitor parallel to the VLA and lets the reasoner poll `/openral/perception/query_task_progress`. See [ADR-0057](https://github.com/OpenRAL/openral/blob/master/docs/adr/0057-robometer-reward-rskill.md). ## License Apache-2.0, inherited from the upstream `robometer/Robometer-4B`. See `LICENSE`. The upstream `robometer` package is pinned by commit and executed only in an isolated sidecar venv (it is not an OpenRAL-trusted org).