NF4 pre-quantized Robometer-4B reward rSkill (ADR-0057): bit-identical, meta-loadable
5a77c1f verified | license: apache-2.0 | |
| base_model: robometer/Robometer-4B | |
| tags: | |
| - reward-model | |
| - robotics | |
| - vision-language-action | |
| - openral | |
| - bitsandbytes | |
| - nf4 | |
| library_name: openral | |
| pipeline_tag: robotics | |
| # rskill-robometer-4b-nf4 | |
| Pre-quantized **NF4** build of [`robometer/Robometer-4B`](https://huggingface.co/robometer/Robometer-4B) | |
| (a Qwen3-VL-4B robotic **reward foundation model**, arXiv 2603.02115), packaged as an | |
| [OpenRAL](https://github.com/OpenRAL/openral) **`kind: reward`** rSkill (ADR-0057). | |
| It runs **in parallel with a VLA policy** and scores the live rollout: given the | |
| robot's camera frames + the task instruction, it emits **per-frame normalized | |
| progress (0β1)** and **per-frame success probability**. The OpenRAL reasoner polls | |
| it on demand (read-only `query_task_progress` tool) to decide whether to continue, | |
| advance, or replan β **advisory only**, never on the control path. | |
| ## What's in this repo | |
| A self-contained checkpoint that the OpenRAL reward sidecar loads **directly as | |
| 4-bit** β no bf16 materialization, no requantize: | |
| - `model.safetensors` β 236 `Linear` modules packed to bitsandbytes NF4 (~3.32 GB | |
| resident), plus the folded non-persistent rotary `inv_freq` buffers. | |
| - `config.json` β model config (resized vocab 151674). | |
| - `config.yaml` β the `robometer` `ExperimentConfig` (lets the sidecar rebuild the | |
| `RBM` graph offline). | |
| - tokenizer / processor files (incl. `added_tokens.json` β the model's added | |
| progress token). | |
| - `quantization_metadata.json` β provenance. | |
| > The model **class is `RBM`** (`robometer.models.rbm`) β the upstream | |
| > `config.json` advertises `architectures: ["RFM"]` with **no `auto_map`**, so | |
| > vanilla `transformers.AutoModel` cannot load it. The OpenRAL sidecar installs the | |
| > pinned `robometer` package (commit `a669dffc`) with **`transformers==4.57.1`** in | |
| > an isolated venv and builds the skeleton on the `meta` device, then installs these | |
| > packed NF4 weights via `Params4bit.from_prequantized`. | |
| ## Provenance & verification | |
| - **Source:** `robometer/Robometer-4B` @ `beef63bc914c5c189329d49c6d712d96d632aa34` (Apache-2.0). | |
| - **Quantization:** bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL | |
| rule `nn.Linear.numel β₯ 4e6 β Linear4bit`. Built by | |
| [`tools/build_robometer_nf4_checkpoint.py`](https://github.com/OpenRAL/openral/blob/master/tools/build_robometer_nf4_checkpoint.py). | |
| - **Bit-identical** to loading the upstream bf16 weights and quantizing in place: | |
| same-process forward `max|Ξ| = 0`; 4-bit dequant round-trip `0`. For a byte-stable | |
| reward ramp across process launches, the sidecar pins the math SDP kernel + | |
| `use_deterministic_algorithms(True)` + `CUBLAS_WORKSPACE_CONFIG=:4096:8` + | |
| `cudnn.allow_tf32=False`. | |
| - **Footprint:** ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a | |
| small NF4 VLA). The reward forward subsamples the frame window to bound activation. | |
| ## Usage (OpenRAL) | |
| This is consumed by OpenRAL, not loaded standalone. The `kind: reward` manifest | |
| points `weights_uri` here: | |
| ```yaml | |
| weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4" | |
| ``` | |
| and in deploy-sim: | |
| ```bash | |
| openral deploy sim --config scenes/deploy/<scene>.yaml --enable-reward-monitor | |
| ``` | |
| brings up the reward monitor parallel to the VLA and lets the reasoner poll | |
| `/openral/perception/query_task_progress`. See | |
| [ADR-0057](https://github.com/OpenRAL/openral/blob/master/docs/adr/0057-robometer-reward-rskill.md). | |
| ## License | |
| Apache-2.0, inherited from the upstream `robometer/Robometer-4B`. See `LICENSE`. | |
| The upstream `robometer` package is pinned by commit and executed only in an | |
| isolated sidecar venv (it is not an OpenRAL-trusted org). | |