AdrianLlopart's picture
NF4 pre-quantized Robometer-4B reward rSkill (ADR-0057): bit-identical, meta-loadable
5a77c1f verified
|
Raw
History Blame Contribute Delete
3.68 kB
---
license: apache-2.0
base_model: robometer/Robometer-4B
tags:
- reward-model
- robotics
- vision-language-action
- openral
- bitsandbytes
- nf4
library_name: openral
pipeline_tag: robotics
---
# rskill-robometer-4b-nf4
Pre-quantized **NF4** build of [`robometer/Robometer-4B`](https://huggingface.co/robometer/Robometer-4B)
(a Qwen3-VL-4B robotic **reward foundation model**, arXiv 2603.02115), packaged as an
[OpenRAL](https://github.com/OpenRAL/openral) **`kind: reward`** rSkill (ADR-0057).
It runs **in parallel with a VLA policy** and scores the live rollout: given the
robot's camera frames + the task instruction, it emits **per-frame normalized
progress (0–1)** and **per-frame success probability**. The OpenRAL reasoner polls
it on demand (read-only `query_task_progress` tool) to decide whether to continue,
advance, or replan β€” **advisory only**, never on the control path.
## What's in this repo
A self-contained checkpoint that the OpenRAL reward sidecar loads **directly as
4-bit** β€” no bf16 materialization, no requantize:
- `model.safetensors` β€” 236 `Linear` modules packed to bitsandbytes NF4 (~3.32 GB
resident), plus the folded non-persistent rotary `inv_freq` buffers.
- `config.json` β€” model config (resized vocab 151674).
- `config.yaml` β€” the `robometer` `ExperimentConfig` (lets the sidecar rebuild the
`RBM` graph offline).
- tokenizer / processor files (incl. `added_tokens.json` β€” the model's added
progress token).
- `quantization_metadata.json` β€” provenance.
> The model **class is `RBM`** (`robometer.models.rbm`) β€” the upstream
> `config.json` advertises `architectures: ["RFM"]` with **no `auto_map`**, so
> vanilla `transformers.AutoModel` cannot load it. The OpenRAL sidecar installs the
> pinned `robometer` package (commit `a669dffc`) with **`transformers==4.57.1`** in
> an isolated venv and builds the skeleton on the `meta` device, then installs these
> packed NF4 weights via `Params4bit.from_prequantized`.
## Provenance & verification
- **Source:** `robometer/Robometer-4B` @ `beef63bc914c5c189329d49c6d712d96d632aa34` (Apache-2.0).
- **Quantization:** bitsandbytes NF4 (double-quant), compute dtype bf16, the OpenRAL
rule `nn.Linear.numel β‰₯ 4e6 β†’ Linear4bit`. Built by
[`tools/build_robometer_nf4_checkpoint.py`](https://github.com/OpenRAL/openral/blob/master/tools/build_robometer_nf4_checkpoint.py).
- **Bit-identical** to loading the upstream bf16 weights and quantizing in place:
same-process forward `max|Ξ”| = 0`; 4-bit dequant round-trip `0`. For a byte-stable
reward ramp across process launches, the sidecar pins the math SDP kernel +
`use_deterministic_algorithms(True)` + `CUBLAS_WORKSPACE_CONFIG=:4096:8` +
`cudnn.allow_tf32=False`.
- **Footprint:** ~3.32 GB resident on an 8 GB GPU; co-resident with the sim (and a
small NF4 VLA). The reward forward subsamples the frame window to bound activation.
## Usage (OpenRAL)
This is consumed by OpenRAL, not loaded standalone. The `kind: reward` manifest
points `weights_uri` here:
```yaml
weights_uri: "hf://OpenRAL/rskill-robometer-4b-nf4"
```
and in deploy-sim:
```bash
openral deploy sim --config scenes/deploy/<scene>.yaml --enable-reward-monitor
```
brings up the reward monitor parallel to the VLA and lets the reasoner poll
`/openral/perception/query_task_progress`. See
[ADR-0057](https://github.com/OpenRAL/openral/blob/master/docs/adr/0057-robometer-reward-rskill.md).
## License
Apache-2.0, inherited from the upstream `robometer/Robometer-4B`. See `LICENSE`.
The upstream `robometer` package is pinned by commit and executed only in an
isolated sidecar venv (it is not an OpenRAL-trusted org).