mjf-su
/

FaithfulnessCritic

+---
+license: apache-2.0
+language:
+- en
+library_name: peft
+base_model: Qwen/Qwen3-VL-4B-Instruct
+pipeline_tag: image-text-to-text
+tags:
+- vision-language
+- autonomous-driving
+- faithfulness
+- critic
+- lora
+- grpo-reward
+- waypoint-prediction
+---
+# FaithfulnessCritic
+LoRA adapters over **Qwen3-VL-4B-Instruct** that score whether a vision-language driving planner's **reasoning (R)**, **meta-action (A)**, and **24-step waypoint plan (W)** are mutually self-consistent given the camera scene.
+The critic emits a single token directly after a forced `<verdict>` prefix; the score `P(CONSISTENT) ∈ (0,1)` is recovered by softmaxing the logits over the two single-token verdict words `CONSISTENT` and `INCONSISTENT`. The model is intended as a frozen reward signal during GRPO planner training and as a faithfulness-auditing tool offline.
+## Variants
+The repo contains four adapter checkpoints under separate subfolders. They differ in (i) which **input class** the critic sees and (ii) which **counterfactual augmentation** strategies were used to construct the negative training examples.
+| Subfolder | Input class | Negative strategies | Notes |
+|---|---|---|---|
+| `GB-S12`  | BEV plot + speed profile          | S1, S2 | Lighter — no scene-description corruption. |
+| `GB-S123` | BEV plot + speed profile          | S1, S2, S3 | All three failure modes. |
+| `GP-S12`  | Forward camera overlay + speed    | S1, S2 | First-person view; uses calibration parquets. |
+| `GP-S123` | Forward camera overlay + speed    | S1, S2, S3 | All three failure modes. |
+Where:
+- **GB** = Gemini-curated dataset, **B**EV input.
+- **GP** = Gemini-curated dataset, first-**P**erson input.
+- **S1** — waypoint substitution: `W` replaced with geometrically incompatible donor waypoints.
+- **S2** — move-justification substitution: only `R.move_justification` is swapped from a donor.
+- **S3** — scene description substitution: `R.scene` is swapped from a different scene.
+Validation sets always include all three strategies in equal proportions, regardless of training mix, so the variants are directly comparable on the same benchmark.
+## Quick start
+Each subfolder is a standalone PEFT adapter. Load it on top of the base VLM:
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForImageTextToText, AutoProcessor
+BASE = "Qwen/Qwen3-VL-4B-Instruct"
+ADAPTER = "mjf-su/FaithfulnessCritic"
+SUBFOLDER = "GB-S12"  # or GB-S123, GP-S12, GP-S123
+processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
+processor.tokenizer.padding_side = "left"
+base = AutoModelForImageTextToText.from_pretrained(
+    BASE, dtype=torch.bfloat16, trust_remote_code=True,
+)
+model = PeftModel.from_pretrained(base, ADAPTER, subfolder=SUBFOLDER)
+model.eval().to("cuda")
+# Build the chat-template prompt with image(s) + text and append "<verdict>"
+# at the end so the next-token logits are over CONSISTENT / INCONSISTENT.
+# See `critic_rewards.py:CriticRewardBase._build_prompt` for the full template
+# and `_score_logit_mode` for the scoring path used to produce P(CONSISTENT).
+```
+The reference end-to-end pipeline lives at https://github.com/mjf-su/fms4navigation under `critic_library/Gemini_samples/{BEV,fPOV}/`.
+## Inputs
+A single triplet `(Image, R, A, W)`:
+- **Image** — forward-facing camera frame of the driving scene.
+  - `GB-*` adapters consume a BEV trajectory plot + a speed-vs-time strip rendered purely from `W`.
+  - `GP-*` adapters consume the camera frame with `W` projected as a teal polyline (full calibration + egomotion required) plus the same speed strip.
+- **R** — `<think>{ "scene": ..., "move_justification": ... }</think>`.
+- **A** — `<action> Longitudinal: <label> | Lateral: <label> </action>` from the canonical 7-longitudinal × 11-lateral vocabulary.
+- **W** — 24 lines of `<wp>[x, y, θ]</wp>`, vehicle-relative, 0.25 s spacing, 6 s horizon.
+## Output
+The critic emits a single token after a forced `<verdict>` prefix. Two scoring paths are supported:
+| Mode | What it does | Range |
+|---|---|---|
+| `logit` (default) | Softmax over the two single-token verdict ids at the prompt's last position. | `P(CONSISTENT) ∈ (0,1)` |
+| `generate` | Greedy-decode 8 tokens, regex-parse `CONSISTENT` / `INCONSISTENT`. | `{0.0, 0.5, 1.0}` |
+Use `logit` mode for reward signals (smooth) and `generate` mode for human-readable verdicts.
+## Training
+- **Base**: Qwen/Qwen3-VL-4B-Instruct (frozen).
+- **Adaptation**: LoRA (`r=256`, `lr=1e-4`).
+- **Loss**: standard SFT next-token cross-entropy, supervising only the `CONSISTENT` / `INCONSISTENT` verdict token.
+- **Positives**: ground-truth `(R, A, W)` triplets from a Gemini-curated subset of [PhysicalAI-Reason-US](https://huggingface.co/datasets/mjf-su/PhysicalAI-Reason-US).
+- **Negatives**: counterfactual triplets built per strategy; donor eligibility requires both action axes to differ, different `scene_id`, same train/val split.
+## Evaluation
+Each variant scored 125 randomly drawn (`seed=42`) planner outputs from two driving VLM planners, with `gemini-3-pro-preview` (few-shot, system-prompt + 6 worked examples) used as the LLM judge. Per-axis verdicts are aggregated to a single `overall ∈ {CONSISTENT, INCONSISTENT, AMBIGUOUS}`. **Agreement = accuracy treating Gemini's `overall` as ground truth**, computed on the subset where both Gemini and the critic returned a non-null verdict (Gemini parse failures and `AMBIGUOUS` are skipped).
+```
+Planner          Critic     Agreement   P      R     F1    μP|C   μP|IC
+─────────────────────────────────────────────────────────────────────────
+MetaAction-1e    GB-S12       0.764    0.763  0.750  0.756  0.750  0.222
+MetaAction-1e    GB-S123      0.724    0.732  0.683  0.707  0.683  0.238
+MetaAction-1e    GP-S12       0.732    0.729  0.717  0.723  0.717  0.254
+MetaAction-1e    GP-S123      0.732    0.737  0.700  0.718  0.700  0.238
+ADEnReward       GB-S12       0.694    0.672  0.717  0.694  0.717  0.328
+ADEnReward       GB-S123      0.653    0.644  0.633  0.639  0.633  0.328
+ADEnReward       GP-S12       0.734    0.714  0.750  0.732  0.750  0.281
+ADEnReward       GP-S123      0.694    0.696  0.650  0.672  0.650  0.266
+```
+- **P / R / F1** treat `CONSISTENT` as the positive class.
+- **μP\|C** — mean critic `P(CONSISTENT)` on Gemini-CONSISTENT records (higher is better).
+- **μP\|IC** — mean critic `P(CONSISTENT)` on Gemini-INCONSISTENT records (lower is better; the spread `μP|C − μP|IC` ≈ 0.45–0.53 across variants indicates the critic is well-discriminating despite a non-trivial decision-boundary error rate).
+Best per planner: `GB-S12` for MetaAction-1e (0.764), `GP-S12` for ADEnReward (0.734). Adding S3 (scene-description corruption) to the training mix did not improve agreement on either planner in this benchmark.
+## Intended use
+- Frozen reward model in GRPO/PPO planner fine-tuning where faithfulness of the (R, A, W) chain matters.
+- Offline auditing of candidate planner outputs.
+- Counterfactual-failure-mode analysis when paired with the variant ablation (S12 vs S123).
+## Out-of-scope use
+- The critic is **not** a safety verifier. A `CONSISTENT` verdict means R/A/W are mutually self-consistent and consistent with the scene; it does **not** mean the trajectory is collision-free, comfortable, or legally compliant.
+- The critic was trained on a US-centric driving dataset; performance on non-US driving cultures, weather conditions, or sensor configurations not present in the training set is unverified.
+- Single-camera, single-frame input only — no temporal stack, no surround views.
+## Limitations
+- Greedy decoding only in `generate` mode; the reward signal is best read via `logit` mode.
+- The critic occasionally produces `null` (parse / render failure) when calibration parquets or camera frames are missing — see `n_critic_failure` in the eval summaries.
+- Like the judge it's evaluated against, the critic can be confidently wrong on edge cases involving rare action combinations (lane-change-during-pull-over, etc.).
+## Files
+```
+mjf-su/FaithfulnessCritic/
+├── GB-S12/      adapter_config.json + adapter_model.safetensors
+├── GB-S123/     ...
+├── GP-S12/      ...
+└── GP-S123/     ...
+```
+## Citation
+If you use this model, please cite the upstream dataset and base model:
+```bibtex
+@misc{foutter_faithfulnesscritic_2026,
+  title  = {FaithfulnessCritic: counterfactual-trained R/A/W consistency critics for vision-based driving planners},
+  author = {Foutter, Matthew and Cercola, Marco and Gammelli, Daniele},
+  year   = {2026},
+  howpublished = {\url{https://huggingface.co/mjf-su/FaithfulnessCritic}},
+}
+```