FaithfulnessCritic
LoRA adapters over Qwen3-VL-4B-Instruct that score whether a vision-language driving planner's reasoning (R), meta-action (A), and 24-step waypoint plan (W) are mutually self-consistent given the camera scene.
The critic emits a single token directly after a forced <verdict> prefix; the score P(CONSISTENT) β (0,1) is recovered by softmaxing the logits over the two single-token verdict words CONSISTENT and INCONSISTENT. The model is intended as a frozen reward signal during GRPO planner training and as a faithfulness-auditing tool offline.
Variants
The repo contains four adapter checkpoints under separate subfolders. They differ in (i) which input class the critic sees and (ii) which counterfactual augmentation strategies were used to construct the negative training examples.
| Subfolder | Input class | Negative strategies | Notes |
|---|---|---|---|
GB-S12 |
BEV plot + speed profile | S1, S2 | Lighter β no scene-description corruption. |
GB-S123 |
BEV plot + speed profile | S1, S2, S3 | All three failure modes. |
GP-S12 |
Forward camera overlay + speed | S1, S2 | First-person view; uses calibration parquets. |
GP-S123 |
Forward camera overlay + speed | S1, S2, S3 | All three failure modes. |
Where:
- GB = Gemini-curated dataset, BEV input.
- GP = Gemini-curated dataset, first-Person input.
- S1 β waypoint substitution:
Wreplaced with geometrically incompatible donor waypoints. - S2 β move-justification substitution: only
R.move_justificationis swapped from a donor. - S3 β scene description substitution:
R.sceneis swapped from a different scene.
Validation sets always include all three strategies in equal proportions, regardless of training mix, so the variants are directly comparable on the same benchmark.
Quick start
Each subfolder is a standalone PEFT adapter. Load it on top of the base VLM:
import torch
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
BASE = "Qwen/Qwen3-VL-4B-Instruct"
ADAPTER = "mjf-su/FaithfulnessCritic"
SUBFOLDER = "GB-S12" # or GB-S123, GP-S12, GP-S123
processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
processor.tokenizer.padding_side = "left"
base = AutoModelForImageTextToText.from_pretrained(
BASE, dtype=torch.bfloat16, trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, ADAPTER, subfolder=SUBFOLDER)
model.eval().to("cuda")
# Build the chat-template prompt with image(s) + text and append "<verdict>"
# at the end so the next-token logits are over CONSISTENT / INCONSISTENT.
# See `critic_rewards.py:CriticRewardBase._build_prompt` for the full template
# and `_score_logit_mode` for the scoring path used to produce P(CONSISTENT).
The reference end-to-end pipeline lives at https://github.com/mjf-su/fms4navigation under critic_library/Gemini_samples/{BEV,fPOV}/.
Inputs
A single triplet (Image, R, A, W):
- Image β forward-facing camera frame of the driving scene.
GB-*adapters consume a BEV trajectory plot + a speed-vs-time strip rendered purely fromW.GP-*adapters consume the camera frame withWprojected as a teal polyline (full calibration + egomotion required) plus the same speed strip.
- R β
<think>{ "scene": ..., "move_justification": ... }</think>. - A β
<action> Longitudinal: <label> | Lateral: <label> </action>from the canonical 7-longitudinal Γ 11-lateral vocabulary. - W β 24 lines of
<wp>[x, y, ΞΈ]</wp>, vehicle-relative, 0.25 s spacing, 6 s horizon.
Output
The critic emits a single token after a forced <verdict> prefix. Two scoring paths are supported:
| Mode | What it does | Range |
|---|---|---|
logit (default) |
Softmax over the two single-token verdict ids at the prompt's last position. | P(CONSISTENT) β (0,1) |
generate |
Greedy-decode 8 tokens, regex-parse CONSISTENT / INCONSISTENT. |
{0.0, 0.5, 1.0} |
Use logit mode for reward signals (smooth) and generate mode for human-readable verdicts.
Training
- Base: Qwen/Qwen3-VL-4B-Instruct (frozen).
- Adaptation: LoRA (
r=256,lr=1e-4). - Loss: standard SFT next-token cross-entropy, supervising only the
CONSISTENT/INCONSISTENTverdict token. - Positives: ground-truth
(R, A, W)triplets from a Gemini-curated subset of PhysicalAI-Reason-US. - Negatives: counterfactual triplets built per strategy; donor eligibility requires both action axes to differ, different
scene_id, same train/val split.
Evaluation
Each variant scored 125 randomly drawn (seed=42) planner outputs from two driving VLM planners, with gemini-3-pro-preview (few-shot, system-prompt + 6 worked examples) used as the LLM judge. Per-axis verdicts are aggregated to a single overall β {CONSISTENT, INCONSISTENT, AMBIGUOUS}. Agreement = accuracy treating Gemini's overall as ground truth, computed on the subset where both Gemini and the critic returned a non-null verdict (Gemini parse failures and AMBIGUOUS are skipped).
Planner Critic Agreement P R F1 ΞΌP|C ΞΌP|IC
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MetaAction-1e GB-S12 0.764 0.763 0.750 0.756 0.750 0.222
MetaAction-1e GB-S123 0.724 0.732 0.683 0.707 0.683 0.238
MetaAction-1e GP-S12 0.732 0.729 0.717 0.723 0.717 0.254
MetaAction-1e GP-S123 0.732 0.737 0.700 0.718 0.700 0.238
ADEnReward GB-S12 0.694 0.672 0.717 0.694 0.717 0.328
ADEnReward GB-S123 0.653 0.644 0.633 0.639 0.633 0.328
ADEnReward GP-S12 0.734 0.714 0.750 0.732 0.750 0.281
ADEnReward GP-S123 0.694 0.696 0.650 0.672 0.650 0.266
- P / R / F1 treat
CONSISTENTas the positive class. - ΞΌP|C β mean critic
P(CONSISTENT)on Gemini-CONSISTENT records (higher is better). - ΞΌP|IC β mean critic
P(CONSISTENT)on Gemini-INCONSISTENT records (lower is better; the spreadΞΌP|C β ΞΌP|ICβ 0.45β0.53 across variants indicates the critic is well-discriminating despite a non-trivial decision-boundary error rate).
Best per planner: GB-S12 for MetaAction-1e (0.764), GP-S12 for ADEnReward (0.734). Adding S3 (scene-description corruption) to the training mix did not improve agreement on either planner in this benchmark.
Intended use
- Frozen reward model in GRPO/PPO planner fine-tuning where faithfulness of the (R, A, W) chain matters.
- Offline auditing of candidate planner outputs.
- Counterfactual-failure-mode analysis when paired with the variant ablation (S12 vs S123).
Out-of-scope use
- The critic is not a safety verifier. A
CONSISTENTverdict means R/A/W are mutually self-consistent and consistent with the scene; it does not mean the trajectory is collision-free, comfortable, or legally compliant. - The critic was trained on a US-centric driving dataset; performance on non-US driving cultures, weather conditions, or sensor configurations not present in the training set is unverified.
- Single-camera, single-frame input only β no temporal stack, no surround views.
Limitations
- Greedy decoding only in
generatemode; the reward signal is best read vialogitmode. - The critic occasionally produces
null(parse / render failure) when calibration parquets or camera frames are missing β seen_critic_failurein the eval summaries. - Like the judge it's evaluated against, the critic can be confidently wrong on edge cases involving rare action combinations (lane-change-during-pull-over, etc.).
Files
mjf-su/FaithfulnessCritic/
βββ GB-S12/ adapter_config.json + adapter_model.safetensors
βββ GB-S123/ ...
βββ GP-S12/ ...
βββ GP-S123/ ...
- Downloads last month
- 21
Model tree for mjf-su/FaithfulnessCritic
Base model
Qwen/Qwen3-VL-4B-Instruct