FaithfulnessCritic

LoRA adapters over Qwen3-VL-4B-Instruct that score whether a vision-language driving planner's reasoning (R), meta-action (A), and 24-step waypoint plan (W) are mutually self-consistent given the camera scene.

The critic emits a single token directly after a forced <verdict> prefix; the score P(CONSISTENT) ∈ (0,1) is recovered by softmaxing the logits over the two single-token verdict words CONSISTENT and INCONSISTENT. The model is intended as a frozen reward signal during GRPO planner training and as a faithfulness-auditing tool offline.

Variants

The repo contains four adapter checkpoints under separate subfolders. They differ in (i) which input class the critic sees and (ii) which counterfactual augmentation strategies were used to construct the negative training examples.

Subfolder Input class Negative strategies Notes
GB-S12 BEV plot + speed profile S1, S2 Lighter β€” no scene-description corruption.
GB-S123 BEV plot + speed profile S1, S2, S3 All three failure modes.
GP-S12 Forward camera overlay + speed S1, S2 First-person view; uses calibration parquets.
GP-S123 Forward camera overlay + speed S1, S2, S3 All three failure modes.

Where:

  • GB = Gemini-curated dataset, BEV input.
  • GP = Gemini-curated dataset, first-Person input.
  • S1 β€” waypoint substitution: W replaced with geometrically incompatible donor waypoints.
  • S2 β€” move-justification substitution: only R.move_justification is swapped from a donor.
  • S3 β€” scene description substitution: R.scene is swapped from a different scene.

Validation sets always include all three strategies in equal proportions, regardless of training mix, so the variants are directly comparable on the same benchmark.

Quick start

Each subfolder is a standalone PEFT adapter. Load it on top of the base VLM:

import torch
from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

BASE = "Qwen/Qwen3-VL-4B-Instruct"
ADAPTER = "mjf-su/FaithfulnessCritic"
SUBFOLDER = "GB-S12"  # or GB-S123, GP-S12, GP-S123

processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
processor.tokenizer.padding_side = "left"

base = AutoModelForImageTextToText.from_pretrained(
    BASE, dtype=torch.bfloat16, trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, ADAPTER, subfolder=SUBFOLDER)
model.eval().to("cuda")

# Build the chat-template prompt with image(s) + text and append "<verdict>"
# at the end so the next-token logits are over CONSISTENT / INCONSISTENT.
# See `critic_rewards.py:CriticRewardBase._build_prompt` for the full template
# and `_score_logit_mode` for the scoring path used to produce P(CONSISTENT).

The reference end-to-end pipeline lives at https://github.com/mjf-su/fms4navigation under critic_library/Gemini_samples/{BEV,fPOV}/.

Inputs

A single triplet (Image, R, A, W):

  • Image β€” forward-facing camera frame of the driving scene.
    • GB-* adapters consume a BEV trajectory plot + a speed-vs-time strip rendered purely from W.
    • GP-* adapters consume the camera frame with W projected as a teal polyline (full calibration + egomotion required) plus the same speed strip.
  • R β€” <think>{ "scene": ..., "move_justification": ... }</think>.
  • A β€” <action> Longitudinal: <label> | Lateral: <label> </action> from the canonical 7-longitudinal Γ— 11-lateral vocabulary.
  • W β€” 24 lines of <wp>[x, y, ΞΈ]</wp>, vehicle-relative, 0.25 s spacing, 6 s horizon.

Output

The critic emits a single token after a forced <verdict> prefix. Two scoring paths are supported:

Mode What it does Range
logit (default) Softmax over the two single-token verdict ids at the prompt's last position. P(CONSISTENT) ∈ (0,1)
generate Greedy-decode 8 tokens, regex-parse CONSISTENT / INCONSISTENT. {0.0, 0.5, 1.0}

Use logit mode for reward signals (smooth) and generate mode for human-readable verdicts.

Training

  • Base: Qwen/Qwen3-VL-4B-Instruct (frozen).
  • Adaptation: LoRA (r=256, lr=1e-4).
  • Loss: standard SFT next-token cross-entropy, supervising only the CONSISTENT / INCONSISTENT verdict token.
  • Positives: ground-truth (R, A, W) triplets from a Gemini-curated subset of PhysicalAI-Reason-US.
  • Negatives: counterfactual triplets built per strategy; donor eligibility requires both action axes to differ, different scene_id, same train/val split.

Evaluation

Each variant scored 125 randomly drawn (seed=42) planner outputs from two driving VLM planners, with gemini-3-pro-preview (few-shot, system-prompt + 6 worked examples) used as the LLM judge. Per-axis verdicts are aggregated to a single overall ∈ {CONSISTENT, INCONSISTENT, AMBIGUOUS}. Agreement = accuracy treating Gemini's overall as ground truth, computed on the subset where both Gemini and the critic returned a non-null verdict (Gemini parse failures and AMBIGUOUS are skipped).

Planner          Critic     Agreement   P      R     F1    ΞΌP|C   ΞΌP|IC
─────────────────────────────────────────────────────────────────────────
MetaAction-1e    GB-S12       0.764    0.763  0.750  0.756  0.750  0.222
MetaAction-1e    GB-S123      0.724    0.732  0.683  0.707  0.683  0.238
MetaAction-1e    GP-S12       0.732    0.729  0.717  0.723  0.717  0.254
MetaAction-1e    GP-S123      0.732    0.737  0.700  0.718  0.700  0.238
ADEnReward       GB-S12       0.694    0.672  0.717  0.694  0.717  0.328
ADEnReward       GB-S123      0.653    0.644  0.633  0.639  0.633  0.328
ADEnReward       GP-S12       0.734    0.714  0.750  0.732  0.750  0.281
ADEnReward       GP-S123      0.694    0.696  0.650  0.672  0.650  0.266
  • P / R / F1 treat CONSISTENT as the positive class.
  • ΞΌP|C β€” mean critic P(CONSISTENT) on Gemini-CONSISTENT records (higher is better).
  • ΞΌP|IC β€” mean critic P(CONSISTENT) on Gemini-INCONSISTENT records (lower is better; the spread ΞΌP|C βˆ’ ΞΌP|IC β‰ˆ 0.45–0.53 across variants indicates the critic is well-discriminating despite a non-trivial decision-boundary error rate).

Best per planner: GB-S12 for MetaAction-1e (0.764), GP-S12 for ADEnReward (0.734). Adding S3 (scene-description corruption) to the training mix did not improve agreement on either planner in this benchmark.

Intended use

  • Frozen reward model in GRPO/PPO planner fine-tuning where faithfulness of the (R, A, W) chain matters.
  • Offline auditing of candidate planner outputs.
  • Counterfactual-failure-mode analysis when paired with the variant ablation (S12 vs S123).

Out-of-scope use

  • The critic is not a safety verifier. A CONSISTENT verdict means R/A/W are mutually self-consistent and consistent with the scene; it does not mean the trajectory is collision-free, comfortable, or legally compliant.
  • The critic was trained on a US-centric driving dataset; performance on non-US driving cultures, weather conditions, or sensor configurations not present in the training set is unverified.
  • Single-camera, single-frame input only β€” no temporal stack, no surround views.

Limitations

  • Greedy decoding only in generate mode; the reward signal is best read via logit mode.
  • The critic occasionally produces null (parse / render failure) when calibration parquets or camera frames are missing β€” see n_critic_failure in the eval summaries.
  • Like the judge it's evaluated against, the critic can be confidently wrong on edge cases involving rare action combinations (lane-change-during-pull-over, etc.).

Files

mjf-su/FaithfulnessCritic/
β”œβ”€β”€ GB-S12/      adapter_config.json + adapter_model.safetensors
β”œβ”€β”€ GB-S123/     ...
β”œβ”€β”€ GP-S12/      ...
└── GP-S123/     ...
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mjf-su/FaithfulnessCritic

Adapter
(52)
this model