Image-Text-to-Text
PEFT
Safetensors
English
vision-language
autonomous-driving
faithfulness
critic
lora
grpo-reward
waypoint-prediction
Instructions to use mjf-su/FaithfulnessCritic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use mjf-su/FaithfulnessCritic with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: peft | |
| base_model: Qwen/Qwen3-VL-4B-Instruct | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - vision-language | |
| - autonomous-driving | |
| - faithfulness | |
| - critic | |
| - lora | |
| - grpo-reward | |
| - waypoint-prediction | |
| # FaithfulnessCritic | |
| LoRA adapters over **Qwen3-VL-4B-Instruct** that score whether a vision-language driving planner's **reasoning (R)**, **meta-action (A)**, and **24-step waypoint plan (W)** are mutually self-consistent given the camera scene. | |
| The critic emits a single token directly after a forced `<verdict>` prefix; the score `P(CONSISTENT) β (0,1)` is recovered by softmaxing the logits over the two single-token verdict words `CONSISTENT` and `INCONSISTENT`. The model is intended as a frozen reward signal during GRPO planner training and as a faithfulness-auditing tool offline. | |
| ## Variants | |
| The repo contains four adapter checkpoints under separate subfolders. They differ in (i) which **input class** the critic sees and (ii) which **counterfactual augmentation** strategies were used to construct the negative training examples. | |
| | Subfolder | Input class | Negative strategies | Notes | | |
| |---|---|---|---| | |
| | `GB-S12` | BEV plot + speed profile | S1, S2 | Lighter β no scene-description corruption. | | |
| | `GB-S123` | BEV plot + speed profile | S1, S2, S3 | All three failure modes. | | |
| | `GP-S12` | Forward camera overlay + speed | S1, S2 | First-person view; uses calibration parquets. | | |
| | `GP-S123` | Forward camera overlay + speed | S1, S2, S3 | All three failure modes. | | |
| Where: | |
| - **GB** = Gemini-curated dataset, **B**EV input. | |
| - **GP** = Gemini-curated dataset, first-**P**erson input. | |
| - **S1** β waypoint substitution: `W` replaced with geometrically incompatible donor waypoints. | |
| - **S2** β move-justification substitution: only `R.move_justification` is swapped from a donor. | |
| - **S3** β scene description substitution: `R.scene` is swapped from a different scene. | |
| Validation sets always include all three strategies in equal proportions, regardless of training mix, so the variants are directly comparable on the same benchmark. | |
| ## Quick start | |
| Each subfolder is a standalone PEFT adapter. Load it on top of the base VLM: | |
| ```python | |
| import torch | |
| from peft import PeftModel | |
| from transformers import AutoModelForImageTextToText, AutoProcessor | |
| BASE = "Qwen/Qwen3-VL-4B-Instruct" | |
| ADAPTER = "mjf-su/FaithfulnessCritic" | |
| SUBFOLDER = "GB-S12" # or GB-S123, GP-S12, GP-S123 | |
| processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True) | |
| processor.tokenizer.padding_side = "left" | |
| base = AutoModelForImageTextToText.from_pretrained( | |
| BASE, dtype=torch.bfloat16, trust_remote_code=True, | |
| ) | |
| model = PeftModel.from_pretrained(base, ADAPTER, subfolder=SUBFOLDER) | |
| model.eval().to("cuda") | |
| # Build the chat-template prompt with image(s) + text and append "<verdict>" | |
| # at the end so the next-token logits are over CONSISTENT / INCONSISTENT. | |
| # See `critic_rewards.py:CriticRewardBase._build_prompt` for the full template | |
| # and `_score_logit_mode` for the scoring path used to produce P(CONSISTENT). | |
| ``` | |
| The reference end-to-end pipeline lives at https://github.com/mjf-su/fms4navigation under `critic_library/Gemini_samples/{BEV,fPOV}/`. | |
| ## Inputs | |
| A single triplet `(Image, R, A, W)`: | |
| - **Image** β forward-facing camera frame of the driving scene. | |
| - `GB-*` adapters consume a BEV trajectory plot + a speed-vs-time strip rendered purely from `W`. | |
| - `GP-*` adapters consume the camera frame with `W` projected as a teal polyline (full calibration + egomotion required) plus the same speed strip. | |
| - **R** β `<think>{ "scene": ..., "move_justification": ... }</think>`. | |
| - **A** β `<action> Longitudinal: <label> | Lateral: <label> </action>` from the canonical 7-longitudinal Γ 11-lateral vocabulary. | |
| - **W** β 24 lines of `<wp>[x, y, ΞΈ]</wp>`, vehicle-relative, 0.25 s spacing, 6 s horizon. | |
| ## Output | |
| The critic emits a single token after a forced `<verdict>` prefix. Two scoring paths are supported: | |
| | Mode | What it does | Range | | |
| |---|---|---| | |
| | `logit` (default) | Softmax over the two single-token verdict ids at the prompt's last position. | `P(CONSISTENT) β (0,1)` | | |
| | `generate` | Greedy-decode 8 tokens, regex-parse `CONSISTENT` / `INCONSISTENT`. | `{0.0, 0.5, 1.0}` | | |
| Use `logit` mode for reward signals (smooth) and `generate` mode for human-readable verdicts. | |
| ## Training | |
| - **Base**: Qwen/Qwen3-VL-4B-Instruct (frozen). | |
| - **Adaptation**: LoRA (`r=256`, `lr=1e-4`). | |
| - **Loss**: standard SFT next-token cross-entropy, supervising only the `CONSISTENT` / `INCONSISTENT` verdict token. | |
| - **Positives**: ground-truth `(R, A, W)` triplets from a Gemini-curated subset of [PhysicalAI-Reason-US](https://huggingface.co/datasets/mjf-su/PhysicalAI-Reason-US). | |
| - **Negatives**: counterfactual triplets built per strategy; donor eligibility requires both action axes to differ, different `scene_id`, same train/val split. | |
| ## Evaluation | |
| Each variant scored 125 randomly drawn (`seed=42`) planner outputs from two driving VLM planners, with `gemini-3-pro-preview` (few-shot, system-prompt + 6 worked examples) used as the LLM judge. Per-axis verdicts are aggregated to a single `overall β {CONSISTENT, INCONSISTENT, AMBIGUOUS}`. **Agreement = accuracy treating Gemini's `overall` as ground truth**, computed on the subset where both Gemini and the critic returned a non-null verdict (Gemini parse failures and `AMBIGUOUS` are skipped). | |
| ``` | |
| Planner Critic Agreement P R F1 ΞΌP|C ΞΌP|IC | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| MetaAction-1e GB-S12 0.764 0.763 0.750 0.756 0.750 0.222 | |
| MetaAction-1e GB-S123 0.724 0.732 0.683 0.707 0.683 0.238 | |
| MetaAction-1e GP-S12 0.732 0.729 0.717 0.723 0.717 0.254 | |
| MetaAction-1e GP-S123 0.732 0.737 0.700 0.718 0.700 0.238 | |
| ADEnReward GB-S12 0.694 0.672 0.717 0.694 0.717 0.328 | |
| ADEnReward GB-S123 0.653 0.644 0.633 0.639 0.633 0.328 | |
| ADEnReward GP-S12 0.734 0.714 0.750 0.732 0.750 0.281 | |
| ADEnReward GP-S123 0.694 0.696 0.650 0.672 0.650 0.266 | |
| ``` | |
| - **P / R / F1** treat `CONSISTENT` as the positive class. | |
| - **ΞΌP\|C** β mean critic `P(CONSISTENT)` on Gemini-CONSISTENT records (higher is better). | |
| - **ΞΌP\|IC** β mean critic `P(CONSISTENT)` on Gemini-INCONSISTENT records (lower is better; the spread `ΞΌP|C β ΞΌP|IC` β 0.45β0.53 across variants indicates the critic is well-discriminating despite a non-trivial decision-boundary error rate). | |
| Best per planner: `GB-S12` for MetaAction-1e (0.764), `GP-S12` for ADEnReward (0.734). Adding S3 (scene-description corruption) to the training mix did not improve agreement on either planner in this benchmark. | |
| ## Intended use | |
| - Frozen reward model in GRPO/PPO planner fine-tuning where faithfulness of the (R, A, W) chain matters. | |
| - Offline auditing of candidate planner outputs. | |
| - Counterfactual-failure-mode analysis when paired with the variant ablation (S12 vs S123). | |
| ## Out-of-scope use | |
| - The critic is **not** a safety verifier. A `CONSISTENT` verdict means R/A/W are mutually self-consistent and consistent with the scene; it does **not** mean the trajectory is collision-free, comfortable, or legally compliant. | |
| - The critic was trained on a US-centric driving dataset; performance on non-US driving cultures, weather conditions, or sensor configurations not present in the training set is unverified. | |
| - Single-camera, single-frame input only β no temporal stack, no surround views. | |
| ## Limitations | |
| - Greedy decoding only in `generate` mode; the reward signal is best read via `logit` mode. | |
| - The critic occasionally produces `null` (parse / render failure) when calibration parquets or camera frames are missing β see `n_critic_failure` in the eval summaries. | |
| - Like the judge it's evaluated against, the critic can be confidently wrong on edge cases involving rare action combinations (lane-change-during-pull-over, etc.). | |
| ## Files | |
| ``` | |
| mjf-su/FaithfulnessCritic/ | |
| βββ GB-S12/ adapter_config.json + adapter_model.safetensors | |
| βββ GB-S123/ ... | |
| βββ GP-S12/ ... | |
| βββ GP-S123/ ... | |
| ``` | |