mjf-su commited on
Commit
cab95b0
Β·
verified Β·
1 Parent(s): 9be2be1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: peft
6
+ base_model: Qwen/Qwen3-VL-4B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ tags:
9
+ - vision-language
10
+ - autonomous-driving
11
+ - faithfulness
12
+ - critic
13
+ - lora
14
+ - grpo-reward
15
+ - waypoint-prediction
16
+ ---
17
+
18
+ # FaithfulnessCritic
19
+
20
+ LoRA adapters over **Qwen3-VL-4B-Instruct** that score whether a vision-language driving planner's **reasoning (R)**, **meta-action (A)**, and **24-step waypoint plan (W)** are mutually self-consistent given the camera scene.
21
+
22
+ The critic emits a single token directly after a forced `<verdict>` prefix; the score `P(CONSISTENT) ∈ (0,1)` is recovered by softmaxing the logits over the two single-token verdict words `CONSISTENT` and `INCONSISTENT`. The model is intended as a frozen reward signal during GRPO planner training and as a faithfulness-auditing tool offline.
23
+
24
+ ## Variants
25
+
26
+ The repo contains four adapter checkpoints under separate subfolders. They differ in (i) which **input class** the critic sees and (ii) which **counterfactual augmentation** strategies were used to construct the negative training examples.
27
+
28
+ | Subfolder | Input class | Negative strategies | Notes |
29
+ |---|---|---|---|
30
+ | `GB-S12` | BEV plot + speed profile | S1, S2 | Lighter β€” no scene-description corruption. |
31
+ | `GB-S123` | BEV plot + speed profile | S1, S2, S3 | All three failure modes. |
32
+ | `GP-S12` | Forward camera overlay + speed | S1, S2 | First-person view; uses calibration parquets. |
33
+ | `GP-S123` | Forward camera overlay + speed | S1, S2, S3 | All three failure modes. |
34
+
35
+ Where:
36
+ - **GB** = Gemini-curated dataset, **B**EV input.
37
+ - **GP** = Gemini-curated dataset, first-**P**erson input.
38
+ - **S1** β€” waypoint substitution: `W` replaced with geometrically incompatible donor waypoints.
39
+ - **S2** β€” move-justification substitution: only `R.move_justification` is swapped from a donor.
40
+ - **S3** β€” scene description substitution: `R.scene` is swapped from a different scene.
41
+
42
+ Validation sets always include all three strategies in equal proportions, regardless of training mix, so the variants are directly comparable on the same benchmark.
43
+
44
+ ## Quick start
45
+
46
+ Each subfolder is a standalone PEFT adapter. Load it on top of the base VLM:
47
+
48
+ ```python
49
+ import torch
50
+ from peft import PeftModel
51
+ from transformers import AutoModelForImageTextToText, AutoProcessor
52
+
53
+ BASE = "Qwen/Qwen3-VL-4B-Instruct"
54
+ ADAPTER = "mjf-su/FaithfulnessCritic"
55
+ SUBFOLDER = "GB-S12" # or GB-S123, GP-S12, GP-S123
56
+
57
+ processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
58
+ processor.tokenizer.padding_side = "left"
59
+
60
+ base = AutoModelForImageTextToText.from_pretrained(
61
+ BASE, dtype=torch.bfloat16, trust_remote_code=True,
62
+ )
63
+ model = PeftModel.from_pretrained(base, ADAPTER, subfolder=SUBFOLDER)
64
+ model.eval().to("cuda")
65
+
66
+ # Build the chat-template prompt with image(s) + text and append "<verdict>"
67
+ # at the end so the next-token logits are over CONSISTENT / INCONSISTENT.
68
+ # See `critic_rewards.py:CriticRewardBase._build_prompt` for the full template
69
+ # and `_score_logit_mode` for the scoring path used to produce P(CONSISTENT).
70
+ ```
71
+
72
+ The reference end-to-end pipeline lives at https://github.com/mjf-su/fms4navigation under `critic_library/Gemini_samples/{BEV,fPOV}/`.
73
+
74
+ ## Inputs
75
+
76
+ A single triplet `(Image, R, A, W)`:
77
+ - **Image** β€” forward-facing camera frame of the driving scene.
78
+ - `GB-*` adapters consume a BEV trajectory plot + a speed-vs-time strip rendered purely from `W`.
79
+ - `GP-*` adapters consume the camera frame with `W` projected as a teal polyline (full calibration + egomotion required) plus the same speed strip.
80
+ - **R** β€” `<think>{ "scene": ..., "move_justification": ... }</think>`.
81
+ - **A** β€” `<action> Longitudinal: <label> | Lateral: <label> </action>` from the canonical 7-longitudinal Γ— 11-lateral vocabulary.
82
+ - **W** β€” 24 lines of `<wp>[x, y, ΞΈ]</wp>`, vehicle-relative, 0.25 s spacing, 6 s horizon.
83
+
84
+ ## Output
85
+
86
+ The critic emits a single token after a forced `<verdict>` prefix. Two scoring paths are supported:
87
+
88
+ | Mode | What it does | Range |
89
+ |---|---|---|
90
+ | `logit` (default) | Softmax over the two single-token verdict ids at the prompt's last position. | `P(CONSISTENT) ∈ (0,1)` |
91
+ | `generate` | Greedy-decode 8 tokens, regex-parse `CONSISTENT` / `INCONSISTENT`. | `{0.0, 0.5, 1.0}` |
92
+
93
+ Use `logit` mode for reward signals (smooth) and `generate` mode for human-readable verdicts.
94
+
95
+ ## Training
96
+
97
+ - **Base**: Qwen/Qwen3-VL-4B-Instruct (frozen).
98
+ - **Adaptation**: LoRA (`r=256`, `lr=1e-4`).
99
+ - **Loss**: standard SFT next-token cross-entropy, supervising only the `CONSISTENT` / `INCONSISTENT` verdict token.
100
+ - **Positives**: ground-truth `(R, A, W)` triplets from a Gemini-curated subset of [PhysicalAI-Reason-US](https://huggingface.co/datasets/mjf-su/PhysicalAI-Reason-US).
101
+ - **Negatives**: counterfactual triplets built per strategy; donor eligibility requires both action axes to differ, different `scene_id`, same train/val split.
102
+
103
+ ## Evaluation
104
+
105
+ Each variant scored 125 randomly drawn (`seed=42`) planner outputs from two driving VLM planners, with `gemini-3-pro-preview` (few-shot, system-prompt + 6 worked examples) used as the LLM judge. Per-axis verdicts are aggregated to a single `overall ∈ {CONSISTENT, INCONSISTENT, AMBIGUOUS}`. **Agreement = accuracy treating Gemini's `overall` as ground truth**, computed on the subset where both Gemini and the critic returned a non-null verdict (Gemini parse failures and `AMBIGUOUS` are skipped).
106
+
107
+ ```
108
+ Planner Critic Agreement P R F1 ΞΌP|C ΞΌP|IC
109
+ ─────────────────────────────────────────────────────────────────────────
110
+ MetaAction-1e GB-S12 0.764 0.763 0.750 0.756 0.750 0.222
111
+ MetaAction-1e GB-S123 0.724 0.732 0.683 0.707 0.683 0.238
112
+ MetaAction-1e GP-S12 0.732 0.729 0.717 0.723 0.717 0.254
113
+ MetaAction-1e GP-S123 0.732 0.737 0.700 0.718 0.700 0.238
114
+ ADEnReward GB-S12 0.694 0.672 0.717 0.694 0.717 0.328
115
+ ADEnReward GB-S123 0.653 0.644 0.633 0.639 0.633 0.328
116
+ ADEnReward GP-S12 0.734 0.714 0.750 0.732 0.750 0.281
117
+ ADEnReward GP-S123 0.694 0.696 0.650 0.672 0.650 0.266
118
+ ```
119
+
120
+ - **P / R / F1** treat `CONSISTENT` as the positive class.
121
+ - **ΞΌP\|C** β€” mean critic `P(CONSISTENT)` on Gemini-CONSISTENT records (higher is better).
122
+ - **ΞΌP\|IC** β€” mean critic `P(CONSISTENT)` on Gemini-INCONSISTENT records (lower is better; the spread `ΞΌP|C βˆ’ ΞΌP|IC` β‰ˆ 0.45–0.53 across variants indicates the critic is well-discriminating despite a non-trivial decision-boundary error rate).
123
+
124
+ Best per planner: `GB-S12` for MetaAction-1e (0.764), `GP-S12` for ADEnReward (0.734). Adding S3 (scene-description corruption) to the training mix did not improve agreement on either planner in this benchmark.
125
+
126
+ ## Intended use
127
+
128
+ - Frozen reward model in GRPO/PPO planner fine-tuning where faithfulness of the (R, A, W) chain matters.
129
+ - Offline auditing of candidate planner outputs.
130
+ - Counterfactual-failure-mode analysis when paired with the variant ablation (S12 vs S123).
131
+
132
+ ## Out-of-scope use
133
+
134
+ - The critic is **not** a safety verifier. A `CONSISTENT` verdict means R/A/W are mutually self-consistent and consistent with the scene; it does **not** mean the trajectory is collision-free, comfortable, or legally compliant.
135
+ - The critic was trained on a US-centric driving dataset; performance on non-US driving cultures, weather conditions, or sensor configurations not present in the training set is unverified.
136
+ - Single-camera, single-frame input only β€” no temporal stack, no surround views.
137
+
138
+ ## Limitations
139
+
140
+ - Greedy decoding only in `generate` mode; the reward signal is best read via `logit` mode.
141
+ - The critic occasionally produces `null` (parse / render failure) when calibration parquets or camera frames are missing β€” see `n_critic_failure` in the eval summaries.
142
+ - Like the judge it's evaluated against, the critic can be confidently wrong on edge cases involving rare action combinations (lane-change-during-pull-over, etc.).
143
+
144
+ ## Files
145
+
146
+ ```
147
+ mjf-su/FaithfulnessCritic/
148
+ β”œβ”€β”€ GB-S12/ adapter_config.json + adapter_model.safetensors
149
+ β”œβ”€β”€ GB-S123/ ...
150
+ β”œβ”€β”€ GP-S12/ ...
151
+ └── GP-S123/ ...
152
+ ```
153
+
154
+ ## Citation
155
+
156
+ If you use this model, please cite the upstream dataset and base model:
157
+
158
+ ```bibtex
159
+ @misc{foutter_faithfulnesscritic_2026,
160
+ title = {FaithfulnessCritic: counterfactual-trained R/A/W consistency critics for vision-based driving planners},
161
+ author = {Foutter, Matthew and Cercola, Marco and Gammelli, Daniele},
162
+ year = {2026},
163
+ howpublished = {\url{https://huggingface.co/mjf-su/FaithfulnessCritic}},
164
+ }
165
+ ```