ConorWang commited on
Commit
6311fba
·
verified ·
1 Parent(s): 1d93299

Upload evidence binding adapter artifacts to evidence_adapter/

Browse files
.gitattributes CHANGED
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  toolspec_adapter/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  uncertainty_adapter/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  toolspec_adapter/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  uncertainty_adapter/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
39
+ evidence_adapter/tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
evidence_adapter/adapter/README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - lora
5
+ ---
6
+
7
+ # Model Card for Model ID
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+
19
+
20
+
21
+ - **Developed by:** [More Information Needed]
22
+ - **Funded by [optional]:** [More Information Needed]
23
+ - **Shared by [optional]:** [More Information Needed]
24
+ - **Model type:** [More Information Needed]
25
+ - **Language(s) (NLP):** [More Information Needed]
26
+ - **License:** [More Information Needed]
27
+ - **Finetuned from model [optional]:** [More Information Needed]
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** [More Information Needed]
34
+ - **Paper [optional]:** [More Information Needed]
35
+ - **Demo [optional]:** [More Information Needed]
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+
41
+ ### Direct Use
42
+
43
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
+
45
+ [More Information Needed]
46
+
47
+ ### Downstream Use [optional]
48
+
49
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
50
+
51
+ [More Information Needed]
52
+
53
+ ### Out-of-Scope Use
54
+
55
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
56
+
57
+ [More Information Needed]
58
+
59
+ ## Bias, Risks, and Limitations
60
+
61
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ### Recommendations
66
+
67
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
68
+
69
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ Use the code below to get started with the model.
74
+
75
+ [More Information Needed]
76
+
77
+ ## Training Details
78
+
79
+ ### Training Data
80
+
81
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
82
+
83
+ [More Information Needed]
84
+
85
+ ### Training Procedure
86
+
87
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
88
+
89
+ #### Preprocessing [optional]
90
+
91
+ [More Information Needed]
92
+
93
+
94
+ #### Training Hyperparameters
95
+
96
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
97
+
98
+ #### Speeds, Sizes, Times [optional]
99
+
100
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
101
+
102
+ [More Information Needed]
103
+
104
+ ## Evaluation
105
+
106
+ <!-- This section describes the evaluation protocols and provides the results. -->
107
+
108
+ ### Testing Data, Factors & Metrics
109
+
110
+ #### Testing Data
111
+
112
+ <!-- This should link to a Dataset Card if possible. -->
113
+
114
+ [More Information Needed]
115
+
116
+ #### Factors
117
+
118
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
119
+
120
+ [More Information Needed]
121
+
122
+ #### Metrics
123
+
124
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
125
+
126
+ [More Information Needed]
127
+
128
+ ### Results
129
+
130
+ [More Information Needed]
131
+
132
+ #### Summary
133
+
134
+
135
+
136
+ ## Model Examination [optional]
137
+
138
+ <!-- Relevant interpretability work for the model goes here -->
139
+
140
+ [More Information Needed]
141
+
142
+ ## Environmental Impact
143
+
144
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
145
+
146
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
147
+
148
+ - **Hardware Type:** [More Information Needed]
149
+ - **Hours used:** [More Information Needed]
150
+ - **Cloud Provider:** [More Information Needed]
151
+ - **Compute Region:** [More Information Needed]
152
+ - **Carbon Emitted:** [More Information Needed]
153
+
154
+ ## Technical Specifications [optional]
155
+
156
+ ### Model Architecture and Objective
157
+
158
+ [More Information Needed]
159
+
160
+ ### Compute Infrastructure
161
+
162
+ [More Information Needed]
163
+
164
+ #### Hardware
165
+
166
+ [More Information Needed]
167
+
168
+ #### Software
169
+
170
+ [More Information Needed]
171
+
172
+ ## Citation [optional]
173
+
174
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
175
+
176
+ **BibTeX:**
177
+
178
+ [More Information Needed]
179
+
180
+ **APA:**
181
+
182
+ [More Information Needed]
183
+
184
+ ## Glossary [optional]
185
+
186
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
187
+
188
+ [More Information Needed]
189
+
190
+ ## More Information [optional]
191
+
192
+ [More Information Needed]
193
+
194
+ ## Model Card Authors [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Contact
199
+
200
+ [More Information Needed]
201
+ ### Framework versions
202
+
203
+ - PEFT 0.19.0
evidence_adapter/adapter/adapter_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": null,
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 16,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "lora_ga_config": null,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": null,
26
+ "peft_type": "LORA",
27
+ "peft_version": "0.19.0",
28
+ "qalora_group_size": 16,
29
+ "r": 8,
30
+ "rank_pattern": {},
31
+ "revision": null,
32
+ "target_modules": [
33
+ "surface_host.evidence_binding.adapter",
34
+ "surface_host.tool_receipt_binding.adapter",
35
+ "surface_host.citation_binding.adapter",
36
+ "surface_host.reverse_engineering_binding.adapter",
37
+ "surface_host.runtime_binding.adapter",
38
+ "surface_host.validator_receipt_bridge.adapter",
39
+ "surface_host.selfcheck_binding.adapter",
40
+ "surface_host.execution_binding.adapter",
41
+ "surface_host.patch_binding.adapter",
42
+ "surface_host.proof_carrying_hints.bridge",
43
+ "surface_host.provenance_binding.adapter",
44
+ "surface_host.worktree_binding.adapter",
45
+ "surface_host.claim_extractor.adapter"
46
+ ],
47
+ "target_parameters": null,
48
+ "task_type": "FEATURE_EXTRACTION",
49
+ "trainable_token_indices": null,
50
+ "use_bdlora": null,
51
+ "use_dora": false,
52
+ "use_qalora": false,
53
+ "use_rslora": false
54
+ }
evidence_adapter/adapter/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d2e07f819fdde8c0954f3f6d21bd5877e26627d145d125a959669cfc6f9f38e
3
+ size 855584
evidence_adapter/best_checkpoint_manifest.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_epoch": 4,
3
+ "best_quality_score": 0.5592996196176252,
4
+ "eval_metrics": {
5
+ "avg_binary_accuracy": 0.8444444444444446,
6
+ "citation_binding_required_accuracy": 1.0,
7
+ "contradiction_visible_accuracy": 0.7692307692307693,
8
+ "count": 65,
9
+ "eval_batches": 65,
10
+ "eval_loss": 4.7871557712554935,
11
+ "execution_needed_accuracy": 0.8461538461538461,
12
+ "mode_accuracy": 0.5692307692307692,
13
+ "next_action_accuracy": 0.5538461538461539,
14
+ "patch_continuity_accuracy": 0.6461538461538462,
15
+ "proof_carrying_compatible_accuracy": 0.8,
16
+ "provenance_accuracy": 0.6461538461538462,
17
+ "quality_score": 0.5592996196176252,
18
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
19
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
20
+ "validator_required_accuracy": 1.0,
21
+ "verdict_accuracy": 0.6307692307692307,
22
+ "worktree_safe_accuracy": 0.9230769230769231
23
+ },
24
+ "train_metrics": {
25
+ "loss": 0.23536107725985758,
26
+ "micro_batches": 182,
27
+ "optimizer_steps": 12
28
+ }
29
+ }
evidence_adapter/epoch_history.json ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epochs": [
3
+ {
4
+ "epoch": 1,
5
+ "eval_metrics": {
6
+ "avg_binary_accuracy": 0.823931623931624,
7
+ "citation_binding_required_accuracy": 1.0,
8
+ "contradiction_visible_accuracy": 0.6461538461538462,
9
+ "count": 65,
10
+ "eval_batches": 65,
11
+ "eval_loss": 5.821129791553204,
12
+ "execution_needed_accuracy": 0.8461538461538461,
13
+ "mode_accuracy": 0.4461538461538462,
14
+ "next_action_accuracy": 0.4307692307692308,
15
+ "patch_continuity_accuracy": 0.6153846153846154,
16
+ "proof_carrying_compatible_accuracy": 0.7692307692307693,
17
+ "provenance_accuracy": 0.6153846153846154,
18
+ "quality_score": 0.4452868058783377,
19
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
20
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
21
+ "validator_required_accuracy": 1.0,
22
+ "verdict_accuracy": 0.47692307692307695,
23
+ "worktree_safe_accuracy": 0.9230769230769231
24
+ },
25
+ "improved": true,
26
+ "quality_score": 0.4452868058783377,
27
+ "train_metrics": {
28
+ "loss": 0.4255874376375597,
29
+ "micro_batches": 182,
30
+ "optimizer_steps": 12
31
+ }
32
+ },
33
+ {
34
+ "epoch": 2,
35
+ "eval_metrics": {
36
+ "avg_binary_accuracy": 0.8376068376068376,
37
+ "citation_binding_required_accuracy": 1.0,
38
+ "contradiction_visible_accuracy": 0.7384615384615385,
39
+ "count": 65,
40
+ "eval_batches": 65,
41
+ "eval_loss": 4.975605465815618,
42
+ "execution_needed_accuracy": 0.8461538461538461,
43
+ "mode_accuracy": 0.5538461538461539,
44
+ "next_action_accuracy": 0.5538461538461539,
45
+ "patch_continuity_accuracy": 0.6461538461538462,
46
+ "proof_carrying_compatible_accuracy": 0.7692307692307693,
47
+ "provenance_accuracy": 0.6,
48
+ "quality_score": 0.5249323351281322,
49
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
50
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
51
+ "validator_required_accuracy": 1.0,
52
+ "verdict_accuracy": 0.5692307692307692,
53
+ "worktree_safe_accuracy": 0.9230769230769231
54
+ },
55
+ "improved": true,
56
+ "quality_score": 0.5249323351281322,
57
+ "train_metrics": {
58
+ "loss": 0.28522092685267164,
59
+ "micro_batches": 182,
60
+ "optimizer_steps": 12
61
+ }
62
+ },
63
+ {
64
+ "epoch": 3,
65
+ "eval_metrics": {
66
+ "avg_binary_accuracy": 0.8444444444444446,
67
+ "citation_binding_required_accuracy": 1.0,
68
+ "contradiction_visible_accuracy": 0.7692307692307693,
69
+ "count": 65,
70
+ "eval_batches": 65,
71
+ "eval_loss": 4.801326029117291,
72
+ "execution_needed_accuracy": 0.8461538461538461,
73
+ "mode_accuracy": 0.5538461538461539,
74
+ "next_action_accuracy": 0.5538461538461539,
75
+ "patch_continuity_accuracy": 0.6461538461538462,
76
+ "proof_carrying_compatible_accuracy": 0.8,
77
+ "provenance_accuracy": 0.6307692307692307,
78
+ "quality_score": 0.5544008298450047,
79
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
80
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
81
+ "validator_required_accuracy": 1.0,
82
+ "verdict_accuracy": 0.6307692307692307,
83
+ "worktree_safe_accuracy": 0.9230769230769231
84
+ },
85
+ "improved": true,
86
+ "quality_score": 0.5544008298450047,
87
+ "train_metrics": {
88
+ "loss": 0.24240652721498038,
89
+ "micro_batches": 182,
90
+ "optimizer_steps": 12
91
+ }
92
+ },
93
+ {
94
+ "epoch": 4,
95
+ "eval_metrics": {
96
+ "avg_binary_accuracy": 0.8444444444444446,
97
+ "citation_binding_required_accuracy": 1.0,
98
+ "contradiction_visible_accuracy": 0.7692307692307693,
99
+ "count": 65,
100
+ "eval_batches": 65,
101
+ "eval_loss": 4.7871557712554935,
102
+ "execution_needed_accuracy": 0.8461538461538461,
103
+ "mode_accuracy": 0.5692307692307692,
104
+ "next_action_accuracy": 0.5538461538461539,
105
+ "patch_continuity_accuracy": 0.6461538461538462,
106
+ "proof_carrying_compatible_accuracy": 0.8,
107
+ "provenance_accuracy": 0.6461538461538462,
108
+ "quality_score": 0.5592996196176252,
109
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
110
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
111
+ "validator_required_accuracy": 1.0,
112
+ "verdict_accuracy": 0.6307692307692307,
113
+ "worktree_safe_accuracy": 0.9230769230769231
114
+ },
115
+ "improved": true,
116
+ "quality_score": 0.5592996196176252,
117
+ "train_metrics": {
118
+ "loss": 0.23536107725985758,
119
+ "micro_batches": 182,
120
+ "optimizer_steps": 12
121
+ }
122
+ }
123
+ ]
124
+ }
evidence_adapter/evidence_binding_adapter_plan.json ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backbone": "/public/wang_libo/veriloop_coder_e1/model",
3
+ "dataset_summary": {
4
+ "eval_size": 65,
5
+ "mode_vocab": [
6
+ "direct_support",
7
+ "multi_support",
8
+ "conflict_visible",
9
+ "evidence_gap",
10
+ "execution_needed",
11
+ "high_risk_unbound",
12
+ "validator_negation",
13
+ "patch_regression",
14
+ "worktree_conflict",
15
+ "tool_selfcheck_confirmed",
16
+ "tool_selfcheck_negated",
17
+ "reverse_engineering_bindable",
18
+ "reverse_engineering_gap"
19
+ ],
20
+ "modes": [
21
+ "conflict_visible",
22
+ "direct_support",
23
+ "evidence_gap",
24
+ "execution_needed",
25
+ "high_risk_unbound",
26
+ "multi_support",
27
+ "patch_regression",
28
+ "reverse_engineering_bindable",
29
+ "reverse_engineering_gap",
30
+ "tool_selfcheck_confirmed",
31
+ "tool_selfcheck_negated",
32
+ "validator_negation",
33
+ "worktree_conflict"
34
+ ],
35
+ "next_action_vocab": [
36
+ "none",
37
+ "validator_review",
38
+ "sandbox_exec",
39
+ "selfcheck_exec",
40
+ "bounded_observation",
41
+ "fail_closed",
42
+ "worktree_reconcile"
43
+ ],
44
+ "provenance_vocab": [
45
+ "inadequate",
46
+ "partial",
47
+ "adequate"
48
+ ],
49
+ "train_size": 182,
50
+ "verdict_vocab": [
51
+ "supported",
52
+ "conflicted",
53
+ "insufficient",
54
+ "execution_required"
55
+ ]
56
+ },
57
+ "excluded_surfaces": [
58
+ "(^|\\.)lm_head($|\\.)::Do not retune final token head; too broad and evaluation-heavy.",
59
+ "(^|\\.)embed_tokens($|\\.)::Embedding surgery risks broad semantic drift.",
60
+ "(^|\\.)norm($|\\.)::Global norm tuning can destabilize calibration across scenes.",
61
+ "attnres|attention_residual::Block AttnRes may be mounted structurally but is never a PEFT target.",
62
+ "dualpath::DualPath is serving/runtime infrastructure only.",
63
+ "mhc|hyper[-_]?connection::mHC-inspired stability hooks remain structural, not PEFT surfaces.",
64
+ "rope|rotary::RoPE/context surgery is handled architecturally, not by narrow PEFT here.",
65
+ "kvcache|kv_cache::KV-cache runtime surfaces are not PEFT targets.",
66
+ "(^|\\.)memory(_store|_bank)?($|\\.)::Persistent memory stores are harness/runtime policy surfaces, not PEFT targets."
67
+ ],
68
+ "notes": [
69
+ "Primary route is host-surface-first evidence-binding training.",
70
+ "Claim↔evidence fidelity, contradiction visibility, provenance discipline, validator receipts, execution-needed escalation, tool self-check compatibility, reverse-engineering boundedness, and proof-carrying hint obedience are first-class signals.",
71
+ "DualPath, Block AttnRes, mHC hooks, visual branches, and MoE routers/experts remain structurally excluded.",
72
+ "This adapter should improve evidence-gate obedience, not broad free-form coding behavior.",
73
+ "Target coverage is rooted in the full evidence-binding decision graph rather than the selector-only subset, so execution/tool/citation/reverse-engineering surfaces are not silently left untuned."
74
+ ],
75
+ "peft_method": "lora_narrow",
76
+ "product_line": "veriloop_coder",
77
+ "recipe": {
78
+ "adapter_family": "evidence_binding",
79
+ "backbone": "/public/wang_libo/veriloop_coder_e1/model",
80
+ "backbone_family": "qwen_dense",
81
+ "excluded_patterns": [
82
+ "(?i)\\bdualpath\\b",
83
+ "(?i)\\bmhc\\b",
84
+ "(?i)\\bfull[_\\- ]?attnres\\b",
85
+ "(?i)\\battnres(_full)?\\b",
86
+ "(?i)\\brouter\\b",
87
+ "(?i)\\bexperts?\\b",
88
+ "(?i)\\bmoe\\b.*\\b(gate|router|expert)\\b",
89
+ "(?i)\\brope\\b.*\\b(freq|inv_freq|theta|rotary)\\b",
90
+ "(?i)\\bkvcache\\b",
91
+ "(?i)\\bposition_embedding\\b",
92
+ "(?i)\\bembed(tokens|ding)?\\b",
93
+ "(?i)\\blm_head\\b"
94
+ ],
95
+ "harness_constraints": [
96
+ "Harness Engineering remains the primary convergence layer.",
97
+ "Adapter must not bypass runtime orchestrator / validator / rollback loops.",
98
+ "Adapter outputs remain subordinate to VeriLoop control-plane decisions.",
99
+ "Adapter must not create hidden prompt-style memory authority.",
100
+ "Adapter must support claim-evidence binding rather than generic retrieval verbosity.",
101
+ "Unbound claims must remain rejectable or demotable."
102
+ ],
103
+ "hyperparams": {
104
+ "alpha": 16,
105
+ "bias": "none",
106
+ "dropout": 0.05,
107
+ "fan_in_fan_out": false,
108
+ "modules_to_save": [],
109
+ "r": 8,
110
+ "task_type": "CAUSAL_LM"
111
+ },
112
+ "merge_policy": "merge_after_guard",
113
+ "metadata": {
114
+ "allow_backbone_bridge": false,
115
+ "allow_vla_action_expert": false,
116
+ "evidence_binding_training": true,
117
+ "harness_first": true,
118
+ "policy_target_floor_applied": true,
119
+ "prefer_explicit_heads": true,
120
+ "prefer_qlora_for_backbone_bridge": false,
121
+ "require_harness_first": true,
122
+ "reverse_engineering_readiness": true,
123
+ "selector_group_count": 1,
124
+ "strict_narrow_scope": true,
125
+ "tool_selfcheck_readiness": true,
126
+ "trainer": "veriloop.evidence_binding_adapter_trainer.v9.qwen36"
127
+ },
128
+ "notes": [
129
+ "Backbone bridge tuning disabled explicitly; selector stays on custom surfaces or no-op.",
130
+ "Backbone family inferred as qwen_dense.",
131
+ "PEFT method resolved as lora_narrow.",
132
+ "Recipe is harness-first: runtime convergence remains in VeriLoop control-plane + harness, not in broad weight surgery.",
133
+ "Block AttnRes, DualPath, mHC hooks, RoPE, KV-cache, and broad MoE routing remain structurally excluded."
134
+ ],
135
+ "peft_method": "lora_narrow",
136
+ "precision_policy": "auto",
137
+ "product_line": "veriloop_coder",
138
+ "regression_requirements": [
139
+ "Must pass PEFT regression guard structural policy checks.",
140
+ "Must not introduce forbidden backbone/serving structural targets.",
141
+ "Must preserve harness regression envelope for the selected product line.",
142
+ "Evidence-conclusion alignment must not regress.",
143
+ "High-risk fabrication rate must not increase."
144
+ ],
145
+ "target_groups": [
146
+ {
147
+ "alpha": 16,
148
+ "dropout": 0.0,
149
+ "name": "group_1_custom_control_head",
150
+ "rank": 8,
151
+ "rationale": "Evidence alignment should land on explicit binding surfaces first.",
152
+ "surface": "custom_control_head",
153
+ "target_modules": [
154
+ "claim_extractor.adapter",
155
+ "evidence_binding.adapter",
156
+ "proof_carrying_hints.bridge"
157
+ ]
158
+ },
159
+ {
160
+ "alpha": 16,
161
+ "dropout": 0.0,
162
+ "name": "group_policy_expanded_evidence_binding_surface_set",
163
+ "rank": 8,
164
+ "rationale": "Expand selector-narrow targets to the full host-side evidence-binding decision graph used by verdict/provenance/next-action heads.",
165
+ "surface": "policy_expanded_evidence_binding_surface_set",
166
+ "target_modules": [
167
+ "provenance_binding.adapter",
168
+ "validator_receipt_bridge.adapter",
169
+ "tool_receipt_binding.adapter",
170
+ "execution_binding.adapter",
171
+ "citation_binding.adapter",
172
+ "runtime_binding.adapter",
173
+ "selfcheck_binding.adapter",
174
+ "reverse_engineering_binding.adapter",
175
+ "patch_binding.adapter",
176
+ "worktree_binding.adapter"
177
+ ]
178
+ }
179
+ ],
180
+ "target_modules": [
181
+ "claim_extractor.adapter",
182
+ "evidence_binding.adapter",
183
+ "proof_carrying_hints.bridge",
184
+ "provenance_binding.adapter",
185
+ "validator_receipt_bridge.adapter",
186
+ "tool_receipt_binding.adapter",
187
+ "execution_binding.adapter",
188
+ "citation_binding.adapter",
189
+ "runtime_binding.adapter",
190
+ "selfcheck_binding.adapter",
191
+ "reverse_engineering_binding.adapter",
192
+ "patch_binding.adapter",
193
+ "worktree_binding.adapter"
194
+ ],
195
+ "version": "veriloop.lora_recipe_veriloop.v2"
196
+ },
197
+ "selected_surfaces": [
198
+ "custom_control_head",
199
+ "policy_expanded_evidence_binding_surface_set"
200
+ ],
201
+ "selected_target_modules": [
202
+ "claim_extractor.adapter",
203
+ "evidence_binding.adapter",
204
+ "proof_carrying_hints.bridge",
205
+ "provenance_binding.adapter",
206
+ "validator_receipt_bridge.adapter",
207
+ "tool_receipt_binding.adapter",
208
+ "execution_binding.adapter",
209
+ "citation_binding.adapter",
210
+ "runtime_binding.adapter",
211
+ "selfcheck_binding.adapter",
212
+ "reverse_engineering_binding.adapter",
213
+ "patch_binding.adapter",
214
+ "worktree_binding.adapter"
215
+ ],
216
+ "selection_mode": "minimal",
217
+ "version": "veriloop.evidence_binding_adapter_trainer.v9.qwen36",
218
+ "warnings": [
219
+ "Harness Engineering is primary; PEFT is limited to obedience-facing, interface-facing support surfaces.",
220
+ "Backbone bridge tuning disabled explicitly; selector stays on custom surfaces or no-op.",
221
+ "Selector target set was narrower than the evidence-binding decision graph; host-side policy floor expanded the PEFT targets."
222
+ ]
223
+ }
evidence_adapter/evidence_binding_adapter_train_result.json ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "artifacts": {
3
+ "adapter_dir": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/adapter",
4
+ "best_checkpoint_manifest": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/best_checkpoint_manifest.json",
5
+ "epoch_history": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/epoch_history.json",
6
+ "eval_jsonl": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/evidence_binding_eval.jsonl",
7
+ "evidence_binding_head": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/evidence_binding_head.pt",
8
+ "host_manifest": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/host_manifest.json",
9
+ "plan_json": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/evidence_binding_adapter_plan.json",
10
+ "tokenizer_dir": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/tokenizer",
11
+ "train_jsonl": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/evidence_binding_train.jsonl",
12
+ "training_manifest": "/private/wang_libo/veriloop_coder_e1/outputs/evidence_binding_qwen36_run1/evidence_binding_training_manifest.json"
13
+ },
14
+ "dataset": {
15
+ "eval_size": 65,
16
+ "mode_vocab": [
17
+ "direct_support",
18
+ "multi_support",
19
+ "conflict_visible",
20
+ "evidence_gap",
21
+ "execution_needed",
22
+ "high_risk_unbound",
23
+ "validator_negation",
24
+ "patch_regression",
25
+ "worktree_conflict",
26
+ "tool_selfcheck_confirmed",
27
+ "tool_selfcheck_negated",
28
+ "reverse_engineering_bindable",
29
+ "reverse_engineering_gap"
30
+ ],
31
+ "modes": [
32
+ "conflict_visible",
33
+ "direct_support",
34
+ "evidence_gap",
35
+ "execution_needed",
36
+ "high_risk_unbound",
37
+ "multi_support",
38
+ "patch_regression",
39
+ "reverse_engineering_bindable",
40
+ "reverse_engineering_gap",
41
+ "tool_selfcheck_confirmed",
42
+ "tool_selfcheck_negated",
43
+ "validator_negation",
44
+ "worktree_conflict"
45
+ ],
46
+ "next_action_vocab": [
47
+ "none",
48
+ "validator_review",
49
+ "sandbox_exec",
50
+ "selfcheck_exec",
51
+ "bounded_observation",
52
+ "fail_closed",
53
+ "worktree_reconcile"
54
+ ],
55
+ "provenance_vocab": [
56
+ "inadequate",
57
+ "partial",
58
+ "adequate"
59
+ ],
60
+ "train_size": 182,
61
+ "verdict_vocab": [
62
+ "supported",
63
+ "conflicted",
64
+ "insufficient",
65
+ "execution_required"
66
+ ]
67
+ },
68
+ "eval_metrics": {
69
+ "adapter_exported": true,
70
+ "auto_lora_from_ia3": false,
71
+ "avg_binary_accuracy": 0.8444444444444446,
72
+ "best_epoch": 4,
73
+ "best_quality_score": 0.5592996196176252,
74
+ "citation_binding_required_accuracy": 1.0,
75
+ "contradiction_visible_accuracy": 0.7692307692307693,
76
+ "count": 65,
77
+ "eval_batches": 65,
78
+ "eval_loss": 4.7871557712554935,
79
+ "execution_needed_accuracy": 0.8461538461538461,
80
+ "mode_accuracy": 0.5692307692307692,
81
+ "next_action_accuracy": 0.5538461538461539,
82
+ "patch_continuity_accuracy": 0.6461538461538462,
83
+ "peft_method": "lora_narrow",
84
+ "proof_carrying_compatible_accuracy": 0.8,
85
+ "provenance_accuracy": 0.6461538461538462,
86
+ "quality_score": 0.5592996196176252,
87
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
88
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
89
+ "used_peft": true,
90
+ "validator_required_accuracy": 1.0,
91
+ "verdict_accuracy": 0.6307692307692307,
92
+ "worktree_safe_accuracy": 0.9230769230769231
93
+ },
94
+ "plan": {
95
+ "backbone": "/public/wang_libo/veriloop_coder_e1/model",
96
+ "dataset_summary": {
97
+ "eval_size": 65,
98
+ "mode_vocab": [
99
+ "direct_support",
100
+ "multi_support",
101
+ "conflict_visible",
102
+ "evidence_gap",
103
+ "execution_needed",
104
+ "high_risk_unbound",
105
+ "validator_negation",
106
+ "patch_regression",
107
+ "worktree_conflict",
108
+ "tool_selfcheck_confirmed",
109
+ "tool_selfcheck_negated",
110
+ "reverse_engineering_bindable",
111
+ "reverse_engineering_gap"
112
+ ],
113
+ "modes": [
114
+ "conflict_visible",
115
+ "direct_support",
116
+ "evidence_gap",
117
+ "execution_needed",
118
+ "high_risk_unbound",
119
+ "multi_support",
120
+ "patch_regression",
121
+ "reverse_engineering_bindable",
122
+ "reverse_engineering_gap",
123
+ "tool_selfcheck_confirmed",
124
+ "tool_selfcheck_negated",
125
+ "validator_negation",
126
+ "worktree_conflict"
127
+ ],
128
+ "next_action_vocab": [
129
+ "none",
130
+ "validator_review",
131
+ "sandbox_exec",
132
+ "selfcheck_exec",
133
+ "bounded_observation",
134
+ "fail_closed",
135
+ "worktree_reconcile"
136
+ ],
137
+ "provenance_vocab": [
138
+ "inadequate",
139
+ "partial",
140
+ "adequate"
141
+ ],
142
+ "train_size": 182,
143
+ "verdict_vocab": [
144
+ "supported",
145
+ "conflicted",
146
+ "insufficient",
147
+ "execution_required"
148
+ ]
149
+ },
150
+ "excluded_surfaces": [
151
+ "(^|\\.)lm_head($|\\.)::Do not retune final token head; too broad and evaluation-heavy.",
152
+ "(^|\\.)embed_tokens($|\\.)::Embedding surgery risks broad semantic drift.",
153
+ "(^|\\.)norm($|\\.)::Global norm tuning can destabilize calibration across scenes.",
154
+ "attnres|attention_residual::Block AttnRes may be mounted structurally but is never a PEFT target.",
155
+ "dualpath::DualPath is serving/runtime infrastructure only.",
156
+ "mhc|hyper[-_]?connection::mHC-inspired stability hooks remain structural, not PEFT surfaces.",
157
+ "rope|rotary::RoPE/context surgery is handled architecturally, not by narrow PEFT here.",
158
+ "kvcache|kv_cache::KV-cache runtime surfaces are not PEFT targets.",
159
+ "(^|\\.)memory(_store|_bank)?($|\\.)::Persistent memory stores are harness/runtime policy surfaces, not PEFT targets."
160
+ ],
161
+ "notes": [
162
+ "Primary route is host-surface-first evidence-binding training.",
163
+ "Claim↔evidence fidelity, contradiction visibility, provenance discipline, validator receipts, execution-needed escalation, tool self-check compatibility, reverse-engineering boundedness, and proof-carrying hint obedience are first-class signals.",
164
+ "DualPath, Block AttnRes, mHC hooks, visual branches, and MoE routers/experts remain structurally excluded.",
165
+ "This adapter should improve evidence-gate obedience, not broad free-form coding behavior.",
166
+ "Target coverage is rooted in the full evidence-binding decision graph rather than the selector-only subset, so execution/tool/citation/reverse-engineering surfaces are not silently left untuned."
167
+ ],
168
+ "peft_method": "lora_narrow",
169
+ "product_line": "veriloop_coder",
170
+ "recipe": {
171
+ "adapter_family": "evidence_binding",
172
+ "backbone": "/public/wang_libo/veriloop_coder_e1/model",
173
+ "backbone_family": "qwen_dense",
174
+ "excluded_patterns": [
175
+ "(?i)\\bdualpath\\b",
176
+ "(?i)\\bmhc\\b",
177
+ "(?i)\\bfull[_\\- ]?attnres\\b",
178
+ "(?i)\\battnres(_full)?\\b",
179
+ "(?i)\\brouter\\b",
180
+ "(?i)\\bexperts?\\b",
181
+ "(?i)\\bmoe\\b.*\\b(gate|router|expert)\\b",
182
+ "(?i)\\brope\\b.*\\b(freq|inv_freq|theta|rotary)\\b",
183
+ "(?i)\\bkvcache\\b",
184
+ "(?i)\\bposition_embedding\\b",
185
+ "(?i)\\bembed(tokens|ding)?\\b",
186
+ "(?i)\\blm_head\\b"
187
+ ],
188
+ "harness_constraints": [
189
+ "Harness Engineering remains the primary convergence layer.",
190
+ "Adapter must not bypass runtime orchestrator / validator / rollback loops.",
191
+ "Adapter outputs remain subordinate to VeriLoop control-plane decisions.",
192
+ "Adapter must not create hidden prompt-style memory authority.",
193
+ "Adapter must support claim-evidence binding rather than generic retrieval verbosity.",
194
+ "Unbound claims must remain rejectable or demotable."
195
+ ],
196
+ "hyperparams": {
197
+ "alpha": 16,
198
+ "bias": "none",
199
+ "dropout": 0.05,
200
+ "fan_in_fan_out": false,
201
+ "modules_to_save": [],
202
+ "r": 8,
203
+ "task_type": "CAUSAL_LM"
204
+ },
205
+ "merge_policy": "merge_after_guard",
206
+ "metadata": {
207
+ "allow_backbone_bridge": false,
208
+ "allow_vla_action_expert": false,
209
+ "evidence_binding_training": true,
210
+ "harness_first": true,
211
+ "policy_target_floor_applied": true,
212
+ "prefer_explicit_heads": true,
213
+ "prefer_qlora_for_backbone_bridge": false,
214
+ "require_harness_first": true,
215
+ "reverse_engineering_readiness": true,
216
+ "selector_group_count": 1,
217
+ "strict_narrow_scope": true,
218
+ "tool_selfcheck_readiness": true,
219
+ "trainer": "veriloop.evidence_binding_adapter_trainer.v9.qwen36"
220
+ },
221
+ "notes": [
222
+ "Backbone bridge tuning disabled explicitly; selector stays on custom surfaces or no-op.",
223
+ "Backbone family inferred as qwen_dense.",
224
+ "PEFT method resolved as lora_narrow.",
225
+ "Recipe is harness-first: runtime convergence remains in VeriLoop control-plane + harness, not in broad weight surgery.",
226
+ "Block AttnRes, DualPath, mHC hooks, RoPE, KV-cache, and broad MoE routing remain structurally excluded."
227
+ ],
228
+ "peft_method": "lora_narrow",
229
+ "precision_policy": "auto",
230
+ "product_line": "veriloop_coder",
231
+ "regression_requirements": [
232
+ "Must pass PEFT regression guard structural policy checks.",
233
+ "Must not introduce forbidden backbone/serving structural targets.",
234
+ "Must preserve harness regression envelope for the selected product line.",
235
+ "Evidence-conclusion alignment must not regress.",
236
+ "High-risk fabrication rate must not increase."
237
+ ],
238
+ "target_groups": [
239
+ {
240
+ "alpha": 16,
241
+ "dropout": 0.0,
242
+ "name": "group_1_custom_control_head",
243
+ "rank": 8,
244
+ "rationale": "Evidence alignment should land on explicit binding surfaces first.",
245
+ "surface": "custom_control_head",
246
+ "target_modules": [
247
+ "claim_extractor.adapter",
248
+ "evidence_binding.adapter",
249
+ "proof_carrying_hints.bridge"
250
+ ]
251
+ },
252
+ {
253
+ "alpha": 16,
254
+ "dropout": 0.0,
255
+ "name": "group_policy_expanded_evidence_binding_surface_set",
256
+ "rank": 8,
257
+ "rationale": "Expand selector-narrow targets to the full host-side evidence-binding decision graph used by verdict/provenance/next-action heads.",
258
+ "surface": "policy_expanded_evidence_binding_surface_set",
259
+ "target_modules": [
260
+ "provenance_binding.adapter",
261
+ "validator_receipt_bridge.adapter",
262
+ "tool_receipt_binding.adapter",
263
+ "execution_binding.adapter",
264
+ "citation_binding.adapter",
265
+ "runtime_binding.adapter",
266
+ "selfcheck_binding.adapter",
267
+ "reverse_engineering_binding.adapter",
268
+ "patch_binding.adapter",
269
+ "worktree_binding.adapter"
270
+ ]
271
+ }
272
+ ],
273
+ "target_modules": [
274
+ "claim_extractor.adapter",
275
+ "evidence_binding.adapter",
276
+ "proof_carrying_hints.bridge",
277
+ "provenance_binding.adapter",
278
+ "validator_receipt_bridge.adapter",
279
+ "tool_receipt_binding.adapter",
280
+ "execution_binding.adapter",
281
+ "citation_binding.adapter",
282
+ "runtime_binding.adapter",
283
+ "selfcheck_binding.adapter",
284
+ "reverse_engineering_binding.adapter",
285
+ "patch_binding.adapter",
286
+ "worktree_binding.adapter"
287
+ ],
288
+ "version": "veriloop.lora_recipe_veriloop.v2"
289
+ },
290
+ "selected_surfaces": [
291
+ "custom_control_head",
292
+ "policy_expanded_evidence_binding_surface_set"
293
+ ],
294
+ "selected_target_modules": [
295
+ "claim_extractor.adapter",
296
+ "evidence_binding.adapter",
297
+ "proof_carrying_hints.bridge",
298
+ "provenance_binding.adapter",
299
+ "validator_receipt_bridge.adapter",
300
+ "tool_receipt_binding.adapter",
301
+ "execution_binding.adapter",
302
+ "citation_binding.adapter",
303
+ "runtime_binding.adapter",
304
+ "selfcheck_binding.adapter",
305
+ "reverse_engineering_binding.adapter",
306
+ "patch_binding.adapter",
307
+ "worktree_binding.adapter"
308
+ ],
309
+ "selection_mode": "minimal",
310
+ "version": "veriloop.evidence_binding_adapter_trainer.v9.qwen36",
311
+ "warnings": [
312
+ "Harness Engineering is primary; PEFT is limited to obedience-facing, interface-facing support surfaces.",
313
+ "Backbone bridge tuning disabled explicitly; selector stays on custom surfaces or no-op.",
314
+ "Selector target set was narrower than the evidence-binding decision graph; host-side policy floor expanded the PEFT targets."
315
+ ]
316
+ },
317
+ "status": "trained",
318
+ "train_metrics": {
319
+ "adapter_exported": true,
320
+ "auto_lora_from_ia3": false,
321
+ "best_epoch": 4,
322
+ "best_quality_score": 0.5592996196176252,
323
+ "epochs_completed": 4,
324
+ "loss": 0.23536107725985758,
325
+ "micro_batches": 182,
326
+ "micro_batches_total": 728,
327
+ "optimizer_steps": 12,
328
+ "optimizer_steps_total": 48,
329
+ "peft_method": "lora_narrow",
330
+ "used_peft": true
331
+ },
332
+ "version": "veriloop.evidence_binding_adapter_trainer.v9.qwen36",
333
+ "warnings": [
334
+ "Harness Engineering is primary; PEFT is limited to obedience-facing, interface-facing support surfaces.",
335
+ "Backbone bridge tuning disabled explicitly; selector stays on custom surfaces or no-op.",
336
+ "Selector target set was narrower than the evidence-binding decision graph; host-side policy floor expanded the PEFT targets."
337
+ ]
338
+ }
evidence_adapter/evidence_binding_eval.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
evidence_adapter/evidence_binding_head.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e85ad23a3e89f3d07cbbe6352f185277569160e403a41acdbf36f3fb1e182bc0
3
+ size 340358929
evidence_adapter/evidence_binding_train.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
evidence_adapter/evidence_binding_training_manifest.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "adapter_exported": true,
3
+ "dataset_summary": {
4
+ "eval_size": 65,
5
+ "mode_vocab": [
6
+ "direct_support",
7
+ "multi_support",
8
+ "conflict_visible",
9
+ "evidence_gap",
10
+ "execution_needed",
11
+ "high_risk_unbound",
12
+ "validator_negation",
13
+ "patch_regression",
14
+ "worktree_conflict",
15
+ "tool_selfcheck_confirmed",
16
+ "tool_selfcheck_negated",
17
+ "reverse_engineering_bindable",
18
+ "reverse_engineering_gap"
19
+ ],
20
+ "modes": [
21
+ "conflict_visible",
22
+ "direct_support",
23
+ "evidence_gap",
24
+ "execution_needed",
25
+ "high_risk_unbound",
26
+ "multi_support",
27
+ "patch_regression",
28
+ "reverse_engineering_bindable",
29
+ "reverse_engineering_gap",
30
+ "tool_selfcheck_confirmed",
31
+ "tool_selfcheck_negated",
32
+ "validator_negation",
33
+ "worktree_conflict"
34
+ ],
35
+ "next_action_vocab": [
36
+ "none",
37
+ "validator_review",
38
+ "sandbox_exec",
39
+ "selfcheck_exec",
40
+ "bounded_observation",
41
+ "fail_closed",
42
+ "worktree_reconcile"
43
+ ],
44
+ "provenance_vocab": [
45
+ "inadequate",
46
+ "partial",
47
+ "adequate"
48
+ ],
49
+ "train_size": 182,
50
+ "verdict_vocab": [
51
+ "supported",
52
+ "conflicted",
53
+ "insufficient",
54
+ "execution_required"
55
+ ]
56
+ },
57
+ "eval_metrics": {
58
+ "adapter_exported": true,
59
+ "auto_lora_from_ia3": false,
60
+ "avg_binary_accuracy": 0.8444444444444446,
61
+ "best_epoch": 4,
62
+ "best_quality_score": 0.5592996196176252,
63
+ "citation_binding_required_accuracy": 1.0,
64
+ "contradiction_visible_accuracy": 0.7692307692307693,
65
+ "count": 65,
66
+ "eval_batches": 65,
67
+ "eval_loss": 4.7871557712554935,
68
+ "execution_needed_accuracy": 0.8461538461538461,
69
+ "mode_accuracy": 0.5692307692307692,
70
+ "next_action_accuracy": 0.5538461538461539,
71
+ "patch_continuity_accuracy": 0.6461538461538462,
72
+ "peft_method": "lora_narrow",
73
+ "proof_carrying_compatible_accuracy": 0.8,
74
+ "provenance_accuracy": 0.6461538461538462,
75
+ "quality_score": 0.5592996196176252,
76
+ "reverse_engineering_ready_accuracy": 0.8461538461538461,
77
+ "tool_selfcheck_needed_accuracy": 0.7692307692307693,
78
+ "used_peft": true,
79
+ "validator_required_accuracy": 1.0,
80
+ "verdict_accuracy": 0.6307692307692307,
81
+ "worktree_safe_accuracy": 0.9230769230769231
82
+ },
83
+ "load_meta": {
84
+ "chosen_class": "AutoModelForCausalLM",
85
+ "hidden_size": 2048,
86
+ "quantization_mode": "4bit"
87
+ },
88
+ "peft_method": "lora_narrow",
89
+ "selected_target_modules": [
90
+ "claim_extractor.adapter",
91
+ "evidence_binding.adapter",
92
+ "proof_carrying_hints.bridge",
93
+ "provenance_binding.adapter",
94
+ "validator_receipt_bridge.adapter",
95
+ "tool_receipt_binding.adapter",
96
+ "execution_binding.adapter",
97
+ "citation_binding.adapter",
98
+ "runtime_binding.adapter",
99
+ "selfcheck_binding.adapter",
100
+ "reverse_engineering_binding.adapter",
101
+ "patch_binding.adapter",
102
+ "worktree_binding.adapter"
103
+ ],
104
+ "status": "trained",
105
+ "train_metrics": {
106
+ "adapter_exported": true,
107
+ "auto_lora_from_ia3": false,
108
+ "best_epoch": 4,
109
+ "best_quality_score": 0.5592996196176252,
110
+ "epochs_completed": 4,
111
+ "loss": 0.23536107725985758,
112
+ "micro_batches": 182,
113
+ "micro_batches_total": 728,
114
+ "optimizer_steps": 12,
115
+ "optimizer_steps_total": 48,
116
+ "peft_method": "lora_narrow",
117
+ "used_peft": true
118
+ },
119
+ "used_peft": true
120
+ }
evidence_adapter/host_manifest.json ADDED
@@ -0,0 +1,268 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "hidden_size": 2048,
3
+ "host_config": {
4
+ "attn_implementation": null,
5
+ "backbone_name_or_path": "/public/wang_libo/veriloop_coder_e1/model",
6
+ "device_map": null,
7
+ "dtype": null,
8
+ "evidence_rank_hint": 8,
9
+ "expose_backbone_inventory": false,
10
+ "freeze_backbone": true,
11
+ "hidden_size_override": 2048,
12
+ "host_dropout": 0.0,
13
+ "identity_rank_hint": 8,
14
+ "load_backbone_weights": false,
15
+ "local_files_only": true,
16
+ "low_cpu_mem_usage": true,
17
+ "memory_rank_hint": 4,
18
+ "rollback_rank_hint": 8,
19
+ "runtime_rank_hint": 8,
20
+ "toolspec_rank_hint": 8,
21
+ "trust_remote_code": true,
22
+ "uncertainty_rank_hint": 8,
23
+ "use_safetensors": null,
24
+ "validator_rank_hint": 8
25
+ },
26
+ "load_result": {
27
+ "has_base_config": true,
28
+ "has_base_model": true,
29
+ "hidden_size": 2048,
30
+ "notes": [
31
+ "class=AutoModelForCausalLM",
32
+ "quant=4bit"
33
+ ],
34
+ "source": "trainer_qwen36_loader"
35
+ },
36
+ "peft_named_modules": [
37
+ "citation_binding",
38
+ "citation_binding.adapter",
39
+ "citation_binding.adapter.base_layer",
40
+ "citation_binding.adapter.lora_A",
41
+ "citation_binding.adapter.lora_A.default",
42
+ "citation_binding.adapter.lora_B",
43
+ "citation_binding.adapter.lora_B.default",
44
+ "citation_binding.adapter.lora_dropout",
45
+ "citation_binding.adapter.lora_dropout.default",
46
+ "citation_binding.adapter.lora_embedding_A",
47
+ "citation_binding.adapter.lora_embedding_B",
48
+ "citation_binding.adapter.lora_magnitude_vector",
49
+ "claim_extractor",
50
+ "claim_extractor.adapter",
51
+ "claim_extractor.adapter.base_layer",
52
+ "claim_extractor.adapter.lora_A",
53
+ "claim_extractor.adapter.lora_A.default",
54
+ "claim_extractor.adapter.lora_B",
55
+ "claim_extractor.adapter.lora_B.default",
56
+ "claim_extractor.adapter.lora_dropout",
57
+ "claim_extractor.adapter.lora_dropout.default",
58
+ "claim_extractor.adapter.lora_embedding_A",
59
+ "claim_extractor.adapter.lora_embedding_B",
60
+ "claim_extractor.adapter.lora_magnitude_vector",
61
+ "dropout",
62
+ "episodic_memory",
63
+ "episodic_memory.adapter",
64
+ "evidence_binding",
65
+ "evidence_binding.adapter",
66
+ "evidence_binding.adapter.base_layer",
67
+ "evidence_binding.adapter.lora_A",
68
+ "evidence_binding.adapter.lora_A.default",
69
+ "evidence_binding.adapter.lora_B",
70
+ "evidence_binding.adapter.lora_B.default",
71
+ "evidence_binding.adapter.lora_dropout",
72
+ "evidence_binding.adapter.lora_dropout.default",
73
+ "evidence_binding.adapter.lora_embedding_A",
74
+ "evidence_binding.adapter.lora_embedding_B",
75
+ "evidence_binding.adapter.lora_magnitude_vector",
76
+ "execution_binding",
77
+ "execution_binding.adapter",
78
+ "execution_binding.adapter.base_layer",
79
+ "execution_binding.adapter.lora_A",
80
+ "execution_binding.adapter.lora_A.default",
81
+ "execution_binding.adapter.lora_B",
82
+ "execution_binding.adapter.lora_B.default",
83
+ "execution_binding.adapter.lora_dropout",
84
+ "execution_binding.adapter.lora_dropout.default",
85
+ "execution_binding.adapter.lora_embedding_A",
86
+ "execution_binding.adapter.lora_embedding_B",
87
+ "execution_binding.adapter.lora_magnitude_vector",
88
+ "failure_signal_bridge",
89
+ "failure_signal_bridge.rollback_bridge",
90
+ "identity_adapter",
91
+ "identity_adapter.bridge",
92
+ "identity_guard",
93
+ "identity_guard.adapter",
94
+ "input_norm",
95
+ "memory_boundary_guard",
96
+ "memory_boundary_guard.adapter",
97
+ "memory_boundary_guard.rollback_filter",
98
+ "patch_binding",
99
+ "patch_binding.adapter",
100
+ "patch_binding.adapter.base_layer",
101
+ "patch_binding.adapter.lora_A",
102
+ "patch_binding.adapter.lora_A.default",
103
+ "patch_binding.adapter.lora_B",
104
+ "patch_binding.adapter.lora_B.default",
105
+ "patch_binding.adapter.lora_dropout",
106
+ "patch_binding.adapter.lora_dropout.default",
107
+ "patch_binding.adapter.lora_embedding_A",
108
+ "patch_binding.adapter.lora_embedding_B",
109
+ "patch_binding.adapter.lora_magnitude_vector",
110
+ "permission_context_manager",
111
+ "permission_context_manager.adapter",
112
+ "progress_state_tracker",
113
+ "progress_state_tracker.adapter",
114
+ "progress_state_tracker.rollback_memory",
115
+ "proof_carrying_hints",
116
+ "proof_carrying_hints.bridge",
117
+ "proof_carrying_hints.bridge.base_layer",
118
+ "proof_carrying_hints.bridge.lora_A",
119
+ "proof_carrying_hints.bridge.lora_A.default",
120
+ "proof_carrying_hints.bridge.lora_B",
121
+ "proof_carrying_hints.bridge.lora_B.default",
122
+ "proof_carrying_hints.bridge.lora_dropout",
123
+ "proof_carrying_hints.bridge.lora_dropout.default",
124
+ "proof_carrying_hints.bridge.lora_embedding_A",
125
+ "proof_carrying_hints.bridge.lora_embedding_B",
126
+ "proof_carrying_hints.bridge.lora_magnitude_vector",
127
+ "provenance_binding",
128
+ "provenance_binding.adapter",
129
+ "provenance_binding.adapter.base_layer",
130
+ "provenance_binding.adapter.lora_A",
131
+ "provenance_binding.adapter.lora_A.default",
132
+ "provenance_binding.adapter.lora_B",
133
+ "provenance_binding.adapter.lora_B.default",
134
+ "provenance_binding.adapter.lora_dropout",
135
+ "provenance_binding.adapter.lora_dropout.default",
136
+ "provenance_binding.adapter.lora_embedding_A",
137
+ "provenance_binding.adapter.lora_embedding_B",
138
+ "provenance_binding.adapter.lora_magnitude_vector",
139
+ "public_identity_head",
140
+ "public_identity_head.proj",
141
+ "query_runtime_engine",
142
+ "query_runtime_engine.adapter",
143
+ "request_normalizer",
144
+ "request_normalizer.adapter",
145
+ "reverse_engineering_binding",
146
+ "reverse_engineering_binding.adapter",
147
+ "reverse_engineering_binding.adapter.base_layer",
148
+ "reverse_engineering_binding.adapter.lora_A",
149
+ "reverse_engineering_binding.adapter.lora_A.default",
150
+ "reverse_engineering_binding.adapter.lora_B",
151
+ "reverse_engineering_binding.adapter.lora_B.default",
152
+ "reverse_engineering_binding.adapter.lora_dropout",
153
+ "reverse_engineering_binding.adapter.lora_dropout.default",
154
+ "reverse_engineering_binding.adapter.lora_embedding_A",
155
+ "reverse_engineering_binding.adapter.lora_embedding_B",
156
+ "reverse_engineering_binding.adapter.lora_magnitude_vector",
157
+ "rollback_adapter",
158
+ "rollback_adapter.head",
159
+ "rollback_engine",
160
+ "rollback_engine.adapter",
161
+ "runtime_binding",
162
+ "runtime_binding.adapter",
163
+ "runtime_binding.adapter.base_layer",
164
+ "runtime_binding.adapter.lora_A",
165
+ "runtime_binding.adapter.lora_A.default",
166
+ "runtime_binding.adapter.lora_B",
167
+ "runtime_binding.adapter.lora_B.default",
168
+ "runtime_binding.adapter.lora_dropout",
169
+ "runtime_binding.adapter.lora_dropout.default",
170
+ "runtime_binding.adapter.lora_embedding_A",
171
+ "runtime_binding.adapter.lora_embedding_B",
172
+ "runtime_binding.adapter.lora_magnitude_vector",
173
+ "runtime_harness_adapter",
174
+ "runtime_harness_adapter.bridge",
175
+ "runtime_harness_uncertainty_bridge",
176
+ "runtime_harness_uncertainty_bridge.adapter",
177
+ "sandbox_rollback_bridge",
178
+ "sandbox_rollback_bridge.adapter",
179
+ "selfcheck_binding",
180
+ "selfcheck_binding.adapter",
181
+ "selfcheck_binding.adapter.base_layer",
182
+ "selfcheck_binding.adapter.lora_A",
183
+ "selfcheck_binding.adapter.lora_A.default",
184
+ "selfcheck_binding.adapter.lora_B",
185
+ "selfcheck_binding.adapter.lora_B.default",
186
+ "selfcheck_binding.adapter.lora_dropout",
187
+ "selfcheck_binding.adapter.lora_dropout.default",
188
+ "selfcheck_binding.adapter.lora_embedding_A",
189
+ "selfcheck_binding.adapter.lora_embedding_B",
190
+ "selfcheck_binding.adapter.lora_magnitude_vector",
191
+ "session_compactor",
192
+ "session_compactor.adapter",
193
+ "session_state_manager",
194
+ "session_state_manager.adapter",
195
+ "session_state_manager.rollback_state",
196
+ "tool_protocol_adapter",
197
+ "tool_protocol_adapter.bridge",
198
+ "tool_receipt_binding",
199
+ "tool_receipt_binding.adapter",
200
+ "tool_receipt_binding.adapter.base_layer",
201
+ "tool_receipt_binding.adapter.lora_A",
202
+ "tool_receipt_binding.adapter.lora_A.default",
203
+ "tool_receipt_binding.adapter.lora_B",
204
+ "tool_receipt_binding.adapter.lora_B.default",
205
+ "tool_receipt_binding.adapter.lora_dropout",
206
+ "tool_receipt_binding.adapter.lora_dropout.default",
207
+ "tool_receipt_binding.adapter.lora_embedding_A",
208
+ "tool_receipt_binding.adapter.lora_embedding_B",
209
+ "tool_receipt_binding.adapter.lora_magnitude_vector",
210
+ "toolspec_bridge",
211
+ "toolspec_bridge.adapter",
212
+ "toolspec_head",
213
+ "toolspec_head.param_schema_adapter",
214
+ "toolspec_head.postcondition_adapter",
215
+ "toolspec_head.precondition_adapter",
216
+ "toolspec_head.receipt_formatter",
217
+ "toolspec_head.trigger_gate",
218
+ "toolspec_head.validator_gate",
219
+ "uncertainty_head",
220
+ "uncertainty_head.calibration_mlp",
221
+ "uncertainty_head.proj",
222
+ "validator_feedback_bridge",
223
+ "validator_feedback_bridge.adapter",
224
+ "validator_feedback_loop",
225
+ "validator_feedback_loop.rollback_adapter",
226
+ "validator_receipt_bridge",
227
+ "validator_receipt_bridge.adapter",
228
+ "validator_receipt_bridge.adapter.base_layer",
229
+ "validator_receipt_bridge.adapter.lora_A",
230
+ "validator_receipt_bridge.adapter.lora_A.default",
231
+ "validator_receipt_bridge.adapter.lora_B",
232
+ "validator_receipt_bridge.adapter.lora_B.default",
233
+ "validator_receipt_bridge.adapter.lora_dropout",
234
+ "validator_receipt_bridge.adapter.lora_dropout.default",
235
+ "validator_receipt_bridge.adapter.lora_embedding_A",
236
+ "validator_receipt_bridge.adapter.lora_embedding_B",
237
+ "validator_receipt_bridge.adapter.lora_magnitude_vector",
238
+ "validator_uncertainty_bridge",
239
+ "validator_uncertainty_bridge.adapter",
240
+ "workspace_snapshot_manager",
241
+ "workspace_snapshot_manager.rollback_context",
242
+ "worktree_binding",
243
+ "worktree_binding.adapter",
244
+ "worktree_binding.adapter.base_layer",
245
+ "worktree_binding.adapter.lora_A",
246
+ "worktree_binding.adapter.lora_A.default",
247
+ "worktree_binding.adapter.lora_B",
248
+ "worktree_binding.adapter.lora_B.default",
249
+ "worktree_binding.adapter.lora_dropout",
250
+ "worktree_binding.adapter.lora_dropout.default",
251
+ "worktree_binding.adapter.lora_embedding_A",
252
+ "worktree_binding.adapter.lora_embedding_B",
253
+ "worktree_binding.adapter.lora_magnitude_vector",
254
+ "worktree_manager",
255
+ "worktree_manager.adapter"
256
+ ],
257
+ "trainable_parameter_report": {
258
+ "backbone_frozen": true,
259
+ "backbone_present": true,
260
+ "hidden_size": 2048,
261
+ "host_parameters": 197668869,
262
+ "host_trainable_parameters": 425984,
263
+ "total_parameters": 34153726597,
264
+ "trainable_parameters": 425984,
265
+ "version": "veriloop.coder_peft_host.v1"
266
+ },
267
+ "version": "veriloop.coder_peft_host.v1"
268
+ }
evidence_adapter/tokenizer/chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if (preserve_thinking is defined and preserve_thinking is true) or (loop.index0 > ns.last_query_index) %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | string if args_value is string else args_value | tojson | safe %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is false %}
150
+ {{- '<think>\n\n</think>\n\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
evidence_adapter/tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea2e66b594a0906e9a547c9e6ff9e5fb8a8198439c8cf7d6dc48f23529161223
3
+ size 19989442
evidence_adapter/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }