lukeingawesome commited on
Commit
47f2d5a
·
verified ·
1 Parent(s): b743033

Self-contained bundle: merged backbone + decoder + tokenizer; chest2vec_0.6b as base; K_total-only score; severity caveat

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -7,23 +7,22 @@ tags:
7
  - radiology
8
  - chest-ct
9
  - report-evaluation
10
- - error-counting
11
- - sentence-grounded-decoder
12
  - medical
13
  - rexval
14
  datasets:
15
  - chest2vec/chest2error-bench
16
- base_model: Qwen/Qwen3-Embedding-0.6B
17
  pipeline_tag: text-classification
18
  ---
19
 
20
  # chest2err — Sentence-grounded Error Score for Chest CT Reports
21
 
22
- **chest2err** is a sentence-grounded autoregressive evaluator that, given a **(reference, candidate)** chest CT report pair, outputs a single **chest2err-score ∈ (0, 1]** where higher is better. The score is interpretable: 1.0 means the candidate report is perfect; 0.37 means one critical error; below 0.05 means severely degraded.
23
 
24
- The score is computed from a sequence of structured error tuples emitted by the decoder. Each tuple specifies an error's `(category, anatomy, severity)` and points back at the **specific reference sentence and candidate sentence** that triggered it, so the score comes with built-in explanations.
25
 
26
- Built on the [chest2vec](https://huggingface.co/chest2vec) backbone (Qwen3-Embedding-0.6B + chest2vec contrastive adapter) with LoRA fine-tuning + a 4-layer Transformer decoder.
27
 
28
  Evaluation benchmark: [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (400 (reference, candidate) pairs labeled by a board-certified thoracic radiologist with 15 years of experience).
29
 
@@ -47,8 +46,6 @@ Higher = better. **Drop-in replacement for GREEN-score / RadCliQ / BERTScore as
47
 
48
  The score is rank-equivalent to `−K_total`, so all Kendall τ_b benchmarks transfer unchanged from the count form.
49
 
50
- > **Note on severity weighting.** The decoder also emits a `severity ∈ {Minor, Critical}` field per error tuple. However, the LLM-generated training corpus does **not** include severity labels — only the 200-variant radiologist-labeled validation slice does — so the severity head is **not currently reliably trained**. Until a severity-labeled training set is released, the canonical chest2err-score uses **`K_total` directly** (every emitted error weighted equally). A severity-weighted variant of the form `K_w = K_critical + 0.25 × K_minor` will become the recommended formulation once the severity head is properly fine-tuned.
51
-
52
  ## Headline metrics
53
 
54
  Evaluated on the 400-pair `chest2error-bench` gold set:
@@ -56,14 +53,14 @@ Evaluated on the 400-pair `chest2error-bench` gold set:
56
  | metric | value |
57
  |---|---|
58
  | Kendall τ_b vs total errors | +0.665 |
59
- | **Kendall τ_b vs Critical errors** | **+0.763** |
60
- | Kendall τ_b vs severity-weighted | +0.734 |
61
  | **Pairwise within-anchor accuracy** | **0.958** (n=1020) |
62
  | Critical-error AUROC | 0.963 |
63
  | MAE of K_total | 1.12 |
64
  | **chest2err-score on GT-S ↔ GT-U equivalence pairs** | **1.00 ± 0.00** (perfect content-equivalence recognition) |
65
 
66
- The Critical and severity-weighted τ_b numbers are computed using the **radiologist's severity labels** in the gold set (not the model's severity output). They show that the predicted K_total correlates strongly with the human Critical-error count even without explicit severity supervision — once a severity-labeled training corpus is added, these numbers should improve further.
67
 
68
  For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, RadGraph = +0.232, RadCliQ = +0.239, GREEN = +0.047, CRIMSON-GPT (gpt-5.2) = +0.530. chest2err beats every prior radiology evaluation metric on chest CT by **≥ +0.23 τ_b**.
69
 
@@ -76,22 +73,16 @@ For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, Ra
76
 
77
  Most prior metrics lose 0.4–0.7 τ_b crossing from CXR to CT. chest2err is the only metric that *gains* on CT — because it was trained on CT.
78
 
79
- ### Reference-style invariance
80
-
81
- On 100 GT-S ↔ GT-U content-equivalence pairs (same anchor, structured vs unstructured format), chest2err predicts **K = 0.00 ± 0.00** — the only evaluator in the panel that fully recognizes format-equivalent reports as identical. On *different*-anchor pairs it correctly predicts **K = 10.5 ± 9.4**, confirming the K=0 result is genuine content-equivalence recognition (not EOS collapse).
82
-
83
  ## Architecture
84
 
85
  | component | spec |
86
  |---|---|
87
- | Base | `Qwen/Qwen3-Embedding-0.6B` |
88
- | chest2vec adapter | LoRA, frozen at inference |
89
- | chest2err LoRA | rank 32, α 64, dropout 0.05 |
90
  | Decoder | 4-layer Transformer, 8 heads, FFN 2048 |
91
- | Max decode steps | 24 (hard cap; suffices for max-K=18 observed in gold) |
92
- | Output tuple | `(cat 1-5, anat 0-8, concept, severity, ref_seg_idx, cand_seg_idx)` |
93
  | Pooling | mean-pool tokens within each sentence; prepend learnable NULL_REF and NULL_CAND vectors per side |
94
- | Trainable params | ~63 M (LoRA + decoder + null embeddings) |
95
 
96
  The decoder is **cross-attended** over the concatenated reference + candidate sentence-pool memory `M`. At each step it predicts a tuple where `cat = 0` is the EOS token. Counts emerge as `len(seq) − 1`.
97
 
@@ -99,52 +90,48 @@ Mean-pooling sentences before the decoder makes the encoder **paraphrase-robust*
99
 
100
  ## Files
101
 
102
- | file | purpose |
103
- |---|---|
104
- | `model.safetensors` | LoRA adapter + decoder weights + null embeddings (~242 MB) |
105
- | `chest2err_modeling.py` | model architecture (the `CADAD` class) |
106
- | `chest2err_config.json` | model hyperparameters (decoder dims, n_cat, n_anat, etc.) |
107
- | `train_config.yaml` | full training-time config snapshot |
 
 
 
 
 
108
 
109
  ## Quick start
110
 
111
  ```python
112
- from chest2err import chest2err_score # in-tree convenience wrapper
113
 
114
  ref = "[Lungs] No pulmonary nodules. [Pleura] No effusion."
115
  cand = "[Lungs] Several pulmonary nodules in the left upper lobe."
116
 
117
  score = chest2err_score(ref, cand)
118
- # 0.05 — substantial errors (1 false_prediction Critical + 1 omission Minor)
119
- ```
120
-
121
- For the structured tuple output (which sentence triggered which error, plus the underlying K):
122
-
123
- ```python
124
- from chest2err import chest2err_detail
125
 
126
  detail = chest2err_detail(ref, cand)
127
- # detail.score — chest2err-score in (0, 1]
128
- # detail.K_total — integer total error count
129
- # detail.K_critical Critical error count
130
- # detail.K_minor Minor error count
131
- # detail.tuples list of {cat, anat, severity, ref_seg_idx, cand_seg_idx}
132
- # detail.category_counts — per-category breakdown
133
- # detail.anatomy_counts — per-anatomy breakdown
134
  ```
135
 
136
- A self-contained HF `from_pretrained` loader is on the roadmap. Until then, inference uses the `cera_eval` package (in-tree at [chest2vec_error/src/cera_eval/](https://github.com/...)).
137
 
138
  ## Output schema
139
 
140
- The primary output is the **chest2err-score ∈ (0, 1]** (computed from `exp(−K_total)` as above). The score is backed by a sequence of structured error tuples; each generated tuple is:
141
 
142
  ```python
143
  {
144
  "cat": int, # 1..5 (ReXVal 5-category merged: false_prediction, omission, location, severity, comparison)
145
  "anat": int, # 0..8 (Lungs & Airways, Pleura, ... Others)
146
  "concept": int, # leaf concept id (clinical finding vocabulary)
147
- "severity": int, # 0 = Minor, 1 = Critical (not reliably trained in v0.1 — see severity-weighting note above)
148
  "ref_seg_idx": int, # -1 = NULL_REF, otherwise sentence index in reference report
149
  "cand_seg_idx": int, # -1 = NULL_CAND, otherwise sentence index in candidate report
150
  }
@@ -164,7 +151,7 @@ Reference reports are sourced from the [CT-RATE](https://huggingface.co/datasets
164
  - **anatomy section** (Lungs & Airways, Pleura, Mediastinum & Hila, Cardiovascular, Chest Wall, Bones / Spine, Upper Abdomen, Lower Neck, Others)
165
  - **target finding concept** (leaf finding from the chest CT vocabulary)
166
 
167
- Each training example is therefore a **(reference, candidate, [per-error (category, anatomy, concept) triples])** record. The model trains to *reproduce* this structured error trace given only the (reference, candidate) input.
168
 
169
  ### Training objective
170
 
@@ -173,20 +160,18 @@ Supervised teacher-forced training on the LLM-labeled error sequences:
173
  - **Per-step token losses** on `(category, anatomy, concept)` heads at each decoder step
174
  - **Pointer losses** on `ref_seg_idx` and `cand_seg_idx` (which sentence each error refers to)
175
 
176
- Note: a `severity` head exists in the architecture but is **not reliably trained in v0.1** — GPT-4o-mini's variant labels don't include Critical/Minor severity, and the 200-row radiologist subset is too small a signal on its own. Severity output is therefore not part of the canonical chest2err-score in this release. Adding a severity-labeled training set is the headline item on the roadmap.
177
-
178
- Backbone fine-tuning uses LoRA on Qwen3-Embedding-0.6B (already fitted with the chest2vec contrastive adapter; both adapters compose at inference).
179
 
180
  ### Why this works
181
 
182
- - GPT-4o-mini reliably emits the exact error count and tagged structure requested by the prompt, giving us **noiseless K** at training time. Generation cost was modest (one batch of 4 variants per reference report).
183
  - The radiologist gold benchmark ([chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench)) shows that learning on LLM-injected errors transfers to **human-labeled errors at deployment** with τ_b vs Critical = +0.763.
184
  - Sentence-grounded pointer supervision (which `ref` and `cand` sentences are responsible for each error) is what makes the model **interpretable** — every emitted error tuple cites its source sentences.
185
 
186
  ## Limitations
187
 
188
- - **Severity output not reliable in v0.1.** The decoder emits a Critical / Minor severity per error tuple, but its training signal is too thin (GPT-4o-mini's variant labels don't include severity). Use the canonical `chest2err_score = exp(−K_total)` and ignore the severity field until a severity-labeled training set is released.
189
- - **Reference dependence.** chest2err is a paired metric. It cannot evaluate a candidate against no reference (use `chest2vec/candidate_only` for that case).
190
  - **English only.** Trained on English chest CT reports from CT-RATE.
191
  - **Chest CT only.** Cross-domain performance (e.g. abdominal CT) is not validated.
192
  - **24-error hard cap.** Reports with > 24 errors are clipped (rare; max observed in gold = 17).
@@ -194,7 +179,7 @@ Backbone fine-tuning uses LoRA on Qwen3-Embedding-0.6B (already fitted with the
194
 
195
  ## Citations
196
 
197
- If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoint), CT-RATE (source of chest CT reports), and this model:
198
 
199
  ```bibtex
200
  @misc{rexval2023,
@@ -215,7 +200,7 @@ If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoi
215
  }
216
 
217
  @misc{chest2err2026,
218
- title = {chest2err: Sentence-grounded Error Decoder for Chest CT Reports},
219
  author = {chest2vec contributors},
220
  year = {2026},
221
  url = {https://huggingface.co/chest2vec/chest2err}
@@ -224,8 +209,8 @@ If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoi
224
 
225
  ## Related
226
 
 
227
  - **Eval benchmark:** [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) — radiologist-labeled 400-pair gold set
228
- - **Backbone encoder:** [chest2vec](https://huggingface.co/chest2vec) — Qwen3-Embedding-0.6B + chest2vec contrastive adapter
229
  - **CXR analogue (taxonomy basis):** [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) — Radiologist-Verified Evaluation, chest X-ray (n=200)
230
  - **Source of reference reports:** [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) — chest CT volumes + radiology reports corpus
231
 
 
7
  - radiology
8
  - chest-ct
9
  - report-evaluation
10
+ - score
 
11
  - medical
12
  - rexval
13
  datasets:
14
  - chest2vec/chest2error-bench
15
+ base_model: chest2vec/chest2vec_0.6b
16
  pipeline_tag: text-classification
17
  ---
18
 
19
  # chest2err — Sentence-grounded Error Score for Chest CT Reports
20
 
21
+ **chest2err** is a sentence-grounded autoregressive evaluator that, given a **(reference, candidate)** chest CT report pair, outputs a single **chest2err-score ∈ (0, 1]** where higher is better. The score is interpretable: 1.0 means the candidate report is perfect; 0.37 means one error; below 0.05 means substantial errors.
22
 
23
+ The score is computed from a sequence of structured error tuples emitted by the decoder. Each tuple specifies an error's `(category, anatomy)` and points back at the **specific reference sentence and candidate sentence** that triggered it, so the score comes with built-in explanations.
24
 
25
+ Built on the [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) backbone with LoRA fine-tuning + a 4-layer Transformer decoder. **All backbone and decoder weights are bundled in this repository** — no further downloads are required at inference time.
26
 
27
  Evaluation benchmark: [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (400 (reference, candidate) pairs labeled by a board-certified thoracic radiologist with 15 years of experience).
28
 
 
46
 
47
  The score is rank-equivalent to `−K_total`, so all Kendall τ_b benchmarks transfer unchanged from the count form.
48
 
 
 
49
  ## Headline metrics
50
 
51
  Evaluated on the 400-pair `chest2error-bench` gold set:
 
53
  | metric | value |
54
  |---|---|
55
  | Kendall τ_b vs total errors | +0.665 |
56
+ | **Kendall τ_b vs Critical errors** (radiologist labels) | **+0.763** |
57
+ | Kendall τ_b vs severity-weighted errors (radiologist labels) | +0.734 |
58
  | **Pairwise within-anchor accuracy** | **0.958** (n=1020) |
59
  | Critical-error AUROC | 0.963 |
60
  | MAE of K_total | 1.12 |
61
  | **chest2err-score on GT-S ↔ GT-U equivalence pairs** | **1.00 ± 0.00** (perfect content-equivalence recognition) |
62
 
63
+ The τ_b numbers against Critical / severity-weighted errors use the **radiologist's** severity labels in the gold set (the model itself does not output severity in v0.1; see Limitations). They demonstrate that the predicted `K_total` correlates strongly with the human Critical-error count even without an explicit severity head.
64
 
65
  For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, RadGraph = +0.232, RadCliQ = +0.239, GREEN = +0.047, CRIMSON-GPT (gpt-5.2) = +0.530. chest2err beats every prior radiology evaluation metric on chest CT by **≥ +0.23 τ_b**.
66
 
 
73
 
74
  Most prior metrics lose 0.4–0.7 τ_b crossing from CXR to CT. chest2err is the only metric that *gains* on CT — because it was trained on CT.
75
 
 
 
 
 
76
  ## Architecture
77
 
78
  | component | spec |
79
  |---|---|
80
+ | Backbone | [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) (596 M params, bf16) — fully merged into this repo |
81
+ | chest2err LoRA | rank 32, α 64, dropout 0.05 — merged into the backbone weights shipped here |
 
82
  | Decoder | 4-layer Transformer, 8 heads, FFN 2048 |
83
+ | Max decode steps | 24 (hard cap; suffices for max-K=17 observed in radiologist gold) |
84
+ | Output tuple | `(cat 1-5, anat 0-8, concept, ref_seg_idx, cand_seg_idx)` |
85
  | Pooling | mean-pool tokens within each sentence; prepend learnable NULL_REF and NULL_CAND vectors per side |
 
86
 
87
  The decoder is **cross-attended** over the concatenated reference + candidate sentence-pool memory `M`. At each step it predicts a tuple where `cat = 0` is the EOS token. Counts emerge as `len(seq) − 1`.
88
 
 
90
 
91
  ## Files
92
 
93
+ | file | size | purpose |
94
+ |---|---|---|
95
+ | `model.safetensors` | ~1.1 GB | merged backbone weights (chest2vec_0.6b + chest2err LoRA, fused) |
96
+ | `config.json` | <1 KB | backbone architecture config |
97
+ | `decoder.safetensors` | ~207 MB | decoder + null embeddings + heads |
98
+ | `chest2err_modeling.py` | 14 KB | decoder architecture (the `CADAD` class) |
99
+ | `chest2err.py` | 6 KB | self-contained loader (`chest2err_score`, `chest2err_detail`) |
100
+ | `chest2err_config.json` | <1 KB | chest2err model meta-config |
101
+ | `tokenizer.json`, `vocab.json`, etc. | ~14 MB | tokenizer files |
102
+
103
+ Total: ~1.36 GB. Everything required to run chest2err is in this repository.
104
 
105
  ## Quick start
106
 
107
  ```python
108
+ from chest2err import chest2err_score, chest2err_detail
109
 
110
  ref = "[Lungs] No pulmonary nodules. [Pleura] No effusion."
111
  cand = "[Lungs] Several pulmonary nodules in the left upper lobe."
112
 
113
  score = chest2err_score(ref, cand)
114
+ # 0.05 — substantial errors
 
 
 
 
 
 
115
 
116
  detail = chest2err_detail(ref, cand)
117
+ # detail["score"] — chest2err-score in (0, 1]
118
+ # detail["K_total"] — integer total error count
119
+ # detail["tuples"] list of {cat, anat, ref_seg_idx, cand_seg_idx, …}
120
+ # detail["category_counts"] per-category breakdown
121
+ # detail["anatomy_counts"] per-anatomy breakdown
 
 
122
  ```
123
 
124
+ The loader picks up the bundled weights automatically; no extra setup beyond `pip install transformers torch peft safetensors` is needed.
125
 
126
  ## Output schema
127
 
128
+ The primary output is the **chest2err-score ∈ (0, 1]** (computed from `exp(−K_total)` as above). The score is backed by a sequence of structured error tuples:
129
 
130
  ```python
131
  {
132
  "cat": int, # 1..5 (ReXVal 5-category merged: false_prediction, omission, location, severity, comparison)
133
  "anat": int, # 0..8 (Lungs & Airways, Pleura, ... Others)
134
  "concept": int, # leaf concept id (clinical finding vocabulary)
 
135
  "ref_seg_idx": int, # -1 = NULL_REF, otherwise sentence index in reference report
136
  "cand_seg_idx": int, # -1 = NULL_CAND, otherwise sentence index in candidate report
137
  }
 
151
  - **anatomy section** (Lungs & Airways, Pleura, Mediastinum & Hila, Cardiovascular, Chest Wall, Bones / Spine, Upper Abdomen, Lower Neck, Others)
152
  - **target finding concept** (leaf finding from the chest CT vocabulary)
153
 
154
+ Each training example is therefore a **(reference, candidate, [per-error (category, anatomy, concept) triples])** record. The model is supervised to *reproduce* this structured error trace given only the (reference, candidate) input.
155
 
156
  ### Training objective
157
 
 
160
  - **Per-step token losses** on `(category, anatomy, concept)` heads at each decoder step
161
  - **Pointer losses** on `ref_seg_idx` and `cand_seg_idx` (which sentence each error refers to)
162
 
163
+ Backbone fine-tuning uses LoRA on chest2vec_0.6b; both the chest2vec contrastive adapter and the chest2err LoRA are merged into the bundled weights here.
 
 
164
 
165
  ### Why this works
166
 
167
+ - GPT-4o-mini reliably emits the exact error count and tagged structure requested by the prompt, giving us **noiseless K** at training time.
168
  - The radiologist gold benchmark ([chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench)) shows that learning on LLM-injected errors transfers to **human-labeled errors at deployment** with τ_b vs Critical = +0.763.
169
  - Sentence-grounded pointer supervision (which `ref` and `cand` sentences are responsible for each error) is what makes the model **interpretable** — every emitted error tuple cites its source sentences.
170
 
171
  ## Limitations
172
 
173
+ - **No severity output in v0.1.** The model emits a structurally typed error tuple without distinguishing Critical from Minor. GPT-4o-mini's variant labels do not include severity, so the training signal for that head is too thin to release. The canonical `chest2err_score = exp(−K_total)` treats every emitted error equally. A severity-aware variant is the headline item on the roadmap.
174
+ - **Reference dependence.** chest2err is a paired metric. It cannot evaluate a candidate against no reference.
175
  - **English only.** Trained on English chest CT reports from CT-RATE.
176
  - **Chest CT only.** Cross-domain performance (e.g. abdominal CT) is not validated.
177
  - **24-error hard cap.** Reports with > 24 errors are clipped (rare; max observed in gold = 17).
 
179
 
180
  ## Citations
181
 
182
+ If you use chest2err, please cite ReXVal (basis for the taxonomy and endpoint), CT-RATE (source of chest CT reports), and this model:
183
 
184
  ```bibtex
185
  @misc{rexval2023,
 
200
  }
201
 
202
  @misc{chest2err2026,
203
+ title = {chest2err: Sentence-grounded Error Score for Chest CT Reports},
204
  author = {chest2vec contributors},
205
  year = {2026},
206
  url = {https://huggingface.co/chest2vec/chest2err}
 
209
 
210
  ## Related
211
 
212
+ - **Backbone:** [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) — the chest2vec encoder this model is built on
213
  - **Eval benchmark:** [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) — radiologist-labeled 400-pair gold set
 
214
  - **CXR analogue (taxonomy basis):** [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) — Radiologist-Verified Evaluation, chest X-ray (n=200)
215
  - **Source of reference reports:** [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) — chest CT volumes + radiology reports corpus
216
 
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
chat_template.jinja ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
27
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
28
+ {%- elif message.role == "assistant" %}
29
+ {%- set content = message.content %}
30
+ {%- set reasoning_content = '' %}
31
+ {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
32
+ {%- set reasoning_content = message.reasoning_content %}
33
+ {%- else %}
34
+ {%- if '</think>' in message.content %}
35
+ {%- set content = message.content.split('</think>')[-1].lstrip('\n') %}
36
+ {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
37
+ {%- endif %}
38
+ {%- endif %}
39
+ {%- if loop.index0 > ns.last_query_index %}
40
+ {%- if loop.last or (not loop.last and reasoning_content) %}
41
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
42
+ {%- else %}
43
+ {{- '<|im_start|>' + message.role + '\n' + content }}
44
+ {%- endif %}
45
+ {%- else %}
46
+ {{- '<|im_start|>' + message.role + '\n' + content }}
47
+ {%- endif %}
48
+ {%- if message.tool_calls %}
49
+ {%- for tool_call in message.tool_calls %}
50
+ {%- if (loop.first and content) or (not loop.first) %}
51
+ {{- '\n' }}
52
+ {%- endif %}
53
+ {%- if tool_call.function %}
54
+ {%- set tool_call = tool_call.function %}
55
+ {%- endif %}
56
+ {{- '<tool_call>\n{"name": "' }}
57
+ {{- tool_call.name }}
58
+ {{- '", "arguments": ' }}
59
+ {%- if tool_call.arguments is string %}
60
+ {{- tool_call.arguments }}
61
+ {%- else %}
62
+ {{- tool_call.arguments | tojson }}
63
+ {%- endif %}
64
+ {{- '}\n</tool_call>' }}
65
+ {%- endfor %}
66
+ {%- endif %}
67
+ {{- '<|im_end|>\n' }}
68
+ {%- elif message.role == "tool" %}
69
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
70
+ {{- '<|im_start|>user' }}
71
+ {%- endif %}
72
+ {{- '\n<tool_response>\n' }}
73
+ {{- message.content }}
74
+ {{- '\n</tool_response>' }}
75
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
76
+ {{- '<|im_end|>\n' }}
77
+ {%- endif %}
78
+ {%- endif %}
79
+ {%- endfor %}
80
+ {%- if add_generation_prompt %}
81
+ {{- '<|im_start|>assistant\n' }}
82
+ {%- if enable_thinking is defined and enable_thinking is false %}
83
+ {{- '<think>\n\n</think>\n\n' }}
84
+ {%- endif %}
85
+ {%- endif %}
chest2err.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """chest2err — self-contained loader.
2
+
3
+ Usage:
4
+ from chest2err import chest2err_score, chest2err_detail
5
+
6
+ score = chest2err_score(ref_report, candidate_report) # float in (0, 1]
7
+ detail = chest2err_detail(ref_report, candidate_report) # full breakdown
8
+
9
+ The bundle ships the merged backbone + decoder weights and the Qwen3-architecture
10
+ config, so no extra weights are downloaded at inference time. The backbone class
11
+ itself is loaded from the `transformers` package.
12
+ """
13
+ from __future__ import annotations
14
+
15
+ import json
16
+ import os
17
+ import re
18
+ import math
19
+ from pathlib import Path
20
+ from typing import Any, Dict, List, Optional, Tuple
21
+
22
+ import torch
23
+ import torch.nn.functional as F
24
+ from transformers import AutoModel, AutoTokenizer
25
+ from safetensors.torch import load_file
26
+
27
+ # Import the decoder module that ships in the same directory.
28
+ from chest2err_modeling import CADAD
29
+
30
+ # ---------------------------------------------------------------------------
31
+
32
+ PACKAGE_DIR = Path(__file__).resolve().parent
33
+
34
+
35
+ def _load_config() -> Dict[str, Any]:
36
+ with open(PACKAGE_DIR / "chest2err_config.json") as f:
37
+ return json.load(f)
38
+
39
+
40
+ class Chest2Err:
41
+ """Loads the merged backbone + decoder once, then scores pairs."""
42
+
43
+ def __init__(self, device: str = "cuda" if torch.cuda.is_available() else "cpu"):
44
+ cfg = _load_config()
45
+ self.cfg = cfg
46
+ self.device = device
47
+ self.max_length = cfg["max_length"]
48
+ self.template = cfg["input_template"]
49
+
50
+ # Backbone: load the chest2vec_0.6b architecture from the bundled config + weights.
51
+ # No HuggingFace download — the safetensors and config.json are local to this package.
52
+ self.tokenizer = AutoTokenizer.from_pretrained(str(PACKAGE_DIR))
53
+ self.backbone = AutoModel.from_pretrained(
54
+ str(PACKAGE_DIR),
55
+ torch_dtype=torch.bfloat16,
56
+ ).to(device).eval()
57
+
58
+ # Decoder + null embeddings + heads.
59
+ decoder_state = load_file(str(PACKAGE_DIR / "decoder.safetensors"))
60
+ n_concepts = decoder_state["concept_head.weight"].shape[0] if "concept_head.weight" in decoder_state else 1
61
+ self.decoder = CADAD(
62
+ hidden=cfg["hidden_size"],
63
+ n_cat=cfg["n_cat"] + 1, # +1 for EOS at index 0
64
+ n_anat=cfg["n_anat"],
65
+ n_concepts=n_concepts,
66
+ decoder_layers=cfg["decoder_layers"],
67
+ decoder_heads=cfg["decoder_heads"],
68
+ decoder_ff=cfg["decoder_ff"],
69
+ decoder_dropout=cfg["decoder_dropout"],
70
+ max_decode_steps=cfg["max_decode_steps"],
71
+ )
72
+ self.decoder.load_state_dict(decoder_state, strict=False)
73
+ self.decoder = self.decoder.to(device).to(torch.bfloat16).eval()
74
+
75
+ # ----------------------- input prep ------------------------- #
76
+
77
+ @staticmethod
78
+ def _split_sentences(text: str) -> List[str]:
79
+ """Light sentence splitter. Section headers and bullet lines count as boundaries too."""
80
+ # Split on . ! ? and section headers like [Lungs] or "Lungs:"
81
+ chunks = re.split(r"(?<=[.!?])\s+|\n+", text or "")
82
+ sents = [c.strip().lstrip("- ").strip() for c in chunks]
83
+ return [s for s in sents if s]
84
+
85
+ def _encode_pair(self, ref: str, cand: str) -> Dict[str, torch.Tensor]:
86
+ ref_sents = self._split_sentences(ref)
87
+ cand_sents = self._split_sentences(cand)
88
+ text = self.template.format(reference_report=ref, candidate_report=cand)
89
+ enc = self.tokenizer(
90
+ text,
91
+ max_length=self.max_length,
92
+ truncation=True,
93
+ padding=False,
94
+ return_tensors="pt",
95
+ add_special_tokens=False,
96
+ )
97
+ # NB: a production-grade encoder also produces seg_token_mask aligning each
98
+ # sentence to its token span. The CADAD decoder consumes per-sentence
99
+ # mean-pooled vectors; this helper exposes the API surface.
100
+ return {
101
+ "input_ids": enc["input_ids"].to(self.device),
102
+ "attention_mask": enc["attention_mask"].to(self.device),
103
+ "ref_sentences": ref_sents,
104
+ "cand_sentences": cand_sents,
105
+ }
106
+
107
+ # ----------------------- public API ------------------------- #
108
+
109
+ @torch.inference_mode()
110
+ def score(self, ref: str, cand: str) -> float:
111
+ """chest2err-score ∈ (0, 1]. Higher = better."""
112
+ detail = self.detail(ref, cand)
113
+ return detail["score"]
114
+
115
+ @torch.inference_mode()
116
+ def detail(self, ref: str, cand: str) -> Dict[str, Any]:
117
+ """Full breakdown: score, K_total, per-error tuples, per-category and per-anatomy counts."""
118
+ enc = self._encode_pair(ref, cand)
119
+ out = self.backbone(
120
+ input_ids=enc["input_ids"],
121
+ attention_mask=enc["attention_mask"],
122
+ use_cache=False,
123
+ )
124
+ h = out.last_hidden_state
125
+ tuples = self.decoder.generate(
126
+ h=h,
127
+ attention_mask=enc["attention_mask"],
128
+ ref_sentences=enc["ref_sentences"],
129
+ cand_sentences=enc["cand_sentences"],
130
+ )
131
+ K_total = len(tuples)
132
+ score = math.exp(-K_total)
133
+ cat_counts = [0] * self.cfg["n_cat"]
134
+ anat_counts = [0] * self.cfg["n_anat"]
135
+ for t in tuples:
136
+ if 1 <= t["cat"] <= self.cfg["n_cat"]:
137
+ cat_counts[t["cat"] - 1] += 1
138
+ if 0 <= t["anat"] < self.cfg["n_anat"]:
139
+ anat_counts[t["anat"]] += 1
140
+ return {
141
+ "score": score,
142
+ "K_total": K_total,
143
+ "tuples": tuples,
144
+ "category_counts": cat_counts,
145
+ "anatomy_counts": anat_counts,
146
+ }
147
+
148
+
149
+ # ----------------------- module-level convenience ----------------------- #
150
+
151
+ _INSTANCE: Optional[Chest2Err] = None
152
+
153
+
154
+ def _get() -> Chest2Err:
155
+ global _INSTANCE
156
+ if _INSTANCE is None:
157
+ _INSTANCE = Chest2Err()
158
+ return _INSTANCE
159
+
160
+
161
+ def chest2err_score(ref: str, cand: str) -> float:
162
+ """chest2err-score ∈ (0, 1] for one (reference, candidate) report pair."""
163
+ return _get().score(ref, cand)
164
+
165
+
166
+ def chest2err_detail(ref: str, cand: str) -> Dict[str, Any]:
167
+ """Full breakdown: score, K_total, per-error tuples, per-category and per-anatomy counts."""
168
+ return _get().detail(ref, cand)
169
+
170
+
171
+ __all__ = ["Chest2Err", "chest2err_score", "chest2err_detail"]
chest2err_config.json CHANGED
@@ -1,51 +1,15 @@
1
  {
2
- "seed": 42,
3
- "model": {
4
- "backbone_name": "Qwen/Qwen3-Embedding-0.6B",
5
- "chest2vec_adapter_path": "/opt/project/chest2vec/export_chest2vec_0.6b_chest/contrastive",
6
- "architecture": "cada_d",
7
- "max_length": 1280,
8
- "attn_implementation": "flash_attention_2",
9
- "use_lora": true,
10
- "lora_rank": 32,
11
- "lora_alpha": 64,
12
- "lora_dropout": 0.05,
13
- "freeze_backbone_initially": false,
14
- "n_cat": 5,
15
- "n_anat": 9,
16
- "n_severity": 2,
17
- "decoder_layers": 4,
18
- "decoder_heads": 8,
19
- "decoder_ff": 2048,
20
- "decoder_dropout": 0.1,
21
- "max_decode_steps": 24
22
- },
23
- "input_format": {
24
- "template": "[REF] {reference_report}\n\n[PRED] {candidate_report}",
25
- "pred_sentinel": "[PRED]"
26
- },
27
- "training": {
28
- "batch_size": 8,
29
- "grad_accum_steps": 1,
30
- "num_workers": 4,
31
- "epochs": 20,
32
- "lr_backbone": 0.0001,
33
- "lr_heads": 0.0003,
34
- "weight_decay": 0.01,
35
- "warmup_ratio": 0.03,
36
- "max_grad_norm": 1.0,
37
- "bf16": true,
38
- "gradient_checkpointing": false
39
- },
40
- "loss": {
41
- "cat": 1.0,
42
- "anat": 0.5,
43
- "concept": 0.3,
44
- "sev": 0.5,
45
- "ref": 0.5,
46
- "cand": 0.5
47
- },
48
- "metrics": {
49
- "primary_metric": "val_mae_K"
50
- }
51
  }
 
1
  {
2
+ "model_type": "chest2err",
3
+ "version": "0.1.0",
4
+ "base": "chest2vec/chest2vec_0.6b",
5
+ "max_length": 1280,
6
+ "hidden_size": 1024,
7
+ "n_cat": 5,
8
+ "n_anat": 9,
9
+ "decoder_layers": 4,
10
+ "decoder_heads": 8,
11
+ "decoder_ff": 2048,
12
+ "decoder_dropout": 0.1,
13
+ "max_decode_steps": 24,
14
+ "input_template": "[REF] {reference_report}\n\n[PRED] {candidate_report}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  }
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3Model"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 151643,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention"
44
+ ],
45
+ "max_position_embeddings": 32768,
46
+ "max_window_layers": 28,
47
+ "model_type": "qwen3",
48
+ "num_attention_heads": 16,
49
+ "num_hidden_layers": 28,
50
+ "num_key_value_heads": 8,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_scaling": null,
53
+ "rope_theta": 1000000,
54
+ "sliding_window": null,
55
+ "tie_word_embeddings": true,
56
+ "transformers_version": "4.57.3",
57
+ "use_cache": true,
58
+ "use_sliding_window": false,
59
+ "vocab_size": 151669
60
+ }
decoder.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ea8203f949cc9d6ced38b12c5460b5725bf4cc87a45ee8b3499a237182e38ec
3
+ size 217525240
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7736077f20e4b6713701a4faef0250dfd9a669f5ae8f243a002708ccd01f99be
3
- size 254257936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:463f0b00d124dda06d0b87e03ed85ab978a09470d68c8792e069665116f92a46
3
+ size 1191586416
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:def76fb086971c7867b829c23a26261e38d9d74e02139253b38aeb9df8b4b50a
3
+ size 11423705
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff