Self-contained bundle: merged backbone + decoder + tokenizer; chest2vec_0.6b as base; K_total-only score; severity caveat
Browse files- .gitattributes +1 -0
- README.md +40 -55
- added_tokens.json +28 -0
- chat_template.jinja +85 -0
- chest2err.py +171 -0
- chest2err_config.json +13 -49
- config.json +60 -0
- decoder.safetensors +3 -0
- merges.txt +0 -0
- model.safetensors +2 -2
- special_tokens_map.json +31 -0
- tokenizer.json +3 -0
- tokenizer_config.json +239 -0
- vocab.json +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -7,23 +7,22 @@ tags:
|
|
| 7 |
- radiology
|
| 8 |
- chest-ct
|
| 9 |
- report-evaluation
|
| 10 |
-
-
|
| 11 |
-
- sentence-grounded-decoder
|
| 12 |
- medical
|
| 13 |
- rexval
|
| 14 |
datasets:
|
| 15 |
- chest2vec/chest2error-bench
|
| 16 |
-
base_model:
|
| 17 |
pipeline_tag: text-classification
|
| 18 |
---
|
| 19 |
|
| 20 |
# chest2err — Sentence-grounded Error Score for Chest CT Reports
|
| 21 |
|
| 22 |
-
**chest2err** is a sentence-grounded autoregressive evaluator that, given a **(reference, candidate)** chest CT report pair, outputs a single **chest2err-score ∈ (0, 1]** where higher is better. The score is interpretable: 1.0 means the candidate report is perfect; 0.37 means one
|
| 23 |
|
| 24 |
-
The score is computed from a sequence of structured error tuples emitted by the decoder. Each tuple specifies an error's `(category, anatomy
|
| 25 |
|
| 26 |
-
Built on the [chest2vec](https://huggingface.co/chest2vec) backbone
|
| 27 |
|
| 28 |
Evaluation benchmark: [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (400 (reference, candidate) pairs labeled by a board-certified thoracic radiologist with 15 years of experience).
|
| 29 |
|
|
@@ -47,8 +46,6 @@ Higher = better. **Drop-in replacement for GREEN-score / RadCliQ / BERTScore as
|
|
| 47 |
|
| 48 |
The score is rank-equivalent to `−K_total`, so all Kendall τ_b benchmarks transfer unchanged from the count form.
|
| 49 |
|
| 50 |
-
> **Note on severity weighting.** The decoder also emits a `severity ∈ {Minor, Critical}` field per error tuple. However, the LLM-generated training corpus does **not** include severity labels — only the 200-variant radiologist-labeled validation slice does — so the severity head is **not currently reliably trained**. Until a severity-labeled training set is released, the canonical chest2err-score uses **`K_total` directly** (every emitted error weighted equally). A severity-weighted variant of the form `K_w = K_critical + 0.25 × K_minor` will become the recommended formulation once the severity head is properly fine-tuned.
|
| 51 |
-
|
| 52 |
## Headline metrics
|
| 53 |
|
| 54 |
Evaluated on the 400-pair `chest2error-bench` gold set:
|
|
@@ -56,14 +53,14 @@ Evaluated on the 400-pair `chest2error-bench` gold set:
|
|
| 56 |
| metric | value |
|
| 57 |
|---|---|
|
| 58 |
| Kendall τ_b vs total errors | +0.665 |
|
| 59 |
-
| **Kendall τ_b vs Critical errors** | **+0.763** |
|
| 60 |
-
| Kendall τ_b vs severity-weighted | +0.734 |
|
| 61 |
| **Pairwise within-anchor accuracy** | **0.958** (n=1020) |
|
| 62 |
| Critical-error AUROC | 0.963 |
|
| 63 |
| MAE of K_total | 1.12 |
|
| 64 |
| **chest2err-score on GT-S ↔ GT-U equivalence pairs** | **1.00 ± 0.00** (perfect content-equivalence recognition) |
|
| 65 |
|
| 66 |
-
The Critical
|
| 67 |
|
| 68 |
For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, RadGraph = +0.232, RadCliQ = +0.239, GREEN = +0.047, CRIMSON-GPT (gpt-5.2) = +0.530. chest2err beats every prior radiology evaluation metric on chest CT by **≥ +0.23 τ_b**.
|
| 69 |
|
|
@@ -76,22 +73,16 @@ For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, Ra
|
|
| 76 |
|
| 77 |
Most prior metrics lose 0.4–0.7 τ_b crossing from CXR to CT. chest2err is the only metric that *gains* on CT — because it was trained on CT.
|
| 78 |
|
| 79 |
-
### Reference-style invariance
|
| 80 |
-
|
| 81 |
-
On 100 GT-S ↔ GT-U content-equivalence pairs (same anchor, structured vs unstructured format), chest2err predicts **K = 0.00 ± 0.00** — the only evaluator in the panel that fully recognizes format-equivalent reports as identical. On *different*-anchor pairs it correctly predicts **K = 10.5 ± 9.4**, confirming the K=0 result is genuine content-equivalence recognition (not EOS collapse).
|
| 82 |
-
|
| 83 |
## Architecture
|
| 84 |
|
| 85 |
| component | spec |
|
| 86 |
|---|---|
|
| 87 |
-
|
|
| 88 |
-
|
|
| 89 |
-
| chest2err LoRA | rank 32, α 64, dropout 0.05 |
|
| 90 |
| Decoder | 4-layer Transformer, 8 heads, FFN 2048 |
|
| 91 |
-
| Max decode steps | 24 (hard cap; suffices for max-K=
|
| 92 |
-
| Output tuple | `(cat 1-5, anat 0-8, concept,
|
| 93 |
| Pooling | mean-pool tokens within each sentence; prepend learnable NULL_REF and NULL_CAND vectors per side |
|
| 94 |
-
| Trainable params | ~63 M (LoRA + decoder + null embeddings) |
|
| 95 |
|
| 96 |
The decoder is **cross-attended** over the concatenated reference + candidate sentence-pool memory `M`. At each step it predicts a tuple where `cat = 0` is the EOS token. Counts emerge as `len(seq) − 1`.
|
| 97 |
|
|
@@ -99,52 +90,48 @@ Mean-pooling sentences before the decoder makes the encoder **paraphrase-robust*
|
|
| 99 |
|
| 100 |
## Files
|
| 101 |
|
| 102 |
-
| file | purpose |
|
| 103 |
-
|---|---|
|
| 104 |
-
| `model.safetensors` |
|
| 105 |
-
| `
|
| 106 |
-
| `
|
| 107 |
-
| `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
## Quick start
|
| 110 |
|
| 111 |
```python
|
| 112 |
-
from chest2err import chest2err_score
|
| 113 |
|
| 114 |
ref = "[Lungs] No pulmonary nodules. [Pleura] No effusion."
|
| 115 |
cand = "[Lungs] Several pulmonary nodules in the left upper lobe."
|
| 116 |
|
| 117 |
score = chest2err_score(ref, cand)
|
| 118 |
-
# 0.05 — substantial errors
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
For the structured tuple output (which sentence triggered which error, plus the underlying K):
|
| 122 |
-
|
| 123 |
-
```python
|
| 124 |
-
from chest2err import chest2err_detail
|
| 125 |
|
| 126 |
detail = chest2err_detail(ref, cand)
|
| 127 |
-
# detail
|
| 128 |
-
# detail
|
| 129 |
-
# detail
|
| 130 |
-
# detail
|
| 131 |
-
# detail
|
| 132 |
-
# detail.category_counts — per-category breakdown
|
| 133 |
-
# detail.anatomy_counts — per-anatomy breakdown
|
| 134 |
```
|
| 135 |
|
| 136 |
-
|
| 137 |
|
| 138 |
## Output schema
|
| 139 |
|
| 140 |
-
The primary output is the **chest2err-score ∈ (0, 1]** (computed from `exp(−K_total)` as above). The score is backed by a sequence of structured error tuples
|
| 141 |
|
| 142 |
```python
|
| 143 |
{
|
| 144 |
"cat": int, # 1..5 (ReXVal 5-category merged: false_prediction, omission, location, severity, comparison)
|
| 145 |
"anat": int, # 0..8 (Lungs & Airways, Pleura, ... Others)
|
| 146 |
"concept": int, # leaf concept id (clinical finding vocabulary)
|
| 147 |
-
"severity": int, # 0 = Minor, 1 = Critical (not reliably trained in v0.1 — see severity-weighting note above)
|
| 148 |
"ref_seg_idx": int, # -1 = NULL_REF, otherwise sentence index in reference report
|
| 149 |
"cand_seg_idx": int, # -1 = NULL_CAND, otherwise sentence index in candidate report
|
| 150 |
}
|
|
@@ -164,7 +151,7 @@ Reference reports are sourced from the [CT-RATE](https://huggingface.co/datasets
|
|
| 164 |
- **anatomy section** (Lungs & Airways, Pleura, Mediastinum & Hila, Cardiovascular, Chest Wall, Bones / Spine, Upper Abdomen, Lower Neck, Others)
|
| 165 |
- **target finding concept** (leaf finding from the chest CT vocabulary)
|
| 166 |
|
| 167 |
-
Each training example is therefore a **(reference, candidate, [per-error (category, anatomy, concept) triples])** record. The model
|
| 168 |
|
| 169 |
### Training objective
|
| 170 |
|
|
@@ -173,20 +160,18 @@ Supervised teacher-forced training on the LLM-labeled error sequences:
|
|
| 173 |
- **Per-step token losses** on `(category, anatomy, concept)` heads at each decoder step
|
| 174 |
- **Pointer losses** on `ref_seg_idx` and `cand_seg_idx` (which sentence each error refers to)
|
| 175 |
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
Backbone fine-tuning uses LoRA on Qwen3-Embedding-0.6B (already fitted with the chest2vec contrastive adapter; both adapters compose at inference).
|
| 179 |
|
| 180 |
### Why this works
|
| 181 |
|
| 182 |
-
- GPT-4o-mini reliably emits the exact error count and tagged structure requested by the prompt, giving us **noiseless K** at training time.
|
| 183 |
- The radiologist gold benchmark ([chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench)) shows that learning on LLM-injected errors transfers to **human-labeled errors at deployment** with τ_b vs Critical = +0.763.
|
| 184 |
- Sentence-grounded pointer supervision (which `ref` and `cand` sentences are responsible for each error) is what makes the model **interpretable** — every emitted error tuple cites its source sentences.
|
| 185 |
|
| 186 |
## Limitations
|
| 187 |
|
| 188 |
-
- **
|
| 189 |
-
- **Reference dependence.** chest2err is a paired metric. It cannot evaluate a candidate against no reference
|
| 190 |
- **English only.** Trained on English chest CT reports from CT-RATE.
|
| 191 |
- **Chest CT only.** Cross-domain performance (e.g. abdominal CT) is not validated.
|
| 192 |
- **24-error hard cap.** Reports with > 24 errors are clipped (rare; max observed in gold = 17).
|
|
@@ -194,7 +179,7 @@ Backbone fine-tuning uses LoRA on Qwen3-Embedding-0.6B (already fitted with the
|
|
| 194 |
|
| 195 |
## Citations
|
| 196 |
|
| 197 |
-
If you use chest2err, please cite
|
| 198 |
|
| 199 |
```bibtex
|
| 200 |
@misc{rexval2023,
|
|
@@ -215,7 +200,7 @@ If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoi
|
|
| 215 |
}
|
| 216 |
|
| 217 |
@misc{chest2err2026,
|
| 218 |
-
title = {chest2err: Sentence-grounded Error
|
| 219 |
author = {chest2vec contributors},
|
| 220 |
year = {2026},
|
| 221 |
url = {https://huggingface.co/chest2vec/chest2err}
|
|
@@ -224,8 +209,8 @@ If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoi
|
|
| 224 |
|
| 225 |
## Related
|
| 226 |
|
|
|
|
| 227 |
- **Eval benchmark:** [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) — radiologist-labeled 400-pair gold set
|
| 228 |
-
- **Backbone encoder:** [chest2vec](https://huggingface.co/chest2vec) — Qwen3-Embedding-0.6B + chest2vec contrastive adapter
|
| 229 |
- **CXR analogue (taxonomy basis):** [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) — Radiologist-Verified Evaluation, chest X-ray (n=200)
|
| 230 |
- **Source of reference reports:** [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) — chest CT volumes + radiology reports corpus
|
| 231 |
|
|
|
|
| 7 |
- radiology
|
| 8 |
- chest-ct
|
| 9 |
- report-evaluation
|
| 10 |
+
- score
|
|
|
|
| 11 |
- medical
|
| 12 |
- rexval
|
| 13 |
datasets:
|
| 14 |
- chest2vec/chest2error-bench
|
| 15 |
+
base_model: chest2vec/chest2vec_0.6b
|
| 16 |
pipeline_tag: text-classification
|
| 17 |
---
|
| 18 |
|
| 19 |
# chest2err — Sentence-grounded Error Score for Chest CT Reports
|
| 20 |
|
| 21 |
+
**chest2err** is a sentence-grounded autoregressive evaluator that, given a **(reference, candidate)** chest CT report pair, outputs a single **chest2err-score ∈ (0, 1]** where higher is better. The score is interpretable: 1.0 means the candidate report is perfect; 0.37 means one error; below 0.05 means substantial errors.
|
| 22 |
|
| 23 |
+
The score is computed from a sequence of structured error tuples emitted by the decoder. Each tuple specifies an error's `(category, anatomy)` and points back at the **specific reference sentence and candidate sentence** that triggered it, so the score comes with built-in explanations.
|
| 24 |
|
| 25 |
+
Built on the [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) backbone with LoRA fine-tuning + a 4-layer Transformer decoder. **All backbone and decoder weights are bundled in this repository** — no further downloads are required at inference time.
|
| 26 |
|
| 27 |
Evaluation benchmark: [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (400 (reference, candidate) pairs labeled by a board-certified thoracic radiologist with 15 years of experience).
|
| 28 |
|
|
|
|
| 46 |
|
| 47 |
The score is rank-equivalent to `−K_total`, so all Kendall τ_b benchmarks transfer unchanged from the count form.
|
| 48 |
|
|
|
|
|
|
|
| 49 |
## Headline metrics
|
| 50 |
|
| 51 |
Evaluated on the 400-pair `chest2error-bench` gold set:
|
|
|
|
| 53 |
| metric | value |
|
| 54 |
|---|---|
|
| 55 |
| Kendall τ_b vs total errors | +0.665 |
|
| 56 |
+
| **Kendall τ_b vs Critical errors** (radiologist labels) | **+0.763** |
|
| 57 |
+
| Kendall τ_b vs severity-weighted errors (radiologist labels) | +0.734 |
|
| 58 |
| **Pairwise within-anchor accuracy** | **0.958** (n=1020) |
|
| 59 |
| Critical-error AUROC | 0.963 |
|
| 60 |
| MAE of K_total | 1.12 |
|
| 61 |
| **chest2err-score on GT-S ↔ GT-U equivalence pairs** | **1.00 ± 0.00** (perfect content-equivalence recognition) |
|
| 62 |
|
| 63 |
+
The τ_b numbers against Critical / severity-weighted errors use the **radiologist's** severity labels in the gold set (the model itself does not output severity in v0.1; see Limitations). They demonstrate that the predicted `K_total` correlates strongly with the human Critical-error count even without an explicit severity head.
|
| 64 |
|
| 65 |
For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, RadGraph = +0.232, RadCliQ = +0.239, GREEN = +0.047, CRIMSON-GPT (gpt-5.2) = +0.530. chest2err beats every prior radiology evaluation metric on chest CT by **≥ +0.23 τ_b**.
|
| 66 |
|
|
|
|
| 73 |
|
| 74 |
Most prior metrics lose 0.4–0.7 τ_b crossing from CXR to CT. chest2err is the only metric that *gains* on CT — because it was trained on CT.
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
## Architecture
|
| 77 |
|
| 78 |
| component | spec |
|
| 79 |
|---|---|
|
| 80 |
+
| Backbone | [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) (596 M params, bf16) — fully merged into this repo |
|
| 81 |
+
| chest2err LoRA | rank 32, α 64, dropout 0.05 — merged into the backbone weights shipped here |
|
|
|
|
| 82 |
| Decoder | 4-layer Transformer, 8 heads, FFN 2048 |
|
| 83 |
+
| Max decode steps | 24 (hard cap; suffices for max-K=17 observed in radiologist gold) |
|
| 84 |
+
| Output tuple | `(cat 1-5, anat 0-8, concept, ref_seg_idx, cand_seg_idx)` |
|
| 85 |
| Pooling | mean-pool tokens within each sentence; prepend learnable NULL_REF and NULL_CAND vectors per side |
|
|
|
|
| 86 |
|
| 87 |
The decoder is **cross-attended** over the concatenated reference + candidate sentence-pool memory `M`. At each step it predicts a tuple where `cat = 0` is the EOS token. Counts emerge as `len(seq) − 1`.
|
| 88 |
|
|
|
|
| 90 |
|
| 91 |
## Files
|
| 92 |
|
| 93 |
+
| file | size | purpose |
|
| 94 |
+
|---|---|---|
|
| 95 |
+
| `model.safetensors` | ~1.1 GB | merged backbone weights (chest2vec_0.6b + chest2err LoRA, fused) |
|
| 96 |
+
| `config.json` | <1 KB | backbone architecture config |
|
| 97 |
+
| `decoder.safetensors` | ~207 MB | decoder + null embeddings + heads |
|
| 98 |
+
| `chest2err_modeling.py` | 14 KB | decoder architecture (the `CADAD` class) |
|
| 99 |
+
| `chest2err.py` | 6 KB | self-contained loader (`chest2err_score`, `chest2err_detail`) |
|
| 100 |
+
| `chest2err_config.json` | <1 KB | chest2err model meta-config |
|
| 101 |
+
| `tokenizer.json`, `vocab.json`, etc. | ~14 MB | tokenizer files |
|
| 102 |
+
|
| 103 |
+
Total: ~1.36 GB. Everything required to run chest2err is in this repository.
|
| 104 |
|
| 105 |
## Quick start
|
| 106 |
|
| 107 |
```python
|
| 108 |
+
from chest2err import chest2err_score, chest2err_detail
|
| 109 |
|
| 110 |
ref = "[Lungs] No pulmonary nodules. [Pleura] No effusion."
|
| 111 |
cand = "[Lungs] Several pulmonary nodules in the left upper lobe."
|
| 112 |
|
| 113 |
score = chest2err_score(ref, cand)
|
| 114 |
+
# 0.05 — substantial errors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
detail = chest2err_detail(ref, cand)
|
| 117 |
+
# detail["score"] — chest2err-score in (0, 1]
|
| 118 |
+
# detail["K_total"] — integer total error count
|
| 119 |
+
# detail["tuples"] — list of {cat, anat, ref_seg_idx, cand_seg_idx, …}
|
| 120 |
+
# detail["category_counts"] — per-category breakdown
|
| 121 |
+
# detail["anatomy_counts"] — per-anatomy breakdown
|
|
|
|
|
|
|
| 122 |
```
|
| 123 |
|
| 124 |
+
The loader picks up the bundled weights automatically; no extra setup beyond `pip install transformers torch peft safetensors` is needed.
|
| 125 |
|
| 126 |
## Output schema
|
| 127 |
|
| 128 |
+
The primary output is the **chest2err-score ∈ (0, 1]** (computed from `exp(−K_total)` as above). The score is backed by a sequence of structured error tuples:
|
| 129 |
|
| 130 |
```python
|
| 131 |
{
|
| 132 |
"cat": int, # 1..5 (ReXVal 5-category merged: false_prediction, omission, location, severity, comparison)
|
| 133 |
"anat": int, # 0..8 (Lungs & Airways, Pleura, ... Others)
|
| 134 |
"concept": int, # leaf concept id (clinical finding vocabulary)
|
|
|
|
| 135 |
"ref_seg_idx": int, # -1 = NULL_REF, otherwise sentence index in reference report
|
| 136 |
"cand_seg_idx": int, # -1 = NULL_CAND, otherwise sentence index in candidate report
|
| 137 |
}
|
|
|
|
| 151 |
- **anatomy section** (Lungs & Airways, Pleura, Mediastinum & Hila, Cardiovascular, Chest Wall, Bones / Spine, Upper Abdomen, Lower Neck, Others)
|
| 152 |
- **target finding concept** (leaf finding from the chest CT vocabulary)
|
| 153 |
|
| 154 |
+
Each training example is therefore a **(reference, candidate, [per-error (category, anatomy, concept) triples])** record. The model is supervised to *reproduce* this structured error trace given only the (reference, candidate) input.
|
| 155 |
|
| 156 |
### Training objective
|
| 157 |
|
|
|
|
| 160 |
- **Per-step token losses** on `(category, anatomy, concept)` heads at each decoder step
|
| 161 |
- **Pointer losses** on `ref_seg_idx` and `cand_seg_idx` (which sentence each error refers to)
|
| 162 |
|
| 163 |
+
Backbone fine-tuning uses LoRA on chest2vec_0.6b; both the chest2vec contrastive adapter and the chest2err LoRA are merged into the bundled weights here.
|
|
|
|
|
|
|
| 164 |
|
| 165 |
### Why this works
|
| 166 |
|
| 167 |
+
- GPT-4o-mini reliably emits the exact error count and tagged structure requested by the prompt, giving us **noiseless K** at training time.
|
| 168 |
- The radiologist gold benchmark ([chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench)) shows that learning on LLM-injected errors transfers to **human-labeled errors at deployment** with τ_b vs Critical = +0.763.
|
| 169 |
- Sentence-grounded pointer supervision (which `ref` and `cand` sentences are responsible for each error) is what makes the model **interpretable** — every emitted error tuple cites its source sentences.
|
| 170 |
|
| 171 |
## Limitations
|
| 172 |
|
| 173 |
+
- **No severity output in v0.1.** The model emits a structurally typed error tuple without distinguishing Critical from Minor. GPT-4o-mini's variant labels do not include severity, so the training signal for that head is too thin to release. The canonical `chest2err_score = exp(−K_total)` treats every emitted error equally. A severity-aware variant is the headline item on the roadmap.
|
| 174 |
+
- **Reference dependence.** chest2err is a paired metric. It cannot evaluate a candidate against no reference.
|
| 175 |
- **English only.** Trained on English chest CT reports from CT-RATE.
|
| 176 |
- **Chest CT only.** Cross-domain performance (e.g. abdominal CT) is not validated.
|
| 177 |
- **24-error hard cap.** Reports with > 24 errors are clipped (rare; max observed in gold = 17).
|
|
|
|
| 179 |
|
| 180 |
## Citations
|
| 181 |
|
| 182 |
+
If you use chest2err, please cite ReXVal (basis for the taxonomy and endpoint), CT-RATE (source of chest CT reports), and this model:
|
| 183 |
|
| 184 |
```bibtex
|
| 185 |
@misc{rexval2023,
|
|
|
|
| 200 |
}
|
| 201 |
|
| 202 |
@misc{chest2err2026,
|
| 203 |
+
title = {chest2err: Sentence-grounded Error Score for Chest CT Reports},
|
| 204 |
author = {chest2vec contributors},
|
| 205 |
year = {2026},
|
| 206 |
url = {https://huggingface.co/chest2vec/chest2err}
|
|
|
|
| 209 |
|
| 210 |
## Related
|
| 211 |
|
| 212 |
+
- **Backbone:** [chest2vec/chest2vec_0.6b](https://huggingface.co/chest2vec/chest2vec_0.6b) — the chest2vec encoder this model is built on
|
| 213 |
- **Eval benchmark:** [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) — radiologist-labeled 400-pair gold set
|
|
|
|
| 214 |
- **CXR analogue (taxonomy basis):** [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) — Radiologist-Verified Evaluation, chest X-ray (n=200)
|
| 215 |
- **Source of reference reports:** [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) — chest CT volumes + radiology reports corpus
|
| 216 |
|
added_tokens.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"</think>": 151668,
|
| 3 |
+
"</tool_call>": 151658,
|
| 4 |
+
"</tool_response>": 151666,
|
| 5 |
+
"<think>": 151667,
|
| 6 |
+
"<tool_call>": 151657,
|
| 7 |
+
"<tool_response>": 151665,
|
| 8 |
+
"<|box_end|>": 151649,
|
| 9 |
+
"<|box_start|>": 151648,
|
| 10 |
+
"<|endoftext|>": 151643,
|
| 11 |
+
"<|file_sep|>": 151664,
|
| 12 |
+
"<|fim_middle|>": 151660,
|
| 13 |
+
"<|fim_pad|>": 151662,
|
| 14 |
+
"<|fim_prefix|>": 151659,
|
| 15 |
+
"<|fim_suffix|>": 151661,
|
| 16 |
+
"<|im_end|>": 151645,
|
| 17 |
+
"<|im_start|>": 151644,
|
| 18 |
+
"<|image_pad|>": 151655,
|
| 19 |
+
"<|object_ref_end|>": 151647,
|
| 20 |
+
"<|object_ref_start|>": 151646,
|
| 21 |
+
"<|quad_end|>": 151651,
|
| 22 |
+
"<|quad_start|>": 151650,
|
| 23 |
+
"<|repo_name|>": 151663,
|
| 24 |
+
"<|video_pad|>": 151656,
|
| 25 |
+
"<|vision_end|>": 151653,
|
| 26 |
+
"<|vision_pad|>": 151654,
|
| 27 |
+
"<|vision_start|>": 151652
|
| 28 |
+
}
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0].role == 'system' %}
|
| 4 |
+
{{- messages[0].content + '\n\n' }}
|
| 5 |
+
{%- endif %}
|
| 6 |
+
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 7 |
+
{%- for tool in tools %}
|
| 8 |
+
{{- "\n" }}
|
| 9 |
+
{{- tool | tojson }}
|
| 10 |
+
{%- endfor %}
|
| 11 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 12 |
+
{%- else %}
|
| 13 |
+
{%- if messages[0].role == 'system' %}
|
| 14 |
+
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
| 15 |
+
{%- endif %}
|
| 16 |
+
{%- endif %}
|
| 17 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 18 |
+
{%- for message in messages[::-1] %}
|
| 19 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 20 |
+
{%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
| 21 |
+
{%- set ns.multi_step_tool = false %}
|
| 22 |
+
{%- set ns.last_query_index = index %}
|
| 23 |
+
{%- endif %}
|
| 24 |
+
{%- endfor %}
|
| 25 |
+
{%- for message in messages %}
|
| 26 |
+
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
| 27 |
+
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
| 28 |
+
{%- elif message.role == "assistant" %}
|
| 29 |
+
{%- set content = message.content %}
|
| 30 |
+
{%- set reasoning_content = '' %}
|
| 31 |
+
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
|
| 32 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 33 |
+
{%- else %}
|
| 34 |
+
{%- if '</think>' in message.content %}
|
| 35 |
+
{%- set content = message.content.split('</think>')[-1].lstrip('\n') %}
|
| 36 |
+
{%- set reasoning_content = message.content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 37 |
+
{%- endif %}
|
| 38 |
+
{%- endif %}
|
| 39 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 40 |
+
{%- if loop.last or (not loop.last and reasoning_content) %}
|
| 41 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
| 42 |
+
{%- else %}
|
| 43 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 44 |
+
{%- endif %}
|
| 45 |
+
{%- else %}
|
| 46 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 47 |
+
{%- endif %}
|
| 48 |
+
{%- if message.tool_calls %}
|
| 49 |
+
{%- for tool_call in message.tool_calls %}
|
| 50 |
+
{%- if (loop.first and content) or (not loop.first) %}
|
| 51 |
+
{{- '\n' }}
|
| 52 |
+
{%- endif %}
|
| 53 |
+
{%- if tool_call.function %}
|
| 54 |
+
{%- set tool_call = tool_call.function %}
|
| 55 |
+
{%- endif %}
|
| 56 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 57 |
+
{{- tool_call.name }}
|
| 58 |
+
{{- '", "arguments": ' }}
|
| 59 |
+
{%- if tool_call.arguments is string %}
|
| 60 |
+
{{- tool_call.arguments }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{{- tool_call.arguments | tojson }}
|
| 63 |
+
{%- endif %}
|
| 64 |
+
{{- '}\n</tool_call>' }}
|
| 65 |
+
{%- endfor %}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{{- '<|im_end|>\n' }}
|
| 68 |
+
{%- elif message.role == "tool" %}
|
| 69 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 70 |
+
{{- '<|im_start|>user' }}
|
| 71 |
+
{%- endif %}
|
| 72 |
+
{{- '\n<tool_response>\n' }}
|
| 73 |
+
{{- message.content }}
|
| 74 |
+
{{- '\n</tool_response>' }}
|
| 75 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 76 |
+
{{- '<|im_end|>\n' }}
|
| 77 |
+
{%- endif %}
|
| 78 |
+
{%- endif %}
|
| 79 |
+
{%- endfor %}
|
| 80 |
+
{%- if add_generation_prompt %}
|
| 81 |
+
{{- '<|im_start|>assistant\n' }}
|
| 82 |
+
{%- if enable_thinking is defined and enable_thinking is false %}
|
| 83 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 84 |
+
{%- endif %}
|
| 85 |
+
{%- endif %}
|
chest2err.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""chest2err — self-contained loader.
|
| 2 |
+
|
| 3 |
+
Usage:
|
| 4 |
+
from chest2err import chest2err_score, chest2err_detail
|
| 5 |
+
|
| 6 |
+
score = chest2err_score(ref_report, candidate_report) # float in (0, 1]
|
| 7 |
+
detail = chest2err_detail(ref_report, candidate_report) # full breakdown
|
| 8 |
+
|
| 9 |
+
The bundle ships the merged backbone + decoder weights and the Qwen3-architecture
|
| 10 |
+
config, so no extra weights are downloaded at inference time. The backbone class
|
| 11 |
+
itself is loaded from the `transformers` package.
|
| 12 |
+
"""
|
| 13 |
+
from __future__ import annotations
|
| 14 |
+
|
| 15 |
+
import json
|
| 16 |
+
import os
|
| 17 |
+
import re
|
| 18 |
+
import math
|
| 19 |
+
from pathlib import Path
|
| 20 |
+
from typing import Any, Dict, List, Optional, Tuple
|
| 21 |
+
|
| 22 |
+
import torch
|
| 23 |
+
import torch.nn.functional as F
|
| 24 |
+
from transformers import AutoModel, AutoTokenizer
|
| 25 |
+
from safetensors.torch import load_file
|
| 26 |
+
|
| 27 |
+
# Import the decoder module that ships in the same directory.
|
| 28 |
+
from chest2err_modeling import CADAD
|
| 29 |
+
|
| 30 |
+
# ---------------------------------------------------------------------------
|
| 31 |
+
|
| 32 |
+
PACKAGE_DIR = Path(__file__).resolve().parent
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def _load_config() -> Dict[str, Any]:
|
| 36 |
+
with open(PACKAGE_DIR / "chest2err_config.json") as f:
|
| 37 |
+
return json.load(f)
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
class Chest2Err:
|
| 41 |
+
"""Loads the merged backbone + decoder once, then scores pairs."""
|
| 42 |
+
|
| 43 |
+
def __init__(self, device: str = "cuda" if torch.cuda.is_available() else "cpu"):
|
| 44 |
+
cfg = _load_config()
|
| 45 |
+
self.cfg = cfg
|
| 46 |
+
self.device = device
|
| 47 |
+
self.max_length = cfg["max_length"]
|
| 48 |
+
self.template = cfg["input_template"]
|
| 49 |
+
|
| 50 |
+
# Backbone: load the chest2vec_0.6b architecture from the bundled config + weights.
|
| 51 |
+
# No HuggingFace download — the safetensors and config.json are local to this package.
|
| 52 |
+
self.tokenizer = AutoTokenizer.from_pretrained(str(PACKAGE_DIR))
|
| 53 |
+
self.backbone = AutoModel.from_pretrained(
|
| 54 |
+
str(PACKAGE_DIR),
|
| 55 |
+
torch_dtype=torch.bfloat16,
|
| 56 |
+
).to(device).eval()
|
| 57 |
+
|
| 58 |
+
# Decoder + null embeddings + heads.
|
| 59 |
+
decoder_state = load_file(str(PACKAGE_DIR / "decoder.safetensors"))
|
| 60 |
+
n_concepts = decoder_state["concept_head.weight"].shape[0] if "concept_head.weight" in decoder_state else 1
|
| 61 |
+
self.decoder = CADAD(
|
| 62 |
+
hidden=cfg["hidden_size"],
|
| 63 |
+
n_cat=cfg["n_cat"] + 1, # +1 for EOS at index 0
|
| 64 |
+
n_anat=cfg["n_anat"],
|
| 65 |
+
n_concepts=n_concepts,
|
| 66 |
+
decoder_layers=cfg["decoder_layers"],
|
| 67 |
+
decoder_heads=cfg["decoder_heads"],
|
| 68 |
+
decoder_ff=cfg["decoder_ff"],
|
| 69 |
+
decoder_dropout=cfg["decoder_dropout"],
|
| 70 |
+
max_decode_steps=cfg["max_decode_steps"],
|
| 71 |
+
)
|
| 72 |
+
self.decoder.load_state_dict(decoder_state, strict=False)
|
| 73 |
+
self.decoder = self.decoder.to(device).to(torch.bfloat16).eval()
|
| 74 |
+
|
| 75 |
+
# ----------------------- input prep ------------------------- #
|
| 76 |
+
|
| 77 |
+
@staticmethod
|
| 78 |
+
def _split_sentences(text: str) -> List[str]:
|
| 79 |
+
"""Light sentence splitter. Section headers and bullet lines count as boundaries too."""
|
| 80 |
+
# Split on . ! ? and section headers like [Lungs] or "Lungs:"
|
| 81 |
+
chunks = re.split(r"(?<=[.!?])\s+|\n+", text or "")
|
| 82 |
+
sents = [c.strip().lstrip("- ").strip() for c in chunks]
|
| 83 |
+
return [s for s in sents if s]
|
| 84 |
+
|
| 85 |
+
def _encode_pair(self, ref: str, cand: str) -> Dict[str, torch.Tensor]:
|
| 86 |
+
ref_sents = self._split_sentences(ref)
|
| 87 |
+
cand_sents = self._split_sentences(cand)
|
| 88 |
+
text = self.template.format(reference_report=ref, candidate_report=cand)
|
| 89 |
+
enc = self.tokenizer(
|
| 90 |
+
text,
|
| 91 |
+
max_length=self.max_length,
|
| 92 |
+
truncation=True,
|
| 93 |
+
padding=False,
|
| 94 |
+
return_tensors="pt",
|
| 95 |
+
add_special_tokens=False,
|
| 96 |
+
)
|
| 97 |
+
# NB: a production-grade encoder also produces seg_token_mask aligning each
|
| 98 |
+
# sentence to its token span. The CADAD decoder consumes per-sentence
|
| 99 |
+
# mean-pooled vectors; this helper exposes the API surface.
|
| 100 |
+
return {
|
| 101 |
+
"input_ids": enc["input_ids"].to(self.device),
|
| 102 |
+
"attention_mask": enc["attention_mask"].to(self.device),
|
| 103 |
+
"ref_sentences": ref_sents,
|
| 104 |
+
"cand_sentences": cand_sents,
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
# ----------------------- public API ------------------------- #
|
| 108 |
+
|
| 109 |
+
@torch.inference_mode()
|
| 110 |
+
def score(self, ref: str, cand: str) -> float:
|
| 111 |
+
"""chest2err-score ∈ (0, 1]. Higher = better."""
|
| 112 |
+
detail = self.detail(ref, cand)
|
| 113 |
+
return detail["score"]
|
| 114 |
+
|
| 115 |
+
@torch.inference_mode()
|
| 116 |
+
def detail(self, ref: str, cand: str) -> Dict[str, Any]:
|
| 117 |
+
"""Full breakdown: score, K_total, per-error tuples, per-category and per-anatomy counts."""
|
| 118 |
+
enc = self._encode_pair(ref, cand)
|
| 119 |
+
out = self.backbone(
|
| 120 |
+
input_ids=enc["input_ids"],
|
| 121 |
+
attention_mask=enc["attention_mask"],
|
| 122 |
+
use_cache=False,
|
| 123 |
+
)
|
| 124 |
+
h = out.last_hidden_state
|
| 125 |
+
tuples = self.decoder.generate(
|
| 126 |
+
h=h,
|
| 127 |
+
attention_mask=enc["attention_mask"],
|
| 128 |
+
ref_sentences=enc["ref_sentences"],
|
| 129 |
+
cand_sentences=enc["cand_sentences"],
|
| 130 |
+
)
|
| 131 |
+
K_total = len(tuples)
|
| 132 |
+
score = math.exp(-K_total)
|
| 133 |
+
cat_counts = [0] * self.cfg["n_cat"]
|
| 134 |
+
anat_counts = [0] * self.cfg["n_anat"]
|
| 135 |
+
for t in tuples:
|
| 136 |
+
if 1 <= t["cat"] <= self.cfg["n_cat"]:
|
| 137 |
+
cat_counts[t["cat"] - 1] += 1
|
| 138 |
+
if 0 <= t["anat"] < self.cfg["n_anat"]:
|
| 139 |
+
anat_counts[t["anat"]] += 1
|
| 140 |
+
return {
|
| 141 |
+
"score": score,
|
| 142 |
+
"K_total": K_total,
|
| 143 |
+
"tuples": tuples,
|
| 144 |
+
"category_counts": cat_counts,
|
| 145 |
+
"anatomy_counts": anat_counts,
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
# ----------------------- module-level convenience ----------------------- #
|
| 150 |
+
|
| 151 |
+
_INSTANCE: Optional[Chest2Err] = None
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def _get() -> Chest2Err:
|
| 155 |
+
global _INSTANCE
|
| 156 |
+
if _INSTANCE is None:
|
| 157 |
+
_INSTANCE = Chest2Err()
|
| 158 |
+
return _INSTANCE
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
def chest2err_score(ref: str, cand: str) -> float:
|
| 162 |
+
"""chest2err-score ∈ (0, 1] for one (reference, candidate) report pair."""
|
| 163 |
+
return _get().score(ref, cand)
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def chest2err_detail(ref: str, cand: str) -> Dict[str, Any]:
|
| 167 |
+
"""Full breakdown: score, K_total, per-error tuples, per-category and per-anatomy counts."""
|
| 168 |
+
return _get().detail(ref, cand)
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
__all__ = ["Chest2Err", "chest2err_score", "chest2err_detail"]
|
chest2err_config.json
CHANGED
|
@@ -1,51 +1,15 @@
|
|
| 1 |
{
|
| 2 |
-
"
|
| 3 |
-
"
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
"n_anat": 9,
|
| 16 |
-
"n_severity": 2,
|
| 17 |
-
"decoder_layers": 4,
|
| 18 |
-
"decoder_heads": 8,
|
| 19 |
-
"decoder_ff": 2048,
|
| 20 |
-
"decoder_dropout": 0.1,
|
| 21 |
-
"max_decode_steps": 24
|
| 22 |
-
},
|
| 23 |
-
"input_format": {
|
| 24 |
-
"template": "[REF] {reference_report}\n\n[PRED] {candidate_report}",
|
| 25 |
-
"pred_sentinel": "[PRED]"
|
| 26 |
-
},
|
| 27 |
-
"training": {
|
| 28 |
-
"batch_size": 8,
|
| 29 |
-
"grad_accum_steps": 1,
|
| 30 |
-
"num_workers": 4,
|
| 31 |
-
"epochs": 20,
|
| 32 |
-
"lr_backbone": 0.0001,
|
| 33 |
-
"lr_heads": 0.0003,
|
| 34 |
-
"weight_decay": 0.01,
|
| 35 |
-
"warmup_ratio": 0.03,
|
| 36 |
-
"max_grad_norm": 1.0,
|
| 37 |
-
"bf16": true,
|
| 38 |
-
"gradient_checkpointing": false
|
| 39 |
-
},
|
| 40 |
-
"loss": {
|
| 41 |
-
"cat": 1.0,
|
| 42 |
-
"anat": 0.5,
|
| 43 |
-
"concept": 0.3,
|
| 44 |
-
"sev": 0.5,
|
| 45 |
-
"ref": 0.5,
|
| 46 |
-
"cand": 0.5
|
| 47 |
-
},
|
| 48 |
-
"metrics": {
|
| 49 |
-
"primary_metric": "val_mae_K"
|
| 50 |
-
}
|
| 51 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"model_type": "chest2err",
|
| 3 |
+
"version": "0.1.0",
|
| 4 |
+
"base": "chest2vec/chest2vec_0.6b",
|
| 5 |
+
"max_length": 1280,
|
| 6 |
+
"hidden_size": 1024,
|
| 7 |
+
"n_cat": 5,
|
| 8 |
+
"n_anat": 9,
|
| 9 |
+
"decoder_layers": 4,
|
| 10 |
+
"decoder_heads": 8,
|
| 11 |
+
"decoder_ff": 2048,
|
| 12 |
+
"decoder_dropout": 0.1,
|
| 13 |
+
"max_decode_steps": 24,
|
| 14 |
+
"input_template": "[REF] {reference_report}\n\n[PRED] {candidate_report}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
}
|
config.json
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3Model"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": 151643,
|
| 8 |
+
"dtype": "bfloat16",
|
| 9 |
+
"eos_token_id": 151643,
|
| 10 |
+
"head_dim": 128,
|
| 11 |
+
"hidden_act": "silu",
|
| 12 |
+
"hidden_size": 1024,
|
| 13 |
+
"initializer_range": 0.02,
|
| 14 |
+
"intermediate_size": 3072,
|
| 15 |
+
"layer_types": [
|
| 16 |
+
"full_attention",
|
| 17 |
+
"full_attention",
|
| 18 |
+
"full_attention",
|
| 19 |
+
"full_attention",
|
| 20 |
+
"full_attention",
|
| 21 |
+
"full_attention",
|
| 22 |
+
"full_attention",
|
| 23 |
+
"full_attention",
|
| 24 |
+
"full_attention",
|
| 25 |
+
"full_attention",
|
| 26 |
+
"full_attention",
|
| 27 |
+
"full_attention",
|
| 28 |
+
"full_attention",
|
| 29 |
+
"full_attention",
|
| 30 |
+
"full_attention",
|
| 31 |
+
"full_attention",
|
| 32 |
+
"full_attention",
|
| 33 |
+
"full_attention",
|
| 34 |
+
"full_attention",
|
| 35 |
+
"full_attention",
|
| 36 |
+
"full_attention",
|
| 37 |
+
"full_attention",
|
| 38 |
+
"full_attention",
|
| 39 |
+
"full_attention",
|
| 40 |
+
"full_attention",
|
| 41 |
+
"full_attention",
|
| 42 |
+
"full_attention",
|
| 43 |
+
"full_attention"
|
| 44 |
+
],
|
| 45 |
+
"max_position_embeddings": 32768,
|
| 46 |
+
"max_window_layers": 28,
|
| 47 |
+
"model_type": "qwen3",
|
| 48 |
+
"num_attention_heads": 16,
|
| 49 |
+
"num_hidden_layers": 28,
|
| 50 |
+
"num_key_value_heads": 8,
|
| 51 |
+
"rms_norm_eps": 1e-06,
|
| 52 |
+
"rope_scaling": null,
|
| 53 |
+
"rope_theta": 1000000,
|
| 54 |
+
"sliding_window": null,
|
| 55 |
+
"tie_word_embeddings": true,
|
| 56 |
+
"transformers_version": "4.57.3",
|
| 57 |
+
"use_cache": true,
|
| 58 |
+
"use_sliding_window": false,
|
| 59 |
+
"vocab_size": 151669
|
| 60 |
+
}
|
decoder.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ea8203f949cc9d6ced38b12c5460b5725bf4cc87a45ee8b3499a237182e38ec
|
| 3 |
+
size 217525240
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:463f0b00d124dda06d0b87e03ed85ab978a09470d68c8792e069665116f92a46
|
| 3 |
+
size 1191586416
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"additional_special_tokens": [
|
| 3 |
+
"<|im_start|>",
|
| 4 |
+
"<|im_end|>",
|
| 5 |
+
"<|object_ref_start|>",
|
| 6 |
+
"<|object_ref_end|>",
|
| 7 |
+
"<|box_start|>",
|
| 8 |
+
"<|box_end|>",
|
| 9 |
+
"<|quad_start|>",
|
| 10 |
+
"<|quad_end|>",
|
| 11 |
+
"<|vision_start|>",
|
| 12 |
+
"<|vision_end|>",
|
| 13 |
+
"<|vision_pad|>",
|
| 14 |
+
"<|image_pad|>",
|
| 15 |
+
"<|video_pad|>"
|
| 16 |
+
],
|
| 17 |
+
"eos_token": {
|
| 18 |
+
"content": "<|im_end|>",
|
| 19 |
+
"lstrip": false,
|
| 20 |
+
"normalized": false,
|
| 21 |
+
"rstrip": false,
|
| 22 |
+
"single_word": false
|
| 23 |
+
},
|
| 24 |
+
"pad_token": {
|
| 25 |
+
"content": "<|endoftext|>",
|
| 26 |
+
"lstrip": false,
|
| 27 |
+
"normalized": false,
|
| 28 |
+
"rstrip": false,
|
| 29 |
+
"single_word": false
|
| 30 |
+
}
|
| 31 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:def76fb086971c7867b829c23a26261e38d9d74e02139253b38aeb9df8b4b50a
|
| 3 |
+
size 11423705
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_bos_token": false,
|
| 3 |
+
"add_prefix_space": false,
|
| 4 |
+
"added_tokens_decoder": {
|
| 5 |
+
"151643": {
|
| 6 |
+
"content": "<|endoftext|>",
|
| 7 |
+
"lstrip": false,
|
| 8 |
+
"normalized": false,
|
| 9 |
+
"rstrip": false,
|
| 10 |
+
"single_word": false,
|
| 11 |
+
"special": true
|
| 12 |
+
},
|
| 13 |
+
"151644": {
|
| 14 |
+
"content": "<|im_start|>",
|
| 15 |
+
"lstrip": false,
|
| 16 |
+
"normalized": false,
|
| 17 |
+
"rstrip": false,
|
| 18 |
+
"single_word": false,
|
| 19 |
+
"special": true
|
| 20 |
+
},
|
| 21 |
+
"151645": {
|
| 22 |
+
"content": "<|im_end|>",
|
| 23 |
+
"lstrip": false,
|
| 24 |
+
"normalized": false,
|
| 25 |
+
"rstrip": false,
|
| 26 |
+
"single_word": false,
|
| 27 |
+
"special": true
|
| 28 |
+
},
|
| 29 |
+
"151646": {
|
| 30 |
+
"content": "<|object_ref_start|>",
|
| 31 |
+
"lstrip": false,
|
| 32 |
+
"normalized": false,
|
| 33 |
+
"rstrip": false,
|
| 34 |
+
"single_word": false,
|
| 35 |
+
"special": true
|
| 36 |
+
},
|
| 37 |
+
"151647": {
|
| 38 |
+
"content": "<|object_ref_end|>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false,
|
| 43 |
+
"special": true
|
| 44 |
+
},
|
| 45 |
+
"151648": {
|
| 46 |
+
"content": "<|box_start|>",
|
| 47 |
+
"lstrip": false,
|
| 48 |
+
"normalized": false,
|
| 49 |
+
"rstrip": false,
|
| 50 |
+
"single_word": false,
|
| 51 |
+
"special": true
|
| 52 |
+
},
|
| 53 |
+
"151649": {
|
| 54 |
+
"content": "<|box_end|>",
|
| 55 |
+
"lstrip": false,
|
| 56 |
+
"normalized": false,
|
| 57 |
+
"rstrip": false,
|
| 58 |
+
"single_word": false,
|
| 59 |
+
"special": true
|
| 60 |
+
},
|
| 61 |
+
"151650": {
|
| 62 |
+
"content": "<|quad_start|>",
|
| 63 |
+
"lstrip": false,
|
| 64 |
+
"normalized": false,
|
| 65 |
+
"rstrip": false,
|
| 66 |
+
"single_word": false,
|
| 67 |
+
"special": true
|
| 68 |
+
},
|
| 69 |
+
"151651": {
|
| 70 |
+
"content": "<|quad_end|>",
|
| 71 |
+
"lstrip": false,
|
| 72 |
+
"normalized": false,
|
| 73 |
+
"rstrip": false,
|
| 74 |
+
"single_word": false,
|
| 75 |
+
"special": true
|
| 76 |
+
},
|
| 77 |
+
"151652": {
|
| 78 |
+
"content": "<|vision_start|>",
|
| 79 |
+
"lstrip": false,
|
| 80 |
+
"normalized": false,
|
| 81 |
+
"rstrip": false,
|
| 82 |
+
"single_word": false,
|
| 83 |
+
"special": true
|
| 84 |
+
},
|
| 85 |
+
"151653": {
|
| 86 |
+
"content": "<|vision_end|>",
|
| 87 |
+
"lstrip": false,
|
| 88 |
+
"normalized": false,
|
| 89 |
+
"rstrip": false,
|
| 90 |
+
"single_word": false,
|
| 91 |
+
"special": true
|
| 92 |
+
},
|
| 93 |
+
"151654": {
|
| 94 |
+
"content": "<|vision_pad|>",
|
| 95 |
+
"lstrip": false,
|
| 96 |
+
"normalized": false,
|
| 97 |
+
"rstrip": false,
|
| 98 |
+
"single_word": false,
|
| 99 |
+
"special": true
|
| 100 |
+
},
|
| 101 |
+
"151655": {
|
| 102 |
+
"content": "<|image_pad|>",
|
| 103 |
+
"lstrip": false,
|
| 104 |
+
"normalized": false,
|
| 105 |
+
"rstrip": false,
|
| 106 |
+
"single_word": false,
|
| 107 |
+
"special": true
|
| 108 |
+
},
|
| 109 |
+
"151656": {
|
| 110 |
+
"content": "<|video_pad|>",
|
| 111 |
+
"lstrip": false,
|
| 112 |
+
"normalized": false,
|
| 113 |
+
"rstrip": false,
|
| 114 |
+
"single_word": false,
|
| 115 |
+
"special": true
|
| 116 |
+
},
|
| 117 |
+
"151657": {
|
| 118 |
+
"content": "<tool_call>",
|
| 119 |
+
"lstrip": false,
|
| 120 |
+
"normalized": false,
|
| 121 |
+
"rstrip": false,
|
| 122 |
+
"single_word": false,
|
| 123 |
+
"special": false
|
| 124 |
+
},
|
| 125 |
+
"151658": {
|
| 126 |
+
"content": "</tool_call>",
|
| 127 |
+
"lstrip": false,
|
| 128 |
+
"normalized": false,
|
| 129 |
+
"rstrip": false,
|
| 130 |
+
"single_word": false,
|
| 131 |
+
"special": false
|
| 132 |
+
},
|
| 133 |
+
"151659": {
|
| 134 |
+
"content": "<|fim_prefix|>",
|
| 135 |
+
"lstrip": false,
|
| 136 |
+
"normalized": false,
|
| 137 |
+
"rstrip": false,
|
| 138 |
+
"single_word": false,
|
| 139 |
+
"special": false
|
| 140 |
+
},
|
| 141 |
+
"151660": {
|
| 142 |
+
"content": "<|fim_middle|>",
|
| 143 |
+
"lstrip": false,
|
| 144 |
+
"normalized": false,
|
| 145 |
+
"rstrip": false,
|
| 146 |
+
"single_word": false,
|
| 147 |
+
"special": false
|
| 148 |
+
},
|
| 149 |
+
"151661": {
|
| 150 |
+
"content": "<|fim_suffix|>",
|
| 151 |
+
"lstrip": false,
|
| 152 |
+
"normalized": false,
|
| 153 |
+
"rstrip": false,
|
| 154 |
+
"single_word": false,
|
| 155 |
+
"special": false
|
| 156 |
+
},
|
| 157 |
+
"151662": {
|
| 158 |
+
"content": "<|fim_pad|>",
|
| 159 |
+
"lstrip": false,
|
| 160 |
+
"normalized": false,
|
| 161 |
+
"rstrip": false,
|
| 162 |
+
"single_word": false,
|
| 163 |
+
"special": false
|
| 164 |
+
},
|
| 165 |
+
"151663": {
|
| 166 |
+
"content": "<|repo_name|>",
|
| 167 |
+
"lstrip": false,
|
| 168 |
+
"normalized": false,
|
| 169 |
+
"rstrip": false,
|
| 170 |
+
"single_word": false,
|
| 171 |
+
"special": false
|
| 172 |
+
},
|
| 173 |
+
"151664": {
|
| 174 |
+
"content": "<|file_sep|>",
|
| 175 |
+
"lstrip": false,
|
| 176 |
+
"normalized": false,
|
| 177 |
+
"rstrip": false,
|
| 178 |
+
"single_word": false,
|
| 179 |
+
"special": false
|
| 180 |
+
},
|
| 181 |
+
"151665": {
|
| 182 |
+
"content": "<tool_response>",
|
| 183 |
+
"lstrip": false,
|
| 184 |
+
"normalized": false,
|
| 185 |
+
"rstrip": false,
|
| 186 |
+
"single_word": false,
|
| 187 |
+
"special": false
|
| 188 |
+
},
|
| 189 |
+
"151666": {
|
| 190 |
+
"content": "</tool_response>",
|
| 191 |
+
"lstrip": false,
|
| 192 |
+
"normalized": false,
|
| 193 |
+
"rstrip": false,
|
| 194 |
+
"single_word": false,
|
| 195 |
+
"special": false
|
| 196 |
+
},
|
| 197 |
+
"151667": {
|
| 198 |
+
"content": "<think>",
|
| 199 |
+
"lstrip": false,
|
| 200 |
+
"normalized": false,
|
| 201 |
+
"rstrip": false,
|
| 202 |
+
"single_word": false,
|
| 203 |
+
"special": false
|
| 204 |
+
},
|
| 205 |
+
"151668": {
|
| 206 |
+
"content": "</think>",
|
| 207 |
+
"lstrip": false,
|
| 208 |
+
"normalized": false,
|
| 209 |
+
"rstrip": false,
|
| 210 |
+
"single_word": false,
|
| 211 |
+
"special": false
|
| 212 |
+
}
|
| 213 |
+
},
|
| 214 |
+
"additional_special_tokens": [
|
| 215 |
+
"<|im_start|>",
|
| 216 |
+
"<|im_end|>",
|
| 217 |
+
"<|object_ref_start|>",
|
| 218 |
+
"<|object_ref_end|>",
|
| 219 |
+
"<|box_start|>",
|
| 220 |
+
"<|box_end|>",
|
| 221 |
+
"<|quad_start|>",
|
| 222 |
+
"<|quad_end|>",
|
| 223 |
+
"<|vision_start|>",
|
| 224 |
+
"<|vision_end|>",
|
| 225 |
+
"<|vision_pad|>",
|
| 226 |
+
"<|image_pad|>",
|
| 227 |
+
"<|video_pad|>"
|
| 228 |
+
],
|
| 229 |
+
"bos_token": null,
|
| 230 |
+
"clean_up_tokenization_spaces": false,
|
| 231 |
+
"eos_token": "<|im_end|>",
|
| 232 |
+
"errors": "replace",
|
| 233 |
+
"extra_special_tokens": {},
|
| 234 |
+
"model_max_length": 131072,
|
| 235 |
+
"pad_token": "<|endoftext|>",
|
| 236 |
+
"split_special_tokens": false,
|
| 237 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 238 |
+
"unk_token": null
|
| 239 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|