Text Classification
Transformers
Safetensors
English
chest2vec_labeler
feature-extraction
radiology
chest-ct
report-labeling
multi-label
ct-rate
chexbert-style-f1
custom_code
Instructions to use chest2vec/chest2vec_labeler with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use chest2vec/chest2vec_labeler with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="chest2vec/chest2vec_labeler", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("chest2vec/chest2vec_labeler", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -35,10 +35,21 @@ other (micro / macro / weighted F1) — useful for evaluating radiology report g
|
|
| 35 |
|
| 36 |
## Label space
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
**Ternary head** — `softmax(logits, dim=-1)` over class indices `[0, 1, 2]`:
|
| 44 |
|
|
@@ -80,15 +91,21 @@ as truth):
|
|
| 80 |
|
| 81 |
```python
|
| 82 |
res = model.score_reports(gt_reports, pred_reports, tokenizer=tok) # equal-length lists
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
# or one-liner that loads the model for you:
|
| 87 |
from modeling_chest2vec_labeler import report_f1
|
| 88 |
report_f1(gt_reports, pred_reports, tokenizer=tok)
|
| 89 |
```
|
| 90 |
|
| 91 |
-
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
## Inputs & conventions
|
| 94 |
|
|
@@ -100,18 +117,36 @@ Returns `micro`, `macro`, `weighted` precision/recall/F1 over the 137 labels, pl
|
|
| 100 |
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
## Caveats
|
| 117 |
|
|
|
|
| 35 |
|
| 36 |
## Label space
|
| 37 |
|
| 38 |
+
The model head predicts **137 leaf labels**. They roll up through the chest-imaging hierarchy
|
| 39 |
+
into **38 upper/container groups** and **10 anatomy sections** (the `label_hierarchy` in
|
| 40 |
+
`config.json`), so predictions and report-comparison F1 can be reported at leaf, upper, or
|
| 41 |
+
anatomy granularity.
|
| 42 |
+
|
| 43 |
+
- The model outputs all **137** leaves. In the training data, **136** of them have at least one
|
| 44 |
+
positive example; the single exception is **`IVC filter`** (kept for taxonomy completeness,
|
| 45 |
+
but it had no positives, so the model effectively never predicts it).
|
| 46 |
+
- The exact label list is in `config.json` (`labels`). Full definitions and per-split counts are
|
| 47 |
+
in the **[chest2vec/chest2vec_labels](https://huggingface.co/datasets/chest2vec/chest2vec_labels)**
|
| 48 |
+
dataset's [`LABEL_HIERARCHY.md`](https://huggingface.co/datasets/chest2vec/chest2vec_labels/blob/main/LABEL_HIERARCHY.md).
|
| 49 |
+
|
| 50 |
+
This model was **trained and evaluated on the
|
| 51 |
+
[chest2vec/chest2vec_labels](https://huggingface.co/datasets/chest2vec/chest2vec_labels)
|
| 52 |
+
dataset** (revised CT-RATE, 137-leaf taxonomy).
|
| 53 |
|
| 54 |
**Ternary head** — `softmax(logits, dim=-1)` over class indices `[0, 1, 2]`:
|
| 55 |
|
|
|
|
| 91 |
|
| 92 |
```python
|
| 93 |
res = model.score_reports(gt_reports, pred_reports, tokenizer=tok) # equal-length lists
|
| 94 |
+
# scores are reported at three hierarchy levels:
|
| 95 |
+
for level in ("leaf", "upper", "anatomy"):
|
| 96 |
+
b = res[level]
|
| 97 |
+
print(level, b["n_labels"], b["micro"]["f1"], b["macro"]["f1"], b["weighted"]["f1"])
|
| 98 |
+
print(res["leaf"]["per_label"]["Pleural effusion"]) # {'precision':..,'recall':..,'f1':..,'support_gt':..}
|
| 99 |
|
| 100 |
# or one-liner that loads the model for you:
|
| 101 |
from modeling_chest2vec_labeler import report_f1
|
| 102 |
report_f1(gt_reports, pred_reports, tokenizer=tok)
|
| 103 |
```
|
| 104 |
|
| 105 |
+
Each level (`leaf` = 137 labels, `upper` = 38 container groups, `anatomy` = 10 sections) returns
|
| 106 |
+
`micro` / `macro` / `weighted` precision/recall/F1 plus `per_label`. Upper/anatomy scores are the
|
| 107 |
+
max-over-children roll-up of the leaf predictions (`model.aggregate_hierarchy(...)`). Coarser
|
| 108 |
+
levels are easier to match, so upper/anatomy F1 are typically higher than leaf.
|
| 109 |
|
| 110 |
## Inputs & conventions
|
| 111 |
|
|
|
|
| 117 |
|
| 118 |
## Evaluation
|
| 119 |
|
| 120 |
+
**How these numbers were produced:** `run_para_v2_eval.sh` runs the model in **direct-paragraph**
|
| 121 |
+
mode (full report, max_len 512) and writes `eval_ctrate_test_direct.json` (public) and
|
| 122 |
+
`eval_sample1000_private.json` (private). Metric = **macro-F1** of the **positive class** (softmax
|
| 123 |
+
probability of class 2) at **threshold 0.33**. Because the all-labels macro is dragged down by
|
| 124 |
+
sparse-tail labels, the **headline restricts to leaf labels with ≥30 positive examples** in that
|
| 125 |
+
eval set — **53 of the evaluated leaves on the public set, 29 on the private set**. Upper/anatomy
|
| 126 |
+
rows are the hierarchy roll-up.
|
| 127 |
+
|
| 128 |
+
**CT-RATE revised test (public, 1,464 reports)** — from `chest2vec/chest2vec_labels` test split:
|
| 129 |
+
|
| 130 |
+
| Level | # labels | macro-F1 @0.33 | macro-AUC |
|
| 131 |
+
|---|--:|--:|--:|
|
| 132 |
+
| leaf (≥30 positives) | 53 | **0.875** | 0.989 |
|
| 133 |
+
| leaf (all evaluated) | 131 | 0.749 | — |
|
| 134 |
+
| upper (≥30 positives) | 27 | 0.938 | 0.994 |
|
| 135 |
+
| anatomy | 10 | 0.956 | 0.993 |
|
| 136 |
+
|
| 137 |
+
**Private evaluation set (1,000 reports)** — a held-out internal set, not released:
|
| 138 |
+
|
| 139 |
+
| Level | # labels | macro-F1 @0.33 | macro-AUC |
|
| 140 |
+
|---|--:|--:|--:|
|
| 141 |
+
| leaf (≥30 positives) | 29 | **0.766** | 0.972 |
|
| 142 |
+
| leaf (all evaluated) | 60 | 0.731 | — |
|
| 143 |
+
| upper (≥30 positives) | 19 | 0.837 | — |
|
| 144 |
+
| anatomy | 10 | 0.869 | — |
|
| 145 |
+
|
| 146 |
+
Leaf macro-AUC barely moves public→private (**0.989 → 0.972**), i.e. label ranking transfers to
|
| 147 |
+
the unseen set; the F1 gap is mostly threshold / labeling-convention, not a domain failure.
|
| 148 |
+
Separately, a radiologist spot-checked **966** reports of the public test labels (857 fully
|
| 149 |
+
accepted / 60 imperfect-but-acceptable / 49 failed; see the [dataset card](https://huggingface.co/datasets/chest2vec/chest2vec_labels)).
|
| 150 |
|
| 151 |
## Caveats
|
| 152 |
|