Initial release: chest2err sentence-grounded error decoder (τ_b=+0.763, pairwise acc=0.958)

Browse files

Files changed (5) hide show

README.md +185 -0
chest2err_config.json +51 -0
chest2err_modeling.py +329 -0
model.safetensors +3 -0
train_config.yaml +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,185 @@

+---
+license: cc-by-nc-4.0
+language:
+- en
+library_name: pytorch
+tags:
+- radiology
+- chest-ct
+- report-evaluation
+- error-counting
+- sentence-grounded-decoder
+- medical
+- rexval
+datasets:
+- chest2vec/chest2error-bench
+base_model: Qwen/Qwen3-Embedding-0.6B
+pipeline_tag: text-classification
+---
+# chest2err — Sentence-grounded Error Decoder for Chest CT Reports
+**chest2err** is a sentence-grounded autoregressive decoder that, given a **(reference, candidate)** chest CT report pair, emits a sequence of structured error tuples. Each tuple specifies an error's `(category, anatomy, severity)` and points back at the **specific reference sentence and candidate sentence** that triggered it. The total error count `K` is the length of the emitted sequence.
+Built on top of the [chest2vec](https://huggingface.co/chest2vec) backbone (Qwen3-Embedding-0.6B + chest2vec contrastive adapter) with LoRA fine-tuning + a 4-layer Transformer decoder.
+Evaluation benchmark: [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (400 (reference, candidate) pairs labeled by a board-certified thoracic radiologist with 15 years of experience).
+## Headline metrics
+Evaluated on the 400-pair `chest2error-bench` gold set:
+| metric | value |
+|---|---|
+| **Kendall τ_b vs Critical errors** | **+0.763** |
+| Kendall τ_b vs total errors | +0.665 |
+| Kendall τ_b vs severity-weighted | +0.734 |
+| **Pairwise within-anchor accuracy** | **0.958** (n=1020) |
+| Critical-error AUROC | 0.963 |
+| MAE vs gold total K | 1.12 |
+For comparison on the same benchmark: BLEU τ_b = +0.235, BERTScore = +0.254, RadGraph = +0.232, RadCliQ = +0.239, GREEN = +0.047, CRIMSON-GPT (gpt-5.2) = +0.530. chest2err beats every prior radiology evaluation metric on chest CT by **≥ +0.23 τ_b**.
+### CXR/CT generalization
+| corpus | τ_b vs Critical |
+|---|---|
+| ReXVal (CXR, n=200) | +0.682 |
+| Chest CT (this benchmark, n=400) | **+0.763** |
+Most prior metrics lose 0.4–0.7 τ_b crossing from CXR to CT. chest2err is the only metric that *gains* on CT — because it was trained on CT.
+### Reference-style invariance
+On 100 GT-S ↔ GT-U content-equivalence pairs (same anchor, structured vs unstructured format), chest2err predicts **K = 0.00 ± 0.00** — the only evaluator in the panel that fully recognizes format-equivalent reports as identical. On *different*-anchor pairs it correctly predicts **K = 10.5 ± 9.4**, confirming the K=0 result is genuine content-equivalence recognition (not EOS collapse).
+## Architecture
+| component | spec |
+|---|---|
+| Base | `Qwen/Qwen3-Embedding-0.6B` |
+| chest2vec adapter | LoRA, frozen at inference |
+| chest2err LoRA | rank 32, α 64, dropout 0.05 |
+| Decoder | 4-layer Transformer, 8 heads, FFN 2048 |
+| Max decode steps | 24 (hard cap; suffices for max-K=18 observed in gold) |
+| Output tuple | `(cat 1-5, anat 0-8, concept, severity, ref_seg_idx, cand_seg_idx)` |
+| Pooling | mean-pool tokens within each sentence; prepend learnable NULL_REF and NULL_CAND vectors per side |
+| Trainable params | ~63 M (LoRA + decoder + null embeddings) |
+The decoder is **cross-attended** over the concatenated reference + candidate sentence-pool memory `M`. At each step it predicts a tuple where `cat = 0` is the EOS token. Counts emerge as `len(seq) − 1`.
+Mean-pooling sentences before the decoder makes the encoder **paraphrase-robust** (inherits chest2vec's contrastive properties) and the decoder **permutation-invariant** with respect to sentence order.
+## Files
+| file | purpose |
+|---|---|
+| `model.safetensors` | LoRA adapter + decoder weights + null embeddings (~242 MB) |
+| `chest2err_modeling.py` | model architecture (the `CADAD` class) |
+| `chest2err_config.json` | model hyperparameters (decoder dims, n_cat, n_anat, etc.) |
+| `train_config.yaml` | full training-time config snapshot |
+## Quick start
+Inference requires the cera_eval package (in-tree at [chest2vec_error/src/cera_eval/](https://github.com/...)). A standalone HF-Hub-loadable wrapper is on the roadmap; in the meantime:
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+from chest2err_modeling import CADAD  # downloaded from this repo
+# Plus the backbone loader from chest2vec:
+#   pip install transformers peft safetensors
+#   load Qwen/Qwen3-Embedding-0.6B + chest2vec adapter as in chest2vec repo
+# Load weights
+ckpt_path = hf_hub_download("chest2vec/chest2err", "model.safetensors")
+state = load_file(ckpt_path)
+# Wire into your backbone + decoder construction:
+model = CADAD(backbone=chest2vec_backbone, hidden=1024,
+              n_cat=5, n_anat=9, n_concepts=concept_vocab_size,
+              decoder_layers=4, decoder_heads=8, decoder_ff=2048,
+              max_decode_steps=24)
+model.load_state_dict(state, strict=False)
+model.eval()
+# At inference, encode (ref, cand), build sentence segment masks,
+# then call model.generate(...) which returns a list of tuples.
+# K = len(tuples) - 1 (EOS).
+```
+A complete inference example (with sentence segmentation + tokenization) lives in [chest2vec_error/src/cera_eval/scorer.py](https://github.com/...).
+## Output schema
+Each generated tuple is:
+```python
+{
+    "cat":          int,  # 1..5 (ReXVal 5-category merged: false_prediction, omission, location, severity, comparison)
+    "anat":         int,  # 0..8 (Lungs & Airways, Pleura, ... Others)
+    "concept":      int,  # leaf concept id (clinical finding vocabulary)
+    "severity":     int,  # 0 = Minor, 1 = Critical
+    "ref_seg_idx":  int,  # -1 = NULL_REF, otherwise sentence index in reference report
+    "cand_seg_idx": int,  # -1 = NULL_CAND, otherwise sentence index in candidate report
+}
+```
+`cat == 0` is the EOS marker; the model stops when it emits it.
+## Training data
+Trained on `chest2vec/chest2err-train` (in preparation): 53,881 (reference, candidate) pairs across 4 candidate styles (V1-V4) + a V5 high-error supplement. Validation: the 200-variant slice of [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) (audited radiologist gold).
+The reference reports are sourced from the [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) chest CT corpus; candidate variants and seeded errors were generated by an LLM following the [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) error taxonomy.
+## Limitations
+- **Reference dependence.** chest2err is a paired metric. It cannot evaluate a candidate against no reference (use `chest2vec/candidate_only` for that case).
+- **English only.** Trained on English chest CT reports from CT-RATE.
+- **Chest CT only.** Cross-domain performance (e.g. abdominal CT) is not validated.
+- **24-error hard cap.** Reports with > 24 errors are clipped (rare; max observed in gold = 17).
+- **Single-radiologist gold.** Inter-rater calibration is in progress.
+## Citations
+If you use chest2err, please cite both ReXVal (basis for the taxonomy and endpoint), CT-RATE (source of chest CT reports), and this model:
+```bibtex
+@misc{rexval2023,
+  title     = {{ReXVal}: Radiologist-Verified Evaluation of Automated Radiology Report Metrics},
+  author    = {Yu, F. and Endo, M. and Krishnan, R. and others},
+  year      = {2023},
+  publisher = {PhysioNet},
+  url       = {https://physionet.org/content/rexval-dataset/1.0.0/}
+}
+@misc{hamamci2024ctrate,
+  title         = {A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities},
+  author        = {Hamamci, Ibrahim Ethem and Er, Sezgin and Almas, Furkan and others},
+  year          = {2024},
+  eprint        = {2403.17834},
+  archivePrefix = {arXiv},
+  url           = {https://huggingface.co/datasets/ibrahimhamamci/CT-RATE}
+}
+@misc{chest2err2026,
+  title  = {chest2err: Sentence-grounded Error Decoder for Chest CT Reports},
+  author = {chest2vec contributors},
+  year   = {2026},
+  url    = {https://huggingface.co/chest2vec/chest2err}
+}
+```
+## Related
+- **Eval benchmark:** [chest2vec/chest2error-bench](https://huggingface.co/datasets/chest2vec/chest2error-bench) — radiologist-labeled 400-pair gold set
+- **Backbone encoder:** [chest2vec](https://huggingface.co/chest2vec) — Qwen3-Embedding-0.6B + chest2vec contrastive adapter
+- **CXR analogue (taxonomy basis):** [ReXVal](https://physionet.org/content/rexval-dataset/1.0.0/) — Radiologist-Verified Evaluation, chest X-ray (n=200)
+- **Source of reference reports:** [CT-RATE](https://huggingface.co/datasets/ibrahimhamamci/CT-RATE) — chest CT volumes + radiology reports corpus
+## License
+CC-BY-NC-4.0. Released for research use.

chest2err_config.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "seed": 42,
+  "model": {
+    "backbone_name": "Qwen/Qwen3-Embedding-0.6B",
+    "chest2vec_adapter_path": "/opt/project/chest2vec/export_chest2vec_0.6b_chest/contrastive",
+    "architecture": "cada_d",
+    "max_length": 1280,
+    "attn_implementation": "flash_attention_2",
+    "use_lora": true,
+    "lora_rank": 32,
+    "lora_alpha": 64,
+    "lora_dropout": 0.05,
+    "freeze_backbone_initially": false,
+    "n_cat": 5,
+    "n_anat": 9,
+    "n_severity": 2,
+    "decoder_layers": 4,
+    "decoder_heads": 8,
+    "decoder_ff": 2048,
+    "decoder_dropout": 0.1,
+    "max_decode_steps": 24
+  },
+  "input_format": {
+    "template": "[REF] {reference_report}\n\n[PRED] {candidate_report}",
+    "pred_sentinel": "[PRED]"
+  },
+  "training": {
+    "batch_size": 8,
+    "grad_accum_steps": 1,
+    "num_workers": 4,
+    "epochs": 20,
+    "lr_backbone": 0.0001,
+    "lr_heads": 0.0003,
+    "weight_decay": 0.01,
+    "warmup_ratio": 0.03,
+    "max_grad_norm": 1.0,
+    "bf16": true,
+    "gradient_checkpointing": false
+  },
+  "loss": {
+    "cat": 1.0,
+    "anat": 0.5,
+    "concept": 0.3,
+    "sev": 0.5,
+    "ref": 0.5,
+    "cand": 0.5
+  },
+  "metrics": {
+    "primary_metric": "val_mae_K"
+  }
+}

chest2err_modeling.py ADDED Viewed

	@@ -0,0 +1,329 @@

+"""CADA-D — sentence-grounded autoregressive error-tuple decoder.
+Architecture
+------------
+1. Encoder (reused from CADA): backbone produces [B, T, D] hidden states.
+2. Sentence pooling: mean-pool hidden states over per-segment token masks
+   on each side; prepend a learnable NULL_REF / NULL_CAND vector per side.
+3. Cross-attended decoder: TransformerDecoder over the concatenated
+   ref+cand segment pool. At each step it predicts a tuple
+       (cat, anat, concept, severity, ref_seg_idx, cand_seg_idx)
+   with cat=0 reserved for EOS.
+Counts emerge as `len(seq) - 1`, cell counts as a histogram over (cat, anat).
+The explanation IS the prediction — each emitted tuple points to a specific
+ref sentence (or NULL) and a specific cand sentence (or NULL).
+"""
+from __future__ import annotations
+import math
+from typing import Dict, Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+def _segment_pool(hidden: torch.Tensor, seg_token_mask: torch.Tensor):
+    """Mean-pool tokens over per-segment masks.
+    hidden:         [B, T, D]
+    seg_token_mask: [B, S, T] bool   1 where token t belongs to segment s.
+    Returns
+        pool: [B, S, D]
+        valid: [B, S]   True where segment had at least 1 token.
+    """
+    m = seg_token_mask.to(hidden.dtype)
+    denom = m.sum(dim=-1, keepdim=True).clamp_min(1.0)
+    pool = (m @ hidden) / denom
+    valid = seg_token_mask.any(dim=-1)
+    return pool, valid
+class _TupleEmbedder(nn.Module):
+    """Sum of category/anatomy/concept/severity embeddings + segment embeddings,
+    then a small projection. Used to embed teacher-forced tuples back to D."""
+    def __init__(self, n_cat: int, n_anat: int, n_concept: int, n_sev: int,
+                 hidden_size: int):
+        super().__init__()
+        self.cat_emb = nn.Embedding(n_cat + 1, hidden_size)
+        self.anat_emb = nn.Embedding(n_anat, hidden_size)
+        self.concept_emb = nn.Embedding(n_concept, hidden_size)
+        self.sev_emb = nn.Embedding(n_sev, hidden_size)
+        self.proj = nn.Linear(hidden_size, hidden_size)
+    def forward(self, cat, anat, concept, sev, ref_emb, cand_emb):
+        e = (self.cat_emb(cat) + self.anat_emb(anat)
+             + self.concept_emb(concept) + self.sev_emb(sev)
+             + ref_emb + cand_emb)
+        return self.proj(e)
+class CADAD(nn.Module):
+    """Sentence-grounded autoregressive error-tuple decoder."""
+    EOS_CAT_IDX = 0  # special class in `cat` for end-of-sequence
+    def __init__(
+        self,
+        backbone,
+        hidden_size: int,
+        n_cat: int = 5,
+        n_anat: int = 9,
+        n_concept: int = 386,
+        n_severity: int = 2,
+        decoder_layers: int = 2,
+        decoder_heads: int = 8,
+        decoder_ff: int = 1024,
+        dropout: float = 0.1,
+        max_decode_steps: int = 24,
+    ):
+        super().__init__()
+        self.backbone = backbone
+        self.hidden_size = hidden_size
+        self.n_cat = n_cat
+        self.n_anat = n_anat
+        self.n_concept = n_concept
+        self.n_severity = n_severity
+        self.max_decode_steps = max_decode_steps
+        # Memory-side conditioning
+        self.mem_type_emb = nn.Embedding(2, hidden_size)  # 0=ref-side, 1=cand-side
+        self.null_ref = nn.Parameter(torch.randn(1, 1, hidden_size) * 0.02)
+        self.null_cand = nn.Parameter(torch.randn(1, 1, hidden_size) * 0.02)
+        self.bos_emb = nn.Parameter(torch.randn(1, 1, hidden_size) * 0.02)
+        self.tuple_emb = _TupleEmbedder(n_cat, n_anat, n_concept, n_severity, hidden_size)
+        layer = nn.TransformerDecoderLayer(
+            d_model=hidden_size, nhead=decoder_heads,
+            dim_feedforward=decoder_ff, dropout=dropout,
+            batch_first=True, activation="gelu", norm_first=True,
+        )
+        self.decoder = nn.TransformerDecoder(layer, num_layers=decoder_layers)
+        # Output heads
+        self.head_cat = nn.Linear(hidden_size, n_cat + 1)        # +1 for EOS at idx 0
+        self.head_anat = nn.Linear(hidden_size, n_anat)
+        self.head_concept = nn.Linear(hidden_size, n_concept)
+        self.head_severity = nn.Linear(hidden_size, n_severity)
+        self.proj_ref = nn.Linear(hidden_size, hidden_size)
+        self.proj_cand = nn.Linear(hidden_size, hidden_size)
+    def encode_memory(self, input_ids, attention_mask,
+                      ref_seg_token_mask, cand_seg_token_mask):
+        """Returns dict with ref_pool, cand_pool, memory, valid masks.
+        ref_pool/cand_pool include a leading NULL slot at index 0.
+        """
+        out = self.backbone(input_ids=input_ids,
+                            attention_mask=attention_mask,
+                            return_dict=True)
+        hidden = out.last_hidden_state                              # [B, T, D]
+        ref_pool, ref_valid = _segment_pool(hidden, ref_seg_token_mask)
+        cand_pool, cand_valid = _segment_pool(hidden, cand_seg_token_mask)
+        B = hidden.size(0)
+        device = hidden.device
+        zero_t = torch.zeros(B, 1, dtype=torch.long, device=device)
+        one_t = torch.ones(B, 1, dtype=torch.long, device=device)
+        # Prepend NULL slot at index 0 on each side.
+        null_r = self.null_ref.expand(B, 1, -1).to(hidden.dtype)
+        null_c = self.null_cand.expand(B, 1, -1).to(hidden.dtype)
+        ref_pool_full = torch.cat([null_r, ref_pool], dim=1)
+        cand_pool_full = torch.cat([null_c, cand_pool], dim=1)
+        # Side-type embeddings
+        side_ref = self.mem_type_emb(zero_t).to(hidden.dtype)
+        side_cand = self.mem_type_emb(one_t).to(hidden.dtype)
+        ref_pool_full = ref_pool_full + side_ref
+        cand_pool_full = cand_pool_full + side_cand
+        bool_one = torch.ones(B, 1, dtype=torch.bool, device=device)
+        ref_valid_full = torch.cat([bool_one, ref_valid], dim=1)
+        cand_valid_full = torch.cat([bool_one, cand_valid], dim=1)
+        memory = torch.cat([ref_pool_full, cand_pool_full], dim=1)  # [B, M, D]
+        memory_valid = torch.cat([ref_valid_full, cand_valid_full], dim=1)
+        return {
+            "ref_pool": ref_pool_full, "ref_valid": ref_valid_full,
+            "cand_pool": cand_pool_full, "cand_valid": cand_valid_full,
+            "memory": memory, "memory_valid": memory_valid,
+        }
+    def _gather_seg_emb(self, pool: torch.Tensor, idx: torch.Tensor) -> torch.Tensor:
+        """pool: [B, S, D], idx: [B, K] (≥0). Returns [B, K, D] via batched gather."""
+        B, K = idx.shape
+        D = pool.size(-1)
+        b_idx = torch.arange(B, device=pool.device).unsqueeze(1).expand(-1, K)
+        return pool[b_idx, idx]
+    def forward_train(
+        self,
+        input_ids, attention_mask,
+        ref_seg_token_mask, cand_seg_token_mask,
+        target_cat, target_anat, target_concept, target_sev,
+        target_ref, target_cand,
+    ):
+        """All targets are [B, K]. Padding & ignored positions are -100.
+        target_cat[b, k]==0 marks EOS at position k.
+        target_ref/target_cand are indices into ref_pool/cand_pool (incl. NULL=0).
+        """
+        enc = self.encode_memory(input_ids, attention_mask,
+                                 ref_seg_token_mask, cand_seg_token_mask)
+        memory = enc["memory"]
+        ref_pool, cand_pool = enc["ref_pool"], enc["cand_pool"]
+        B, K = target_cat.shape
+        # For teacher-forcing we need the segment embedding for each target step,
+        # using clamp_min(0) so PAD/IGNORE sites get NULL. Loss ignores them later.
+        ref_idx_safe = target_ref.clamp_min(0)
+        cand_idx_safe = target_cand.clamp_min(0)
+        ref_emb_per_t = self._gather_seg_emb(ref_pool, ref_idx_safe)
+        cand_emb_per_t = self._gather_seg_emb(cand_pool, cand_idx_safe)
+        tuple_emb_all = self.tuple_emb(
+            cat=target_cat.clamp_min(0),
+            anat=target_anat.clamp_min(0),
+            concept=target_concept.clamp_min(0),
+            sev=target_sev.clamp_min(0),
+            ref_emb=ref_emb_per_t,
+            cand_emb=cand_emb_per_t,
+        )
+        # Shift right with BOS
+        bos = self.bos_emb.expand(B, 1, -1).to(tuple_emb_all.dtype)
+        decoder_input = torch.cat([bos, tuple_emb_all[:, :-1, :]], dim=1)
+        causal_mask = nn.Transformer.generate_square_subsequent_mask(K).to(decoder_input.device)
+        mem_kp_mask = ~enc["memory_valid"]
+        out = self.decoder(
+            tgt=decoder_input,
+            memory=memory,
+            tgt_mask=causal_mask,
+            memory_key_padding_mask=mem_kp_mask,
+        )                                                            # [B, K, D]
+        logits_cat = self.head_cat(out)
+        logits_anat = self.head_anat(out)
+        logits_concept = self.head_concept(out)
+        logits_sev = self.head_severity(out)
+        scale = 1.0 / math.sqrt(self.hidden_size)
+        ref_q = self.proj_ref(out)
+        cand_q = self.proj_cand(out)
+        logits_ref = torch.einsum("bkd,bsd->bks", ref_q, ref_pool) * scale
+        logits_cand = torch.einsum("bkd,bsd->bks", cand_q, cand_pool) * scale
+        # Mask invalid pointer slots (padded segments) to -inf
+        logits_ref = logits_ref.masked_fill(~enc["ref_valid"][:, None, :], -1e4)
+        logits_cand = logits_cand.masked_fill(~enc["cand_valid"][:, None, :], -1e4)
+        return {
+            "logits_cat": logits_cat,
+            "logits_anat": logits_anat,
+            "logits_concept": logits_concept,
+            "logits_sev": logits_sev,
+            "logits_ref": logits_ref,
+            "logits_cand": logits_cand,
+            "memory": memory,
+        }
+    @torch.no_grad()
+    def decode_greedy(
+        self,
+        input_ids, attention_mask,
+        ref_seg_token_mask, cand_seg_token_mask,
+    ):
+        """Greedy autoregressive decoding. Returns list-of-list of dicts (per pair)."""
+        enc = self.encode_memory(input_ids, attention_mask,
+                                 ref_seg_token_mask, cand_seg_token_mask)
+        memory = enc["memory"]
+        ref_pool, cand_pool = enc["ref_pool"], enc["cand_pool"]
+        ref_valid, cand_valid = enc["ref_valid"], enc["cand_valid"]
+        mem_kp_mask = ~enc["memory_valid"]
+        B = input_ids.size(0)
+        device = input_ids.device
+        D = memory.size(-1)
+        bos = self.bos_emb.expand(B, 1, -1).to(memory.dtype)
+        prev_emb = bos
+        running = torch.ones(B, dtype=torch.bool, device=device)
+        out_seqs = [[] for _ in range(B)]
+        for step in range(self.max_decode_steps):
+            causal = nn.Transformer.generate_square_subsequent_mask(prev_emb.size(1)).to(device)
+            dec = self.decoder(prev_emb, memory, tgt_mask=causal, memory_key_padding_mask=mem_kp_mask)
+            last = dec[:, -1, :]                                     # [B, D]
+            # Sample / argmax each head
+            cat_pred = self.head_cat(last).argmax(-1)                # [B]
+            anat_pred = self.head_anat(last).argmax(-1)
+            concept_pred = self.head_concept(last).argmax(-1)
+            sev_pred = self.head_severity(last).argmax(-1)
+            scale = 1.0 / math.sqrt(self.hidden_size)
+            ref_q = self.proj_ref(last)
+            cand_q = self.proj_cand(last)
+            ref_logit = (torch.einsum("bd,bsd->bs", ref_q, ref_pool) * scale).masked_fill(~ref_valid, -1e4)
+            cand_logit = (torch.einsum("bd,bsd->bs", cand_q, cand_pool) * scale).masked_fill(~cand_valid, -1e4)
+            ref_pred = ref_logit.argmax(-1)
+            cand_pred = cand_logit.argmax(-1)
+            for b in range(B):
+                if not running[b]:
+                    continue
+                if cat_pred[b].item() == self.EOS_CAT_IDX:
+                    running[b] = False
+                    continue
+                out_seqs[b].append({
+                    "cat": int(cat_pred[b]),
+                    "anat": int(anat_pred[b]),
+                    "concept_id": int(concept_pred[b]),
+                    "severity": int(sev_pred[b]),
+                    "ref_seg_idx": int(ref_pred[b]),
+                    "cand_seg_idx": int(cand_pred[b]),
+                })
+            if not running.any():
+                break
+            # Build next-step embedding from this step's predictions
+            ref_emb_step = ref_pool[torch.arange(B, device=device), ref_pred]
+            cand_emb_step = cand_pool[torch.arange(B, device=device), cand_pred]
+            next_emb = self.tuple_emb(
+                cat=cat_pred, anat=anat_pred,
+                concept=concept_pred, sev=sev_pred,
+                ref_emb=ref_emb_step, cand_emb=cand_emb_step,
+            ).unsqueeze(1)                                           # [B, 1, D]
+            prev_emb = torch.cat([prev_emb, next_emb], dim=1)
+        return out_seqs
+def cadad_loss(out: Dict[str, torch.Tensor],
+               target_cat, target_anat, target_concept, target_sev,
+               target_ref, target_cand,
+               weights: Optional[Dict[str, float]] = None) -> Dict[str, torch.Tensor]:
+    """Cross-entropy on every head. Pad/ignore positions = -100 in targets.
+    EOS positions only supervise `cat`; other heads should be -100 there.
+    """
+    w = {"cat": 1.0, "anat": 0.5, "concept": 0.3, "sev": 0.5,
+         "ref": 0.5, "cand": 0.5, **(weights or {})}
+    L_cat = F.cross_entropy(out["logits_cat"].reshape(-1, out["logits_cat"].size(-1)),
+                            target_cat.reshape(-1), ignore_index=-100)
+    L_anat = F.cross_entropy(out["logits_anat"].reshape(-1, out["logits_anat"].size(-1)),
+                             target_anat.reshape(-1), ignore_index=-100)
+    L_concept = F.cross_entropy(out["logits_concept"].reshape(-1, out["logits_concept"].size(-1)),
+                                target_concept.reshape(-1), ignore_index=-100)
+    L_sev = F.cross_entropy(out["logits_sev"].reshape(-1, out["logits_sev"].size(-1)),
+                            target_sev.reshape(-1), ignore_index=-100)
+    L_ref = F.cross_entropy(out["logits_ref"].reshape(-1, out["logits_ref"].size(-1)),
+                            target_ref.reshape(-1), ignore_index=-100)
+    L_cand = F.cross_entropy(out["logits_cand"].reshape(-1, out["logits_cand"].size(-1)),
+                             target_cand.reshape(-1), ignore_index=-100)
+    total = (w["cat"] * L_cat + w["anat"] * L_anat + w["concept"] * L_concept
+             + w["sev"] * L_sev + w["ref"] * L_ref + w["cand"] * L_cand)
+    return {"total": total, "cat": L_cat, "anat": L_anat, "concept": L_concept,
+            "sev": L_sev, "ref": L_ref, "cand": L_cand}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7736077f20e4b6713701a4faef0250dfd9a669f5ae8f243a002708ccd01f99be
+size 254257936

train_config.yaml ADDED Viewed

	@@ -0,0 +1,51 @@

+input_format:
+  pred_sentinel: '[PRED]'
+  template: '[REF] {reference_report}
+    [PRED] {candidate_report}'
+loss:
+  anat: 0.5
+  cand: 0.5
+  cat: 1.0
+  concept: 0.3
+  ref: 0.5
+  sev: 0.5
+metrics:
+  primary_metric: val_mae_K
+model:
+  architecture: cada_d
+  attn_implementation: flash_attention_2
+  backbone_name: Qwen/Qwen3-Embedding-0.6B
+  chest2vec_adapter_path: /opt/project/chest2vec/export_chest2vec_0.6b_chest/contrastive
+  decoder_dropout: 0.1
+  decoder_ff: 2048
+  decoder_heads: 8
+  decoder_layers: 4
+  freeze_backbone_initially: false
+  lora_alpha: 64
+  lora_dropout: 0.05
+  lora_rank: 32
+  max_decode_steps: 24
+  max_length: 1280
+  n_anat: 9
+  n_cat: 5
+  n_severity: 2
+  use_lora: true
+paths:
+  concept_vocab_path: /opt/project/chest2vec/chest2vec_error/artifacts_v5/concept2id.json
+  data_csv: /opt/project/chest2vec/create_labels/unified_variants_v5_merged.csv
+  output_dir: /opt/project/chest2vec/chest2vec_error/artifacts/cada_d_6gpu
+seed: 42
+training:
+  batch_size: 8
+  bf16: true
+  epochs: 20
+  grad_accum_steps: 1
+  gradient_checkpointing: false
+  lr_backbone: 0.0001
+  lr_heads: 0.0003
+  max_grad_norm: 1.0
+  num_workers: 4
+  warmup_ratio: 0.03
+  weight_decay: 0.01