You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

newlantern-deberta-sk

DeBERTa-v3-large fine-tuned for radiology prior study relevance classification. Given a current and a prior radiology study description, predicts whether the prior is relevant to the current read.

Built for the New Lantern challenge: relevant-priors-v1.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("sh3hryarkhan/newlantern-deberta-sk")
model = AutoModelForSequenceClassification.from_pretrained("sh3hryarkhan/newlantern-deberta-sk")
model.eval()

text_a = "delta: 365d (<= 1y) | cur: CT HEAD WITHOUT CONTRAST | norm: CT/HEAD/BILATERAL/WITHOUT"
text_b = "prior: MRI BRAIN ROUTINE | norm: MRI/HEAD/BILATERAL/WITHOUT"

enc = tokenizer(text_a, text_b, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
    prob = torch.softmax(model(**enc).logits, dim=-1)[0, 1].item()

print(f"P(relevant) = {prob:.3f}")

Label 1 = relevant, 0 = not relevant. Decision threshold: 0.5.

Input Format

text_a: delta: {days}d ({bucket}) | cur: {current_description} | norm: {mod}/{region}/{lat}/{con}
text_b: prior: {prior_description} | norm: {mod}/{region}/{lat}/{con}

The norm field is a 4-tuple (modality, region, laterality, contrast) parsed from the raw description string. The delta bucket maps days to one of: same day, <= 1m, <= 3m, <= 6m, <= 1y, <= 2y, <= 5y, > 5y.

Training

  • Base model: microsoft/deberta-v3-large
  • Task: binary sequence classification (relevant / not relevant)
  • Data: public split of the New Lantern relevant-priors-v1 challenge (~13k labeled pairs)
  • Epochs: 4 (early stopped; epoch 5 overfit)
  • Phase 1 (epochs 1-2): standard cross-entropy training; AdamW lr=2e-5, cosine schedule, fp16
  • Phase 2 (epochs 3-4): hard-negative mining via WeightedRandomSampler (3x weight on misclassified samples)
  • Hardware: A100
  • Best val accuracy: 96.44%

System Context

This model is Tier 2 in a three-tier cascade. Tier 1 is a lookup table over seen (cur, prior) description pairs with Laplace smoothing; the encoder only fires when the lookup abstains (novel pairs). In production on the full public split, ~5-10% of pairs reach the encoder. See the full system at sh3hryarkhan/newlantern-deberta-sk.

Downloads last month
17
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sh3hryarkhan/newlantern-deberta-sk

Finetuned
(265)
this model