BoilerplateChecker-1225

BoilerplateChecker-1225 is a binary text classifier that estimates whether a given patient summary indicates that the patient has a history of conditions that may exclude them from a given clinical trial based on that trial's boilerplate exclusion criteria. "Boilerplate exclusions" are intended to represent exclusion criteria that are not central to defining the target population for a specifictrial, but that instead tend to exclude patients from many clinical trials in general. Examples of "boilerplate exclusions" include concepts like "uncontrolled brain metastases" or "history of pneumonitis." This model is fine-tuned from [answerdotai/ModernBERT-large] for sequence classification on pairs of (trial_boilerplate_text, patient_boilerplate_text). "Patient boilerplate text" represents a subsection of an overall patient summary that describes any history of such conditions.

This model is not intended to capture whether a patient is excluded from a clinical trial based on trial criteria central to defining the trial's target population, which include age, sex, cancer type, histology, cancer burden requirements, biomarker requirements, and treatment history requirements. These concepts are covered by the separate TrialChecker classification model.

Important: This is a research prototype for model development, not a medical device or approved clinical decision support tool. It is not intended for clinical decision-making.



Training summary

The classifier was trained with a script that:

  1. Loads three sources of annotated patient–trial pairs:
    • Pairs originating from space-specific eligibility checks
    • “Patient→top-cohorts” checks (rounds 1–3)
    • “Trial-space→top patients” checks (rounds 1–3)
  2. Deduplicates by ['patient_boilerplate_text', 'trial_boilerplate_text']
  3. Builds the final text input as:
text = "Patient history: " + patient_boilerplate_text + "\nTrial exclusions:" + trial_boilerplate_text
  1. Uses exclusion_result as the binary label (0/1)
  2. Model is ModernBERT-large (sequence classification, 2 labels) at max_length 3192

Key hyperparameters from training (on H100 x 8)

  • Base model: answerdotai/ModernBERT-large
  • Max length: 3192
  • Optimizer settings: learning_rate=2e-5, weight_decay=0.01
  • Batch size: per_device_train_batch_size=8
  • Epochs: 2
  • Save strategy: epoch
  • Tokenizer: AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large")
  • Data collator: DataCollatorWithPadding

Intended use

  • Input: a string describing the patient's history of common "boilerplate exclusion conditions", if any, and a clinical trial's "boilerplate exclusion criteria," if any.
  • Output: probability that the patient is excluded from the trial based on the trial's "boilerplate exclusion criteria".
  • Use cases:
  • Deeper pre-screening of candidate patients for specific trials

Out of scope:

  • Confirming formal eligibility or safety
  • Formal (autonomous) medical record review, diagnosis, or treatment decision-making

Inference (Transformers)

Quick start (single example)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_REPO = "ksg-dfci/BoilerplateChecker-1225" 

tok = AutoTokenizer.from_pretrained(MODEL_REPO)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_REPO).to(device)
model.eval()

trial_boilerplate_text = (
 "Patients with uncontrolled brain metastases are excluded."
)

patient_boilerplate_text = (
 "New brain metastases identified 01/02/23, not yet treated."
)

text = "Patient history: " + patient_boilerplate_text + "\nTrial exclusions:" + trial_boilerplate_text

# Raw Transformers model
enc = tok(text, return_tensors="pt", truncation=True, max_length=4096).to(device)
with torch.no_grad():
 logits = model(**enc).logits
probs = logits.softmax(-1).squeeze(0)

# Label mapping was set in training: {0: "NEGATIVE", 1: "POSITIVE"}
p_positive = float(probs[1])
print(f"Exclusion probability: {p_positive:.3f}")

# Or pipeline API to get similar outputs
from trasnformers import pipeline
pipe = pipeline('text-classification', 'ksg-dfci/BoilerplateChecker-1225')
pipe([text])

Batched scoring

from typing import List
import torch

def score_pairs(spaces: List[str], summaries: List[str], tokenizer, model, max_length=4096, batch_size=8):
    assert len(spaces) == len(summaries)
    device = next(model.parameters()).device
    scores = []

    for i in range(0, len(spaces), batch_size):
        batch_spaces = spaces[i:i+batch_size]
        batch_summaries = summaries[i:i+batch_size]
        texts = [s + "\nNow here is the patient summary:" + p for s, p in zip(batch_spaces, batch_summaries)]
        enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=max_length).to(device)
        with torch.no_grad():
            logits = model(**enc).logits
        probs = logits.softmax(-1)[:, 1]  # POSITIVE
        scores.extend(probs.detach().cpu().tolist())
    return scores

# Example
trial_exclusions = [trial_boilerplate_text] * 3
paitne_boilerplate_texts = [patient_boilerplate_text, "Different patient comorbidities 1...", "Different patient comorbidities 2..."]
scores = score_pairs(spaces, summaries, tok, model)
print(scores)

Thresholding & calibration

  • Default decision: 0.5 on the POSITIVE probability.
  • For better calibration/operating points, tune the threshold on a validation set (e.g., maximize F1, optimize Youden’s J, or set to a desired precision).

How to prepare inputs

Trial boilerplate text: as per example above, a compact list of exclusion criteria for a trial that are not central to the target population for the trial Patient boilerplate text: as per example above, a concise summary of any medical conditions that may meet common boilerplat exclusion criteria

You can generate these inputs with your upstream LLM pipeline (e.g., gpt-oss-120b or our OncoReasoning-3B-1225 model for summarization and trial information extraction), but the classifier accepts any plain strings in the format shown above.


Reproducibility (high-level)

Below is the minimal structure used by the training script to build the dataset before tokenization:

# 1) Load and merge three labeled sources
#    - space_specific_eligibility_checks.parquet
#    - top_ten_cohorts_checked_round{1,2,3}.csv
#    - top_twenty_patients_checked_round{1,2,3}.csv

# 2) Deduplicate by ['patient_boilerplate_text','trial_boilerplate_text'] and keep:
#    - split, patient_boilerplate_text, trial_boilerplate_text, exclusion_result

# 3) Compose input text and label:
text  = this_space + "\nNow here is the patient summary:" + patient_summary
label = int(eligibility_result)  # 0 or 1

# 4) Tokenize with ModernBERT tokenizer (max_length=3192, truncation=True)
# 5) Train AutoModelForSequenceClassification, which then produces probabilities for the "POSITIVE" class (patient may be excluded) and for the "NEGATIVE" class (patient not predicted to be excluded)

To reproduce exactly, consult and run the original training scripts at https://github.com/kenlkehl/matchminer-ai-training.


Limitations & ethical considerations

  • Outputs reflect training data and may contain biases or errors.
  • The model estimates probability of exclusion based on common boilerplate criteria, not formal eligibility screening.
  • Not validated for safety-critical use; do not use for diagnosis or treatment decisions.

Citation

If you use this model or parts of the pipeline, please cite this model card and arxiv preprint (https://arxiv.org/abs/2412.17228) or corresponding journal publication (pending).


Downloads last month
93
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using ksg-dfci/BoilerplateChecker-1225 2