RadLITE-Reranker
Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker
A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone.
Recommended: Use this reranker together with RadLITE-Encoder in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves MRR 0.829 on radiology retrieval benchmarks.
Model Description
| Property | Value |
|---|---|
| Model Type | Cross-Encoder (Reranker) |
| Base Model | ms-marco-MiniLM-L-12-v2 |
| Domain | Radiology / Medical Imaging |
| Hidden Size | 384 |
| Max Sequence Length | 512 tokens |
| Output | Single relevance score |
| License | Apache 2.0 |
Why Use a Reranker?
Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions:
| Approach | Speed | Accuracy | Use Case |
|---|---|---|---|
| Bi-Encoder | Fast (1000s docs/sec) | Good | First-stage retrieval |
| Cross-Encoder | Slow (10s docs/sec) | Excellent | Reranking top candidates |
Two-stage pipeline: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results.
Performance
Impact on RadLIT-9 Benchmark
| Configuration | MRR | Improvement |
|---|---|---|
| Bi-Encoder only | 0.78 | baseline |
| Bi-Encoder + Reranker | 0.829 | +6.3% |
ABR Core Exam (Board-Style Questions)
Comparing two-stage pipeline (bi-encoder + reranker) vs bi-encoder alone:
| Dataset | Two-Stage MRR | Bi-Encoder Only | Improvement |
|---|---|---|---|
| Core Exam Chest | 0.533 | 0.409 | +30.3% |
| Core Exam Combined | 0.466 | 0.381 | +22.5% |
The reranker provides significant gains on complex, multi-part queries typical of board exam questions.
Published Benchmark Results
From Matulich & Mason, 2026:
| Benchmark | RadLIT Result | Key Finding |
|---|---|---|
| NFCorpus nDCG@10 | 0.268 | 17.9x improvement over RadBERT bi-encoder (0.015) |
| VQA-RAD MRR | 0.972 | Near-perfect retrieval on radiology Q&A |
| RadLIT-9 Thoracic | 0.736 nDCG@10 | Best-in-class (beat BGE-large, ColBERTv2) |
| RadLIT-9 Pediatric | 0.625 nDCG@10 | Best-in-class (beat BGE-large, ColBERTv2) |
| Zebra Test | 92% found rate | 2.1x improvement on rare conditions vs ColBERTv2 |
Vocabulary Alignment Hypothesis: Domain training provides measurable advantage when queries use radiology-specific terminology that aligns with the training domain.
Quick Start
Installation
pip install sentence-transformers>=2.2.0
Basic Usage
from sentence_transformers import CrossEncoder
# Load the reranker
reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)
# Query and candidate documents
query = "What are the imaging features of hepatocellular carcinoma?"
documents = [
"HCC typically shows arterial enhancement with portal venous washout on CT.",
"Fatty liver disease presents as decreased attenuation on non-contrast CT.",
"Hepatic hemangiomas show peripheral nodular enhancement.",
]
# Create query-document pairs
pairs = [[query, doc] for doc in documents]
# Get relevance scores
scores = reranker.predict(pairs)
# Apply temperature calibration (RECOMMENDED)
calibrated_scores = scores / 1.5
print("Scores:", calibrated_scores)
# Document about HCC will have highest score
Temperature Calibration
Important: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals:
# Raw scores might be: [4.2, -1.5, 0.8]
# After calibration: [2.8, -1.0, 0.53]
TEMPERATURE = 1.5 # Recommended value
def calibrated_predict(reranker, pairs):
raw_scores = reranker.predict(pairs)
return raw_scores / TEMPERATURE
Full Two-Stage Search Pipeline
from sentence_transformers import SentenceTransformer, CrossEncoder
import numpy as np
class RadLITESearch:
def __init__(self, device="cuda"):
# Stage 1: Fast bi-encoder
self.encoder = SentenceTransformer(
"matulichpt/radlit-biencoder",
device=device
)
# Stage 2: Precise reranker
self.reranker = CrossEncoder(
"matulichpt/radlit-crossencoder",
max_length=512,
device=device
)
self.temperature = 1.5
self.corpus_embeddings = None
self.corpus = None
def index_corpus(self, documents: list):
"""Pre-compute embeddings for your corpus."""
self.corpus = documents
self.corpus_embeddings = self.encoder.encode(
documents,
normalize_embeddings=True,
show_progress_bar=True,
batch_size=32
)
def search(self, query: str, top_k: int = 10, candidates: int = 50):
"""Two-stage search: retrieve then rerank."""
# Stage 1: Bi-encoder retrieval
query_emb = self.encoder.encode(query, normalize_embeddings=True)
scores = query_emb @ self.corpus_embeddings.T
top_indices = np.argsort(scores)[-candidates:][::-1]
# Stage 2: Cross-encoder reranking
candidate_docs = [self.corpus[i] for i in top_indices]
pairs = [[query, doc] for doc in candidate_docs]
rerank_scores = self.reranker.predict(pairs) / self.temperature
# Sort by reranked scores
sorted_indices = np.argsort(rerank_scores)[::-1]
results = []
for idx in sorted_indices[:top_k]:
results.append({
"document": candidate_docs[idx],
"corpus_index": int(top_indices[idx]),
"score": float(rerank_scores[idx]),
"biencoder_score": float(scores[top_indices[idx]])
})
return results
# Usage
searcher = RadLITESearch()
searcher.index_corpus(your_radiology_documents)
results = searcher.search("pneumothorax CT findings")
Integration with Any Corpus
Radiopaedia / Educational Content
import json
# Load your content (e.g., Radiopaedia articles)
with open("radiopaedia_articles.json") as f:
articles = json.load(f)
corpus = [article["content"] for article in articles]
# Initialize search
searcher = RadLITESearch()
searcher.index_corpus(corpus)
# Search
results = searcher.search("classic findings of pulmonary embolism on CTPA")
for r in results[:5]:
print(f"Score: {r['score']:.3f}")
print(f"Content: {r['document'][:200]}...")
print()
Integration with Elasticsearch/OpenSearch
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)
def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10):
"""Rerank Elasticsearch BM25 results."""
documents = [hit["_source"]["content"] for hit in es_results]
pairs = [[query, doc] for doc in documents]
scores = reranker.predict(pairs) / 1.5 # Temperature calibration
# Combine with ES scores (optional)
for i, hit in enumerate(es_results):
hit["rerank_score"] = float(scores[i])
hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i]
# Sort by combined score
reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True)
return reranked[:top_k]
Optimal Fusion Weights
When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights:
# Optimal weights from grid search on RadLIT-9
FUSION_WEIGHTS = {
"biencoder": 0.5, # RadLITE-Encoder similarity
"crossencoder": 0.2, # RadLITE-Reranker (after temp calibration)
"bm25": 0.3 # Lexical matching (if available)
}
def fused_score(bienc_score, ce_score, bm25_score=0):
return (
FUSION_WEIGHTS["biencoder"] * bienc_score +
FUSION_WEIGHTS["crossencoder"] * ce_score +
FUSION_WEIGHTS["bm25"] * bm25_score
)
Architecture
[Query] + [SEP] + [Document]
|
v
[BERT Tokenizer]
|
v
[MiniLM Encoder] (12 layers, 384 hidden)
|
v
[Classification Head]
|
v
Relevance Score (float)
Training Details
- Base Model: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking)
- Fine-tuning: Radiology query-document relevance pairs
- Training Steps: 5,626
- Best Validation Loss: 0.691
- Learning Rate: 2e-5
- Batch Size: 32
- Category Weighting: Yes (balanced across radiology subspecialties)
Best Practices
1. Always Use Temperature Calibration
Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion:
calibrated = raw_score / 1.5
2. Limit Candidates for Reranking
Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder:
# Good: Rerank top 50
rerank_candidates = 50
# Bad: Rerank entire corpus
rerank_candidates = len(corpus) # Too slow!
3. Batch Predictions
# Efficient: Single batch call
pairs = [[query, doc] for doc in candidates]
scores = reranker.predict(pairs, batch_size=32)
# Inefficient: Individual calls
scores = [reranker.predict([[query, doc]])[0] for doc in candidates]
4. GPU Acceleration
reranker = CrossEncoder(
"matulichpt/radlit-crossencoder",
max_length=512,
device="cuda" # Use GPU
)
Limitations
- English only: Trained on English radiology text
- Speed: ~10-50 pairs/second (use for reranking, not full corpus)
- 512 token limit: Long documents are truncated
- Domain-specific: Optimized for radiology, may underperform on general medical content
Citation
If you use RadLITE in your work, please cite:
@article{matulich2026radlit,
title = {Late Interaction Retrieval Unlocks Domain Knowledge in Radiology Language Models},
author = {Matulich, Patrick and Mason, Dan},
year = {2026},
journal = {Radiology: Artificial Intelligence},
note = {17.9x improvement over RadBERT; best-in-class on Thoracic/Pediatric subspecialties},
url = {https://huggingface.co/matulichpt/radlit-biencoder}
}
Related Models
- RadLITE-Encoder - Bi-encoder for first-stage retrieval
- RadBERT-RoBERTa-4m - Base radiology language model
License
Apache 2.0 - Free for commercial and research use.
- Downloads last month
- 36
Model tree for matulichpt/radlit-crossencoder
Base model
microsoft/MiniLM-L12-H384-uncasedEvaluation results
- MRR (with bi-encoder) on RadLIT-9 (Radiology Retrieval Benchmark)self-reported0.829
- MRR on ABR Core Exam (Chest) on RadLIT-9 (Radiology Retrieval Benchmark)self-reported0.533