|
|
---
|
|
|
license: apache-2.0
|
|
|
language:
|
|
|
- en
|
|
|
tags:
|
|
|
- cross-encoder
|
|
|
- reranker
|
|
|
- radiology
|
|
|
- medical
|
|
|
- retrieval
|
|
|
- sentence-similarity
|
|
|
- healthcare
|
|
|
- clinical
|
|
|
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
|
|
|
pipeline_tag: text-classification
|
|
|
library_name: sentence-transformers
|
|
|
datasets:
|
|
|
- radiology-education-corpus
|
|
|
metrics:
|
|
|
- mrr
|
|
|
- ndcg
|
|
|
model-index:
|
|
|
- name: RadLITE-Reranker
|
|
|
results:
|
|
|
- task:
|
|
|
type: reranking
|
|
|
name: Document Reranking
|
|
|
dataset:
|
|
|
name: RadLIT-9 (Radiology Retrieval Benchmark)
|
|
|
type: radiology-retrieval
|
|
|
metrics:
|
|
|
- type: mrr
|
|
|
value: 0.829
|
|
|
name: MRR (with bi-encoder)
|
|
|
- type: mrr_improvement
|
|
|
value: 0.303
|
|
|
name: MRR Improvement on ACR Core Exam (+30.3%)
|
|
|
---
|
|
|
|
|
|
# RadLITE-Reranker
|
|
|
|
|
|
**Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker**
|
|
|
|
|
|
A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone.
|
|
|
|
|
|
> **Recommended:** Use this reranker together with [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves **MRR 0.829** on radiology benchmarks (+30% on board exam questions).
|
|
|
|
|
|
## Model Description
|
|
|
|
|
|
| Property | Value |
|
|
|
|----------|-------|
|
|
|
| **Model Type** | Cross-Encoder (Reranker) |
|
|
|
| **Base Model** | [ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) |
|
|
|
| **Domain** | Radiology / Medical Imaging |
|
|
|
| **Hidden Size** | 384 |
|
|
|
| **Max Sequence Length** | 512 tokens |
|
|
|
| **Output** | Single relevance score |
|
|
|
| **License** | Apache 2.0 |
|
|
|
|
|
|
### Why Use a Reranker?
|
|
|
|
|
|
Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions:
|
|
|
|
|
|
| Approach | Speed | Accuracy | Use Case |
|
|
|
|----------|-------|----------|----------|
|
|
|
| Bi-Encoder | Fast (1000s docs/sec) | Good | First-stage retrieval |
|
|
|
| Cross-Encoder | Slow (10s docs/sec) | Excellent | Reranking top candidates |
|
|
|
|
|
|
**Two-stage pipeline**: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results.
|
|
|
|
|
|
## Performance
|
|
|
|
|
|
### Impact on RadLIT-9 Benchmark
|
|
|
|
|
|
| Configuration | MRR | Improvement |
|
|
|
|---------------|-----|-------------|
|
|
|
| Bi-Encoder only | 0.78 | baseline |
|
|
|
| **Bi-Encoder + Reranker** | **0.829** | **+6.3%** |
|
|
|
|
|
|
### ACR Core Exam (Board-Style Questions)
|
|
|
|
|
|
| Dataset | With Reranker | Without | Improvement |
|
|
|
|---------|---------------|---------|-------------|
|
|
|
| Core Exam Chest | 0.533 | 0.409 | **+30.3%** |
|
|
|
| Core Exam Combined | 0.466 | 0.381 | **+22.5%** |
|
|
|
|
|
|
The reranker is especially valuable for complex, multi-part queries typical of board exam questions.
|
|
|
|
|
|
## Quick Start
|
|
|
|
|
|
### Installation
|
|
|
|
|
|
```bash
|
|
|
pip install sentence-transformers>=2.2.0
|
|
|
```
|
|
|
|
|
|
### Basic Usage
|
|
|
|
|
|
```python
|
|
|
from sentence_transformers import CrossEncoder
|
|
|
|
|
|
# Load the reranker
|
|
|
reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512)
|
|
|
|
|
|
# Query and candidate documents
|
|
|
query = "What are the imaging features of hepatocellular carcinoma?"
|
|
|
documents = [
|
|
|
"HCC typically shows arterial enhancement with portal venous washout on CT.",
|
|
|
"Fatty liver disease presents as decreased attenuation on non-contrast CT.",
|
|
|
"Hepatic hemangiomas show peripheral nodular enhancement.",
|
|
|
]
|
|
|
|
|
|
# Create query-document pairs
|
|
|
pairs = [[query, doc] for doc in documents]
|
|
|
|
|
|
# Get relevance scores
|
|
|
scores = reranker.predict(pairs)
|
|
|
|
|
|
# Apply temperature calibration (RECOMMENDED)
|
|
|
calibrated_scores = scores / 1.5
|
|
|
|
|
|
print("Scores:", calibrated_scores)
|
|
|
# Document about HCC will have highest score
|
|
|
```
|
|
|
|
|
|
### Temperature Calibration
|
|
|
|
|
|
**Important**: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals:
|
|
|
|
|
|
```python
|
|
|
# Raw scores might be: [4.2, -1.5, 0.8]
|
|
|
# After calibration: [2.8, -1.0, 0.53]
|
|
|
|
|
|
TEMPERATURE = 1.5 # Recommended value
|
|
|
|
|
|
def calibrated_predict(reranker, pairs):
|
|
|
raw_scores = reranker.predict(pairs)
|
|
|
return raw_scores / TEMPERATURE
|
|
|
```
|
|
|
|
|
|
### Full Two-Stage Search Pipeline
|
|
|
|
|
|
```python
|
|
|
from sentence_transformers import SentenceTransformer, CrossEncoder
|
|
|
import numpy as np
|
|
|
|
|
|
class RadLITESearch:
|
|
|
def __init__(self, device="cuda"):
|
|
|
# Stage 1: Fast bi-encoder
|
|
|
self.encoder = SentenceTransformer(
|
|
|
"matulichpt/RadLITE-Encoder",
|
|
|
device=device
|
|
|
)
|
|
|
# Stage 2: Precise reranker
|
|
|
self.reranker = CrossEncoder(
|
|
|
"matulichpt/RadLITE-Reranker",
|
|
|
max_length=512,
|
|
|
device=device
|
|
|
)
|
|
|
self.temperature = 1.5
|
|
|
self.corpus_embeddings = None
|
|
|
self.corpus = None
|
|
|
|
|
|
def index_corpus(self, documents: list):
|
|
|
"""Pre-compute embeddings for your corpus."""
|
|
|
self.corpus = documents
|
|
|
self.corpus_embeddings = self.encoder.encode(
|
|
|
documents,
|
|
|
normalize_embeddings=True,
|
|
|
show_progress_bar=True,
|
|
|
batch_size=32
|
|
|
)
|
|
|
|
|
|
def search(self, query: str, top_k: int = 10, candidates: int = 50):
|
|
|
"""Two-stage search: retrieve then rerank."""
|
|
|
|
|
|
# Stage 1: Bi-encoder retrieval
|
|
|
query_emb = self.encoder.encode(query, normalize_embeddings=True)
|
|
|
scores = query_emb @ self.corpus_embeddings.T
|
|
|
top_indices = np.argsort(scores)[-candidates:][::-1]
|
|
|
|
|
|
# Stage 2: Cross-encoder reranking
|
|
|
candidate_docs = [self.corpus[i] for i in top_indices]
|
|
|
pairs = [[query, doc] for doc in candidate_docs]
|
|
|
rerank_scores = self.reranker.predict(pairs) / self.temperature
|
|
|
|
|
|
# Sort by reranked scores
|
|
|
sorted_indices = np.argsort(rerank_scores)[::-1]
|
|
|
|
|
|
results = []
|
|
|
for idx in sorted_indices[:top_k]:
|
|
|
results.append({
|
|
|
"document": candidate_docs[idx],
|
|
|
"corpus_index": int(top_indices[idx]),
|
|
|
"score": float(rerank_scores[idx]),
|
|
|
"biencoder_score": float(scores[top_indices[idx]])
|
|
|
})
|
|
|
return results
|
|
|
|
|
|
|
|
|
# Usage
|
|
|
searcher = RadLITESearch()
|
|
|
searcher.index_corpus(your_radiology_documents)
|
|
|
results = searcher.search("pneumothorax CT findings")
|
|
|
```
|
|
|
|
|
|
## Integration with Any Corpus
|
|
|
|
|
|
### Radiopaedia / Educational Content
|
|
|
|
|
|
```python
|
|
|
import json
|
|
|
|
|
|
# Load your content (e.g., Radiopaedia articles)
|
|
|
with open("radiopaedia_articles.json") as f:
|
|
|
articles = json.load(f)
|
|
|
|
|
|
corpus = [article["content"] for article in articles]
|
|
|
|
|
|
# Initialize search
|
|
|
searcher = RadLITESearch()
|
|
|
searcher.index_corpus(corpus)
|
|
|
|
|
|
# Search
|
|
|
results = searcher.search("classic findings of pulmonary embolism on CTPA")
|
|
|
|
|
|
for r in results[:5]:
|
|
|
print(f"Score: {r['score']:.3f}")
|
|
|
print(f"Content: {r['document'][:200]}...")
|
|
|
print()
|
|
|
```
|
|
|
|
|
|
### Integration with Elasticsearch/OpenSearch
|
|
|
|
|
|
```python
|
|
|
from sentence_transformers import CrossEncoder
|
|
|
|
|
|
reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512)
|
|
|
|
|
|
def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10):
|
|
|
"""Rerank Elasticsearch BM25 results."""
|
|
|
documents = [hit["_source"]["content"] for hit in es_results]
|
|
|
pairs = [[query, doc] for doc in documents]
|
|
|
|
|
|
scores = reranker.predict(pairs) / 1.5 # Temperature calibration
|
|
|
|
|
|
# Combine with ES scores (optional)
|
|
|
for i, hit in enumerate(es_results):
|
|
|
hit["rerank_score"] = float(scores[i])
|
|
|
hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i]
|
|
|
|
|
|
# Sort by combined score
|
|
|
reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True)
|
|
|
return reranked[:top_k]
|
|
|
```
|
|
|
|
|
|
## Optimal Fusion Weights
|
|
|
|
|
|
When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights:
|
|
|
|
|
|
```python
|
|
|
# Optimal weights from grid search on RadLIT-9
|
|
|
FUSION_WEIGHTS = {
|
|
|
"biencoder": 0.5, # RadLITE-Encoder similarity
|
|
|
"crossencoder": 0.2, # RadLITE-Reranker (after temp calibration)
|
|
|
"bm25": 0.3 # Lexical matching (if available)
|
|
|
}
|
|
|
|
|
|
def fused_score(bienc_score, ce_score, bm25_score=0):
|
|
|
return (
|
|
|
FUSION_WEIGHTS["biencoder"] * bienc_score +
|
|
|
FUSION_WEIGHTS["crossencoder"] * ce_score +
|
|
|
FUSION_WEIGHTS["bm25"] * bm25_score
|
|
|
)
|
|
|
```
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
```
|
|
|
[Query] + [SEP] + [Document]
|
|
|
|
|
|
|
v
|
|
|
[BERT Tokenizer]
|
|
|
|
|
|
|
v
|
|
|
[MiniLM Encoder] (12 layers, 384 hidden)
|
|
|
|
|
|
|
v
|
|
|
[Classification Head]
|
|
|
|
|
|
|
v
|
|
|
Relevance Score (float)
|
|
|
```
|
|
|
|
|
|
## Training Details
|
|
|
|
|
|
- **Base Model**: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking)
|
|
|
- **Fine-tuning**: Radiology query-document relevance pairs
|
|
|
- **Training Steps**: 5,626
|
|
|
- **Best Validation Loss**: 0.691
|
|
|
- **Learning Rate**: 2e-5
|
|
|
- **Batch Size**: 32
|
|
|
- **Category Weighting**: Yes (balanced across radiology subspecialties)
|
|
|
|
|
|
## Best Practices
|
|
|
|
|
|
### 1. Always Use Temperature Calibration
|
|
|
|
|
|
Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion:
|
|
|
|
|
|
```python
|
|
|
calibrated = raw_score / 1.5
|
|
|
```
|
|
|
|
|
|
### 2. Limit Candidates for Reranking
|
|
|
|
|
|
Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder:
|
|
|
|
|
|
```python
|
|
|
# Good: Rerank top 50
|
|
|
rerank_candidates = 50
|
|
|
|
|
|
# Bad: Rerank entire corpus
|
|
|
rerank_candidates = len(corpus) # Too slow!
|
|
|
```
|
|
|
|
|
|
### 3. Batch Predictions
|
|
|
|
|
|
```python
|
|
|
# Efficient: Single batch call
|
|
|
pairs = [[query, doc] for doc in candidates]
|
|
|
scores = reranker.predict(pairs, batch_size=32)
|
|
|
|
|
|
# Inefficient: Individual calls
|
|
|
scores = [reranker.predict([[query, doc]])[0] for doc in candidates]
|
|
|
```
|
|
|
|
|
|
### 4. GPU Acceleration
|
|
|
|
|
|
```python
|
|
|
reranker = CrossEncoder(
|
|
|
"matulichpt/RadLITE-Reranker",
|
|
|
max_length=512,
|
|
|
device="cuda" # Use GPU
|
|
|
)
|
|
|
```
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- **English only**: Trained on English radiology text
|
|
|
- **Speed**: ~10-50 pairs/second (use for reranking, not full corpus)
|
|
|
- **512 token limit**: Long documents are truncated
|
|
|
- **Domain-specific**: Optimized for radiology, may underperform on general medical content
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
If you use RadLITE in your work, please cite:
|
|
|
|
|
|
```bibtex
|
|
|
@software{radlite_2026,
|
|
|
title = {RadLITE: Calibrated Multi-Stage Retrieval for Radiology Education},
|
|
|
author = {Grai Team},
|
|
|
year = {2026},
|
|
|
month = {January},
|
|
|
url = {https://huggingface.co/matulichpt/RadLITE-Reranker},
|
|
|
note = {+30% MRR improvement on ACR Core Exam questions}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Related Models
|
|
|
|
|
|
- [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) - Bi-encoder for first-stage retrieval
|
|
|
- [RadBERT-RoBERTa-4m](https://huggingface.co/zzxslp/RadBERT-RoBERTa-4m) - Base radiology language model
|
|
|
|
|
|
## License
|
|
|
|
|
|
Apache 2.0 - Free for commercial and research use.
|
|
|
|