File size: 12,169 Bytes

---

license: apache-2.0
language:
- en
tags:
- cross-encoder
- reranker
- radiology
- medical
- retrieval
- sentence-similarity
- healthcare
- clinical
base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
pipeline_tag: text-classification
library_name: sentence-transformers
datasets:
- radiology-education-corpus
metrics:
- mrr
- ndcg
model-index:
- name: RadLITE-Reranker
  results:
  - task:
      type: reranking
      name: Document Reranking
    dataset:
      name: RadLIT-9 (Radiology Retrieval Benchmark)
      type: radiology-retrieval
    metrics:
    - type: mrr
      value: 0.829
      name: MRR (with bi-encoder)
    - type: mrr
      value: 0.533
      name: MRR on ABR Core Exam (Chest)
---


# RadLITE-Reranker

**Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker**

A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone.

> **Recommended:** Use this reranker together with [RadLITE-Encoder](https://huggingface.co/matulichpt/radlit-biencoder) in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves **MRR 0.829** on radiology retrieval benchmarks.

## Model Description

| Property | Value |
|----------|-------|
| **Model Type** | Cross-Encoder (Reranker) |
| **Base Model** | [ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) |
| **Domain** | Radiology / Medical Imaging |
| **Hidden Size** | 384 |
| **Max Sequence Length** | 512 tokens |
| **Output** | Single relevance score |
| **License** | Apache 2.0 |

### Why Use a Reranker?

Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions:

| Approach | Speed | Accuracy | Use Case |
|----------|-------|----------|----------|
| Bi-Encoder | Fast (1000s docs/sec) | Good | First-stage retrieval |
| Cross-Encoder | Slow (10s docs/sec) | Excellent | Reranking top candidates |

**Two-stage pipeline**: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results.

## Performance

### Impact on RadLIT-9 Benchmark

| Configuration | MRR | Improvement |
|---------------|-----|-------------|
| Bi-Encoder only | 0.78 | baseline |
| **Bi-Encoder + Reranker** | **0.829** | **+6.3%** |

### ABR Core Exam (Board-Style Questions)

Comparing two-stage pipeline (bi-encoder + reranker) vs bi-encoder alone:

| Dataset | Two-Stage MRR | Bi-Encoder Only | Improvement |
|---------|---------------|-----------------|-------------|
| Core Exam Chest | 0.533 | 0.409 | +30.3% |
| Core Exam Combined | 0.466 | 0.381 | +22.5% |

The reranker provides significant gains on complex, multi-part queries typical of board exam questions.

### Published Benchmark Results

From [Matulich & Mason, 2026](https://huggingface.co/matulichpt/radlit-biencoder):

| Benchmark | RadLIT Result | Key Finding |
|-----------|---------------|-------------|
| NFCorpus nDCG@10 | 0.268 | **17.9x improvement** over RadBERT bi-encoder (0.015) |
| VQA-RAD MRR | 0.972 | Near-perfect retrieval on radiology Q&A |
| RadLIT-9 Thoracic | 0.736 nDCG@10 | **Best-in-class** (beat BGE-large, ColBERTv2) |
| RadLIT-9 Pediatric | 0.625 nDCG@10 | **Best-in-class** (beat BGE-large, ColBERTv2) |
| Zebra Test | 92% found rate | 2.1x improvement on rare conditions vs ColBERTv2 |

**Vocabulary Alignment Hypothesis**: Domain training provides measurable advantage when queries use radiology-specific terminology that aligns with the training domain.

## Quick Start

### Installation

```bash

pip install sentence-transformers>=2.2.0

```

### Basic Usage

```python

from sentence_transformers import CrossEncoder



# Load the reranker

reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)



# Query and candidate documents

query = "What are the imaging features of hepatocellular carcinoma?"

documents = [

    "HCC typically shows arterial enhancement with portal venous washout on CT.",

    "Fatty liver disease presents as decreased attenuation on non-contrast CT.",

    "Hepatic hemangiomas show peripheral nodular enhancement.",

]



# Create query-document pairs

pairs = [[query, doc] for doc in documents]



# Get relevance scores

scores = reranker.predict(pairs)



# Apply temperature calibration (RECOMMENDED)

calibrated_scores = scores / 1.5



print("Scores:", calibrated_scores)

# Document about HCC will have highest score

```

### Temperature Calibration

**Important**: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals:

```python

# Raw scores might be: [4.2, -1.5, 0.8]

# After calibration:   [2.8, -1.0, 0.53]



TEMPERATURE = 1.5  # Recommended value



def calibrated_predict(reranker, pairs):

    raw_scores = reranker.predict(pairs)

    return raw_scores / TEMPERATURE

```

### Full Two-Stage Search Pipeline

```python

from sentence_transformers import SentenceTransformer, CrossEncoder

import numpy as np



class RadLITESearch:

    def __init__(self, device="cuda"):

        # Stage 1: Fast bi-encoder

        self.encoder = SentenceTransformer(

            "matulichpt/radlit-biencoder",

            device=device

        )

        # Stage 2: Precise reranker

        self.reranker = CrossEncoder(

            "matulichpt/radlit-crossencoder",

            max_length=512,

            device=device

        )

        self.temperature = 1.5

        self.corpus_embeddings = None

        self.corpus = None



    def index_corpus(self, documents: list):

        """Pre-compute embeddings for your corpus."""

        self.corpus = documents

        self.corpus_embeddings = self.encoder.encode(

            documents,

            normalize_embeddings=True,

            show_progress_bar=True,

            batch_size=32

        )



    def search(self, query: str, top_k: int = 10, candidates: int = 50):

        """Two-stage search: retrieve then rerank."""



        # Stage 1: Bi-encoder retrieval

        query_emb = self.encoder.encode(query, normalize_embeddings=True)

        scores = query_emb @ self.corpus_embeddings.T

        top_indices = np.argsort(scores)[-candidates:][::-1]



        # Stage 2: Cross-encoder reranking

        candidate_docs = [self.corpus[i] for i in top_indices]

        pairs = [[query, doc] for doc in candidate_docs]

        rerank_scores = self.reranker.predict(pairs) / self.temperature



        # Sort by reranked scores

        sorted_indices = np.argsort(rerank_scores)[::-1]



        results = []

        for idx in sorted_indices[:top_k]:

            results.append({

                "document": candidate_docs[idx],

                "corpus_index": int(top_indices[idx]),

                "score": float(rerank_scores[idx]),

                "biencoder_score": float(scores[top_indices[idx]])

            })

        return results





# Usage

searcher = RadLITESearch()

searcher.index_corpus(your_radiology_documents)

results = searcher.search("pneumothorax CT findings")

```

## Integration with Any Corpus

### Radiopaedia / Educational Content

```python

import json



# Load your content (e.g., Radiopaedia articles)

with open("radiopaedia_articles.json") as f:

    articles = json.load(f)



corpus = [article["content"] for article in articles]



# Initialize search

searcher = RadLITESearch()

searcher.index_corpus(corpus)



# Search

results = searcher.search("classic findings of pulmonary embolism on CTPA")



for r in results[:5]:

    print(f"Score: {r['score']:.3f}")

    print(f"Content: {r['document'][:200]}...")

    print()

```

### Integration with Elasticsearch/OpenSearch

```python

from sentence_transformers import CrossEncoder



reranker = CrossEncoder("matulichpt/radlit-crossencoder", max_length=512)



def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10):

    """Rerank Elasticsearch BM25 results."""

    documents = [hit["_source"]["content"] for hit in es_results]

    pairs = [[query, doc] for doc in documents]



    scores = reranker.predict(pairs) / 1.5  # Temperature calibration



    # Combine with ES scores (optional)

    for i, hit in enumerate(es_results):

        hit["rerank_score"] = float(scores[i])

        hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i]



    # Sort by combined score

    reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True)

    return reranked[:top_k]

```

## Optimal Fusion Weights

When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights:

```python

# Optimal weights from grid search on RadLIT-9

FUSION_WEIGHTS = {

    "biencoder": 0.5,    # RadLITE-Encoder similarity

    "crossencoder": 0.2, # RadLITE-Reranker (after temp calibration)

    "bm25": 0.3          # Lexical matching (if available)

}



def fused_score(bienc_score, ce_score, bm25_score=0):

    return (

        FUSION_WEIGHTS["biencoder"] * bienc_score +

        FUSION_WEIGHTS["crossencoder"] * ce_score +

        FUSION_WEIGHTS["bm25"] * bm25_score

    )

```

## Architecture

```

[Query] + [SEP] + [Document]

           |

           v

    [BERT Tokenizer]

           |

           v

    [MiniLM Encoder] (12 layers, 384 hidden)

           |

           v

    [Classification Head]

           |

           v

    Relevance Score (float)

```

## Training Details

- **Base Model**: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking)
- **Fine-tuning**: Radiology query-document relevance pairs
- **Training Steps**: 5,626
- **Best Validation Loss**: 0.691
- **Learning Rate**: 2e-5
- **Batch Size**: 32
- **Category Weighting**: Yes (balanced across radiology subspecialties)

## Best Practices

### 1. Always Use Temperature Calibration

Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion:

```python

calibrated = raw_score / 1.5

```

### 2. Limit Candidates for Reranking

Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder:

```python

# Good: Rerank top 50

rerank_candidates = 50



# Bad: Rerank entire corpus

rerank_candidates = len(corpus)  # Too slow!

```

### 3. Batch Predictions

```python

# Efficient: Single batch call

pairs = [[query, doc] for doc in candidates]

scores = reranker.predict(pairs, batch_size=32)



# Inefficient: Individual calls

scores = [reranker.predict([[query, doc]])[0] for doc in candidates]

```

### 4. GPU Acceleration

```python

reranker = CrossEncoder(

    "matulichpt/radlit-crossencoder",

    max_length=512,

    device="cuda"  # Use GPU

)

```

## Limitations

- **English only**: Trained on English radiology text
- **Speed**: ~10-50 pairs/second (use for reranking, not full corpus)
- **512 token limit**: Long documents are truncated
- **Domain-specific**: Optimized for radiology, may underperform on general medical content

## Citation

If you use RadLITE in your work, please cite:

```bibtex

@article{matulich2026radlit,

    title = {Late Interaction Retrieval Unlocks Domain Knowledge in Radiology Language Models},

    author = {Matulich, Patrick and Mason, Dan},

    year = {2026},

    journal = {Radiology: Artificial Intelligence},

    note = {17.9x improvement over RadBERT; best-in-class on Thoracic/Pediatric subspecialties},

    url = {https://huggingface.co/matulichpt/radlit-biencoder}

}

```

## Related Models

- [RadLITE-Encoder](https://huggingface.co/matulichpt/radlit-biencoder) - Bi-encoder for first-stage retrieval
- [RadBERT-RoBERTa-4m](https://huggingface.co/zzxslp/RadBERT-RoBERTa-4m) - Base radiology language model

## License

Apache 2.0 - Free for commercial and research use.