RadLITE-Reranker / README.md

Upload folder using huggingface_hub

2046a6b verified 3 days ago

11.2 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- cross-encoder
	- reranker
	- radiology
	- medical
	- retrieval
	- sentence-similarity
	- healthcare
	- clinical
	base_model: cross-encoder/ms-marco-MiniLM-L-12-v2
	pipeline_tag: text-classification
	library_name: sentence-transformers
	datasets:
	- radiology-education-corpus
	metrics:
	- mrr
	- ndcg
	model-index:
	- name: RadLITE-Reranker
	results:
	- task:
	type: reranking
	name: Document Reranking
	dataset:
	name: RadLIT-9 (Radiology Retrieval Benchmark)
	type: radiology-retrieval
	metrics:
	- type: mrr
	value: 0.829
	name: MRR (with bi-encoder)
	- type: mrr_improvement
	value: 0.303
	name: MRR Improvement on ACR Core Exam (+30.3%)
	---

	# RadLITE-Reranker

	Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker

	A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone.

	> Recommended: Use this reranker together with [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves MRR 0.829 on radiology benchmarks (+30% on board exam questions).

	## Model Description

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Type \| Cross-Encoder (Reranker) \|
	\| Base Model \| [ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) \|
	\| Domain \| Radiology / Medical Imaging \|
	\| Hidden Size \| 384 \|
	\| Max Sequence Length \| 512 tokens \|
	\| Output \| Single relevance score \|
	\| License \| Apache 2.0 \|

	### Why Use a Reranker?

	Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions:

	\| Approach \| Speed \| Accuracy \| Use Case \|
	\|----------\|-------\|----------\|----------\|
	\| Bi-Encoder \| Fast (1000s docs/sec) \| Good \| First-stage retrieval \|
	\| Cross-Encoder \| Slow (10s docs/sec) \| Excellent \| Reranking top candidates \|

	Two-stage pipeline: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results.

	## Performance

	### Impact on RadLIT-9 Benchmark

	\| Configuration \| MRR \| Improvement \|
	\|---------------\|-----\|-------------\|
	\| Bi-Encoder only \| 0.78 \| baseline \|
	\| Bi-Encoder + Reranker \| 0.829 \| +6.3% \|

	### ACR Core Exam (Board-Style Questions)

	\| Dataset \| With Reranker \| Without \| Improvement \|
	\|---------\|---------------\|---------\|-------------\|
	\| Core Exam Chest \| 0.533 \| 0.409 \| +30.3% \|
	\| Core Exam Combined \| 0.466 \| 0.381 \| +22.5% \|

	The reranker is especially valuable for complex, multi-part queries typical of board exam questions.

	## Quick Start

	### Installation

	```bash
	pip install sentence-transformers>=2.2.0
	```

	### Basic Usage

	```python
	from sentence_transformers import CrossEncoder

	# Load the reranker
	reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512)

	# Query and candidate documents
	query = "What are the imaging features of hepatocellular carcinoma?"
	documents = [
	"HCC typically shows arterial enhancement with portal venous washout on CT.",
	"Fatty liver disease presents as decreased attenuation on non-contrast CT.",
	"Hepatic hemangiomas show peripheral nodular enhancement.",
	]

	# Create query-document pairs
	pairs = [[query, doc] for doc in documents]

	# Get relevance scores
	scores = reranker.predict(pairs)

	# Apply temperature calibration (RECOMMENDED)
	calibrated_scores = scores / 1.5

	print("Scores:", calibrated_scores)
	# Document about HCC will have highest score
	```

	### Temperature Calibration

	Important: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals:

	```python
	# Raw scores might be: [4.2, -1.5, 0.8]
	# After calibration: [2.8, -1.0, 0.53]

	TEMPERATURE = 1.5 # Recommended value

	def calibrated_predict(reranker, pairs):
	raw_scores = reranker.predict(pairs)
	return raw_scores / TEMPERATURE
	```

	### Full Two-Stage Search Pipeline

	```python
	from sentence_transformers import SentenceTransformer, CrossEncoder
	import numpy as np

	class RadLITESearch:
	def __init__(self, device="cuda"):
	# Stage 1: Fast bi-encoder
	self.encoder = SentenceTransformer(
	"matulichpt/RadLITE-Encoder",
	device=device
	)
	# Stage 2: Precise reranker
	self.reranker = CrossEncoder(
	"matulichpt/RadLITE-Reranker",
	max_length=512,
	device=device
	)
	self.temperature = 1.5
	self.corpus_embeddings = None
	self.corpus = None

	def index_corpus(self, documents: list):
	"""Pre-compute embeddings for your corpus."""
	self.corpus = documents
	self.corpus_embeddings = self.encoder.encode(
	documents,
	normalize_embeddings=True,
	show_progress_bar=True,
	batch_size=32
	)

	def search(self, query: str, top_k: int = 10, candidates: int = 50):
	"""Two-stage search: retrieve then rerank."""

	# Stage 1: Bi-encoder retrieval
	query_emb = self.encoder.encode(query, normalize_embeddings=True)
	scores = query_emb @ self.corpus_embeddings.T
	top_indices = np.argsort(scores)[-candidates:][::-1]

	# Stage 2: Cross-encoder reranking
	candidate_docs = [self.corpus[i] for i in top_indices]
	pairs = [[query, doc] for doc in candidate_docs]
	rerank_scores = self.reranker.predict(pairs) / self.temperature

	# Sort by reranked scores
	sorted_indices = np.argsort(rerank_scores)[::-1]

	results = []
	for idx in sorted_indices[:top_k]:
	results.append({
	"document": candidate_docs[idx],
	"corpus_index": int(top_indices[idx]),
	"score": float(rerank_scores[idx]),
	"biencoder_score": float(scores[top_indices[idx]])
	})
	return results


	# Usage
	searcher = RadLITESearch()
	searcher.index_corpus(your_radiology_documents)
	results = searcher.search("pneumothorax CT findings")
	```

	## Integration with Any Corpus

	### Radiopaedia / Educational Content

	```python
	import json

	# Load your content (e.g., Radiopaedia articles)
	with open("radiopaedia_articles.json") as f:
	articles = json.load(f)

	corpus = [article["content"] for article in articles]

	# Initialize search
	searcher = RadLITESearch()
	searcher.index_corpus(corpus)

	# Search
	results = searcher.search("classic findings of pulmonary embolism on CTPA")

	for r in results[:5]:
	print(f"Score: {r['score']:.3f}")
	print(f"Content: {r['document'][:200]}...")
	print()
	```

	### Integration with Elasticsearch/OpenSearch

	```python
	from sentence_transformers import CrossEncoder

	reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512)

	def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10):
	"""Rerank Elasticsearch BM25 results."""
	documents = [hit["_source"]["content"] for hit in es_results]
	pairs = [[query, doc] for doc in documents]

	scores = reranker.predict(pairs) / 1.5 # Temperature calibration

	# Combine with ES scores (optional)
	for i, hit in enumerate(es_results):
	hit["rerank_score"] = float(scores[i])
	hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i]

	# Sort by combined score
	reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True)
	return reranked[:top_k]
	```

	## Optimal Fusion Weights

	When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights:

	```python
	# Optimal weights from grid search on RadLIT-9
	FUSION_WEIGHTS = {
	"biencoder": 0.5, # RadLITE-Encoder similarity
	"crossencoder": 0.2, # RadLITE-Reranker (after temp calibration)
	"bm25": 0.3 # Lexical matching (if available)
	}

	def fused_score(bienc_score, ce_score, bm25_score=0):
	return (
	FUSION_WEIGHTS["biencoder"] * bienc_score +
	FUSION_WEIGHTS["crossencoder"] * ce_score +
	FUSION_WEIGHTS["bm25"] * bm25_score
	)
	```

	## Architecture

	```
	[Query] + [SEP] + [Document]
	\|
	v
	[BERT Tokenizer]
	\|
	v
	[MiniLM Encoder] (12 layers, 384 hidden)
	\|
	v
	[Classification Head]
	\|
	v
	Relevance Score (float)
	```

	## Training Details

	- Base Model: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking)
	- Fine-tuning: Radiology query-document relevance pairs
	- Training Steps: 5,626
	- Best Validation Loss: 0.691
	- Learning Rate: 2e-5
	- Batch Size: 32
	- Category Weighting: Yes (balanced across radiology subspecialties)

	## Best Practices

	### 1. Always Use Temperature Calibration

	Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion:

	```python
	calibrated = raw_score / 1.5
	```

	### 2. Limit Candidates for Reranking

	Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder:

	```python
	# Good: Rerank top 50
	rerank_candidates = 50

	# Bad: Rerank entire corpus
	rerank_candidates = len(corpus) # Too slow!
	```

	### 3. Batch Predictions

	```python
	# Efficient: Single batch call
	pairs = [[query, doc] for doc in candidates]
	scores = reranker.predict(pairs, batch_size=32)

	# Inefficient: Individual calls
	scores = [reranker.predict([[query, doc]])[0] for doc in candidates]
	```

	### 4. GPU Acceleration

	```python
	reranker = CrossEncoder(
	"matulichpt/RadLITE-Reranker",
	max_length=512,
	device="cuda" # Use GPU
	)
	```

	## Limitations

	- English only: Trained on English radiology text
	- Speed: ~10-50 pairs/second (use for reranking, not full corpus)
	- 512 token limit: Long documents are truncated
	- Domain-specific: Optimized for radiology, may underperform on general medical content

	## Citation

	If you use RadLITE in your work, please cite:

	```bibtex
	@software{radlite_2026,
	title = {RadLITE: Calibrated Multi-Stage Retrieval for Radiology Education},
	author = {Grai Team},
	year = {2026},
	month = {January},
	url = {https://huggingface.co/matulichpt/RadLITE-Reranker},
	note = {+30% MRR improvement on ACR Core Exam questions}
	}
	```

	## Related Models

	- [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) - Bi-encoder for first-stage retrieval
	- [RadBERT-RoBERTa-4m](https://huggingface.co/zzxslp/RadBERT-RoBERTa-4m) - Base radiology language model

	## License

	Apache 2.0 - Free for commercial and research use.