--- license: apache-2.0 language: - en tags: - cross-encoder - reranker - radiology - medical - retrieval - sentence-similarity - healthcare - clinical base_model: cross-encoder/ms-marco-MiniLM-L-12-v2 pipeline_tag: text-classification library_name: sentence-transformers datasets: - radiology-education-corpus metrics: - mrr - ndcg model-index: - name: RadLITE-Reranker results: - task: type: reranking name: Document Reranking dataset: name: RadLIT-9 (Radiology Retrieval Benchmark) type: radiology-retrieval metrics: - type: mrr value: 0.829 name: MRR (with bi-encoder) - type: mrr_improvement value: 0.303 name: MRR Improvement on ACR Core Exam (+30.3%) --- # RadLITE-Reranker **Radiology Late Interaction Transformer Enhanced - Cross-Encoder Reranker** A domain-specialized cross-encoder for reranking radiology search results. This model takes a query-document pair and predicts a relevance score, providing more accurate ranking than bi-encoder similarity alone. > **Recommended:** Use this reranker together with [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) in a two-stage pipeline for optimal performance. The bi-encoder handles fast retrieval over large corpora, then this cross-encoder reranks the top candidates for precision. This combination achieves **MRR 0.829** on radiology benchmarks (+30% on board exam questions). ## Model Description | Property | Value | |----------|-------| | **Model Type** | Cross-Encoder (Reranker) | | **Base Model** | [ms-marco-MiniLM-L-12-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2) | | **Domain** | Radiology / Medical Imaging | | **Hidden Size** | 384 | | **Max Sequence Length** | 512 tokens | | **Output** | Single relevance score | | **License** | Apache 2.0 | ### Why Use a Reranker? Bi-encoders (like RadLITE-Encoder) are fast but encode query and document independently. Cross-encoders process them together, capturing fine-grained interactions: | Approach | Speed | Accuracy | Use Case | |----------|-------|----------|----------| | Bi-Encoder | Fast (1000s docs/sec) | Good | First-stage retrieval | | Cross-Encoder | Slow (10s docs/sec) | Excellent | Reranking top candidates | **Two-stage pipeline**: Use bi-encoder to get top 50-100 candidates, then rerank with cross-encoder for best results. ## Performance ### Impact on RadLIT-9 Benchmark | Configuration | MRR | Improvement | |---------------|-----|-------------| | Bi-Encoder only | 0.78 | baseline | | **Bi-Encoder + Reranker** | **0.829** | **+6.3%** | ### ACR Core Exam (Board-Style Questions) | Dataset | With Reranker | Without | Improvement | |---------|---------------|---------|-------------| | Core Exam Chest | 0.533 | 0.409 | **+30.3%** | | Core Exam Combined | 0.466 | 0.381 | **+22.5%** | The reranker is especially valuable for complex, multi-part queries typical of board exam questions. ## Quick Start ### Installation ```bash pip install sentence-transformers>=2.2.0 ``` ### Basic Usage ```python from sentence_transformers import CrossEncoder # Load the reranker reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512) # Query and candidate documents query = "What are the imaging features of hepatocellular carcinoma?" documents = [ "HCC typically shows arterial enhancement with portal venous washout on CT.", "Fatty liver disease presents as decreased attenuation on non-contrast CT.", "Hepatic hemangiomas show peripheral nodular enhancement.", ] # Create query-document pairs pairs = [[query, doc] for doc in documents] # Get relevance scores scores = reranker.predict(pairs) # Apply temperature calibration (RECOMMENDED) calibrated_scores = scores / 1.5 print("Scores:", calibrated_scores) # Document about HCC will have highest score ``` ### Temperature Calibration **Important**: This model outputs scores with high variance. Apply temperature scaling for better fusion with other signals: ```python # Raw scores might be: [4.2, -1.5, 0.8] # After calibration: [2.8, -1.0, 0.53] TEMPERATURE = 1.5 # Recommended value def calibrated_predict(reranker, pairs): raw_scores = reranker.predict(pairs) return raw_scores / TEMPERATURE ``` ### Full Two-Stage Search Pipeline ```python from sentence_transformers import SentenceTransformer, CrossEncoder import numpy as np class RadLITESearch: def __init__(self, device="cuda"): # Stage 1: Fast bi-encoder self.encoder = SentenceTransformer( "matulichpt/RadLITE-Encoder", device=device ) # Stage 2: Precise reranker self.reranker = CrossEncoder( "matulichpt/RadLITE-Reranker", max_length=512, device=device ) self.temperature = 1.5 self.corpus_embeddings = None self.corpus = None def index_corpus(self, documents: list): """Pre-compute embeddings for your corpus.""" self.corpus = documents self.corpus_embeddings = self.encoder.encode( documents, normalize_embeddings=True, show_progress_bar=True, batch_size=32 ) def search(self, query: str, top_k: int = 10, candidates: int = 50): """Two-stage search: retrieve then rerank.""" # Stage 1: Bi-encoder retrieval query_emb = self.encoder.encode(query, normalize_embeddings=True) scores = query_emb @ self.corpus_embeddings.T top_indices = np.argsort(scores)[-candidates:][::-1] # Stage 2: Cross-encoder reranking candidate_docs = [self.corpus[i] for i in top_indices] pairs = [[query, doc] for doc in candidate_docs] rerank_scores = self.reranker.predict(pairs) / self.temperature # Sort by reranked scores sorted_indices = np.argsort(rerank_scores)[::-1] results = [] for idx in sorted_indices[:top_k]: results.append({ "document": candidate_docs[idx], "corpus_index": int(top_indices[idx]), "score": float(rerank_scores[idx]), "biencoder_score": float(scores[top_indices[idx]]) }) return results # Usage searcher = RadLITESearch() searcher.index_corpus(your_radiology_documents) results = searcher.search("pneumothorax CT findings") ``` ## Integration with Any Corpus ### Radiopaedia / Educational Content ```python import json # Load your content (e.g., Radiopaedia articles) with open("radiopaedia_articles.json") as f: articles = json.load(f) corpus = [article["content"] for article in articles] # Initialize search searcher = RadLITESearch() searcher.index_corpus(corpus) # Search results = searcher.search("classic findings of pulmonary embolism on CTPA") for r in results[:5]: print(f"Score: {r['score']:.3f}") print(f"Content: {r['document'][:200]}...") print() ``` ### Integration with Elasticsearch/OpenSearch ```python from sentence_transformers import CrossEncoder reranker = CrossEncoder("matulichpt/RadLITE-Reranker", max_length=512) def rerank_elasticsearch_results(query: str, es_results: list, top_k: int = 10): """Rerank Elasticsearch BM25 results.""" documents = [hit["_source"]["content"] for hit in es_results] pairs = [[query, doc] for doc in documents] scores = reranker.predict(pairs) / 1.5 # Temperature calibration # Combine with ES scores (optional) for i, hit in enumerate(es_results): hit["rerank_score"] = float(scores[i]) hit["combined_score"] = 0.3 * hit["_score"] + 0.7 * scores[i] # Sort by combined score reranked = sorted(es_results, key=lambda x: x["combined_score"], reverse=True) return reranked[:top_k] ``` ## Optimal Fusion Weights When combining multiple signals (bi-encoder, cross-encoder, BM25), use these weights: ```python # Optimal weights from grid search on RadLIT-9 FUSION_WEIGHTS = { "biencoder": 0.5, # RadLITE-Encoder similarity "crossencoder": 0.2, # RadLITE-Reranker (after temp calibration) "bm25": 0.3 # Lexical matching (if available) } def fused_score(bienc_score, ce_score, bm25_score=0): return ( FUSION_WEIGHTS["biencoder"] * bienc_score + FUSION_WEIGHTS["crossencoder"] * ce_score + FUSION_WEIGHTS["bm25"] * bm25_score ) ``` ## Architecture ``` [Query] + [SEP] + [Document] | v [BERT Tokenizer] | v [MiniLM Encoder] (12 layers, 384 hidden) | v [Classification Head] | v Relevance Score (float) ``` ## Training Details - **Base Model**: ms-marco-MiniLM-L-12-v2 (trained on MS MARCO passage ranking) - **Fine-tuning**: Radiology query-document relevance pairs - **Training Steps**: 5,626 - **Best Validation Loss**: 0.691 - **Learning Rate**: 2e-5 - **Batch Size**: 32 - **Category Weighting**: Yes (balanced across radiology subspecialties) ## Best Practices ### 1. Always Use Temperature Calibration Raw cross-encoder scores can be extreme. Temperature scaling (1.5) produces better fusion: ```python calibrated = raw_score / 1.5 ``` ### 2. Limit Candidates for Reranking Cross-encoders are slow. Only rerank top 50-100 candidates from bi-encoder: ```python # Good: Rerank top 50 rerank_candidates = 50 # Bad: Rerank entire corpus rerank_candidates = len(corpus) # Too slow! ``` ### 3. Batch Predictions ```python # Efficient: Single batch call pairs = [[query, doc] for doc in candidates] scores = reranker.predict(pairs, batch_size=32) # Inefficient: Individual calls scores = [reranker.predict([[query, doc]])[0] for doc in candidates] ``` ### 4. GPU Acceleration ```python reranker = CrossEncoder( "matulichpt/RadLITE-Reranker", max_length=512, device="cuda" # Use GPU ) ``` ## Limitations - **English only**: Trained on English radiology text - **Speed**: ~10-50 pairs/second (use for reranking, not full corpus) - **512 token limit**: Long documents are truncated - **Domain-specific**: Optimized for radiology, may underperform on general medical content ## Citation If you use RadLITE in your work, please cite: ```bibtex @software{radlite_2026, title = {RadLITE: Calibrated Multi-Stage Retrieval for Radiology Education}, author = {Grai Team}, year = {2026}, month = {January}, url = {https://huggingface.co/matulichpt/RadLITE-Reranker}, note = {+30% MRR improvement on ACR Core Exam questions} } ``` ## Related Models - [RadLITE-Encoder](https://huggingface.co/matulichpt/RadLITE-Encoder) - Bi-encoder for first-stage retrieval - [RadBERT-RoBERTa-4m](https://huggingface.co/zzxslp/RadBERT-RoBERTa-4m) - Base radiology language model ## License Apache 2.0 - Free for commercial and research use.