--- language: - en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - radiology - medical - retrieval - colbert - late-interaction datasets: - custom metrics: - mrr - recall pipeline_tag: sentence-similarity model-index: - name: radlit-colbert results: - task: type: retrieval name: Radiology Document Retrieval dataset: type: custom name: RadLIT-9 config: radlit9-v1.1-balanced metrics: - type: mrr value: 0.750 name: MRR - type: recall@10 value: 0.943 name: Recall@10 - type: ndcg@10 value: 0.794 name: nDCG@10 --- # RadLIT-ColBERT: Radiology Late Interaction Transformer A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings. ## Model Description RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring). ### Why Late Interaction? Late interaction models offer advantages for medical terminology: - **Precise term matching**: Each query token finds its best-matching document token - **Better handling of multi-word concepts**: "hepatocellular carcinoma" tokens can independently match - **Implicit term weighting**: Important query terms contribute more to the final score ### Architecture - **Base Model**: RoBERTa-base with ColBERT adapter - **Hidden Size**: 768 - **Output Dimension**: 128 (compressed for efficiency) - **Layers**: 12 - **Attention Heads**: 12 - **Parameters**: ~125M - **Max Sequence Length**: 512 tokens ### Training The model was trained using the ColBERT framework with radiology-specific data: - **Training Objective**: InfoNCE with in-batch negatives + hard negatives - **Hard Negative Mining**: Top-100 BM25 negatives per query - **Training Epochs**: 4 - **Batch Size**: 32 **Note**: Training data sources are not disclosed due to variable licensing. ## Performance ### RadLIT-9 Benchmark | Metric | Score | |--------|-------| | **MRR** | 0.750 | | **nDCG@10** | 0.794 | | **Recall@10** | 94.3% | | **Recall@5** | 89.0% | | **Recall@1** | 64.5% | | **Latency** | ~5ms | ### Subspecialty Performance | Subspecialty | MRR | Recall@10 | |--------------|-----|-----------| | Thoracic | **0.958** | 98% | | Pediatric | 0.882 | 100% | | Cardiac | 0.754 | 98% | | Breast | 0.740 | 100% | | Neuroradiology | 0.729 | 90% | | MSK | 0.706 | 87% | | Physics | 0.699 | 93% | | GI | 0.686 | 94% | | GU | 0.578 | 90% | ### Comparison with Other Approaches | Model | MRR | Latency | |-------|-----|---------| | **RadLIT-ColBERT** | 0.750 | 5ms | | RadLIT-BiEncoder | 0.703 | 5ms | | BM25 | ~0.55 | <1ms | ## Usage ### Installation ```bash pip install sentence-transformers colbert-ai ``` ### Basic Usage with Sentence Transformers ```python from sentence_transformers import SentenceTransformer # Load model model = SentenceTransformer('matulichpt/radlit-colbert') # Encode queries and documents query = "What are the imaging features of hepatocellular carcinoma on MRI?" documents = [ "HCC typically shows arterial enhancement with washout...", "Breast cancer staging involves mammography and MRI..." ] # Get embeddings (token-level for ColBERT) query_emb = model.encode(query, convert_to_tensor=True) doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents] # For ColBERT MaxSim, you need to compute token-level similarities # See ColBERT documentation for proper MaxSim implementation ``` ### Late Interaction Scoring (MaxSim) ```python import torch def maxsim_score(query_emb, doc_emb): """ Compute MaxSim score between query and document embeddings. For each query token, find the maximum similarity with any document token, then sum these maximum similarities. """ # query_emb: [num_query_tokens, dim] # doc_emb: [num_doc_tokens, dim] # Compute all pairwise similarities similarities = torch.matmul(query_emb, doc_emb.T) # [q_tokens, d_tokens] # For each query token, take max similarity across all doc tokens max_sims = similarities.max(dim=1).values # [q_tokens] # Sum all max similarities return max_sims.sum().item() # Usage query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings') doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings') score = maxsim_score(query_emb, doc_emb) ``` ### Integration with RadLITE Pipeline RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline: ``` Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results ``` For best results, use the full RadLITE pipeline: - [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval alternative - [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking stage ## Evolution: RadLIT to RadLITE | Version | Model | MRR | Innovation | |---------|-------|-----|------------| | v1.0 | **RadLIT-ColBERT** (this model) | 0.750 | Late interaction | | v1.5 | RadLITx | 0.782 | + Cross-encoder fusion | | v2.0 | RadLITE | **0.829** | + Calibrated fusion | ## Intended Use ### Primary Use Cases - Fast first-stage radiology retrieval - Educational content search - Medical imaging literature retrieval ### Out-of-Scope Uses - Non-radiology content retrieval - Clinical diagnosis - Final relevance scoring (use CrossEncoder for that) ## Limitations 1. **Subspecialty variance**: Performance varies from 0.58 (GU) to 0.96 (Thoracic) 2. **Domain specificity**: Optimized for radiology; limited generalization 3. **Late interaction overhead**: Token-level storage increases index size ## Ethical Considerations - Not a diagnostic tool - Should be used to surface relevant educational content - May reflect biases in radiology literature ## Citation ```bibtex @software{radlit_colbert_2026, title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval}, author = {Grai Team}, year = {2026}, url = {https://huggingface.co/matulichpt/radlit-colbert}, note = {MRR 0.750 on RadLIT-9 benchmark} } ``` ## Related Models - [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval (RadLITE v2.0) - [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking ## License Apache 2.0 - Free for research and commercial use.