---
language:
- en
license: apache-2.0
library_name: sentence-transformers
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- radiology
- medical
- retrieval
- colbert
- late-interaction
datasets:
- custom
metrics:
- mrr
- recall
pipeline_tag: sentence-similarity
model-index:
- name: radlit-colbert
  results:
  - task:
      type: retrieval
      name: Radiology Document Retrieval
    dataset:
      type: custom
      name: RadLIT-9
      config: radlit9-v1.1-balanced
    metrics:
    - type: mrr
      value: 0.750
      name: MRR
    - type: recall@10
      value: 0.943
      name: Recall@10
    - type: ndcg@10
      value: 0.794
      name: nDCG@10
---

# RadLIT-ColBERT: Radiology Late Interaction Transformer

A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings.

## Model Description

RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring).

### Why Late Interaction?

Late interaction models offer advantages for medical terminology:
- **Precise term matching**: Each query token finds its best-matching document token
- **Better handling of multi-word concepts**: "hepatocellular carcinoma" tokens can independently match
- **Implicit term weighting**: Important query terms contribute more to the final score

### Architecture

- **Base Model**: RoBERTa-base with ColBERT adapter
- **Hidden Size**: 768
- **Output Dimension**: 128 (compressed for efficiency)
- **Layers**: 12
- **Attention Heads**: 12
- **Parameters**: ~125M
- **Max Sequence Length**: 512 tokens

### Training

The model was trained using the ColBERT framework with radiology-specific data:

- **Training Objective**: InfoNCE with in-batch negatives + hard negatives
- **Hard Negative Mining**: Top-100 BM25 negatives per query
- **Training Epochs**: 4
- **Batch Size**: 32

**Note**: Training data sources are not disclosed due to variable licensing.

## Performance

### RadLIT-9 Benchmark

| Metric | Score |
|--------|-------|
| **MRR** | 0.750 |
| **nDCG@10** | 0.794 |
| **Recall@10** | 94.3% |
| **Recall@5** | 89.0% |
| **Recall@1** | 64.5% |
| **Latency** | ~5ms |

### Subspecialty Performance

| Subspecialty | MRR | Recall@10 |
|--------------|-----|-----------|
| Thoracic | **0.958** | 98% |
| Pediatric | 0.882 | 100% |
| Cardiac | 0.754 | 98% |
| Breast | 0.740 | 100% |
| Neuroradiology | 0.729 | 90% |
| MSK | 0.706 | 87% |
| Physics | 0.699 | 93% |
| GI | 0.686 | 94% |
| GU | 0.578 | 90% |

### Comparison with Other Approaches

| Model | MRR | Latency |
|-------|-----|---------|
| **RadLIT-ColBERT** | 0.750 | 5ms |
| RadLIT-BiEncoder | 0.703 | 5ms |
| BM25 | ~0.55 | <1ms |

## Usage

### Installation

```bash
pip install sentence-transformers colbert-ai
```

### Basic Usage with Sentence Transformers

```python
from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-colbert')

# Encode queries and documents
query = "What are the imaging features of hepatocellular carcinoma on MRI?"
documents = [
    "HCC typically shows arterial enhancement with washout...",
    "Breast cancer staging involves mammography and MRI..."
]

# Get embeddings (token-level for ColBERT)
query_emb = model.encode(query, convert_to_tensor=True)
doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents]

# For ColBERT MaxSim, you need to compute token-level similarities
# See ColBERT documentation for proper MaxSim implementation
```

### Late Interaction Scoring (MaxSim)

```python
import torch

def maxsim_score(query_emb, doc_emb):
    """
    Compute MaxSim score between query and document embeddings.

    For each query token, find the maximum similarity with any document token,
    then sum these maximum similarities.
    """
    # query_emb: [num_query_tokens, dim]
    # doc_emb: [num_doc_tokens, dim]

    # Compute all pairwise similarities
    similarities = torch.matmul(query_emb, doc_emb.T)  # [q_tokens, d_tokens]

    # For each query token, take max similarity across all doc tokens
    max_sims = similarities.max(dim=1).values  # [q_tokens]

    # Sum all max similarities
    return max_sims.sum().item()

# Usage
query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings')
doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings')
score = maxsim_score(query_emb, doc_emb)
```

### Integration with RadLITE Pipeline

RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline:

```
Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results
```

For best results, use the full RadLITE pipeline:
- [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval alternative
- [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking stage

## Evolution: RadLIT to RadLITE

| Version | Model | MRR | Innovation |
|---------|-------|-----|------------|
| v1.0 | **RadLIT-ColBERT** (this model) | 0.750 | Late interaction |
| v1.5 | RadLITx | 0.782 | + Cross-encoder fusion |
| v2.0 | RadLITE | **0.829** | + Calibrated fusion |

## Intended Use

### Primary Use Cases

- Fast first-stage radiology retrieval
- Educational content search
- Medical imaging literature retrieval

### Out-of-Scope Uses

- Non-radiology content retrieval
- Clinical diagnosis
- Final relevance scoring (use CrossEncoder for that)

## Limitations

1. **Subspecialty variance**: Performance varies from 0.58 (GU) to 0.96 (Thoracic)
2. **Domain specificity**: Optimized for radiology; limited generalization
3. **Late interaction overhead**: Token-level storage increases index size

## Ethical Considerations

- Not a diagnostic tool
- Should be used to surface relevant educational content
- May reflect biases in radiology literature

## Citation

```bibtex
@software{radlit_colbert_2026,
  title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval},
  author = {Grai Team},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-colbert},
  note = {MRR 0.750 on RadLIT-9 benchmark}
}
```

## Related Models

- [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval (RadLITE v2.0)
- [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking

## License

Apache 2.0 - Free for research and commercial use.