radlit-colbert / README.md

Initial model upload with benchmarks

255f954 verified about 1 month ago

6.95 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: sentence-transformers
	tags:
	- sentence-transformers
	- feature-extraction
	- sentence-similarity
	- radiology
	- medical
	- retrieval
	- colbert
	- late-interaction
	datasets:
	- custom
	metrics:
	- mrr
	- recall
	pipeline_tag: sentence-similarity
	model-index:
	- name: radlit-colbert
	results:
	- task:
	type: retrieval
	name: Radiology Document Retrieval
	dataset:
	type: custom
	name: RadLIT-9
	config: radlit9-v1.1-balanced
	metrics:
	- type: mrr
	value: 0.750
	name: MRR
	- type: recall@10
	value: 0.943
	name: Recall@10
	- type: ndcg@10
	value: 0.794
	name: nDCG@10
	---

	# RadLIT-ColBERT: Radiology Late Interaction Transformer

	A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings.

	## Model Description

	RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring).

	### Why Late Interaction?

	Late interaction models offer advantages for medical terminology:
	- Precise term matching: Each query token finds its best-matching document token
	- Better handling of multi-word concepts: "hepatocellular carcinoma" tokens can independently match
	- Implicit term weighting: Important query terms contribute more to the final score

	### Architecture

	- Base Model: RoBERTa-base with ColBERT adapter
	- Hidden Size: 768
	- Output Dimension: 128 (compressed for efficiency)
	- Layers: 12
	- Attention Heads: 12
	- Parameters: ~125M
	- Max Sequence Length: 512 tokens

	### Training

	The model was trained using the ColBERT framework with radiology-specific data:

	- Training Objective: InfoNCE with in-batch negatives + hard negatives
	- Hard Negative Mining: Top-100 BM25 negatives per query
	- Training Epochs: 4
	- Batch Size: 32

	Note: Training data sources are not disclosed due to variable licensing.

	## Performance

	### RadLIT-9 Benchmark

	\| Metric \| Score \|
	\|--------\|-------\|
	\| MRR \| 0.750 \|
	\| nDCG@10 \| 0.794 \|
	\| Recall@10 \| 94.3% \|
	\| Recall@5 \| 89.0% \|
	\| Recall@1 \| 64.5% \|
	\| Latency \| ~5ms \|

	### Subspecialty Performance

	\| Subspecialty \| MRR \| Recall@10 \|
	\|--------------\|-----\|-----------\|
	\| Thoracic \| 0.958 \| 98% \|
	\| Pediatric \| 0.882 \| 100% \|
	\| Cardiac \| 0.754 \| 98% \|
	\| Breast \| 0.740 \| 100% \|
	\| Neuroradiology \| 0.729 \| 90% \|
	\| MSK \| 0.706 \| 87% \|
	\| Physics \| 0.699 \| 93% \|
	\| GI \| 0.686 \| 94% \|
	\| GU \| 0.578 \| 90% \|

	### Comparison with Other Approaches

	\| Model \| MRR \| Latency \|
	\|-------\|-----\|---------\|
	\| RadLIT-ColBERT \| 0.750 \| 5ms \|
	\| RadLIT-BiEncoder \| 0.703 \| 5ms \|
	\| BM25 \| ~0.55 \| <1ms \|

	## Usage

	### Installation

	```bash
	pip install sentence-transformers colbert-ai
	```

	### Basic Usage with Sentence Transformers

	```python
	from sentence_transformers import SentenceTransformer

	# Load model
	model = SentenceTransformer('matulichpt/radlit-colbert')

	# Encode queries and documents
	query = "What are the imaging features of hepatocellular carcinoma on MRI?"
	documents = [
	"HCC typically shows arterial enhancement with washout...",
	"Breast cancer staging involves mammography and MRI..."
	]

	# Get embeddings (token-level for ColBERT)
	query_emb = model.encode(query, convert_to_tensor=True)
	doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents]

	# For ColBERT MaxSim, you need to compute token-level similarities
	# See ColBERT documentation for proper MaxSim implementation
	```

	### Late Interaction Scoring (MaxSim)

	```python
	import torch

	def maxsim_score(query_emb, doc_emb):
	"""
	Compute MaxSim score between query and document embeddings.

	For each query token, find the maximum similarity with any document token,
	then sum these maximum similarities.
	"""
	# query_emb: [num_query_tokens, dim]
	# doc_emb: [num_doc_tokens, dim]

	# Compute all pairwise similarities
	similarities = torch.matmul(query_emb, doc_emb.T) # [q_tokens, d_tokens]

	# For each query token, take max similarity across all doc tokens
	max_sims = similarities.max(dim=1).values # [q_tokens]

	# Sum all max similarities
	return max_sims.sum().item()

	# Usage
	query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings')
	doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings')
	score = maxsim_score(query_emb, doc_emb)
	```

	### Integration with RadLITE Pipeline

	RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline:

	```
	Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results
	```

	For best results, use the full RadLITE pipeline:
	- [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval alternative
	- [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking stage

	## Evolution: RadLIT to RadLITE

	\| Version \| Model \| MRR \| Innovation \|
	\|---------\|-------\|-----\|------------\|
	\| v1.0 \| RadLIT-ColBERT (this model) \| 0.750 \| Late interaction \|
	\| v1.5 \| RadLITx \| 0.782 \| + Cross-encoder fusion \|
	\| v2.0 \| RadLITE \| 0.829 \| + Calibrated fusion \|

	## Intended Use

	### Primary Use Cases

	- Fast first-stage radiology retrieval
	- Educational content search
	- Medical imaging literature retrieval

	### Out-of-Scope Uses

	- Non-radiology content retrieval
	- Clinical diagnosis
	- Final relevance scoring (use CrossEncoder for that)

	## Limitations

	1. Subspecialty variance: Performance varies from 0.58 (GU) to 0.96 (Thoracic)
	2. Domain specificity: Optimized for radiology; limited generalization
	3. Late interaction overhead: Token-level storage increases index size

	## Ethical Considerations

	- Not a diagnostic tool
	- Should be used to surface relevant educational content
	- May reflect biases in radiology literature

	## Citation

	```bibtex
	@software{radlit_colbert_2026,
	title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval},
	author = {Grai Team},
	year = {2026},
	url = {https://huggingface.co/matulichpt/radlit-colbert},
	note = {MRR 0.750 on RadLIT-9 benchmark}
	}
	```

	## Related Models

	- [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - Dense retrieval (RadLITE v2.0)
	- [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Reranking

	## License

	Apache 2.0 - Free for research and commercial use.