radlit-biencoder / README.md
matulichpt's picture
Fix metrics: show bi-encoder standalone performance (0.698 MRR), not full pipeline
88e135d verified
|
raw
history blame
7.74 kB
metadata
language:
  - en
license: apache-2.0
library_name: sentence-transformers
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - radiology
  - medical
  - retrieval
  - embedding
datasets:
  - custom
metrics:
  - mrr
  - recall
pipeline_tag: sentence-similarity
model-index:
  - name: radlit-biencoder
    results:
      - task:
          type: retrieval
          name: Radiology Document Retrieval
        dataset:
          type: custom
          name: RadLIT-9
          config: radlit9-v1.1-balanced
        metrics:
          - type: mrr
            value: 0.698
            name: MRR (bi-encoder only)
          - type: recall@10
            value: 0.914
            name: Recall@10
          - type: ndcg@10
            value: 0.748
            name: nDCG@10

RadLIT-BiEncoder: Radiology Document Retrieval

A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.

Model Description

RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.

Architecture

  • Base Model: RoBERTa-base architecture
  • Hidden Size: 768
  • Layers: 12
  • Attention Heads: 12
  • Parameters: ~125M
  • Max Sequence Length: 512 tokens
  • Embedding Dimension: 768

Training

The model was trained using contrastive learning with hard negative mining on radiology educational content:

  • Training Objective: Multiple Negatives Ranking Loss with hard negatives
  • Batch Size: 32
  • Learning Rate: 2e-5 with warmup
  • Training Epochs: 4

Note: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.

Performance

RadLIT-9 Benchmark (Bi-Encoder Only)

Performance when using this bi-encoder alone for retrieval:

Metric Score
MRR 0.698
nDCG@10 0.748
Recall@10 91.4%
Recall@5 86.9%
Recall@1 56.7%

Comparison with General-Purpose Models

On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):

Model MRR nDCG@10 Recall@10
GTE-large 0.843 0.873 97.1%
E5-large-v2 0.813 0.850 96.9%
BGE-large 0.792 0.836 97.4%
RadLIT-BiEncoder 0.698 0.748 91.4%

Important: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).

Full RadLITE Pipeline Performance

When combined with RadLIT-CrossEncoder and BM25 fusion:

Configuration MRR Improvement
Bi-encoder only 0.698 baseline
+ Cross-encoder reranking 0.782 +12.0%
+ BM25 fusion (RadLITE) 0.829 +18.8%

The full RadLITE pipeline achieves 0.829 MRR, competitive with the best general-purpose models while being optimized for radiology.

Subspecialty Performance (Bi-Encoder Only)

Subspecialty MRR Recall@10
Physics/Nuclear 0.790 100%
Pediatric 0.827 92%
Thoracic 0.828 94%
Cardiac 0.778 98%
Neuroradiology 0.731 88%
Gastrointestinal 0.626 98%
Breast 0.592 90%
Musculoskeletal 0.598 78%
Genitourinary 0.470 84%

Usage

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-biencoder')

# Encode queries and documents
queries = [
    "What are the imaging features of hepatocellular carcinoma on MRI?",
    "How do you differentiate glioblastoma from metastasis?"
]
documents = [
    "HCC typically shows arterial enhancement with washout on portal venous phase...",
    "GBM and metastases can be differentiated by their location and multiplicity..."
]

query_embeddings = model.encode(queries, convert_to_tensor=True)
doc_embeddings = model.encode(documents, convert_to_tensor=True)

# Compute similarity
from sentence_transformers.util import cos_sim
similarities = cos_sim(query_embeddings, doc_embeddings)
print(similarities)

For Retrieval Pipeline

from sentence_transformers import SentenceTransformer, util
import torch

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Pre-encode your document corpus
corpus = ["document 1...", "document 2...", ...]
corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)

# At query time
query = "What are the CT findings in pulmonary embolism?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find top-k similar documents
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=10)

for score, idx in zip(top_results[0], top_results[1]):
    print(f"Score: {score:.4f} - {corpus[idx][:100]}...")

Recommended: Full RadLITE Pipeline

For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:

from sentence_transformers import SentenceTransformer, CrossEncoder

# Stage 1: Bi-encoder retrieval (fast, gets candidates)
biencoder = SentenceTransformer('matulichpt/radlit-biencoder')

# Stage 2: Cross-encoder reranking (slower, more accurate)
crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')

# Retrieve candidates
query = "What are the MRI findings in anterior cruciate ligament tear?"
candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)

# Rerank with cross-encoder
pairs = [[query, doc] for doc in candidates]
scores = crossencoder.predict(pairs)

# Apply temperature calibration (recommended: T=1.5)
calibrated_scores = scores / 1.5

# Sort by calibrated scores
reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)

Intended Use

Primary Use Cases

  • First-stage candidate retrieval for radiology content
  • Medical imaging literature search
  • Radiology question-answering systems (retrieval component)

Out-of-Scope Uses

  • General web search
  • Non-medical document retrieval
  • Clinical diagnosis (this is a retrieval model, not a diagnostic tool)

Limitations

  1. Bi-encoder alone underperforms: Use with cross-encoder reranking for best results
  2. Domain Specificity: Optimized for radiology; may underperform on general content
  3. Language: English only
  4. Subspecialty Variance: Performance varies by subspecialty (0.47-0.83 MRR range)

Ethical Considerations

  • This model should not be used as a sole source for clinical decision-making
  • Retrieved documents should be reviewed by qualified medical professionals
  • The model may reflect biases present in radiology educational literature

Citation

@software{radlit_biencoder_2026,
  title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
  author = {Matulich, P.},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-biencoder},
  note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
}

Related Models

License

Apache 2.0 - Free for research and commercial use.