Fix metrics: show bi-encoder standalone performance (0.698 MRR), not full pipeline

88e135d verified 4 months ago

7.74 kB

language:
  - en
license: apache-2.0
library_name: sentence-transformers
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - radiology
  - medical
  - retrieval
  - embedding
datasets:
  - custom
metrics:
  - mrr
  - recall
pipeline_tag: sentence-similarity
model-index:
  - name: radlit-biencoder
    results:
      - task:
          type: retrieval
          name: Radiology Document Retrieval
        dataset:
          type: custom
          name: RadLIT-9
          config: radlit9-v1.1-balanced
        metrics:
          - type: mrr
            value: 0.698
            name: MRR (bi-encoder only)
          - type: recall@10
            value: 0.914
            name: Recall@10
          - type: ndcg@10
            value: 0.748
            name: nDCG@10

RadLIT-BiEncoder: Radiology Document Retrieval

A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.

Model Description

RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.

Architecture

Base Model: RoBERTa-base architecture
Hidden Size: 768
Layers: 12
Attention Heads: 12
Parameters: ~125M
Max Sequence Length: 512 tokens
Embedding Dimension: 768

Training

The model was trained using contrastive learning with hard negative mining on radiology educational content:

Training Objective: Multiple Negatives Ranking Loss with hard negatives
Batch Size: 32
Learning Rate: 2e-5 with warmup
Training Epochs: 4

Note: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.

Performance

RadLIT-9 Benchmark (Bi-Encoder Only)

Performance when using this bi-encoder alone for retrieval:

Metric	Score
MRR	0.698
nDCG@10	0.748
Recall@10	91.4%
Recall@5	86.9%
Recall@1	56.7%

Comparison with General-Purpose Models

On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):

Model	MRR	nDCG@10	Recall@10
GTE-large	0.843	0.873	97.1%
E5-large-v2	0.813	0.850	96.9%
BGE-large	0.792	0.836	97.4%
RadLIT-BiEncoder	0.698	0.748	91.4%

Important: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).

Full RadLITE Pipeline Performance

When combined with RadLIT-CrossEncoder and BM25 fusion:

Configuration	MRR	Improvement
Bi-encoder only	0.698	baseline
+ Cross-encoder reranking	0.782	+12.0%
+ BM25 fusion (RadLITE)	0.829	+18.8%

The full RadLITE pipeline achieves 0.829 MRR, competitive with the best general-purpose models while being optimized for radiology.

Subspecialty Performance (Bi-Encoder Only)

Subspecialty	MRR	Recall@10
Physics/Nuclear	0.790	100%
Pediatric	0.827	92%
Thoracic	0.828	94%
Cardiac	0.778	98%
Neuroradiology	0.731	88%
Gastrointestinal	0.626	98%
Breast	0.592	90%
Musculoskeletal	0.598	78%
Genitourinary	0.470	84%

Usage

Installation

pip install sentence-transformers

Basic Usage

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-biencoder')

# Encode queries and documents
queries = [
    "What are the imaging features of hepatocellular carcinoma on MRI?",
    "How do you differentiate glioblastoma from metastasis?"
]
documents = [
    "HCC typically shows arterial enhancement with washout on portal venous phase...",
    "GBM and metastases can be differentiated by their location and multiplicity..."
]

query_embeddings = model.encode(queries, convert_to_tensor=True)
doc_embeddings = model.encode(documents, convert_to_tensor=True)

# Compute similarity
from sentence_transformers.util import cos_sim
similarities = cos_sim(query_embeddings, doc_embeddings)
print(similarities)

For Retrieval Pipeline

from sentence_transformers import SentenceTransformer, util
import torch

model = SentenceTransformer('matulichpt/radlit-biencoder')

# Pre-encode your document corpus
corpus = ["document 1...", "document 2...", ...]
corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)

# At query time
query = "What are the CT findings in pulmonary embolism?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find top-k similar documents
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(cos_scores, k=10)

for score, idx in zip(top_results[0], top_results[1]):
    print(f"Score: {score:.4f} - {corpus[idx][:100]}...")

Recommended: Full RadLITE Pipeline

For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:

from sentence_transformers import SentenceTransformer, CrossEncoder

# Stage 1: Bi-encoder retrieval (fast, gets candidates)
biencoder = SentenceTransformer('matulichpt/radlit-biencoder')

# Stage 2: Cross-encoder reranking (slower, more accurate)
crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')

# Retrieve candidates
query = "What are the MRI findings in anterior cruciate ligament tear?"
candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)

# Rerank with cross-encoder
pairs = [[query, doc] for doc in candidates]
scores = crossencoder.predict(pairs)

# Apply temperature calibration (recommended: T=1.5)
calibrated_scores = scores / 1.5

# Sort by calibrated scores
reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)

Intended Use

Primary Use Cases

First-stage candidate retrieval for radiology content
Medical imaging literature search
Radiology question-answering systems (retrieval component)

Out-of-Scope Uses

General web search
Non-medical document retrieval
Clinical diagnosis (this is a retrieval model, not a diagnostic tool)

Limitations

Bi-encoder alone underperforms: Use with cross-encoder reranking for best results
Domain Specificity: Optimized for radiology; may underperform on general content
Language: English only
Subspecialty Variance: Performance varies by subspecialty (0.47-0.83 MRR range)

Ethical Considerations

This model should not be used as a sole source for clinical decision-making
Retrieved documents should be reviewed by qualified medical professionals
The model may reflect biases present in radiology educational literature

Citation

@software{radlit_biencoder_2026,
  title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
  author = {Matulich, P.},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-biencoder},
  note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
}

Related Models

RadLIT-CrossEncoder - Second-stage reranking
RadLIT-ColBERT - Late interaction model

License

Apache 2.0 - Free for research and commercial use.