Initial model upload with benchmarks

Browse files

Files changed (7) hide show

README.md +288 -0
config.json +32 -0
model.safetensors +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,288 @@

+---
+language:
+- en
+license: apache-2.0
+library_name: sentence-transformers
+tags:
+- sentence-transformers
+- cross-encoder
+- text-classification
+- radiology
+- medical
+- reranking
+datasets:
+- custom
+metrics:
+- mrr
+- recall
+pipeline_tag: text-classification
+model-index:
+- name: radlit-crossencoder
+  results:
+  - task:
+      type: reranking
+      name: Radiology Document Reranking
+    dataset:
+      type: custom
+      name: RadLIT-9
+      config: radlit9-v1.1-balanced
+    metrics:
+    - type: mrr
+      value: 0.829
+      name: MRR (with bi-encoder)
+    - type: mrr_improvement
+      value: 0.30
+      name: MRR Improvement on Complex Queries
+---
+# RadLIT-CrossEncoder: Radiology Reranking Model
+A cross-encoder model fine-tuned for reranking radiology document retrieval results. Designed to work as the second stage of the RadLITE pipeline, providing significant improvements on complex clinical queries.
+## Model Description
+RadLIT-CrossEncoder takes a query-document pair and outputs a relevance score. Unlike bi-encoders that encode queries and documents separately, cross-encoders process them jointly, enabling more nuanced relevance judgments at the cost of higher latency.
+### Architecture
+- **Base Model**: BERT architecture (medical-initialized)
+- **Hidden Size**: 384
+- **Layers**: 12
+- **Attention Heads**: 12
+- **Parameters**: ~33M (optimized for inference speed)
+- **Max Sequence Length**: 512 tokens
+- **Output**: Single relevance score (regression)
+### Training
+The model was fine-tuned on radiology query-document pairs with relevance labels:
+- **Training Objective**: Binary Cross-Entropy with soft labels
+- **Training Data**: Expert-labeled query-document pairs from radiology education
+- **Hard Negatives**: Mined from bi-encoder retrieval failures
+- **Batch Size**: 16
+- **Learning Rate**: 2e-5
+- **Epochs**: 3
+**Note**: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.
+## Performance
+### Impact on RadLITE Pipeline
+When combined with RadLIT-BiEncoder:
+| Configuration | MRR | Improvement |
+|---------------|-----|-------------|
+| Bi-encoder only | 0.703 | baseline |
+| + Cross-encoder reranking | 0.741 | +5.4% |
+| + Calibrated fusion (RadLITE) | **0.829** | **+17.9%** |
+### Performance on Complex Queries
+The cross-encoder shows largest improvements on complex clinical reasoning queries:
+| Query Type | Improvement |
+|------------|-------------|
+| Board exam questions | **+30.3%** |
+| Differential diagnosis | +22.5% |
+| Staging/classification | +18.0% |
+| Simple factual | +5.0% |
+### Subspecialty Impact
+Greatest improvements on subspecialties requiring clinical reasoning:
+| Subspecialty | Improvement with CE |
+|--------------|---------------------|
+| Physics | +33.9% |
+| Genitourinary | +20.1% |
+| Neuroradiology | +18.0% |
+| Gastrointestinal | +16.6% |
+## Usage
+### Installation
+```bash
+pip install sentence-transformers
+```
+### Basic Usage
+```python
+from sentence_transformers import CrossEncoder
+# Load model
+model = CrossEncoder('matulichpt/radlit-crossencoder')
+# Score query-document pairs
+pairs = [
+    ["What are the CT findings in pulmonary embolism?",
+     "CT pulmonary angiography shows filling defects in the pulmonary arteries..."],
+    ["What are the CT findings in pulmonary embolism?",
+     "MRI of the knee shows ACL tear with bone bruise pattern..."]
+]
+scores = model.predict(pairs)
+print(scores)  # [0.92, 0.08] - higher score = more relevant
+```
+### Reranking Pipeline
+```python
+from sentence_transformers import SentenceTransformer, CrossEncoder
+import numpy as np
+# Load models
+biencoder = SentenceTransformer('matulichpt/radlit-biencoder')
+crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')
+def retrieve_and_rerank(query, corpus, corpus_embeddings, top_k=10, rerank_k=50):
+    # Stage 1: Bi-encoder retrieval
+    query_embedding = biencoder.encode(query, convert_to_tensor=True)
+    cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
+    top_indices = torch.topk(cos_scores, k=rerank_k)[1].tolist()
+    # Stage 2: Cross-encoder reranking
+    candidates = [corpus[i] for i in top_indices]
+    pairs = [[query, doc] for doc in candidates]
+    ce_scores = crossencoder.predict(pairs)
+    # Apply temperature calibration (IMPORTANT: use T=1.5)
+    calibrated_scores = ce_scores / 1.5
+    # Sort and return top-k
+    sorted_indices = np.argsort(calibrated_scores)[::-1][:top_k]
+    return [(candidates[i], calibrated_scores[i]) for i in sorted_indices]
+# Example
+results = retrieve_and_rerank(
+    "What are the imaging features of hepatocellular carcinoma?",
+    corpus, corpus_embeddings
+)
+```
+### Temperature Calibration
+**Important**: For optimal performance in score fusion, apply temperature scaling:
+```python
+# Raw CE scores have higher variance than bi-encoder scores
+raw_scores = crossencoder.predict(pairs)
+# Temperature calibration aligns score distributions
+# T=1.5 found optimal through grid search
+calibrated_scores = raw_scores / 1.5
+```
+This is critical when combining cross-encoder scores with bi-encoder scores.
+### Full RadLITE Fusion
+```python
+def radlite_score(query, document, biencoder, crossencoder, bm25_score):
+    """
+    Full RadLITE scoring with optimal weights.
+    Optimal weights (found via grid search on RadLIT-9):
+    - Bi-encoder: 0.5
+    - Cross-encoder: 0.2
+    - BM25: 0.3
+    """
+    # Bi-encoder score
+    q_emb = biencoder.encode(query, convert_to_tensor=True)
+    d_emb = biencoder.encode(document, convert_to_tensor=True)
+    biencoder_score = float(util.cos_sim(q_emb, d_emb)[0][0])
+    # Cross-encoder score (calibrated)
+    ce_score = crossencoder.predict([[query, document]])[0] / 1.5
+    # Fusion
+    final_score = (
+        0.5 * biencoder_score +
+        0.2 * ce_score +
+        0.3 * bm25_score  # Normalized BM25
+    )
+    return final_score
+```
+## Technical Details
+### Why Temperature Calibration?
+Cross-encoder scores tend to be more extreme than bi-encoder similarity scores:
+| Score Type | Typical Range | Variance |
+|------------|---------------|----------|
+| Bi-encoder cosine | [0.3, 0.9] | Low |
+| Raw CE score | [-2, 3] | High |
+| Calibrated CE (T=1.5) | [-1.3, 2] | Medium |
+Without calibration, the CE dominates the fusion and degrades overall performance. Temperature 1.5 achieves ~0.7 correlation between score distributions.
+### Latency Considerations
+| Operation | Latency |
+|-----------|---------|
+| Single pair scoring | ~4ms |
+| 50 pairs (batch) | ~200-300ms |
+| Bi-encoder (50 docs) | ~80-120ms |
+For production use, consider:
+- Limiting rerank candidates (50 is optimal)
+- Batch processing
+- GPU acceleration
+## Intended Use
+### Primary Use Cases
+- Second-stage reranking for radiology retrieval
+- Relevance scoring for radiology Q&A
+- Fine-grained document ranking
+### Out-of-Scope Uses
+- First-stage retrieval (too slow for large corpora)
+- Non-radiology content
+- Clinical diagnosis
+## Limitations
+1. **Latency**: ~4ms per pair; not suitable for first-stage retrieval
+2. **Domain**: Optimized for radiology; limited generalization
+3. **Context Length**: 512 tokens max; long documents need truncation
+4. **Score Interpretation**: Requires calibration for fusion
+## Ethical Considerations
+- Not a diagnostic tool
+- Should be used to surface relevant educational content, not replace clinical judgment
+- May reflect biases in radiology literature
+## Citation
+```bibtex
+@software{radlit_crossencoder_2026,
+  title = {RadLIT-CrossEncoder: Radiology Reranking Model},
+  author = {Grai Team},
+  year = {2026},
+  url = {https://huggingface.co/matulichpt/radlit-crossencoder},
+  note = {+30% improvement on complex radiology queries}
+}
+```
+## Related Models
+- [RadLIT-BiEncoder](https://huggingface.co/matulichpt/radlit-biencoder) - First-stage retrieval
+- RadLITE Pipeline - Full retrieval system documentation
+## License
+Apache 2.0 - Free for research and commercial use.
+## Contact
+For questions or collaboration: Open an issue on the model repository

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "sbert_ce_default_activation_function": "torch.nn.modules.linear.Identity",
+  "transformers_version": "4.56.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3dfc8832e0d99ed4c39d357bd5be9ea2552eab7107daa09b30db39a43f741a73
+size 133464836

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff