matulichpt commited on
Commit
88e135d
·
verified ·
1 Parent(s): 812c876

Fix metrics: show bi-encoder standalone performance (0.698 MRR), not full pipeline

Browse files
Files changed (1) hide show
  1. README.md +257 -239
README.md CHANGED
@@ -1,239 +1,257 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- library_name: sentence-transformers
6
- tags:
7
- - sentence-transformers
8
- - feature-extraction
9
- - sentence-similarity
10
- - radiology
11
- - medical
12
- - retrieval
13
- - embedding
14
- datasets:
15
- - custom
16
- pipeline_tag: sentence-similarity
17
- model-index:
18
- - name: radlit-biencoder
19
- results:
20
- - task:
21
- type: retrieval
22
- name: Radiology Document Retrieval
23
- dataset:
24
- type: custom
25
- name: RadLIT-9
26
- config: radlit9-v1.1-balanced
27
- metrics:
28
- - type: mrr
29
- value: 0.829
30
- name: MRR
31
- - type: recall@10
32
- value: 0.971
33
- name: Recall@10
34
- - type: ndcg@10
35
- value: 0.863
36
- name: nDCG@10
37
- ---
38
-
39
- # RadLIT-BiEncoder: Radiology Late Interaction Transformer
40
-
41
- A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology, clinical reasoning patterns, and radiology-specific queries.
42
-
43
- ## Model Description
44
-
45
- RadLIT-BiEncoder is the first stage of the RadLITE retrieval pipeline. It generates dense embeddings optimized for radiology content retrieval, significantly outperforming general-purpose embedding models on radiology-specific queries.
46
-
47
- ### Architecture
48
-
49
- - **Base Model**: RoBERTa-base architecture
50
- - **Hidden Size**: 768
51
- - **Layers**: 12
52
- - **Attention Heads**: 12
53
- - **Parameters**: ~125M
54
- - **Max Sequence Length**: 512 tokens
55
- - **Embedding Dimension**: 768
56
-
57
- ### Training
58
-
59
- The model was trained using contrastive learning with hard negative mining on a large corpus of radiology educational content. Training details:
60
-
61
- - **Training Objective**: Multiple Negatives Ranking Loss with hard negatives
62
- - **Batch Size**: 32
63
- - **Learning Rate**: 2e-5 with warmup
64
- - **Training Epochs**: 4
65
- - **Hard Negatives**: Mined from top-k retrieval failures
66
-
67
- **Note**: Training data consisted of radiology educational materials. Specific sources are not disclosed due to variable licensing, but the model is released under Apache 2.0 for research and commercial use.
68
-
69
- ## Performance
70
-
71
- ### RadLIT-9 Benchmark
72
-
73
- RadLIT-9 is a comprehensive radiology retrieval benchmark covering 9 subspecialties:
74
-
75
- | Metric | Score |
76
- |--------|-------|
77
- | **MRR** | 0.829 |
78
- | **nDCG@10** | 0.863 |
79
- | **Recall@10** | 97.1% |
80
- | **Recall@5** | 93.8% |
81
- | **Recall@1** | 74.3% |
82
-
83
- ### Subspecialty Performance
84
-
85
- | Subspecialty | MRR | Recall@10 |
86
- |--------------|-----|-----------|
87
- | Physics/Nuclear | 0.936 | 100% |
88
- | Pediatric | 0.931 | 100% |
89
- | Thoracic | 0.913 | 98% |
90
- | Cardiac | 0.862 | 98% |
91
- | Neuroradiology | 0.860 | 98% |
92
- | Gastrointestinal | 0.800 | 96% |
93
- | Breast | 0.722 | 93% |
94
- | Musculoskeletal | 0.695 | 89% |
95
- | Genitourinary | 0.694 | 100% |
96
-
97
- ### Comparison with Baselines
98
-
99
- | Model | MRR | vs RadLIT |
100
- |-------|-----|-----------|
101
- | **RadLIT-BiEncoder** | **0.829** | -- |
102
- | ColBERT-v2 | 0.750 | -9.5% |
103
- | General bi-encoder | 0.703 | -15.2% |
104
- | BM25 | ~0.55 | -33.6% |
105
-
106
- ## Usage
107
-
108
- ### Installation
109
-
110
- ```bash
111
- pip install sentence-transformers
112
- ```
113
-
114
- ### Basic Usage
115
-
116
- ```python
117
- from sentence_transformers import SentenceTransformer
118
-
119
- # Load model
120
- model = SentenceTransformer('matulichpt/radlit-biencoder')
121
-
122
- # Encode queries and documents
123
- queries = [
124
- "What are the imaging features of hepatocellular carcinoma on MRI?",
125
- "How do you differentiate glioblastoma from metastasis?"
126
- ]
127
- documents = [
128
- "HCC typically shows arterial enhancement with washout on portal venous phase...",
129
- "GBM and metastases can be differentiated by their location and multiplicity..."
130
- ]
131
-
132
- query_embeddings = model.encode(queries, convert_to_tensor=True)
133
- doc_embeddings = model.encode(documents, convert_to_tensor=True)
134
-
135
- # Compute similarity
136
- from sentence_transformers.util import cos_sim
137
- similarities = cos_sim(query_embeddings, doc_embeddings)
138
- print(similarities)
139
- ```
140
-
141
- ### For Retrieval Pipeline
142
-
143
- ```python
144
- from sentence_transformers import SentenceTransformer, util
145
- import torch
146
-
147
- model = SentenceTransformer('matulichpt/radlit-biencoder')
148
-
149
- # Pre-encode your document corpus
150
- corpus = ["document 1...", "document 2...", ...]
151
- corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)
152
-
153
- # At query time
154
- query = "What are the CT findings in pulmonary embolism?"
155
- query_embedding = model.encode(query, convert_to_tensor=True)
156
-
157
- # Find top-k similar documents
158
- cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
159
- top_results = torch.topk(cos_scores, k=10)
160
-
161
- for score, idx in zip(top_results[0], top_results[1]):
162
- print(f"Score: {score:.4f} - {corpus[idx][:100]}...")
163
- ```
164
-
165
- ## Recommended: Full RadLITE Pipeline
166
-
167
- For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:
168
-
169
- ```python
170
- from sentence_transformers import SentenceTransformer, CrossEncoder
171
-
172
- # Stage 1: Bi-encoder retrieval
173
- biencoder = SentenceTransformer('grai-rad/radlit-biencoder')
174
-
175
- # Stage 2: Cross-encoder reranking
176
- crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')
177
-
178
- # Retrieve candidates
179
- query = "What are the MRI findings in anterior cruciate ligament tear?"
180
- candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)
181
-
182
- # Rerank with cross-encoder
183
- pairs = [[query, doc] for doc in candidates]
184
- scores = crossencoder.predict(pairs)
185
-
186
- # Apply temperature calibration (recommended: T=1.5)
187
- calibrated_scores = scores / 1.5
188
-
189
- # Sort by calibrated scores
190
- reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)
191
- ```
192
-
193
- ## Intended Use
194
-
195
- ### Primary Use Cases
196
-
197
- - Radiology educational content retrieval
198
- - Medical imaging literature search
199
- - Clinical decision support (retrieval component)
200
- - Radiology question-answering systems
201
-
202
- ### Out-of-Scope Uses
203
-
204
- - General web search
205
- - Non-medical document retrieval
206
- - Clinical diagnosis (this is a retrieval model, not a diagnostic tool)
207
-
208
- ## Limitations
209
-
210
- 1. **Domain Specificity**: Optimized for radiology; may underperform on general medical or non-medical content
211
- 2. **Language**: English only
212
- 3. **Subspecialty Variance**: Performance varies by subspecialty (0.69-0.94 MRR range)
213
- 4. **Not a Diagnostic Tool**: This model retrieves relevant documents; it does not provide medical diagnoses
214
-
215
- ## Ethical Considerations
216
-
217
- - This model should not be used as a sole source for clinical decision-making
218
- - Retrieved documents should be reviewed by qualified medical professionals
219
- - The model may reflect biases present in radiology educational literature
220
-
221
- ## Citation
222
-
223
- ```bibtex
224
- @software{radlit_biencoder_2026,
225
- title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
226
- author = {Grai Team},
227
- year = {2026},
228
- url = {https://huggingface.co/matulichpt/radlit-biencoder},
229
- note = {MRR 0.829 on RadLIT-9 benchmark}
230
- }
231
- ```
232
-
233
- ## License
234
-
235
- Apache 2.0 - Free for research and commercial use.
236
-
237
- ## Contact
238
-
239
- For questions or collaboration: Open an issue on the model repository
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - feature-extraction
9
+ - sentence-similarity
10
+ - radiology
11
+ - medical
12
+ - retrieval
13
+ - embedding
14
+ datasets:
15
+ - custom
16
+ metrics:
17
+ - mrr
18
+ - recall
19
+ pipeline_tag: sentence-similarity
20
+ model-index:
21
+ - name: radlit-biencoder
22
+ results:
23
+ - task:
24
+ type: retrieval
25
+ name: Radiology Document Retrieval
26
+ dataset:
27
+ type: custom
28
+ name: RadLIT-9
29
+ config: radlit9-v1.1-balanced
30
+ metrics:
31
+ - type: mrr
32
+ value: 0.698
33
+ name: MRR (bi-encoder only)
34
+ - type: recall@10
35
+ value: 0.914
36
+ name: Recall@10
37
+ - type: ndcg@10
38
+ value: 0.748
39
+ name: nDCG@10
40
+ ---
41
+
42
+ # RadLIT-BiEncoder: Radiology Document Retrieval
43
+
44
+ A domain-specialized bi-encoder model for radiology document retrieval, trained to understand medical imaging terminology and radiology-specific queries.
45
+
46
+ ## Model Description
47
+
48
+ RadLIT-BiEncoder generates dense embeddings optimized for radiology content retrieval. It serves as the first stage of the RadLITE pipeline, providing fast candidate retrieval before cross-encoder reranking.
49
+
50
+ ### Architecture
51
+
52
+ - **Base Model**: RoBERTa-base architecture
53
+ - **Hidden Size**: 768
54
+ - **Layers**: 12
55
+ - **Attention Heads**: 12
56
+ - **Parameters**: ~125M
57
+ - **Max Sequence Length**: 512 tokens
58
+ - **Embedding Dimension**: 768
59
+
60
+ ### Training
61
+
62
+ The model was trained using contrastive learning with hard negative mining on radiology educational content:
63
+
64
+ - **Training Objective**: Multiple Negatives Ranking Loss with hard negatives
65
+ - **Batch Size**: 32
66
+ - **Learning Rate**: 2e-5 with warmup
67
+ - **Training Epochs**: 4
68
+
69
+ **Note**: Training data sources are not disclosed due to variable licensing. The model is released under Apache 2.0.
70
+
71
+ ## Performance
72
+
73
+ ### RadLIT-9 Benchmark (Bi-Encoder Only)
74
+
75
+ Performance when using this bi-encoder alone for retrieval:
76
+
77
+ | Metric | Score |
78
+ |--------|-------|
79
+ | **MRR** | 0.698 |
80
+ | **nDCG@10** | 0.748 |
81
+ | **Recall@10** | 91.4% |
82
+ | **Recall@5** | 86.9% |
83
+ | **Recall@1** | 56.7% |
84
+
85
+ ### Comparison with General-Purpose Models
86
+
87
+ On RadLIT-9 benchmark (bi-encoder retrieval only, no reranking):
88
+
89
+ | Model | MRR | nDCG@10 | Recall@10 |
90
+ |-------|-----|---------|-----------|
91
+ | GTE-large | 0.843 | 0.873 | 97.1% |
92
+ | E5-large-v2 | 0.813 | 0.850 | 96.9% |
93
+ | BGE-large | 0.792 | 0.836 | 97.4% |
94
+ | **RadLIT-BiEncoder** | **0.698** | **0.748** | **91.4%** |
95
+
96
+ **Important**: The bi-encoder alone underperforms general-purpose models. The value of RadLIT comes from the full pipeline with cross-encoder reranking (see below).
97
+
98
+ ### Full RadLITE Pipeline Performance
99
+
100
+ When combined with RadLIT-CrossEncoder and BM25 fusion:
101
+
102
+ | Configuration | MRR | Improvement |
103
+ |---------------|-----|-------------|
104
+ | Bi-encoder only | 0.698 | baseline |
105
+ | + Cross-encoder reranking | 0.782 | +12.0% |
106
+ | + BM25 fusion (RadLITE) | **0.829** | **+18.8%** |
107
+
108
+ The full RadLITE pipeline achieves **0.829 MRR**, competitive with the best general-purpose models while being optimized for radiology.
109
+
110
+ ### Subspecialty Performance (Bi-Encoder Only)
111
+
112
+ | Subspecialty | MRR | Recall@10 |
113
+ |--------------|-----|-----------|
114
+ | Physics/Nuclear | 0.790 | 100% |
115
+ | Pediatric | 0.827 | 92% |
116
+ | Thoracic | 0.828 | 94% |
117
+ | Cardiac | 0.778 | 98% |
118
+ | Neuroradiology | 0.731 | 88% |
119
+ | Gastrointestinal | 0.626 | 98% |
120
+ | Breast | 0.592 | 90% |
121
+ | Musculoskeletal | 0.598 | 78% |
122
+ | Genitourinary | 0.470 | 84% |
123
+
124
+ ## Usage
125
+
126
+ ### Installation
127
+
128
+ ```bash
129
+ pip install sentence-transformers
130
+ ```
131
+
132
+ ### Basic Usage
133
+
134
+ ```python
135
+ from sentence_transformers import SentenceTransformer
136
+
137
+ # Load model
138
+ model = SentenceTransformer('matulichpt/radlit-biencoder')
139
+
140
+ # Encode queries and documents
141
+ queries = [
142
+ "What are the imaging features of hepatocellular carcinoma on MRI?",
143
+ "How do you differentiate glioblastoma from metastasis?"
144
+ ]
145
+ documents = [
146
+ "HCC typically shows arterial enhancement with washout on portal venous phase...",
147
+ "GBM and metastases can be differentiated by their location and multiplicity..."
148
+ ]
149
+
150
+ query_embeddings = model.encode(queries, convert_to_tensor=True)
151
+ doc_embeddings = model.encode(documents, convert_to_tensor=True)
152
+
153
+ # Compute similarity
154
+ from sentence_transformers.util import cos_sim
155
+ similarities = cos_sim(query_embeddings, doc_embeddings)
156
+ print(similarities)
157
+ ```
158
+
159
+ ### For Retrieval Pipeline
160
+
161
+ ```python
162
+ from sentence_transformers import SentenceTransformer, util
163
+ import torch
164
+
165
+ model = SentenceTransformer('matulichpt/radlit-biencoder')
166
+
167
+ # Pre-encode your document corpus
168
+ corpus = ["document 1...", "document 2...", ...]
169
+ corpus_embeddings = model.encode(corpus, convert_to_tensor=True, show_progress_bar=True)
170
+
171
+ # At query time
172
+ query = "What are the CT findings in pulmonary embolism?"
173
+ query_embedding = model.encode(query, convert_to_tensor=True)
174
+
175
+ # Find top-k similar documents
176
+ cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
177
+ top_results = torch.topk(cos_scores, k=10)
178
+
179
+ for score, idx in zip(top_results[0], top_results[1]):
180
+ print(f"Score: {score:.4f} - {corpus[idx][:100]}...")
181
+ ```
182
+
183
+ ## Recommended: Full RadLITE Pipeline
184
+
185
+ For best results, use RadLIT-BiEncoder as the first stage followed by RadLIT-CrossEncoder for reranking:
186
+
187
+ ```python
188
+ from sentence_transformers import SentenceTransformer, CrossEncoder
189
+
190
+ # Stage 1: Bi-encoder retrieval (fast, gets candidates)
191
+ biencoder = SentenceTransformer('matulichpt/radlit-biencoder')
192
+
193
+ # Stage 2: Cross-encoder reranking (slower, more accurate)
194
+ crossencoder = CrossEncoder('matulichpt/radlit-crossencoder')
195
+
196
+ # Retrieve candidates
197
+ query = "What are the MRI findings in anterior cruciate ligament tear?"
198
+ candidates = retrieve_with_biencoder(query, corpus, biencoder, top_k=50)
199
+
200
+ # Rerank with cross-encoder
201
+ pairs = [[query, doc] for doc in candidates]
202
+ scores = crossencoder.predict(pairs)
203
+
204
+ # Apply temperature calibration (recommended: T=1.5)
205
+ calibrated_scores = scores / 1.5
206
+
207
+ # Sort by calibrated scores
208
+ reranked = sorted(zip(candidates, calibrated_scores), key=lambda x: x[1], reverse=True)
209
+ ```
210
+
211
+ ## Intended Use
212
+
213
+ ### Primary Use Cases
214
+
215
+ - First-stage candidate retrieval for radiology content
216
+ - Medical imaging literature search
217
+ - Radiology question-answering systems (retrieval component)
218
+
219
+ ### Out-of-Scope Uses
220
+
221
+ - General web search
222
+ - Non-medical document retrieval
223
+ - Clinical diagnosis (this is a retrieval model, not a diagnostic tool)
224
+
225
+ ## Limitations
226
+
227
+ 1. **Bi-encoder alone underperforms**: Use with cross-encoder reranking for best results
228
+ 2. **Domain Specificity**: Optimized for radiology; may underperform on general content
229
+ 3. **Language**: English only
230
+ 4. **Subspecialty Variance**: Performance varies by subspecialty (0.47-0.83 MRR range)
231
+
232
+ ## Ethical Considerations
233
+
234
+ - This model should not be used as a sole source for clinical decision-making
235
+ - Retrieved documents should be reviewed by qualified medical professionals
236
+ - The model may reflect biases present in radiology educational literature
237
+
238
+ ## Citation
239
+
240
+ ```bibtex
241
+ @software{radlit_biencoder_2026,
242
+ title = {RadLIT-BiEncoder: Domain-Specialized Embeddings for Radiology Retrieval},
243
+ author = {Matulich, P.},
244
+ year = {2026},
245
+ url = {https://huggingface.co/matulichpt/radlit-biencoder},
246
+ note = {MRR 0.698 standalone, 0.829 with RadLITE pipeline}
247
+ }
248
+ ```
249
+
250
+ ## Related Models
251
+
252
+ - [RadLIT-CrossEncoder](https://huggingface.co/matulichpt/radlit-crossencoder) - Second-stage reranking
253
+ - [RadLIT-ColBERT](https://huggingface.co/matulichpt/radlit-colbert) - Late interaction model
254
+
255
+ ## License
256
+
257
+ Apache 2.0 - Free for research and commercial use.