MedSwin
/

MedSwin-Reranker-bge-gemma

+---
+license: apache-2.0
+datasets:
+- BeIR/scidocs
+- miriad/miriad-4.4M
+- BioASQ-b
+language:
+- en
+base_model:
+- BAAI/bge-reranker-v2-gemma
+pipeline_tag: text-classification
+tags:
+- medical
+- merge
+- rerank
+---
+# MedSwin/MedSwin-Reranker-bge-gemma — Fine-tuned Biomedical & EMR Context Ranking
+- **Developed by:** Medical Swinburne University of Technology AI Team
+- **Funded by:** [Swinburne University of Technology](https://www.swinburne.edu.au)
+- **Language(s):** English
+- **License:** Apache 2.0
+## Overview
+1. **RAG Context Reranking**
+   Re-rank candidate passages retrieved from a VectorDB (initial recall via embeddings), improving final context selection for downstream medical LLM reasoning.
+2. **EMR Profile Reranking**
+   Re-rank patient historical information (e.g., past assessments, diagnoses, medications) to surface the most clinically relevant records for a given current assessment.
+The reranker outputs a **direct relevance score** for each *(query, passage)* pair and can be used as a drop-in “second-stage” ranking component after embedding-based retrieval.
+---
+## Why a Reranker?
+Embedding retrieval is fast and scalable but may miss nuanced relevance (clinical relationships, subtle terminology, long context dependencies).
+A reranker improves precision by explicitly scoring each candidate passage against the query, typically yielding better top-k context for medical QA and decision support.
+---
+## Base Model
+- **Model**: [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)
+- **Finetuning strategy**: **LoRA** (parameter-efficient fine-tuning) with gradient checkpointing and mixed precision (fp16/bf16 depending on GPU).
+- **Rationale**: Gemma-based rerankers generally provide strong relevance modeling and support longer contexts compared to smaller rerankers.
+---
+## Training Data (Offline, Local)
+We fine-tune using **open HF datasets** stored locally on HPC:
+### 1) BioASQ (Generated Queries)
+- Used as: (query, document) positives; negatives sampled from rolling buffer.
+- Specialised to handle the complex terminology and high precision required for Task B (Biomedical Semantic QA). The reranker acts as a critical second stage in a two-stage retrieval system, filtering initial candidate lists from a PubMed-indexed retriever to ensure the highest-ranked documents contain the specific evidence needed for factoid and 'ideal' answer generation.
+### 2) MIRIAD (Medical IR Instruction Dataset)
+- Used as: (question → passage) positives; negatives sampled from rolling buffer.
+- [MIRIAD's 4.4M](https://huggingface.co/datasets/miriad/miriad-4.4M) literature-grounded QA pairs, the model is trained to distinguish between highly similar clinical concepts. This specialization reduces medical hallucinations and ensures that the most scientifically accurate evidence is prioritised in a multi-stage retrieval pipeline for healthcare professionals.
+### 3) SciDocs
+- Multi-task dataset—including citation prediction and co-citation analysis—the model learns to capture nuanced semantic relationships that standard Bi-Encoders miss. The resulting reranker serves as a high-accuracy second stage in a two-stage retrieval pipeline, significantly improving Top-K relevance for complex scholarly queries.
+---
+## Methodology
+### Data Construction (Triplets)
+The training corpus is converted into reranker triplets:
+```json
+{
+  "query": "clinical question",
+  "pos": ["relevant passage 1", "relevant passage 2"],
+  "neg": ["irrelevant passage A", "irrelevant passage B"],
+  "source": "dataset_name"
+}
+```
+* **Positives**: from dataset relevance labels or paired question–passage examples.
+* **Negatives**: sampled from an in-memory rolling buffer (fast, scalable offline).
+* Output splits: **train / val / test** created in one run.
+### Evaluation
+Computes IR ranking metrics by scoring each query against its *(pos + neg)* candidates:
+* **nDCG@10:** 0.60+
+* **MRR@10:** 0.50+
+* **MAP@10:** 0.40+
+* **Hit@1:** 0.40+
+* Metrics reported overall and broken down by data source.