File size: 9,922 Bytes
15f2326 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
---
license: apache-2.0
---
# MedRAGChecker Claim Extractor · LoRA Adapter
Biomedical claim-triple extractor fine-tuned from a medical LLM using GPT-4.1 teacher labels.
This adapter is part of the **MedRAGChecker** pipeline for claim-level verification in biomedical RAG.
> **Task:** given a medical question and its answer, extract factual triples of the form
> `[subject, relation, object]` as a pure JSON array.
---
## Model summary
- **Base model:** `<BASE_MODEL_ID>` (for example: `med42-llama3-8b`, `Meditron3-8B`, `PMC_LLaMA_13B`, or `qwen2-med-7b`)
- **Adapter type:** LoRA (rank = 16, alpha = 32, dropout = 0.0) via PEFT
- **Architecture:** same as base causal LM (LLaMA-style or Qwen-style)
- **Task:** biomedical claim triple extraction
- **Input:** question text + model answer (plain text)
- **Output:** JSON array of triples, e.g.
```json
[
["Psoriasis", "is", "chronic inflammatory skin disease"],
["Psoriasis", "is associated with", "systemic comorbidities"]
]
```
You can either:
- keep one Hugging Face repo per adapter (recommended), or
- store several adapters in one repo and refer to specific subfolders.
Replace `<BASE_MODEL_ID>` and any placeholder names below with your actual base model and repo id (for example: `JoyDaJun/MedRAGChecker-Extractor-Meditron3-8B`).
---
## Intended use
- Post-hoc analysis of biomedical QA systems at *claim level*.
- Use inside a RAG or QA evaluation pipeline to:
- extract atomic factual statements from a generated answer;
- feed those triples to a checker model (e.g. MedRAGChecker NLI+KG).
This adapter is **not** a general-purpose chat model and **must not** be used as a standalone medical assistant.
---
## How to use
### 1. LLaMA-style base models (Meditron, Med42, PMC-LLaMA, etc.)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json
base_model_id = "<BASE_MODEL_ID>" # e.g. "med42-llama3-8b"
adapter_id = "<ADAPTER_REPO_ID>" # e.g. "JoyDaJun/MedRAGChecker-Extractor-Med42-8B"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
def build_prompt(question: str, answer: str) -> str:
system_part = (
"You are an information extraction assistant. "
"Given a medical question and its answer, extract all factual triples "
"as [subject, relation, object]. "
"Return a pure JSON array of triples, with no explanations, no extra text, "
"no comments. If there are no clear factual triples, return an empty JSON array []."
)
qa_part = f"Question: {question}\nAnswer: {answer}"
return (
system_part
+ "\n\n"
+ qa_part
+ '\n\nTriples (JSON only, e.g. [["subj", "rel", "obj"], ...]):\n'
)
question = "Does hypercholesterolemia increase leukotriene B4 in neutrophils?"
answer = "Hypercholesterolemia increases 5-LO activity in neutrophils..."
prompt = build_prompt(question, answer)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
gen_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
)
text = tokenizer.decode(gen_ids[0], skip_special_tokens=True)
# Optional: keep only the JSON array
start = text.find("[")
end = text.rfind("]") + 1
json_str = text[start:end] if start != -1 and end != -1 else "[]"
triples = json.loads(json_str)
print(triples)
```
### 2. Chat-style base models (Qwen2-med, etc.)
For chat-style models, wrap the same prompt inside the chat template.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json
base_model_id = "<QWEN_BASE_MODEL_ID>" # e.g. "qwen2-med-7b"
adapter_id = "<ADAPTER_REPO_ID_QWEN>" # e.g. "JoyDaJun/MedRAGChecker-Extractor-Qwen2-med-7B"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
def build_prompt(question: str, answer: str) -> str:
system_part = (
"Given a medical question and its answer, extract all factual triples "
"as [subject, relation, object]. "
"Return only a JSON array of triples."
)
qa_part = f"Question: {question}\nAnswer: {answer}"
return system_part + "\n\n" + qa_part + '\n\nTriples (JSON only, e.g. [["subj", "rel", "obj"], ...]):\n'
question = "Does hypercholesterolemia increase leukotriene B4 in neutrophils?"
answer = "Hypercholesterolemia increases 5-LO activity in neutrophils..."
messages = [
{"role": "system", "content": "You are an information extraction assistant."},
{"role": "user", "content": build_prompt(question, answer)},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
gen_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
)
text = tokenizer.decode(gen_ids[0], skip_special_tokens=True)
start = text.find("[")
end = text.rfind("]") + 1
json_str = text[start:end] if start != -1 and end != -1 else "[]"
triples = json.loads(json_str)
print(triples)
```
---
## Training details
This adapter was trained with the `DistillExtractor/train_extractor_sft.py` script in the MedRAGChecker codebase.
- **Teacher model:** GPT-4.1 as claim-triple annotator.
- **Training data:**
- JSONL file `extractor_sft.jsonl` with fields:
- `instruction`: system prompt + `Question:` + `Answer:` (from biomedical QA datasets and RAG outputs).
- `output`: pure JSON array of `[subject, relation, object]` triples labeled by GPT-4.1.
- Sources include consumer and research-style biomedical QA (e.g., MedQuAD, PubMedQA, LiveQA Medical, CSIRO MedRedQA, and AskDocs-style Reddit threads).
- **Preprocessing:**
- Parse `Question:` and `Answer:` from the `instruction` field using regex.
- Rebuild a canonical prompt with an explicit
`Triples (JSON only, e.g. [["subj", "rel", "obj"], ...]):`
header.
- **Fine-tuning setup (example):**
- Epochs: `10`
- Batch size: `1` with gradient accumulation `32` (effective batch size 32).
- Max input length: `2048`.
- Optimizer: AdamW, learning rate `1e-4`.
- LoRA config: `r = 16`, `alpha = 32`, `dropout = 0.0`.
- Precision: `bfloat16` on GPUs with `device_map="auto"`.
Example training command:
```bash
export WANDB_PROJECT=MedRAGChecker
export WANDB_NAME=extractor_<BASE_NAME>
BASE=/path/to/<BASE_MODEL_ID>
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python DistillExtractor/train_extractor_sft.py \
--model_name "$BASE" \
--train_path ./data/extractor_sft.jsonl \
--output_dir ./runs/extractor_sft_<BASE_NAME> \
--epochs 10 \
--batch_size 1 \
--grad_accum 32 \
--lr 1e-4 \
--bf16
```
Replace `<BASE_MODEL_ID>` and `<BASE_NAME>` with your actual base model.
---
## Evaluation
We evaluate on a held-out split of the same GPT-4.1-annotated dataset using two families of metrics:
1. **Strict triple match**
- Normalize to lowercase and strip whitespace.
- Treat each triple as a set element `(subject, relation, object)`.
- Compute precision/recall/F1 on exact triple matches.
- Also report exact match rate (all triples in an example match exactly).
2. **Soft triple match**
- Tokenize subject, relation, and object.
- Compute token-level F1 for each field between predicted and gold triples.
- Aggregate into a per-triple similarity score.
- Run greedy matching between predicted and gold triples by similarity.
- Compute soft precision/recall/F1 from matched pairs.
Example metrics on a random subsample of `N = 200` examples for a Meditron3-8B-based extractor:
| Metric | Value |
|------------------|--------|
| strict_precision | 0.0890 |
| strict_recall | 0.0930 |
| strict_f1 | 0.0900 |
| exact_match | 0.0500 |
| soft_precision | 0.2052 |
| soft_recall | 0.2598 |
| soft_f1 | 0.2148 |
These numbers illustrate that:
- the model is far from perfect at exact triple reconstruction;
- soft matching shows it still captures many approximate facts, which is often sufficient for downstream diagnostics in MedRAGChecker.
You can reproduce these metrics (and compute new ones for other checkpoints) with the evaluation script:
```bash
python DistillExtractor/run_extractor_eval_soft.py \
--base_model <BASE_MODEL_ID> \
--adapter_path <ADAPTER_REPO_OR_LOCAL_PATH> \
--data_path ./data/extractor_sft.jsonl \
--output_path ./results/extractor_soft_<BASE_NAME>.json \
--num_examples 200
```
---
## Limitations and risks
- The adapter inherits all limitations and biases of the base model and GPT-4.1 teacher.
- Extracted triples may still be incomplete, redundant, or slightly rephrased.
- The model is optimized for **English biomedical text**; performance on other domains or languages is likely poor.
- Do **not** use this model (or its extracted triples) directly for patient-facing decisions or clinical care without expert validation.
---
## Citation
If you use this adapter or MedRAGChecker in your work, please consider citing our paper (details to be updated):
```bibtex
@inproceedings{ji2025medragchecker,
title = {MedRAGChecker: Claim-level Verification for Biomedical Retrieval-Augmented Generation},
author = {Ji, Yuelyu and collaborators},
booktitle = {Proceedings of a future venue},
year = {2025}
}
```
---
## License
- This adapter is released under the same license terms as the corresponding base model `<BASE_MODEL_ID>`.
- You must accept and comply with the license of the base model before using this LoRA. |