Qwen3-8B-LoRA-ContextBioEL-Reranker-SFT
This repository provides a LoRA adapter for Qwen3-8B for the reranker stage of a clinical biomedical entity linking pipeline.
This model reranks a top-10 candidate list using the rewritten term, marked note context, and candidate semantic tags, and outputs the best concept_id. It was trained with supervised fine-tuning (SFT).
Model type
- Base model: Qwen/Qwen3-8B
- Adapter type: LoRA
- Stage: Reranker
- Training: SFT
- Task: Context-aware biomedical entity linking reranking
Intended use
Inputs:
rewritten_termcontext_marked, where the target mention is explicitly enclosed by<mention>...</mention>candidates, a top-10 candidate list containing:concept_idconcept_namesemantic_tag
Output:
- exactly one selected
concept_idin the<answer>...</answer>block
This model is intended for research use in biomedical entity linking pipelines.
Important decoding note
This adapter was trained with reasoning-style outputs.
Please:
- enable thinking
- do not use greedy decoding
Recommended decoding:
do_sample=True- non-greedy decoding such as temperature/top-p sampling
- parse the final prediction from the
<answer>...</answer>span
Usage example
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json
base_model_path = "Qwen/Qwen3-8B"
adapter_path = "Tao-AI-Informatics/Qwen3-8B-LoRA-ContextBioEL-Reranker-SFT"
tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_path)
cands_json = json.dumps([
{"concept_id": "22298006", "concept_name": "myocardial infarction", "semantic_tag": "disorder"},
{"concept_id": "57054005", "concept_name": "acute myocardial infarction", "semantic_tag": "disorder"}
], indent=2)
messages = [
{
"role": "system",
"content": (
"You are a clinical concept normalization model that reranks a top-10 candidate list using context and semantic tags.\n\n"
"Inputs you will receive:\n"
"- rewritten_term\n"
"- context_marked with <mention>...</mention>\n"
"- candidates: top-10 items (concept_id, concept_name, semantic_tag)\n\n"
"Think before answer\n\n"
"Output ONLY:\n"
"<think>...</think>\n"
"<answer>...</answer>\n\n"
"In <think>, write a detailed reasoning with these parts:\n"
"1) Context interpretation: what the mention means in this note (section cues, negation, experiencer, temporality).\n"
"2) Type inference: what semantic type/tag is expected (and why other tags are wrong).\n"
"3) Candidate comparison: evaluate multiple candidates. Note over-specific vs too-general, added qualifiers, and tag alignment.\n"
"4) Decision: justify the final choice.\n\n"
"In <answer>, use exactly one of:\n"
"- <answer><concept_id></answer>\n"
),
},
{
"role": "user",
"content": (
"Task: Choose the best concept_id from candidates.\n\n"
"rewritten_term:\nacute myocardial infarction\n\n"
"context_marked:\n"
"The patient was admitted for <mention>heart attack</mention> yesterday.\n\n"
f"candidates (top10; no scores):\n{cands_json}"
),
},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Notes
This is a LoRA adapter, not a standalone full model.
The adapter is designed for the rewriting stage, not retrieval by itself.
In downstream pipelines, the rewritten term is typically passed to a retriever or reranker.
Limitations
This model is intended for research use only.
Performance may vary across ontologies, institutions, and note styles.
The model should be evaluated carefully before any real-world deployment.
The final normalized term should be extracted from the ... block.
Citation
If you use this model, please cite the associated paper when available.
- Downloads last month
- 13