Qwen3-8B-LoRA-ContextBioEL-Reranker-RL

This repository provides a LoRA adapter for Qwen3-8B for the reranker stage of a clinical biomedical entity linking pipeline.

This model reranks a top-10 candidate list using the rewritten term, marked note context, and candidate semantic tags, and outputs the best concept_id. It was further optimized with reinforcement learning (RL) for entity-linking-oriented reranking behavior.

Model type

Base model: Qwen/Qwen3-8B
Adapter type: LoRA
Stage: Reranker
Training: RL
Task: Context-aware biomedical entity linking reranking

Intended use

Inputs:

rewritten_term
context_marked, where the target mention is explicitly enclosed by <mention>...</mention>
candidates, a top-10 candidate list containing:
- concept_id
- concept_name
- semantic_tag

Output:

exactly one selected concept_id in the <answer>...</answer> block

This model is intended for research use in biomedical entity linking pipelines.

Important decoding note

This adapter was trained with reasoning-style outputs.

Please:

enable thinking
do not use greedy decoding

Recommended decoding:

do_sample=True
non-greedy decoding such as temperature/top-p sampling
parse the final prediction from the <answer>...</answer> span

Usage example

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json

base_model_path = "Qwen/Qwen3-8B"
adapter_path = "Tao-AI-Informatics/Qwen3-8B-LoRA-ContextBioEL-Reranker-RL"

tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_path)

cands_json = json.dumps([
    {"concept_id": "22298006", "concept_name": "myocardial infarction", "semantic_tag": "disorder"},
    {"concept_id": "57054005", "concept_name": "acute myocardial infarction", "semantic_tag": "disorder"}
], indent=2)

messages = [
    {
        "role": "system",
        "content": (
            "You are a clinical concept normalization model that reranks a top-10 candidate list using context and semantic tags.\n\n"
            "Inputs you will receive:\n"
            "- rewritten_term\n"
            "- context_marked with <mention>...</mention>\n"
            "- candidates: top-10 items (concept_id, concept_name, semantic_tag)\n\n"
            "Think before answer\n\n"
            "Output ONLY:\n"
            "<think>...</think>\n"
            "<answer>...</answer>\n\n"
            "In <think>, write a detailed reasoning with these parts:\n"
            "1) Context interpretation: what the mention means in this note (section cues, negation, experiencer, temporality).\n"
            "2) Type inference: what semantic type/tag is expected (and why other tags are wrong).\n"
            "3) Candidate comparison: evaluate multiple candidates. Note over-specific vs too-general, added qualifiers, and tag alignment.\n"
            "4) Decision: justify the final choice.\n\n"
            "In <answer>, use exactly one of:\n"
            "- <answer><concept_id></answer>\n"
        ),
    },
    {
        "role": "user",
        "content": (
            "Task: Choose the best concept_id from candidates.\n\n"
            "rewritten_term:\nacute myocardial infarction\n\n"
            "context_marked:\n"
            "The patient was admitted for <mention>heart attack</mention> yesterday.\n\n"
            f"candidates (top10; no scores):\n{cands_json}"
        ),
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Notes

This is a LoRA adapter, not a standalone full model.
The adapter is designed for the rewriting stage, not retrieval by itself.
In downstream pipelines, the rewritten term is typically passed to a retriever or reranker.