Instructions to use anonymous-medical/MedLayEval with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use anonymous-medical/MedLayEval with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "anonymous-medical/MedLayEval") - Notebooks
- Google Colab
- Kaggle
MedLayEval
MedLayEval is a distilled multimodal evaluator for medical lay-language
generation. Given a triple
(medical image, expert caption, candidate lay caption), it returns five
attribute scores in [0, 1] plus their mean overall score, and serves as
the headline metric for the
MedLayXPlain
benchmark.
The model is a Qwen2.5-VL-3B-Instruct backbone with LoRA adapters and a small attention-mask-pooled regression head trained by distillation from a stronger judge.
The five attributes
| Attribute | What it scores |
|---|---|
modality |
Correctly identifies the imaging modality (CT, MRI, histology, โฆ) |
anatomy |
Correctly identifies the depicted anatomy / region |
finding |
Correctly conveys the radiological / pathological finding |
factual |
Factually consistent with the expert caption and image |
readability |
Written in patient-facing lay language (no jargon) |
The overall score reported on the MedLayXPlain leaderboard is the mean of the five attributes.
Files
adapter_config.json
adapter_model.safetensors # PEFT LoRA, r=16, alpha=32, dropout=0.05
# targets q/k/v/o + gate/up/down projections
regression_head.pt # 2-layer MLP head: 2048 -> 256 -> 5 (+ Sigmoid)
config.json # base model config (Qwen2.5-VL-3B-Instruct)
generation_config.json
preprocessor_config.json
video_preprocessor_config.json
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt
added_tokens.json, special_tokens_map.json, chat_template.jinja
model.py # VLMRegressor module (importable)
inference_example.py # minimal usage example
The base model weights are not redistributed โ adapter_config.json
points at Qwen/Qwen2.5-VL-3B-Instruct, which is fetched from the Hub at
load time. Users must accept the
Qwen license
for the base weights separately; the LoRA + head weights in this repo
are released under Apache 2.0.
Quick start
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
from model import VLMRegressor, ATTRS # shipped in this repo
BASE = "Qwen/Qwen2.5-VL-3B-Instruct"
CKPT = "." # this repo, after `huggingface_hub.snapshot_download`
device = "cuda:0"
processor = AutoProcessor.from_pretrained(BASE, max_pixels=448 * 448)
if processor.tokenizer.pad_token is None:
processor.tokenizer.pad_token = processor.tokenizer.eos_token
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
BASE, torch_dtype=torch.bfloat16, attn_implementation="sdpa",
)
vlm = PeftModel.from_pretrained(base, CKPT).merge_and_unload()
hidden = vlm.config.hidden_size if hasattr(vlm.config, "hidden_size") else vlm.config.text_config.hidden_size
model = VLMRegressor(vlm, hidden).to(device, dtype=torch.bfloat16)
model.head.load_state_dict(torch.load(f"{CKPT}/regression_head.pt", map_location=device))
model.head = model.head.to(device, dtype=torch.float32)
model.eval()
image = Image.open("example.png").convert("RGB")
expert = "Axial chest CT showing a 1.2 cm spiculated nodule in the right upper lobe ..."
lay = "The scan shows a small spot in the upper part of the right lung that ..."
user_text = f"<expert>{expert[:1500]}</expert>\n<lay>{lay[:1500]}</lay>"
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": user_text},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
inputs = processor(text=[text], images=[image], padding=True, truncation=True,
max_length=2048, return_tensors="pt").to(device)
with torch.no_grad():
scores = model(**inputs).cpu().float().numpy()[0]
print({a: float(s) for a, s in zip(ATTRS, scores)})
print("overall:", float(scores.mean()))
inference_example.py runs the same flow end-to-end on dummy inputs.
Training (high level)
- Base:
Qwen/Qwen2.5-VL-3B-Instruct. - Adapters: LoRA r=16, ฮฑ=32, dropout=0.05, on
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj. - Head:
Linear(2048, 256) โ GELU โ Dropout(0.1) โ Linear(256, 5) โ Sigmoidon attention-mask-pooled last-hidden-state. - Loss: MSE against the five attribute scores produced by a larger judge on the MedLayXPlain training partition.
- Decoding at inference: no generation; only one forward pass to read the pooled hidden state.
Full data construction, training, and validation details are in the appendix of the submission and in the public MedLayXPlain repository.
Intended use & limitations
- Intended: as an automatic, ground-truth-free metric for ranking lay-language outputs of medical VLMs that already have a matched expert caption available.
- Not intended: as a clinical assessment tool. The five attributes measure agreement with expert text, not real-world clinical correctness or patient outcomes.
- The model is trained on captions paired with images from MedTrinity-25M; out-of-distribution modalities or text styles may degrade scores.
- Calibration: scores are not probabilities. They are bounded to
[0, 1]by the final sigmoid but the ranking, not the absolute value, is what has been validated.
Citation
@inproceedings{anonymous2026medlayxplain,
title = {MedLayXPlain: A Benchmark and Distilled Evaluator for Medical Lay-Language Generation},
author = {Anonymous},
booktitle = {NeurIPS Datasets and Benchmarks},
year = {2026}
}
License
- LoRA adapter weights (
adapter_*), regression head (regression_head.pt),model.py, andinference_example.py: Apache 2.0. - Tokenizer / processor files are copied from the base model and remain under the Qwen license.
- Use of this checkpoint requires the base model
Qwen/Qwen2.5-VL-3B-Instruct, which has its own license. Users are responsible for accepting it separately.
- Downloads last month
- 9
Model tree for anonymous-medical/MedLayEval
Base model
Qwen/Qwen2.5-VL-3B-Instruct