--- library_name: transformers tags: - vision-language - medical - radiology - chest-xray - qwen2.5-vl pipeline_tag: image-text-to-text base_model: Qwen/Qwen2.5-VL-7B-Instruct --- # EvidenceAIResearch/VReason-QwenVL VReason-QwenVL model checkpoint for chest X-ray visual reasoning and report generation. ## What is included - Model weights (`safetensors` shards) - Tokenizer and config files - `generation_config.json` - Built-in `model.visual_reason(...)` method available via `trust_remote_code=True` ## Installation ```bash pip install -r requirements.txt pip install cxas-vreason ``` If `cxas-vreason` is not yet available in your environment, install from this repo: ```bash pip install "git+https://huggingface.co/EvidenceAIResearch/VReason-QwenVL#subdirectory=cxas_vreason" ``` ## Quick start (Transformers) ```python import torch from PIL import Image from transformers import AutoProcessor, AutoModelForVision2Seq repo_id = "EvidenceAIResearch/VReason-QwenVL" processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForVision2Seq.from_pretrained( repo_id, torch_dtype=torch.float16, trust_remote_code=True, ).eval().cuda() image = Image.open("frontal.jpg").convert("RGB") messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, { "type": "text", "text": "Based on the provided chest radiograph, explain your diagnosis procedure and write a report.", }, ], } ] prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[prompt], images=[[image]], return_tensors="pt").to(model.device) output_ids = model.generate(**inputs, max_new_tokens=1024) text = processor.batch_decode(output_ids, skip_special_tokens=False)[0] print(text) ``` ## Visual reasoning method After loading with `trust_remote_code=True`, the model exposes: - `model.visual_reason(...)` This method can: - `reasoning.json` with regions, sub-regions, and extracted reasoning text - Generate ROI image artifacts for anatomy/pathology tool calls (blur/crop/blurcrop) Example: ```python out = model.visual_reason( processor=processor, image="frontal.jpg", generate_roi=True, output_dir="./visual_reason_out", viz_mode="blurcrop", ) print(out["report"]) ``` Notes: - `trust_remote_code=True` is required to enable `model.visual_reason(...)`. - Pass `generate_roi=False` when you only need structured text parsing. ## Limitations - Intended for research use only. - Not a medical device; outputs must not be used as sole clinical evidence. - Performance can vary by data source and imaging protocol. --- ## Citation ```bibtex @unpublished{ye2026visual, title={Visual Reasoning Enables Evidence-Grounded Radiology {AI}}, author={Ye, Shuchang and Robertson, Harry and Moghadam, Alireza and Shu, Matthew and Harb, Nathan and Li, Jennifer and Mogdil, Aadhar and Raythatha, Jineel and Shen, Yujia and Song, Xinyun and Tan, Xinchen and Fu, Xiaolong and Meng, Mingyuan and Bi, Lei and Yang, Jean YH and Kim, Jinman}, year={2026}, } ```