π©Ί Medical Image QA Model β Vision-Language Expert
This is a multimodal model fine-tuned for image-based biomedical question answering and captioning, based on scientific figures from PMC Open Access subset. The model takes a biomedical image and an optional question, then generates an expert-level description or answer.
π§ Model Architecture
- Base Model:
FastVisionModel(e.g., a BLIP, MiniGPT4, or Flamingo-style model) - Backbone: Vision encoder + LLM (supports
apply_chat_templatefor prompt formatting) - Trained for Tasks:
- Biomedical image captioning
- Image-based question answering
𧬠Dataset
- Name: axiong/pmc_oa_demo
- Samples: 100 samples (demo)
- Fields:
image: Biomedical figure (from scientific paper)caption: Expert-written captionquestion: (optional) User query about the imageanswer: (optional) Expert response
π§ͺ Example Usage
π Visual Inference with Instruction & Optional Question
from transformers import TextStreamer
import matplotlib.pyplot as plt
# Prepare model and tokenizer
FastVisionModel.for_inference(model)
sample = dataset[10]
image = sample["image"]
caption = sample.get("caption", "")
# Display the image
plt.imshow(image)
plt.axis('off')
plt.title("Input Image")
plt.show()
instruction = "You are an expert Doctor. Describe accurately what you see in this image."
question = input("Please enter your question about the image (or press Enter to skip): ").strip()
# Build messages for the chat template
user_content = [
{"type": "image", "image": image},
{"type": "text", "text": instruction}
]
if question:
user_content.append({"type": "text", "text": question})
messages = [{"role": "user", "content": user_content}]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=128,
use_cache=True,
temperature=1.5,
min_p=0.1,
)
# Optional: display true caption for comparison
print("\nGround Truth Caption:\n", caption)
- Downloads last month
- 1