Sai1290
/

X-Rays-LLM

vision-language

image-question-answering

text-generation-inference

Model card Files Files and versions

X-Rays-LLM / README.md

Sai1290's picture

Update README.md

2c1f736 verified 4 months ago

|

history blame contribute delete

2.66 kB

	---
	tags:
	- vision-language
	- multimodal
	- image-question-answering
	- biomedical
	- transformers
	- huggingface
	- fastvision
	license: openrail
	language:
	- en
	datasets:
	- axiong/pmc_oa_demo
	library_name: transformers
	model-index:
	- name: Medical Image QA Model (PMC-OA)
	results: []
	---

	# 🩺 Medical Image QA Model — Vision-Language Expert

	This is a multimodal model fine-tuned for image-based biomedical question answering and captioning, based on scientific figures from [PMC Open Access subset](https://huggingface.co/datasets/axiong/pmc_oa_demo). The model takes a biomedical image and an optional question, then generates an expert-level description or answer.

	---

	## 🧠 Model Architecture

	- Base Model: `FastVisionModel` (e.g., a BLIP, MiniGPT4, or Flamingo-style model)
	- Backbone: Vision encoder + LLM (supports `apply_chat_template` for prompt formatting)
	- Trained for Tasks:
	- Biomedical image captioning
	- Image-based question answering

	---

	## 🧬 Dataset

	- Name: [axiong/pmc_oa_demo](https://huggingface.co/datasets/axiong/pmc_oa_demo)
	- Samples: 100 samples (demo)
	- Fields:
	- `image`: Biomedical figure (from scientific paper)
	- `caption`: Expert-written caption
	- `question`: (optional) User query about the image
	- `answer`: (optional) Expert response

	---

	## 🧪 Example Usage

	### 🔍 Visual Inference with Instruction & Optional Question

	```python
	from transformers import TextStreamer
	import matplotlib.pyplot as plt

	# Prepare model and tokenizer
	FastVisionModel.for_inference(model)

	sample = dataset[10]
	image = sample["image"]
	caption = sample.get("caption", "")

	# Display the image
	plt.imshow(image)
	plt.axis('off')
	plt.title("Input Image")
	plt.show()

	instruction = "You are an expert Doctor. Describe accurately what you see in this image."
	question = input("Please enter your question about the image (or press Enter to skip): ").strip()

	# Build messages for the chat template
	user_content = [
	{"type": "image", "image": image},
	{"type": "text", "text": instruction}
	]
	if question:
	user_content.append({"type": "text", "text": question})

	messages = [{"role": "user", "content": user_content}]
	input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

	inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
	streamer = TextStreamer(tokenizer, skip_prompt=True)

	_ = model.generate(
	**inputs,
	streamer=streamer,
	max_new_tokens=128,
	use_cache=True,
	temperature=1.5,
	min_p=0.1,
	)

	# Optional: display true caption for comparison
	print("\nGround Truth Caption:\n", caption)