Sai1290 commited on
Commit
2c1f736
·
verified ·
1 Parent(s): fd476a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -11
README.md CHANGED
@@ -1,21 +1,97 @@
1
  ---
2
- base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
3
  tags:
4
- - text-generation-inference
 
 
 
5
  - transformers
6
- - unsloth
7
- - mllama
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** Sai1290
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
18
 
19
- This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
1
  ---
 
2
  tags:
3
+ - vision-language
4
+ - multimodal
5
+ - image-question-answering
6
+ - biomedical
7
  - transformers
8
+ - huggingface
9
+ - fastvision
10
+ license: openrail
11
  language:
12
  - en
13
+ datasets:
14
+ - axiong/pmc_oa_demo
15
+ library_name: transformers
16
+ model-index:
17
+ - name: Medical Image QA Model (PMC-OA)
18
+ results: []
19
  ---
20
 
21
+ # 🩺 Medical Image QA Model — Vision-Language Expert
22
 
23
+ This is a multimodal model fine-tuned for **image-based biomedical question answering and captioning**, based on scientific figures from [PMC Open Access subset](https://huggingface.co/datasets/axiong/pmc_oa_demo). The model takes a biomedical image and an optional question, then generates an expert-level description or answer.
 
 
24
 
25
+ ---
26
+
27
+ ## 🧠 Model Architecture
28
+
29
+ - **Base Model:** `FastVisionModel` (e.g., a BLIP, MiniGPT4, or Flamingo-style model)
30
+ - **Backbone:** Vision encoder + LLM (supports `apply_chat_template` for prompt formatting)
31
+ - **Trained for Tasks:**
32
+ - Biomedical image captioning
33
+ - Image-based question answering
34
+
35
+ ---
36
+
37
+ ## 🧬 Dataset
38
+
39
+ - **Name:** [axiong/pmc_oa_demo](https://huggingface.co/datasets/axiong/pmc_oa_demo)
40
+ - **Samples:** 100 samples (demo)
41
+ - **Fields:**
42
+ - `image`: Biomedical figure (from scientific paper)
43
+ - `caption`: Expert-written caption
44
+ - `question`: (optional) User query about the image
45
+ - `answer`: (optional) Expert response
46
+
47
+ ---
48
+
49
+ ## 🧪 Example Usage
50
+
51
+ ### 🔍 Visual Inference with Instruction & Optional Question
52
+
53
+ ```python
54
+ from transformers import TextStreamer
55
+ import matplotlib.pyplot as plt
56
+
57
+ # Prepare model and tokenizer
58
+ FastVisionModel.for_inference(model)
59
+
60
+ sample = dataset[10]
61
+ image = sample["image"]
62
+ caption = sample.get("caption", "")
63
+
64
+ # Display the image
65
+ plt.imshow(image)
66
+ plt.axis('off')
67
+ plt.title("Input Image")
68
+ plt.show()
69
+
70
+ instruction = "You are an expert Doctor. Describe accurately what you see in this image."
71
+ question = input("Please enter your question about the image (or press Enter to skip): ").strip()
72
+
73
+ # Build messages for the chat template
74
+ user_content = [
75
+ {"type": "image", "image": image},
76
+ {"type": "text", "text": instruction}
77
+ ]
78
+ if question:
79
+ user_content.append({"type": "text", "text": question})
80
+
81
+ messages = [{"role": "user", "content": user_content}]
82
+ input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
83
+
84
+ inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
85
+ streamer = TextStreamer(tokenizer, skip_prompt=True)
86
+
87
+ _ = model.generate(
88
+ **inputs,
89
+ streamer=streamer,
90
+ max_new_tokens=128,
91
+ use_cache=True,
92
+ temperature=1.5,
93
+ min_p=0.1,
94
+ )
95
 
96
+ # Optional: display true caption for comparison
97
+ print("\nGround Truth Caption:\n", caption)