xenosxiny commited on
Commit
cbb41f5
·
verified ·
1 Parent(s): 565e667

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -1
README.md CHANGED
@@ -8,4 +8,80 @@ pipeline_tag: image-to-text
8
  tags:
9
  - medical
10
  - retinal
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  tags:
9
  - medical
10
  - retinal
11
+ ---
12
+
13
+
14
+ # RetinalGPT: Large Language-and-Vision Assistant for Retinal Health 👁️
15
+
16
+
17
+ **RetinalGPT** is a specialized multimodal vision-language model (VLM) based on the **LLaVA-v1.5** architecture. It is specifically engineered for the high-precision domain of **ophthalmology**, with a focus on interpreting retinal fundus photography and Optical Coherence Tomography (OCT) scans.
18
+
19
+ ---
20
+
21
+ ## 📌 Model Summary
22
+
23
+ RetinalGPT bridges the gap between general-purpose VLMs and specialized ophthalmic diagnostics. By fine-tuning on a curated corpus of retinal image-text pairs, the model demonstrates advanced capabilities in identifying pathologies such as Diabetic Retinopathy (DR), Glaucoma, and Age-related Macular Degeneration (AMD).
24
+
25
+ - **Base LLM:** Llama-7b
26
+ - **Vision Tower:** CLIP-ViT-L-14-336px
27
+ - **Connector:** MLP Projection Layer
28
+ - **Domain:** Ophthalmology / Retinal Imaging
29
+
30
+ ---
31
+
32
+ ## 🚀 Key Capabilities
33
+
34
+ RetinalGPT is trained to perform complex visual reasoning tasks including:
35
+ * **Automated Screening:** Grading Diabetic Retinopathy severity (Stage 0-4).
36
+ * **Lesion Characterization:** Identifying and describing microaneurysms, hemorrhages, and exudates.
37
+ * **Anatomical Mapping:** Precise description of the optic disc, cup-to-disc ratio, and foveal reflex.
38
+ * **Clinical QA:** Engaging in multi-turn dialogues about specific clinical findings in a retinal scan.
39
+
40
+ ---
41
+
42
+ ## 💻 How to Use
43
+
44
+ RetinalGPT follows the standard LLaVA inference pipeline. You will need the `llava` library installed.
45
+
46
+ ### Installation
47
+ ```bash
48
+ pip install git+[https://github.com/haotian-liu/LLaVA.git](https://github.com/haotian-liu/LLaVA.git)
49
+ Python Inference
50
+ ```
51
+
52
+ ```Python
53
+ from llava.model.builder import load_pretrained_model
54
+ from llava.mm_utils import process_images, tokenizer_image_token, get_model_name_from_path
55
+ from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
56
+ from PIL import Image
57
+ import torch
58
+
59
+ model_path = "your-username/retinalgpt"
60
+ model_name = get_model_name_from_path(model_path)
61
+
62
+ tokenizer, model, image_processor, context_len = load_pretrained_model(
63
+ model_path=model_path,
64
+ model_base=None,
65
+ model_name=model_name
66
+ )
67
+
68
+ # Prepare Image
69
+ image = Image.open("fundus_sample.jpg")
70
+ image_tensor = image_processor.preprocess(image, return_tensors='pt')['images'].half().cuda()
71
+
72
+ prompt = "Can you describe this image?"
73
+
74
+ input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors='pt').unsqueeze(0).cuda()
75
+
76
+ # Generate Response
77
+ with torch.inference_mode():
78
+ output_ids = model.generate(
79
+ input_ids,
80
+ images=image_tensor,
81
+ do_sample=True,
82
+ temperature=0.2,
83
+ max_new_tokens=512,
84
+ use_cache=True
85
+ )
86
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
87
+ ```