YabinZhang commited on
Commit
beea791
·
verified ·
1 Parent(s): 02af0fc
Files changed (1) hide show
  1. README.md +176 -2
README.md CHANGED
@@ -1,3 +1,177 @@
1
- CheXOne official model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- To update
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-3B-Instruct
7
+ pipeline_tag: image-text-to-text
8
+ tags:
9
+ - Chest-Xray
10
+ - CXR
11
+ - Reasoning
12
+ - VQA
13
+ - Report
14
+ - Grounding
15
+ ---
16
 
17
+ <div align="center">
18
+ <img src="https://github.com/YBZh/CheXOne/raw/main/asset/chexone_logo1.png" width="600" alt="CheXOne Logo">
19
+
20
+
21
+ <p align="center">
22
+ 📝 <a href="https://huggingface.co/StanfordAIMI/CheXOne" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/StanfordAIMI/CheXOne" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/YBZh/CheXOne" target="_blank">Github</a> • 🪄 <a href="https://github.com/YBZh/CheXOne" target="_blank">Project</a>
23
+ </p>
24
+ </div>
25
+
26
+ <!-- <div align="center">
27
+ </div> -->
28
+
29
+ ## ✨ Key Features:
30
+ * **Reasoning Capability**: Produces explicit reasoning traces alongside final answers.
31
+
32
+ * **Multi-Task Support**: Supports Visual Question Answering (VQA), Report Generation, and Visual Grounding.
33
+
34
+ * **Resident-Level Report Drafting**: Matches or outperforms resident-drafted reports in 50% of cases.
35
+
36
+ - **Two Inference Modes**
37
+ - **Reasoning Mode**: Higher performance with explicit reasoning traces.
38
+ - **Instruct Mode**: Faster inference without reasoning traces.
39
+
40
+
41
+ ## 🎬 Get Started
42
+ CheXOne is post-trained on Qwen2.5VL-3B-Instruct model, which has been in the latest Hugging face transformers and we advise you to build from source with command:
43
+ ```
44
+ pip install git+https://github.com/huggingface/transformers accelerate
45
+ ```
46
+
47
+ ```python
48
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
49
+ from qwen_vl_utils import process_vision_info
50
+
51
+ # default: Load the model on the available device(s)
52
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
53
+ "StanfordAIMI/CheXOne", torch_dtype="auto", device_map="auto"
54
+ )
55
+
56
+ # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image scenarios.
57
+ # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
58
+ # "StanfordAIMI/CheXOne",
59
+ # torch_dtype=torch.bfloat16,
60
+ # attn_implementation="flash_attention_2",
61
+ # device_map="auto",
62
+ # )
63
+
64
+ # default processer
65
+ processor = AutoProcessor.from_pretrained("StanfordAIMI/CheXOne")
66
+
67
+ # The default range for the number of visual tokens per image in the model is 4-16384.
68
+ # We recommand to set max_pixels=512*512 to align with the training setting.
69
+ # min_pixels = 256*28*28
70
+ # max_pixels = 512*512
71
+ # processor = AutoProcessor.from_pretrained("StanfordAIMI/CheXOne", min_pixels=min_pixels, max_pixels=max_pixels)
72
+
73
+ # Inference Mode: Reasoning
74
+ messages = [
75
+ {
76
+ "role": "user",
77
+ "content": [
78
+ {
79
+ "type": "image",
80
+ "image": "https://github.com/YBZh/CheXOne/blob/main/asset/cxr.jpg",
81
+ },
82
+ {"type": "text", "text": "Write an example findings section for the CXR. Please reason step by step, and put your final answer within \\boxed{{}}."},
83
+ ],
84
+ }
85
+ ]
86
+
87
+ # Inference Mode: Instruct
88
+ # messages = [
89
+ # {
90
+ # "role": "user",
91
+ # "content": [
92
+ # {
93
+ # "type": "image",
94
+ # "image": "https://github.com/YBZh/CheXOne/blob/main/asset/cxr.jpg",
95
+ # },
96
+ # {"type": "text", "text": "Write an example findings section for the CXR."},
97
+ # ],
98
+ # }
99
+ # ]
100
+
101
+ # Preparation for inference
102
+ text = processor.apply_chat_template(
103
+ messages, tokenize=False, add_generation_prompt=True
104
+ )
105
+ image_inputs, video_inputs = process_vision_info(messages)
106
+ inputs = processor(
107
+ text=[text],
108
+ images=image_inputs,
109
+ videos=video_inputs,
110
+ padding=True,
111
+ return_tensors="pt",
112
+ )
113
+ inputs = inputs.to("cuda")
114
+
115
+ # Inference: Generation of the output
116
+ generated_ids = model.generate(**inputs, max_new_tokens=1024)
117
+ generated_ids_trimmed = [
118
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
119
+ ]
120
+ output_text = processor.batch_decode(
121
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
122
+ )
123
+ print(output_text)
124
+ ```
125
+ <details>
126
+ <summary>Multi image inference</summary>
127
+
128
+ ```python
129
+ # Messages containing multiple images and a text query
130
+ messages = [
131
+ {
132
+ "role": "user",
133
+ "content": [
134
+ {"type": "image", "image": "https://github.com/YBZh/CheXOne/blob/main/asset/cxr.jpg"},
135
+ {"type": "image", "image": "https://github.com/YBZh/CheXOne/blob/main/asset/cxr_lateral.jpg"},
136
+ {"type": "text", "text": "Write an example findings section for the CXR. Please reason step by step, and put your final answer within \\boxed{{}}."},
137
+ ],
138
+ }
139
+ ]
140
+
141
+ # Preparation for inference
142
+ text = processor.apply_chat_template(
143
+ messages, tokenize=False, add_generation_prompt=True
144
+ )
145
+ image_inputs, video_inputs = process_vision_info(messages)
146
+ inputs = processor(
147
+ text=[text],
148
+ images=image_inputs,
149
+ videos=video_inputs,
150
+ padding=True,
151
+ return_tensors="pt",
152
+ )
153
+ inputs = inputs.to("cuda")
154
+
155
+ # Inference
156
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
157
+ generated_ids_trimmed = [
158
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
159
+ ]
160
+ output_text = processor.batch_decode(
161
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
162
+ )
163
+ print(output_text)
164
+ ```
165
+ </details>
166
+
167
+ ## ✏️ Citation
168
+
169
+ ```
170
+ @article{xx,
171
+ title={xx},
172
+ author={Cxxx},
173
+ journal={xx},
174
+ url={xx},
175
+ year={xx}
176
+ }
177
+ ```