DanielJi commited on
Commit
4387770
ยท
verified ยท
1 Parent(s): 176f945

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -126
README.md CHANGED
@@ -1,126 +1,126 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- # ColonR1
6
-
7
- ๐Ÿ“– [Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning](https://arxiv.org/abs/2512.03667)
8
-
9
- ๐Ÿ  [https://github.com/ai4colonoscopy/Colon-X](https://github.com/ai4colonoscopy/Colon-X)
10
-
11
-
12
- <p align="center">
13
- <img src="./assets/ColonR1.jpg"/> <br />
14
- <em>
15
- Figure 1: Details of our colonoscopy-specific reasoning model, ColonR1.
16
- </em>
17
- </p>
18
-
19
-
20
- # Quick start
21
-
22
- Below is a code snippet to help you quickly try out our ColonR1 model using Hugging Face Transformers. For convenience, we manually combined some configuration and code files. Please note that this is a quick code, we recommend you using a source code to explore more.
23
-
24
- - Before running the snippet, you need to install the following minimum dependencies.
25
-
26
- ```shell
27
- conda create -n quickstart python=3.10
28
- conda activate quickstart
29
- pip install torch transformers accelerate pillow
30
- ```
31
-
32
- - Then you can use `python ColonR1/quickstart.py` to run it, as shown in the following code.
33
-
34
- ```python
35
- import torch
36
- from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
37
- from PIL import Image
38
- import warnings
39
- import os
40
-
41
- warnings.filterwarnings('ignore')
42
- device = "cuda" if torch.cuda.is_available() else "cpu"
43
-
44
- MODEL_PATH = "ai4colonoscopy/ColonR1"
45
- IMAGE_PATH = "assets/example.jpg"
46
- Question = "Does the image contain a polyp? Answer me with Yes or No."
47
-
48
- print(f"[Info] Loading model from {MODEL_PATH}...")
49
- model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
50
- MODEL_PATH,
51
- torch_dtype=torch.bfloat16,
52
- attn_implementation="flash_attention_2",
53
- device_map="auto"
54
- )
55
- model.eval()
56
-
57
- processor = AutoProcessor.from_pretrained(MODEL_PATH)
58
-
59
- if not os.path.exists(IMAGE_PATH):
60
- raise FileNotFoundError(f"Image not found at {IMAGE_PATH}. Please provide a valid image path.")
61
-
62
- image = Image.open(IMAGE_PATH).convert("RGB")
63
-
64
- TASK_SUFFIX = (
65
- "Your task: 1. First, Think through the question step by step, enclose your reasoning process "
66
- "in <think>...</think> tags. 2. Then provide the correct answer inside <answer>...</answer> tags. "
67
- "3. No extra information or text outside of these tags."
68
- )
69
-
70
- final_question = f"{Question}\n{TASK_SUFFIX}"
71
-
72
- messages = [
73
- {
74
- "role": "user",
75
- "content": [
76
- {"type": "image", "image": IMAGE_PATH},
77
- {"type": "text", "text": final_question},
78
- ],
79
- }
80
- ]
81
-
82
- print("[Info] Processing inputs...")
83
- text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
84
-
85
- inputs = processor(
86
- text=[text_prompt],
87
- images=[image],
88
- padding=True,
89
- return_tensors="pt",
90
- ).to(device)
91
-
92
-
93
- print("[Info] Generating response...")
94
- with torch.no_grad():
95
- generated_ids = model.generate(
96
- **inputs,
97
- max_new_tokens=1024,
98
- do_sample=False
99
- )
100
-
101
- generated_ids_trimmed = generated_ids[:, inputs.input_ids.shape[1]:]
102
- output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True)[0]
103
-
104
- print(output_text)
105
- ```
106
-
107
- # Reference
108
-
109
- Feel free to cite if you find the Colon-X Project useful for your work:
110
-
111
- ```
112
- @article{ji2025colonx,
113
- title={Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning},
114
- author={Ji, Ge-Peng and Liu, Jingyi and Fan, Deng-Ping and Barnes, Nick},
115
- journal={arXiv preprint arXiv:2512.03667},
116
- year={2025}
117
- }
118
- ```
119
-
120
- # License
121
-
122
- This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.
123
-
124
-
125
-
126
-
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # The first R1-styled model (ColonR1) tailored for reasoning in colonoscopy tasks
6
+
7
+ ๐Ÿ“– [Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning](https://arxiv.org/abs/2512.03667)
8
+
9
+ ๐Ÿ  More details refer to our project page: [https://github.com/ai4colonoscopy/Colon-X](https://github.com/ai4colonoscopy/Colon-X)
10
+
11
+
12
+ <p align="center">
13
+ <img src="./assets/ColonR1.jpg"/> <br />
14
+ <em>
15
+ Figure 1: Details of our colonoscopy-specific reasoning model, ColonR1.
16
+ </em>
17
+ </p>
18
+
19
+
20
+ # Quick start
21
+
22
+ Below is a code snippet to help you quickly try out our ColonR1 model using Hugging Face Transformers. For convenience, we manually combined some configuration and code files. Please note that this is a quick code, we recommend you using a source code to explore more.
23
+
24
+ - Before running the snippet, you need to install the following minimum dependencies.
25
+
26
+ ```shell
27
+ conda create -n quickstart python=3.10
28
+ conda activate quickstart
29
+ pip install torch transformers accelerate pillow
30
+ ```
31
+
32
+ - Then you can use `python ColonR1/quickstart.py` to run it, as shown in the following code.
33
+
34
+ ```python
35
+ import torch
36
+ from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
37
+ from PIL import Image
38
+ import warnings
39
+ import os
40
+
41
+ warnings.filterwarnings('ignore')
42
+ device = "cuda" if torch.cuda.is_available() else "cpu"
43
+
44
+ MODEL_PATH = "ai4colonoscopy/ColonR1"
45
+ IMAGE_PATH = "assets/example.jpg"
46
+ Question = "Does the image contain a polyp? Answer me with Yes or No."
47
+
48
+ print(f"[Info] Loading model from {MODEL_PATH}...")
49
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
50
+ MODEL_PATH,
51
+ torch_dtype=torch.bfloat16,
52
+ attn_implementation="flash_attention_2",
53
+ device_map="auto"
54
+ )
55
+ model.eval()
56
+
57
+ processor = AutoProcessor.from_pretrained(MODEL_PATH)
58
+
59
+ if not os.path.exists(IMAGE_PATH):
60
+ raise FileNotFoundError(f"Image not found at {IMAGE_PATH}. Please provide a valid image path.")
61
+
62
+ image = Image.open(IMAGE_PATH).convert("RGB")
63
+
64
+ TASK_SUFFIX = (
65
+ "Your task: 1. First, Think through the question step by step, enclose your reasoning process "
66
+ "in <think>...</think> tags. 2. Then provide the correct answer inside <answer>...</answer> tags. "
67
+ "3. No extra information or text outside of these tags."
68
+ )
69
+
70
+ final_question = f"{Question}\n{TASK_SUFFIX}"
71
+
72
+ messages = [
73
+ {
74
+ "role": "user",
75
+ "content": [
76
+ {"type": "image", "image": IMAGE_PATH},
77
+ {"type": "text", "text": final_question},
78
+ ],
79
+ }
80
+ ]
81
+
82
+ print("[Info] Processing inputs...")
83
+ text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
84
+
85
+ inputs = processor(
86
+ text=[text_prompt],
87
+ images=[image],
88
+ padding=True,
89
+ return_tensors="pt",
90
+ ).to(device)
91
+
92
+
93
+ print("[Info] Generating response...")
94
+ with torch.no_grad():
95
+ generated_ids = model.generate(
96
+ **inputs,
97
+ max_new_tokens=1024,
98
+ do_sample=False
99
+ )
100
+
101
+ generated_ids_trimmed = generated_ids[:, inputs.input_ids.shape[1]:]
102
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True)[0]
103
+
104
+ print(output_text)
105
+ ```
106
+
107
+ # Reference
108
+
109
+ Feel free to cite if you find the Colon-X Project useful for your work:
110
+
111
+ ```
112
+ @article{ji2025colonx,
113
+ title={Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning},
114
+ author={Ji, Ge-Peng and Liu, Jingyi and Fan, Deng-Ping and Barnes, Nick},
115
+ journal={arXiv preprint arXiv:2512.03667},
116
+ year={2025}
117
+ }
118
+ ```
119
+
120
+ # License
121
+
122
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.
123
+
124
+
125
+
126
+