JingyiLiu commited on
Commit
aefc92d
·
verified ·
1 Parent(s): 667d38c

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md CHANGED
@@ -1,3 +1,107 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # ColonR1
6
+
7
+ <p align="center">
8
+ <img src="./assets/ColonR1.jpg"/> <br />
9
+ <em>
10
+ Figure 1: Details of our colonoscopy-specific reasoning model, ColonR1.
11
+ </em>
12
+ </p>
13
+
14
+
15
+ 📖 [Paper](https://arxiv.org/abs/2512.03667) | 🏠 [Home](https://github.com/ai4colonoscopy/Colon-X)
16
+
17
+
18
+ # Quick start
19
+
20
+ Below is a code snippet to help you quickly try out our ColonR1 model using Hugging Face Transformers. For convenience, we manually combined some configuration and code files. Please note that this is a quick code, we recommend you using a source code to explore more.
21
+
22
+ - Before running the snippet, you need to install the following minimum dependencies.
23
+
24
+ ```shell
25
+ conda create -n quickstart python=3.10
26
+ conda activate quickstart
27
+ pip install torch transformers accelerate pillow
28
+ ```
29
+
30
+ - Then you can use `python ColonR1/quickstart.py` to run it, as shown in the following code.
31
+
32
+ ```python
33
+ import torch
34
+ from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
35
+ from PIL import Image
36
+ import warnings
37
+ import os
38
+
39
+ warnings.filterwarnings('ignore')
40
+ device = "cuda" if torch.cuda.is_available() else "cpu"
41
+
42
+ MODEL_PATH = "ai4colonoscopy/ColonR1"
43
+ IMAGE_PATH = "assets/example.jpg"
44
+ Question = "Does the image contain a polyp? Answer me with Yes or No."
45
+
46
+ print(f"[Info] Loading model from {MODEL_PATH}...")
47
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
48
+ MODEL_PATH,
49
+ torch_dtype=torch.bfloat16,
50
+ attn_implementation="flash_attention_2",
51
+ device_map="auto"
52
+ )
53
+ model.eval()
54
+
55
+ processor = AutoProcessor.from_pretrained(MODEL_PATH)
56
+
57
+ if not os.path.exists(IMAGE_PATH):
58
+ raise FileNotFoundError(f"Image not found at {IMAGE_PATH}. Please provide a valid image path.")
59
+
60
+ image = Image.open(IMAGE_PATH).convert("RGB")
61
+
62
+ TASK_SUFFIX = (
63
+ "Your task: 1. First, Think through the question step by step, enclose your reasoning process "
64
+ "in <think>...</think> tags. 2. Then provide the correct answer inside <answer>...</answer> tags. "
65
+ "3. No extra information or text outside of these tags."
66
+ )
67
+
68
+ final_question = f"{Question}\n{TASK_SUFFIX}"
69
+
70
+ messages = [
71
+ {
72
+ "role": "user",
73
+ "content": [
74
+ {"type": "image", "image": IMAGE_PATH},
75
+ {"type": "text", "text": final_question},
76
+ ],
77
+ }
78
+ ]
79
+
80
+ print("[Info] Processing inputs...")
81
+ text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
82
+
83
+ inputs = processor(
84
+ text=[text_prompt],
85
+ images=[image],
86
+ padding=True,
87
+ return_tensors="pt",
88
+ ).to(device)
89
+
90
+
91
+ print("[Info] Generating response...")
92
+ with torch.no_grad():
93
+ generated_ids = model.generate(
94
+ **inputs,
95
+ max_new_tokens=1024,
96
+ do_sample=False
97
+ )
98
+
99
+ generated_ids_trimmed = generated_ids[:, inputs.input_ids.shape[1]:]
100
+ output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True)[0]
101
+
102
+ print(output_text)
103
+ ```
104
+
105
+ # License
106
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
107
+ The content of this project itself is licensed under the Apache license 2.0.