Yuhao commited on
Commit
6bfad80
Β·
1 Parent(s): 394cd7d

Unify checkpoint path and environment docs

Browse files
README.md CHANGED
@@ -36,6 +36,7 @@ See [LICENSE](LICENSE) for details.
36
  ```text
37
  SkinGPT-R1/
38
  β”œβ”€β”€ checkpoints/
 
39
  β”œβ”€β”€ inference/
40
  β”‚ β”œβ”€β”€ full_precision/
41
  β”‚ └── int4_quantized/
@@ -43,10 +44,9 @@ SkinGPT-R1/
43
  └── README.md
44
  ```
45
 
46
- Checkpoint paths:
47
 
48
- - Full precision: `./checkpoints/full_precision`
49
- - INT4 quantized: `./checkpoints/int4`
50
 
51
  ## Highlights
52
 
@@ -57,12 +57,65 @@ Checkpoint paths:
57
 
58
  ## Install
59
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ```bash
61
- conda create -n skingpt-r1 python=3.10 -y
 
62
  conda activate skingpt-r1
63
  pip install -r requirements.txt
64
  ```
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Attention Backend Notes
67
 
68
  This repo uses two attention acceleration paths:
@@ -77,9 +130,9 @@ Recommended choice:
77
 
78
  Practical notes:
79
 
80
- - The current repo pins `torch==2.4.0`, and SDPA is already built into PyTorch in this version.
81
  - FlashAttention's official README currently lists Ampere, Ada, and Hopper support for FlashAttention-2. It does not list RTX 50 / Blackwell consumer GPUs in that section, so this repo defaults to `sdpa` for that path.
82
- - PyTorch 2.5 added a newer cuDNN SDPA backend for H100-class or newer GPUs, but this repo is pinned to PyTorch 2.4, so you should not assume those 2.5-specific gains here.
83
 
84
  If you are on an RTX 5090 and `flash-attn` is unavailable or unstable in your environment, use the INT4 path in this repo, which is already configured with `attn_implementation="sdpa"`.
85
 
@@ -93,6 +146,12 @@ Single image:
93
  bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
94
  ```
95
 
 
 
 
 
 
 
96
  Multi-turn chat:
97
 
98
  ```bash
@@ -115,6 +174,12 @@ Single image:
115
  bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
116
  ```
117
 
 
 
 
 
 
 
118
  Multi-turn chat:
119
 
120
  ```bash
 
36
  ```text
37
  SkinGPT-R1/
38
  β”œβ”€β”€ checkpoints/
39
+ β”œβ”€β”€ environment.yml
40
  β”œβ”€β”€ inference/
41
  β”‚ β”œβ”€β”€ full_precision/
42
  β”‚ └── int4_quantized/
 
44
  └── README.md
45
  ```
46
 
47
+ Model weights directory:
48
 
49
+ - `./checkpoints`
 
50
 
51
  ## Highlights
52
 
 
57
 
58
  ## Install
59
 
60
+ `environment.yml` is a Conda environment definition file. It captures the Python version and the package versions we use, so other users can recreate a working environment from scratch with one command.
61
+
62
+ Recommended from scratch:
63
+
64
+ ```bash
65
+ cd SkinGPT-R1
66
+ conda env create -f environment.yml
67
+ conda activate skingpt-r1
68
+ ```
69
+
70
+ Manual setup:
71
+
72
  ```bash
73
+ cd SkinGPT-R1
74
+ conda create -n skingpt-r1 python=3.10.20 -y
75
  conda activate skingpt-r1
76
  pip install -r requirements.txt
77
  ```
78
 
79
+ This repo is currently aligned to the maintainers' working environment:
80
+
81
+ - `torch==2.10.0`
82
+ - `torchvision==0.25.0`
83
+ - `transformers==5.3.0`
84
+ - `qwen-vl-utils==0.0.14`
85
+
86
+ For RTX 50 series, start with the default `sdpa` path and do not install `flash-attn`
87
+ unless you have already verified that your CUDA stack supports it.
88
+
89
+ ## Quick Start
90
+
91
+ 1. Clone the repository and enter it.
92
+
93
+ ```bash
94
+ git clone <your-repo-url>
95
+ cd SkinGPT-R1
96
+ ```
97
+
98
+ 2. Create the environment.
99
+
100
+ ```bash
101
+ conda env create -f environment.yml
102
+ conda activate skingpt-r1
103
+ ```
104
+
105
+ 3. Put model weights under:
106
+
107
+ ```text
108
+ ./checkpoints
109
+ ```
110
+
111
+ 4. Prepare a test image, for example:
112
+
113
+ ```text
114
+ ./test_images/lesion.jpg
115
+ ```
116
+
117
+ 5. Run one of the inference commands below.
118
+
119
  ## Attention Backend Notes
120
 
121
  This repo uses two attention acceleration paths:
 
130
 
131
  Practical notes:
132
 
133
+ - The current repo pins `torch==2.10.0`, and SDPA is already built into PyTorch in this version.
134
  - FlashAttention's official README currently lists Ampere, Ada, and Hopper support for FlashAttention-2. It does not list RTX 50 / Blackwell consumer GPUs in that section, so this repo defaults to `sdpa` for that path.
135
+ - Newer PyTorch releases continue improving SDPA, and this repo is already on a modern PyTorch stack. Even so, RTX 50 series should still default to `sdpa` unless `flash-attn` has been explicitly validated in your environment.
136
 
137
  If you are on an RTX 5090 and `flash-attn` is unavailable or unstable in your environment, use the INT4 path in this repo, which is already configured with `attn_implementation="sdpa"`.
138
 
 
146
  bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
147
  ```
148
 
149
+ If you are on a multi-GPU server and want to select one GPU:
150
+
151
+ ```bash
152
+ CUDA_VISIBLE_DEVICES=0 bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
153
+ ```
154
+
155
  Multi-turn chat:
156
 
157
  ```bash
 
174
  bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
175
  ```
176
 
177
+ If you are on a multi-GPU server and want to select one GPU:
178
+
179
+ ```bash
180
+ CUDA_VISIBLE_DEVICES=0 bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
181
+ ```
182
+
183
  Multi-turn chat:
184
 
185
  ```bash
environment.yml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: skingpt-r1
2
+ channels:
3
+ - defaults
4
+ dependencies:
5
+ - python=3.10.20
6
+ - pip
7
+ - pip:
8
+ - accelerate==1.13.0
9
+ - av==17.0.0
10
+ - bitsandbytes==0.49.2
11
+ - fastapi>=0.100.0
12
+ - huggingface-hub==1.7.1
13
+ - openai>=1.0.0
14
+ - pillow==12.0.0
15
+ - python-multipart>=0.0.6
16
+ - qwen-vl-utils==0.0.14
17
+ - safetensors==0.7.0
18
+ - tokenizers==0.22.2
19
+ - torch==2.10.0
20
+ - torchvision==0.25.0
21
+ - transformers==5.3.0
22
+ - uvicorn>=0.20.0
inference/README.md CHANGED
@@ -5,7 +5,6 @@ Two runtime tracks are provided:
5
  - `full_precision/`: single-image inference, multi-turn chat, and FastAPI service
6
  - `int4_quantized/`: single-image inference, multi-turn chat, and FastAPI service for the INT4 path
7
 
8
- Checkpoint paths:
9
 
10
- - `./checkpoints/full_precision`
11
- - `./checkpoints/int4`
 
5
  - `full_precision/`: single-image inference, multi-turn chat, and FastAPI service
6
  - `int4_quantized/`: single-image inference, multi-turn chat, and FastAPI service for the INT4 path
7
 
8
+ Model weights directory:
9
 
10
+ - `./checkpoints`
 
inference/full_precision/model_utils.py CHANGED
@@ -12,7 +12,7 @@ from transformers import (
12
  TextIteratorStreamer,
13
  )
14
 
15
- DEFAULT_MODEL_PATH = "./checkpoints/full_precision"
16
  DEFAULT_SYSTEM_PROMPT = "You are a professional AI dermatology assistant."
17
 
18
 
 
12
  TextIteratorStreamer,
13
  )
14
 
15
+ DEFAULT_MODEL_PATH = "./checkpoints"
16
  DEFAULT_SYSTEM_PROMPT = "You are a professional AI dermatology assistant."
17
 
18
 
inference/int4_quantized/model_utils.py CHANGED
@@ -19,7 +19,7 @@ from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import (
19
  Qwen2_5_VLForConditionalGeneration,
20
  )
21
 
22
- DEFAULT_MODEL_PATH = "./checkpoints/int4"
23
  DEFAULT_SYSTEM_PROMPT = (
24
  "You are a professional AI dermatology assistant. "
25
  "Reason step by step, keep the reasoning concise, avoid repetition, "
 
19
  Qwen2_5_VLForConditionalGeneration,
20
  )
21
 
22
+ DEFAULT_MODEL_PATH = "./checkpoints"
23
  DEFAULT_SYSTEM_PROMPT = (
24
  "You are a professional AI dermatology assistant. "
25
  "Reason step by step, keep the reasoning concise, avoid repetition, "
requirements.txt CHANGED
@@ -1,30 +1,22 @@
1
- # Base requirements
2
- torch==2.4.0
3
- torchvision==0.19.0
4
- accelerate>=0.26.0
5
 
6
- # Model specific utilities
7
- qwen-vl-utils==0.0.10
8
- transformers-stream-generator==0.0.4
9
- av
10
- pillow>=10.0.0
11
-
12
- # API & Serving (New: For FastAPI deployment)
13
  fastapi>=0.100.0
14
- uvicorn>=0.20.0
 
 
15
  python-multipart>=0.0.6
16
- openai>=1.0.0 # For DeepSeek API (OpenAI-compatible)
17
- bitsandbytes>=0.43.0 # Required for INT4 quantized inference
18
- # Attention notes:
19
- # - SDPA is built into PyTorch 2.x
20
- # - flash-attn is optional and mainly useful on GPUs officially supported by the project
21
-
22
- # Install latest transformers from source (Required for Qwen2.5-VL/Vision-R1)
23
- git+https://github.com/huggingface/transformers.git
24
-
25
- # Optional but recommended for GPU acceleration
26
- # flash-attn==2.6.1
27
 
28
- # For potential future demo usage
29
- gradio==5.4.0
30
- gradio_client==1.4.2
 
1
+ # Reproducible runtime dependencies for SkinGPT-R1.
2
+ # These versions are aligned with the maintainers' working environment.
 
 
3
 
4
+ accelerate==1.13.0
5
+ av==17.0.0
6
+ bitsandbytes==0.49.2
 
 
 
 
7
  fastapi>=0.100.0
8
+ huggingface-hub==1.7.1
9
+ openai>=1.0.0
10
+ pillow==12.0.0
11
  python-multipart>=0.0.6
12
+ qwen-vl-utils==0.0.14
13
+ safetensors==0.7.0
14
+ tokenizers==0.22.2
15
+ torch==2.10.0
16
+ torchvision==0.25.0
17
+ transformers==5.3.0
18
+ uvicorn>=0.20.0
 
 
 
 
19
 
20
+ # Attention backend notes:
21
+ # - SDPA is built into torch 2.10.0 and is the default choice for RTX 50 series.
22
+ # - flash-attn is optional and should only be installed on stacks known to support it.