Unify checkpoint path and environment docs

Files changed (6) hide show

README.md +71 -6
environment.yml +22 -0
inference/README.md +2 -3
inference/full_precision/model_utils.py +1 -1
inference/int4_quantized/model_utils.py +1 -1
requirements.txt +18 -26

README.md CHANGED Viewed

@@ -36,6 +36,7 @@ See [LICENSE](LICENSE) for details.
 ```text
 SkinGPT-R1/
 ├── checkpoints/
 ├── inference/
 │   ├── full_precision/
 │   └── int4_quantized/
@@ -43,10 +44,9 @@ SkinGPT-R1/
 └── README.md
 ```
-Checkpoint paths:
-- Full precision: `./checkpoints/full_precision`
-- INT4 quantized: `./checkpoints/int4`
 ## Highlights
@@ -57,12 +57,65 @@ Checkpoint paths:
 ## Install
 ```bash
-conda create -n skingpt-r1 python=3.10 -y
 conda activate skingpt-r1
 pip install -r requirements.txt
 ```
 ## Attention Backend Notes
 This repo uses two attention acceleration paths:
@@ -77,9 +130,9 @@ Recommended choice:
 Practical notes:
-- The current repo pins `torch==2.4.0`, and SDPA is already built into PyTorch in this version.
 - FlashAttention's official README currently lists Ampere, Ada, and Hopper support for FlashAttention-2. It does not list RTX 50 / Blackwell consumer GPUs in that section, so this repo defaults to `sdpa` for that path.
-- PyTorch 2.5 added a newer cuDNN SDPA backend for H100-class or newer GPUs, but this repo is pinned to PyTorch 2.4, so you should not assume those 2.5-specific gains here.
 If you are on an RTX 5090 and `flash-attn` is unavailable or unstable in your environment, use the INT4 path in this repo, which is already configured with `attn_implementation="sdpa"`.
@@ -93,6 +146,12 @@ Single image:
 bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
 ```
 Multi-turn chat:
 ```bash
@@ -115,6 +174,12 @@ Single image:
 bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
 ```
 Multi-turn chat:
 ```bash

 ```text
 SkinGPT-R1/
 ├── checkpoints/
+├── environment.yml
 ├── inference/
 │   ├── full_precision/
 │   └── int4_quantized/
 └── README.md
 ```
+Model weights directory:
+- `./checkpoints`
 ## Highlights
 ## Install
+`environment.yml` is a Conda environment definition file. It captures the Python version and the package versions we use, so other users can recreate a working environment from scratch with one command.
+Recommended from scratch:
+```bash
+cd SkinGPT-R1
+conda env create -f environment.yml
+conda activate skingpt-r1
+```
+Manual setup:
 ```bash
+cd SkinGPT-R1
+conda create -n skingpt-r1 python=3.10.20 -y
 conda activate skingpt-r1
 pip install -r requirements.txt
 ```
+This repo is currently aligned to the maintainers' working environment:
+- `torch==2.10.0`
+- `torchvision==0.25.0`
+- `transformers==5.3.0`
+- `qwen-vl-utils==0.0.14`
+For RTX 50 series, start with the default `sdpa` path and do not install `flash-attn`
+unless you have already verified that your CUDA stack supports it.
+## Quick Start
+1. Clone the repository and enter it.
+```bash
+git clone <your-repo-url>
+cd SkinGPT-R1
+```
+2. Create the environment.
+```bash
+conda env create -f environment.yml
+conda activate skingpt-r1
+```
+3. Put model weights under:
+```text
+./checkpoints
+```
+4. Prepare a test image, for example:
+```text
+./test_images/lesion.jpg
+```
+5. Run one of the inference commands below.
 ## Attention Backend Notes
 This repo uses two attention acceleration paths:
 Practical notes:
+- The current repo pins `torch==2.10.0`, and SDPA is already built into PyTorch in this version.
 - FlashAttention's official README currently lists Ampere, Ada, and Hopper support for FlashAttention-2. It does not list RTX 50 / Blackwell consumer GPUs in that section, so this repo defaults to `sdpa` for that path.
+- Newer PyTorch releases continue improving SDPA, and this repo is already on a modern PyTorch stack. Even so, RTX 50 series should still default to `sdpa` unless `flash-attn` has been explicitly validated in your environment.
 If you are on an RTX 5090 and `flash-attn` is unavailable or unstable in your environment, use the INT4 path in this repo, which is already configured with `attn_implementation="sdpa"`.
 bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
 ```
+If you are on a multi-GPU server and want to select one GPU:
+```bash
+CUDA_VISIBLE_DEVICES=0 bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
+```
 Multi-turn chat:
 ```bash
 bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
 ```
+If you are on a multi-GPU server and want to select one GPU:
+```bash
+CUDA_VISIBLE_DEVICES=0 bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
+```
 Multi-turn chat:
 ```bash

environment.yml ADDED Viewed

	@@ -0,0 +1,22 @@

+name: skingpt-r1
+channels:
+  - defaults
+dependencies:
+  - python=3.10.20
+  - pip
+  - pip:
+      - accelerate==1.13.0
+      - av==17.0.0
+      - bitsandbytes==0.49.2
+      - fastapi>=0.100.0
+      - huggingface-hub==1.7.1
+      - openai>=1.0.0
+      - pillow==12.0.0
+      - python-multipart>=0.0.6
+      - qwen-vl-utils==0.0.14
+      - safetensors==0.7.0
+      - tokenizers==0.22.2
+      - torch==2.10.0
+      - torchvision==0.25.0
+      - transformers==5.3.0
+      - uvicorn>=0.20.0

inference/README.md CHANGED Viewed

@@ -5,7 +5,6 @@ Two runtime tracks are provided:
 - `full_precision/`: single-image inference, multi-turn chat, and FastAPI service
 - `int4_quantized/`: single-image inference, multi-turn chat, and FastAPI service for the INT4 path
-Checkpoint paths:
-- `./checkpoints/full_precision`
-- `./checkpoints/int4`

 - `full_precision/`: single-image inference, multi-turn chat, and FastAPI service
 - `int4_quantized/`: single-image inference, multi-turn chat, and FastAPI service for the INT4 path
+Model weights directory:
+- `./checkpoints`

inference/full_precision/model_utils.py CHANGED Viewed

@@ -12,7 +12,7 @@ from transformers import (
     TextIteratorStreamer,
 )
-DEFAULT_MODEL_PATH = "./checkpoints/full_precision"
 DEFAULT_SYSTEM_PROMPT = "You are a professional AI dermatology assistant."

     TextIteratorStreamer,
 )
+DEFAULT_MODEL_PATH = "./checkpoints"
 DEFAULT_SYSTEM_PROMPT = "You are a professional AI dermatology assistant."

inference/int4_quantized/model_utils.py CHANGED Viewed

@@ -19,7 +19,7 @@ from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import (
     Qwen2_5_VLForConditionalGeneration,
 )
-DEFAULT_MODEL_PATH = "./checkpoints/int4"
 DEFAULT_SYSTEM_PROMPT = (
     "You are a professional AI dermatology assistant. "
     "Reason step by step, keep the reasoning concise, avoid repetition, "

     Qwen2_5_VLForConditionalGeneration,
 )
+DEFAULT_MODEL_PATH = "./checkpoints"
 DEFAULT_SYSTEM_PROMPT = (
     "You are a professional AI dermatology assistant. "
     "Reason step by step, keep the reasoning concise, avoid repetition, "

requirements.txt CHANGED Viewed

@@ -1,30 +1,22 @@
-# Base requirements
-torch==2.4.0
-torchvision==0.19.0
-accelerate>=0.26.0
-# Model specific utilities
-qwen-vl-utils==0.0.10
-transformers-stream-generator==0.0.4
-av
-pillow>=10.0.0
-# API & Serving (New: For FastAPI deployment)
 fastapi>=0.100.0
-uvicorn>=0.20.0
 python-multipart>=0.0.6
-openai>=1.0.0  # For DeepSeek API (OpenAI-compatible)
-bitsandbytes>=0.43.0  # Required for INT4 quantized inference
-# Attention notes:
-# - SDPA is built into PyTorch 2.x
-# - flash-attn is optional and mainly useful on GPUs officially supported by the project
-# Install latest transformers from source (Required for Qwen2.5-VL/Vision-R1)
-git+https://github.com/huggingface/transformers.git
-# Optional but recommended for GPU acceleration
-# flash-attn==2.6.1
-# For potential future demo usage
-gradio==5.4.0
-gradio_client==1.4.2

+# Reproducible runtime dependencies for SkinGPT-R1.
+# These versions are aligned with the maintainers' working environment.
+accelerate==1.13.0
+av==17.0.0
+bitsandbytes==0.49.2
 fastapi>=0.100.0
+huggingface-hub==1.7.1
+openai>=1.0.0
+pillow==12.0.0
 python-multipart>=0.0.6
+qwen-vl-utils==0.0.14
+safetensors==0.7.0
+tokenizers==0.22.2
+torch==2.10.0
+torchvision==0.25.0
+transformers==5.3.0
+uvicorn>=0.20.0
+# Attention backend notes:
+# - SDPA is built into torch 2.10.0 and is the default choice for RTX 50 series.
+# - flash-attn is optional and should only be installed on stacks known to support it.