pennypm commited on
Commit
c139c13
·
verified ·
1 Parent(s): 44391d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -47,7 +47,9 @@ Max ITL (ms): 7819.3
47
 
48
  ## How To Use
49
 
50
- Mirror:lmsysorg/sglang:v0.4.6.post5-cu124
 
 
51
 
52
  ```
53
  # 1. Repo clone
@@ -55,7 +57,6 @@ git clone https://github.com/TMElyralab/sglang && cd sglang
55
  git checkout lyra_w4afp8
56
 
57
  # 2. SGLang install
58
- pip install vllm==0.10.0
59
  pip install uv cmake
60
  pip install -e "python[all]"
61
 
@@ -63,7 +64,7 @@ pip install -e "python[all]"
63
  cd sgl-kernel & make build
64
 
65
  # 4. Run SGLang
66
- python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --dtype half --cuda-graph-max-bs 128 --max-running-requests 128
67
  ```
68
 
69
 
 
47
 
48
  ## How To Use
49
 
50
+ Mirror:lmsysorg/sglang:v0.4.6.post5-cu124
51
+ or lmsysorg/sglang:v0.5.1.post5-cu126 (cuda12.6 env need to update ptxas to 12.8 on Hopper. [reference](https://github.com/sgl-project/sglang/blob/main/sgl-kernel/README.md))
52
+
53
 
54
  ```
55
  # 1. Repo clone
 
57
  git checkout lyra_w4afp8
58
 
59
  # 2. SGLang install
 
60
  pip install uv cmake
61
  pip install -e "python[all]"
62
 
 
64
  cd sgl-kernel & make build
65
 
66
  # 4. Run SGLang
67
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --cuda-graph-max-bs 128 --max-running-requests 128
68
  ```
69
 
70