TMElyralab
/

DeepSeek-R1-AWQ-W4AFP8

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

pennypm commited on Aug 28, 2025

Commit

c139c13

·

verified ·

1 Parent(s): 44391d2

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -47,7 +47,9 @@ Max ITL (ms):                            7819.3
 ## How To Use
-Mirror：lmsysorg/sglang:v0.4.6.post5-cu124
 ```
 # 1. Repo clone
@@ -55,7 +57,6 @@ git clone https://github.com/TMElyralab/sglang && cd sglang
 git checkout lyra_w4afp8
 # 2. SGLang install
-pip install vllm==0.10.0
 pip install uv cmake
 pip install -e "python[all]"
@@ -63,7 +64,7 @@ pip install -e "python[all]"
 cd sgl-kernel & make build
 # 4. Run SGLang
-python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --dtype half --cuda-graph-max-bs 128 --max-running-requests 128
 ```

 ## How To Use
+Mirror：lmsysorg/sglang:v0.4.6.post5-cu124
+or lmsysorg/sglang:v0.5.1.post5-cu126 (cuda12.6 env need to update ptxas to 12.8 on Hopper. [reference](https://github.com/sgl-project/sglang/blob/main/sgl-kernel/README.md))
 ```
 # 1. Repo clone
 git checkout lyra_w4afp8
 # 2. SGLang install
 pip install uv cmake
 pip install -e "python[all]"
 cd sgl-kernel & make build
 # 4. Run SGLang
+python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --cuda-graph-max-bs 128 --max-running-requests 128
 ```