Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,9 @@ Max ITL (ms): 7819.3
|
|
| 47 |
|
| 48 |
## How To Use
|
| 49 |
|
| 50 |
-
Mirror:lmsysorg/sglang:v0.4.6.post5-cu124
|
|
|
|
|
|
|
| 51 |
|
| 52 |
```
|
| 53 |
# 1. Repo clone
|
|
@@ -55,7 +57,6 @@ git clone https://github.com/TMElyralab/sglang && cd sglang
|
|
| 55 |
git checkout lyra_w4afp8
|
| 56 |
|
| 57 |
# 2. SGLang install
|
| 58 |
-
pip install vllm==0.10.0
|
| 59 |
pip install uv cmake
|
| 60 |
pip install -e "python[all]"
|
| 61 |
|
|
@@ -63,7 +64,7 @@ pip install -e "python[all]"
|
|
| 63 |
cd sgl-kernel & make build
|
| 64 |
|
| 65 |
# 4. Run SGLang
|
| 66 |
-
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --
|
| 67 |
```
|
| 68 |
|
| 69 |
|
|
|
|
| 47 |
|
| 48 |
## How To Use
|
| 49 |
|
| 50 |
+
Mirror:lmsysorg/sglang:v0.4.6.post5-cu124
|
| 51 |
+
or lmsysorg/sglang:v0.5.1.post5-cu126 (cuda12.6 env need to update ptxas to 12.8 on Hopper. [reference](https://github.com/sgl-project/sglang/blob/main/sgl-kernel/README.md))
|
| 52 |
+
|
| 53 |
|
| 54 |
```
|
| 55 |
# 1. Repo clone
|
|
|
|
| 57 |
git checkout lyra_w4afp8
|
| 58 |
|
| 59 |
# 2. SGLang install
|
|
|
|
| 60 |
pip install uv cmake
|
| 61 |
pip install -e "python[all]"
|
| 62 |
|
|
|
|
| 64 |
cd sgl-kernel & make build
|
| 65 |
|
| 66 |
# 4. Run SGLang
|
| 67 |
+
python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --cuda-graph-max-bs 128 --max-running-requests 128
|
| 68 |
```
|
| 69 |
|
| 70 |
|