xiaosa commited on
Commit
44391d2
·
verified ·
1 Parent(s): 25fc680

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -5
README.md CHANGED
@@ -1,12 +1,10 @@
1
  ---
2
  license: mit
3
- base_model:
4
- - deepseek-ai/DeepSeek-R1
5
- base_model_relation: quantized
6
  ---
7
- # DeepSeek-V3.1-W4AFP8
8
 
9
- This model is a W4AFP8 quantized DeepSeek-V3.1 with AWQ quantizaton.
10
  Releated PR:https://github.com/sgl-project/sglang/pull/8573
11
  Releated Project: https://github.com/TMElyralab/sglang/tree/lyra_w4afp8
12
 
@@ -47,6 +45,39 @@ Max ITL (ms): 7819.3
47
  ==================================================
48
  ```
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  ------
51
  <!-- markdownlint-disable first-line-h1 -->
52
  <!-- markdownlint-disable html -->
 
1
  ---
2
  license: mit
3
+ library_name: transformers
 
 
4
  ---
5
+ # DeepSeek-R1-AWQ-W4AFP8
6
 
7
+ This model is a W4AFP8 quantized DeepSeek-R1 with AWQ quantizaton.
8
  Releated PR:https://github.com/sgl-project/sglang/pull/8573
9
  Releated Project: https://github.com/TMElyralab/sglang/tree/lyra_w4afp8
10
 
 
45
  ==================================================
46
  ```
47
 
48
+ ## How To Use
49
+
50
+ Mirror:lmsysorg/sglang:v0.4.6.post5-cu124
51
+
52
+ ```
53
+ # 1. Repo clone
54
+ git clone https://github.com/TMElyralab/sglang && cd sglang
55
+ git checkout lyra_w4afp8
56
+
57
+ # 2. SGLang install
58
+ pip install vllm==0.10.0
59
+ pip install uv cmake
60
+ pip install -e "python[all]"
61
+
62
+ # 3. Recompile sgl-kernel
63
+ cd sgl-kernel & make build
64
+
65
+ # 4. Run SGLang
66
+ python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-AWQ-W4AFP8 --tp 8 --trust-remote-code --host 0.0.0.0 --port 8000 --mem-fraction-static 0.9 --quantization w4a8_machete --dtype half --cuda-graph-max-bs 128 --max-running-requests 128
67
+ ```
68
+
69
+
70
+ ## Citation
71
+ We are TMElyralab, the Acceleration Team from Tencent Music Entertainment (TME).
72
+ ```
73
+ @Misc{TMElyralab_2025,
74
+ author = {Sa Xiao, Mian Peng, Haoxiong Su, Kangjian Wu, Bin Wu, Yibo Lu, Qiwen Mao, Wenjiang Zhou},
75
+ howpublished = {\url{https://github.com/TMElyralab}},
76
+ year = {2025}
77
+ }
78
+ ```
79
+
80
+
81
  ------
82
  <!-- markdownlint-disable first-line-h1 -->
83
  <!-- markdownlint-disable html -->