curryandsun commited on
Commit
0e36ba9
·
verified ·
1 Parent(s): 43f23e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -7
README.md CHANGED
@@ -74,9 +74,9 @@ What is truly exciting is that in the comparison with Qwen3-32B, Ring-flash-line
74
 
75
  <div align="center">
76
 
77
- | **Model** | **Download** |
78
- | :----------------: | :----------: |
79
- | Ring-flash-linear-2.0 | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-flash-linear-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-flash-linear-2.0)|
80
  </div>
81
 
82
  ## Quickstart
@@ -140,8 +140,26 @@ print(responses)
140
  print("*" * 30)
141
  ```
142
 
143
- ### SGLang
144
- ```bash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  python -m sglang.launch_server \
146
  --model-path <model_path> \
147
  --trust-remote-code \
@@ -149,7 +167,17 @@ python -m sglang.launch_server \
149
  --disable-radix-cache \
150
  --json-model-override-args "{\"linear_backend\": \"seg_la\"}"
151
  ```
 
 
 
 
 
 
 
 
 
 
 
152
  ### vLLM
153
- Todo
154
-
155
  ## Citation
 
74
 
75
  <div align="center">
76
 
77
+ | **Model** | **Context Length** | **Download** |
78
+ | :----------------: | :----------------: | :----------: |
79
+ | Ring-flash-linear-2.0 | 128K | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-flash-linear-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-flash-linear-2.0)|
80
  </div>
81
 
82
  ## Quickstart
 
140
  print("*" * 30)
141
  ```
142
 
143
+ ### 🚀 SGLang
144
+
145
+ #### Environment Preparation
146
+
147
+ We will later submit our model to SGLang official release, now we can prepare the environment following steps:
148
+ ```shell
149
+ pip3 install sgl-kernel==0.3.9.post2 vllm==0.10.2
150
+ ```
151
+
152
+ Then you should install our sglang whl package:
153
+ ```shell
154
+ pip install https://github.com/inclusionAI/Ring-V2/blob/main/hybrid_linear/whls/sglang-0.5.2-py3-none-any.whl
155
+ ```
156
+
157
+ #### Run Inference
158
+
159
+ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
160
+
161
+ - Start server:
162
+ ```shell
163
  python -m sglang.launch_server \
164
  --model-path <model_path> \
165
  --trust-remote-code \
 
167
  --disable-radix-cache \
168
  --json-model-override-args "{\"linear_backend\": \"seg_la\"}"
169
  ```
170
+
171
+ - Client:
172
+
173
+ ```shell
174
+ curl -s http://localhost:${PORT}/v1/chat/completions \
175
+ -H "Content-Type: application/json" \
176
+ -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
177
+ ```
178
+
179
+ More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
180
+
181
  ### vLLM
182
+ TODO
 
183
  ## Citation