inclusionAI
/

Ring-flash-linear-2.0

Text Generation

bailing_moe_linear

Mixture of Experts

Model card Files Files and versions

curryandsun commited on Sep 26, 2025

Commit

0e36ba9

·

verified ·

1 Parent(s): 43f23e6

Update README.md

Files changed (1) hide show

README.md +35 -7

README.md CHANGED Viewed

@@ -74,9 +74,9 @@ What is truly exciting is that in the comparison with Qwen3-32B, Ring-flash-line
 <div align="center">
-|     **Model**      | **Download** |
-| :----------------: | :----------: |
-| Ring-flash-linear-2.0 | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-flash-linear-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-flash-linear-2.0)|
 </div>
 ## Quickstart
@@ -140,8 +140,26 @@ print(responses)
 print("*" * 30)
 ```
-### SGLang
-```bash
 python -m sglang.launch_server \
     --model-path <model_path> \
     --trust-remote-code \
@@ -149,7 +167,17 @@ python -m sglang.launch_server \
     --disable-radix-cache \
     --json-model-override-args "{\"linear_backend\": \"seg_la\"}"
 ```
 ### vLLM
-Todo
 ## Citation

 <div align="center">
+|     **Model**     | **Context Length** | **Download** |
+| :----------------: | :----------------: | :----------: |
+| Ring-flash-linear-2.0 |        128K         |      [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-flash-linear-2.0) <br>[🤖 Modelscope](https://modelscope.cn/models/inclusionAI/Ring-flash-linear-2.0)|
 </div>
 ## Quickstart
 print("*" * 30)
 ```
+### 🚀 SGLang
+#### Environment Preparation
+We will later submit our model to SGLang official release, now we can prepare the environment following steps:
+```shell
+pip3 install sgl-kernel==0.3.9.post2 vllm==0.10.2
+```
+Then you should install our sglang whl package:
+```shell
+pip install https://github.com/inclusionAI/Ring-V2/blob/main/hybrid_linear/whls/sglang-0.5.2-py3-none-any.whl
+```
+#### Run Inference
+BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
+- Start server:
+```shell
 python -m sglang.launch_server \
     --model-path <model_path> \
     --trust-remote-code \
     --disable-radix-cache \
     --json-model-override-args "{\"linear_backend\": \"seg_la\"}"
 ```
+- Client:
+```shell
+curl -s http://localhost:${PORT}/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
+```
+More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
 ### vLLM
+TODO
 ## Citation