RangiLyu commited on
Commit
15c08d2
·
verified ·
1 Parent(s): 7301366

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -13
README.md CHANGED
@@ -190,34 +190,49 @@ print(decoded_output)
190
 
191
  ### Serving
192
 
193
- You can utilize one of the following LLM inference frameworks to create an OpenAI compatible server:
 
 
 
 
 
 
 
194
 
195
  #### [lmdeploy(>=0.9.2)](https://github.com/InternLM/lmdeploy)
196
 
197
- ```
198
- lmdeploy serve api_server internlm/Intern-S1-FP8 --reasoning-parser intern-s1 --tool-call-parser intern-s1 --tp 4
199
  ```
200
 
201
  #### [vllm](https://github.com/vllm-project/vllm)
202
 
203
- Coming soon.
 
 
204
 
205
  #### [sglang](https://github.com/sgl-project/sglang)
206
 
207
- Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
208
-
209
  ```bash
210
- CUDA_VISIBLE_DEVICES=0,1,2,3 \
211
- python3 -m sglang.launch_server \
212
- --model-path internlm/Intern-S1-FP8 \
213
  --trust-remote-code \
214
- --tp 4 \
215
- --port 8001 \
216
- --mem-fraction-static 0.85 \
217
- --enable-multimodal \
218
  --grammar-backend none
219
  ```
220
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  ## Advanced Usage
222
 
223
  ### Tool Calling
 
190
 
191
  ### Serving
192
 
193
+ The minimum hardware requirements for deploying Intern-S1 series models are:
194
+
195
+ | Model | A100(GPUs) | H800(GPUs) | H100(GPUs) | H200(GPUs) |
196
+ | :---------------------------------------------------------------------: | :--------: | :--------: | :--------: | :--------: |
197
+ | [internlm/Intern-S1](https://huggingface.co/internlm/Intern-S1) | 8 | 8 | 8 | 4 |
198
+ | [internlm/Intern-S1-FP8](https://huggingface.co/internlm/Intern-S1-FP8) | - | 4 | 4 | 2 |
199
+
200
+ You can utilize one of the following LLM inference frameworks to create an OpenAI compatible server:
201
 
202
  #### [lmdeploy(>=0.9.2)](https://github.com/InternLM/lmdeploy)
203
 
204
+ ```bash
205
+ lmdeploy serve api_server internlm/Intern-S1 --reasoning-parser intern-s1 --tool-call-parser intern-s1 --tp 8
206
  ```
207
 
208
  #### [vllm](https://github.com/vllm-project/vllm)
209
 
210
+ ```bash
211
+ vllm serve internlm/Intern-S1 --tensor-parallel-size 8 --trust-remote-code
212
+ ```
213
 
214
  #### [sglang](https://github.com/sgl-project/sglang)
215
 
 
 
216
  ```bash
217
+ python3 -m sglang.launch_server \
218
+ --model-path internlm/Intern-S1 \
 
219
  --trust-remote-code \
220
+ --tp 8 \
 
 
 
221
  --grammar-backend none
222
  ```
223
 
224
+ #### ollama for local deployment:
225
+
226
+ ```bash
227
+ # install ollama
228
+ curl -fsSL https://ollama.com/install.sh | sh
229
+ # fetch model
230
+ ollama pull internlm/interns1
231
+ # run model
232
+ ollama run internlm/interns1
233
+ # then use openai client to call on http://localhost:11434/v1
234
+ ```
235
+
236
  ## Advanced Usage
237
 
238
  ### Tool Calling