Instructions to use internlm/Intern-S1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use internlm/Intern-S1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="internlm/Intern-S1", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("internlm/Intern-S1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use internlm/Intern-S1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "internlm/Intern-S1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/internlm/Intern-S1
- SGLang
How to use internlm/Intern-S1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "internlm/Intern-S1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "internlm/Intern-S1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "internlm/Intern-S1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use internlm/Intern-S1 with Docker Model Runner:
docker model run hf.co/internlm/Intern-S1
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,17 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
pipeline_tag: image-text-to-text
|
| 4 |
-
---
|
| 5 |
|
| 6 |
|
| 7 |
## Intern-S1
|
| 8 |
|
| 9 |

|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
## Introduction
|
| 12 |
|
| 13 |
We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
|
|
@@ -24,7 +28,43 @@ Features
|
|
| 24 |
|
| 25 |
We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
| 30 |
|
|
@@ -74,7 +114,7 @@ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :
|
|
| 74 |
print(decoded_output)
|
| 75 |
```
|
| 76 |
|
| 77 |
-
####
|
| 78 |
|
| 79 |
```python
|
| 80 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
|
@@ -156,11 +196,14 @@ Coming soon.
|
|
| 156 |
|
| 157 |
#### [sglang](https://github.com/sgl-project/sglang)
|
| 158 |
|
|
|
|
|
|
|
| 159 |
```bash
|
| 160 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
| 161 |
python3 -m sglang.launch_server \
|
| 162 |
--model-path internlm/Intern-S1 \
|
| 163 |
--trust-remote-code \
|
|
|
|
| 164 |
--tp 8 \
|
| 165 |
--enable-multimodal \
|
| 166 |
--grammar-backend none
|
|
@@ -172,9 +215,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
| 172 |
# install ollama
|
| 173 |
curl -fsSL https://ollama.com/install.sh | sh
|
| 174 |
# fetch model
|
| 175 |
-
ollama pull internlm/
|
| 176 |
# run model
|
| 177 |
-
ollama run internlm/
|
| 178 |
# then use openai client to call on http://localhost:11434/v1
|
| 179 |
```
|
| 180 |
|
|
@@ -186,9 +229,10 @@ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capab
|
|
| 186 |
|
| 187 |
A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatileโit works not just with OpenAI models, but with any model that follows the same interface standard.
|
| 188 |
|
| 189 |
-
To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast.
|
| 190 |
|
| 191 |
```python
|
|
|
|
| 192 |
from openai import OpenAI
|
| 193 |
import json
|
| 194 |
|
|
@@ -313,7 +357,7 @@ response = client.chat.completions.create(
|
|
| 313 |
temperature=0.8,
|
| 314 |
top_p=0.8,
|
| 315 |
stream=False,
|
| 316 |
-
extra_body=dict(spaces_between_special_tokens=False),
|
| 317 |
tools=tools)
|
| 318 |
print(response.choices[0].message)
|
| 319 |
messages.append(response.choices[0].message)
|
|
@@ -335,11 +379,10 @@ response = client.chat.completions.create(
|
|
| 335 |
temperature=0.8,
|
| 336 |
top_p=0.8,
|
| 337 |
stream=False,
|
| 338 |
-
extra_body=dict(spaces_between_special_tokens=False),
|
| 339 |
tools=tools)
|
| 340 |
print(response.choices[0].message.content)
|
| 341 |
```
|
| 342 |
-
|
| 343 |
|
| 344 |
### Switching Between Thinking and Non-Thinking Modes
|
| 345 |
|
|
@@ -400,4 +443,4 @@ For vllm and sglang users, configure this through,
|
|
| 400 |
extra_body={
|
| 401 |
"chat_template_kwargs": {"enable_thinking": false}
|
| 402 |
}
|
| 403 |
-
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
---
|
| 5 |
|
| 6 |
|
| 7 |
## Intern-S1
|
| 8 |
|
| 9 |

|
| 10 |
|
| 11 |
+
|
| 12 |
+
[](https://github.com/InternLM/Intern-S1)
|
| 13 |
+
|
| 14 |
+
|
| 15 |
## Introduction
|
| 16 |
|
| 17 |
We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
|
|
|
|
| 28 |
|
| 29 |
We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
|
| 30 |
|
| 31 |
+
<table>
|
| 32 |
+
<thead>
|
| 33 |
+
<tr>
|
| 34 |
+
<th rowspan="2">Benchmarks</th>
|
| 35 |
+
<th colspan="2">Intern-S1</th>
|
| 36 |
+
<th>InternVL3-78B</th>
|
| 37 |
+
<th>Qwen2.5-VL-72B</th>
|
| 38 |
+
<th>DS-R1-0528</th>
|
| 39 |
+
<th>Qwen3-235B-A2.2B</th>
|
| 40 |
+
<th>Kimi-K2-Instruct</th>
|
| 41 |
+
<th>Gemini-2.5 Pro</th>
|
| 42 |
+
<th>o3</th>
|
| 43 |
+
<th>Grok-4</th>
|
| 44 |
+
</tr>
|
| 45 |
+
</thead>
|
| 46 |
+
<tbody>
|
| 47 |
+
<tr><td>MMUL-Pro</td><td colspan="2">83.5 โ
</td><td>73.0</td><td>72.1</td><td>83.4</td><td>82.2</td><td>82.7</td><td>86.0</td><td>85.0</td><td>85.9</td></tr>
|
| 48 |
+
<tr><td>MMMU</td><td colspan="2">77.7 โ
</td><td>72.2</td><td>70.2</td><td>-</td><td>-</td><td>-</td><td>81.9</td><td>80.8</td><td>77.9</td></tr>
|
| 49 |
+
<tr><td>GPQA</td><td colspan="2">77.3</td><td>49.9</td><td>49.0</td><td>80.6</td><td>71.1</td><td>77.8</td><td>83.8</td><td>83.3</td><td>87.5</td></tr>
|
| 50 |
+
<tr><td>MMStar</td><td colspan="2">74.9 โ
</td><td>72.5</td><td>70.8</td><td>-</td><td>-</td><td>-</td><td>79.3</td><td>75.1</td><td>69.6</td></tr>
|
| 51 |
+
<tr><td>MathVista</td><td colspan="2">81.5 ๐</td><td>79.0</td><td>74.8</td><td>-</td><td>-</td><td>-</td><td>80.3</td><td>77.5</td><td>72.5</td></tr>
|
| 52 |
+
<tr><td>AIME2025</td><td colspan="2">86.0</td><td>10.7</td><td>10.9</td><td>87.5</td><td>81.5</td><td>51.4</td><td>83.0</td><td>88.9</td><td>91.7</td></tr>
|
| 53 |
+
<tr><td>MathVision</td><td colspan="2">62.5 โ
</td><td>43.1</td><td>38.1</td><td>-</td><td>-</td><td>-</td><td>73.0</td><td>67.7</td><td>67.3</td></tr>
|
| 54 |
+
<tr><td>IFEval</td><td colspan="2">86.7</td><td>75.6</td><td>83.9</td><td>79.7</td><td>85.0</td><td>90.2</td><td>91.5</td><td>92.2</td><td>92.8</td></tr>
|
| 55 |
+
<tr><td>SFE</td><td colspan="2">44.3 ๐</td><td>36.2</td><td>30.5</td><td>-</td><td>-</td><td>-</td><td>43.0</td><td>37.7</td><td>31.2</td></tr>
|
| 56 |
+
<tr><td>Physics</td><td colspan="2">44.0 โ
</td><td>23.1</td><td>15.7</td><td>-</td><td>-</td><td>-</td><td>40.0</td><td>47.9</td><td>42.8</td></tr>
|
| 57 |
+
<tr><td>SmolInstrcut</td><td colspan="2">51.0 ๐</td><td>19.4</td><td>21.0</td><td>30.7</td><td>28.7</td><td>48.1</td><td>40.4</td><td>43.9</td><td>47.3</td></tr>
|
| 58 |
+
<tr><td>ChemBench</td><td colspan="2">83.4 ๐</td><td>61.3</td><td>61.6</td><td>75.6</td><td>75.8</td><td>75.3</td><td>82.8</td><td>81.6</td><td>83.3</td></tr>
|
| 59 |
+
<tr><td>MatBench</td><td colspan="2">75.0 ๐</td><td>49.3</td><td>51.5</td><td>57.7</td><td>52.1</td><td>61.7</td><td>61.7</td><td>61.6</td><td>67.9</td></tr>
|
| 60 |
+
<tr><td>MicroVQA</td><td colspan="2">63.9 ๐</td><td>59.1</td><td>53.0</td><td>-</td><td>-</td><td>-</td><td>63.1</td><td>58.3</td><td>59.5</td></tr>
|
| 61 |
+
<tr><td>ProteinLMBench</td><td colspan="2">63.1</td><td>61.6</td><td>61.0</td><td>61.4</td><td>59.8</td><td>66.7</td><td>62.9</td><td>67.7</td><td>66.2</td></tr>
|
| 62 |
+
<tr><td>MSEarthMCQ</td><td colspan="2">65.7 ๐</td><td>57.2</td><td>37.6</td><td>-</td><td>-</td><td>-</td><td>59.9</td><td>61.0</td><td>58.0</td></tr>
|
| 63 |
+
<tr><td>XLRS-Bench</td><td colspan="2">55.0 ๐</td><td>49.3</td><td>50.9</td><td>-</td><td>-</td><td>-</td><td>45.2</td><td>43.6</td><td>45.4</td></tr>
|
| 64 |
+
</tbody>
|
| 65 |
+
</table>
|
| 66 |
+
|
| 67 |
+
> **Note**: โ
means the best performance among open-sourced models, ๐ indicates the best performance among all models.
|
| 68 |
|
| 69 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
| 70 |
|
|
|
|
| 114 |
print(decoded_output)
|
| 115 |
```
|
| 116 |
|
| 117 |
+
####Image input
|
| 118 |
|
| 119 |
```python
|
| 120 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
|
|
|
| 196 |
|
| 197 |
#### [sglang](https://github.com/sgl-project/sglang)
|
| 198 |
|
| 199 |
+
Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
|
| 200 |
+
|
| 201 |
```bash
|
| 202 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
| 203 |
python3 -m sglang.launch_server \
|
| 204 |
--model-path internlm/Intern-S1 \
|
| 205 |
--trust-remote-code \
|
| 206 |
+
--mem-fraction-static 0.85 \
|
| 207 |
--tp 8 \
|
| 208 |
--enable-multimodal \
|
| 209 |
--grammar-backend none
|
|
|
|
| 215 |
# install ollama
|
| 216 |
curl -fsSL https://ollama.com/install.sh | sh
|
| 217 |
# fetch model
|
| 218 |
+
ollama pull internlm/interns1
|
| 219 |
# run model
|
| 220 |
+
ollama run internlm/interns1
|
| 221 |
# then use openai client to call on http://localhost:11434/v1
|
| 222 |
```
|
| 223 |
|
|
|
|
| 229 |
|
| 230 |
A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatileโit works not just with OpenAI models, but with any model that follows the same interface standard.
|
| 231 |
|
| 232 |
+
To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
|
| 233 |
|
| 234 |
```python
|
| 235 |
+
|
| 236 |
from openai import OpenAI
|
| 237 |
import json
|
| 238 |
|
|
|
|
| 357 |
temperature=0.8,
|
| 358 |
top_p=0.8,
|
| 359 |
stream=False,
|
| 360 |
+
extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
|
| 361 |
tools=tools)
|
| 362 |
print(response.choices[0].message)
|
| 363 |
messages.append(response.choices[0].message)
|
|
|
|
| 379 |
temperature=0.8,
|
| 380 |
top_p=0.8,
|
| 381 |
stream=False,
|
| 382 |
+
extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
|
| 383 |
tools=tools)
|
| 384 |
print(response.choices[0].message.content)
|
| 385 |
```
|
|
|
|
| 386 |
|
| 387 |
### Switching Between Thinking and Non-Thinking Modes
|
| 388 |
|
|
|
|
| 443 |
extra_body={
|
| 444 |
"chat_template_kwargs": {"enable_thinking": false}
|
| 445 |
}
|
| 446 |
+
```
|