Instructions to use internlm/Intern-S1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use internlm/Intern-S1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="internlm/Intern-S1", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("internlm/Intern-S1", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use internlm/Intern-S1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "internlm/Intern-S1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/internlm/Intern-S1

SGLang

How to use internlm/Intern-S1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "internlm/Intern-S1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "internlm/Intern-S1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use internlm/Intern-S1 with Docker Model Runner:
```
docker model run hf.co/internlm/Intern-S1
```

RangiLyu commited on Jul 25, 2025

Commit

2a79595

verified ·

1 Parent(s): ff0c7fd

Update README.md

Browse files

Files changed (1) hide show

README.md +56 -13

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
----
-license: apache-2.0
-pipeline_tag: image-text-to-text
----
 ## Intern-S1
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
@@ -24,7 +28,43 @@ Features
 We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
 We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
@@ -74,7 +114,7 @@ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :
 print(decoded_output)
 ```
-#### Image input
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
@@ -156,11 +196,14 @@ Coming soon.
 #### [sglang](https://github.com/sgl-project/sglang)
 ```bash
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
     python3 -m sglang.launch_server \
     --model-path internlm/Intern-S1 \
     --trust-remote-code \
     --tp 8 \
     --enable-multimodal \
     --grammar-backend none
@@ -172,9 +215,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
 # install ollama
 curl -fsSL https://ollama.com/install.sh | sh
 # fetch model
-ollama pull internlm/Intern-S1
 # run model
-ollama run internlm/Intern-S1
 # then use openai client to call on http://localhost:11434/v1
 ```
@@ -186,9 +229,10 @@ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capab
 A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
-To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast.
 ```python
 from openai import OpenAI
 import json
@@ -313,7 +357,7 @@ response = client.chat.completions.create(
     temperature=0.8,
     top_p=0.8,
     stream=False,
-    extra_body=dict(spaces_between_special_tokens=False),
     tools=tools)
 print(response.choices[0].message)
 messages.append(response.choices[0].message)
@@ -335,11 +379,10 @@ response = client.chat.completions.create(
     temperature=0.8,
     top_p=0.8,
     stream=False,
-    extra_body=dict(spaces_between_special_tokens=False),
     tools=tools)
 print(response.choices[0].message.content)
 ```
 ### Switching Between Thinking and Non-Thinking Modes
@@ -400,4 +443,4 @@ For vllm and sglang users, configure this through,
 extra_body={
     "chat_template_kwargs": {"enable_thinking": false}
 }
-```

+---
+license: apache-2.0
+pipeline_tag: image-text-to-text
+---
 ## Intern-S1
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
+[![GitHub](https://img.shields.io/badge/GitHub-InternS1-blue)](https://github.com/InternLM/Intern-S1)
 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
 We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
+<table>
+  <thead>
+    <tr>
+      <th rowspan="2">Benchmarks</th>
+      <th colspan="2">Intern-S1</th>
+      <th>InternVL3-78B</th>
+      <th>Qwen2.5-VL-72B</th>
+      <th>DS-R1-0528</th>
+      <th>Qwen3-235B-A2.2B</th>
+      <th>Kimi-K2-Instruct</th>
+      <th>Gemini-2.5 Pro</th>
+      <th>o3</th>
+      <th>Grok-4</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr><td>MMUL-Pro</td><td colspan="2">83.5 ✅</td><td>73.0</td><td>72.1</td><td>83.4</td><td>82.2</td><td>82.7</td><td>86.0</td><td>85.0</td><td>85.9</td></tr>
+    <tr><td>MMMU</td><td colspan="2">77.7 ✅</td><td>72.2</td><td>70.2</td><td>-</td><td>-</td><td>-</td><td>81.9</td><td>80.8</td><td>77.9</td></tr>
+    <tr><td>GPQA</td><td colspan="2">77.3</td><td>49.9</td><td>49.0</td><td>80.6</td><td>71.1</td><td>77.8</td><td>83.8</td><td>83.3</td><td>87.5</td></tr>
+    <tr><td>MMStar</td><td colspan="2">74.9 ✅</td><td>72.5</td><td>70.8</td><td>-</td><td>-</td><td>-</td><td>79.3</td><td>75.1</td><td>69.6</td></tr>
+    <tr><td>MathVista</td><td colspan="2">81.5 👑</td><td>79.0</td><td>74.8</td><td>-</td><td>-</td><td>-</td><td>80.3</td><td>77.5</td><td>72.5</td></tr>
+    <tr><td>AIME2025</td><td colspan="2">86.0</td><td>10.7</td><td>10.9</td><td>87.5</td><td>81.5</td><td>51.4</td><td>83.0</td><td>88.9</td><td>91.7</td></tr>
+    <tr><td>MathVision</td><td colspan="2">62.5 ✅</td><td>43.1</td><td>38.1</td><td>-</td><td>-</td><td>-</td><td>73.0</td><td>67.7</td><td>67.3</td></tr>
+    <tr><td>IFEval</td><td colspan="2">86.7</td><td>75.6</td><td>83.9</td><td>79.7</td><td>85.0</td><td>90.2</td><td>91.5</td><td>92.2</td><td>92.8</td></tr>
+    <tr><td>SFE</td><td colspan="2">44.3 👑</td><td>36.2</td><td>30.5</td><td>-</td><td>-</td><td>-</td><td>43.0</td><td>37.7</td><td>31.2</td></tr>
+    <tr><td>Physics</td><td colspan="2">44.0 ✅</td><td>23.1</td><td>15.7</td><td>-</td><td>-</td><td>-</td><td>40.0</td><td>47.9</td><td>42.8</td></tr>
+    <tr><td>SmolInstrcut</td><td colspan="2">51.0 👑</td><td>19.4</td><td>21.0</td><td>30.7</td><td>28.7</td><td>48.1</td><td>40.4</td><td>43.9</td><td>47.3</td></tr>
+    <tr><td>ChemBench</td><td colspan="2">83.4 👑</td><td>61.3</td><td>61.6</td><td>75.6</td><td>75.8</td><td>75.3</td><td>82.8</td><td>81.6</td><td>83.3</td></tr>
+    <tr><td>MatBench</td><td colspan="2">75.0 👑</td><td>49.3</td><td>51.5</td><td>57.7</td><td>52.1</td><td>61.7</td><td>61.7</td><td>61.6</td><td>67.9</td></tr>
+    <tr><td>MicroVQA</td><td colspan="2">63.9 👑</td><td>59.1</td><td>53.0</td><td>-</td><td>-</td><td>-</td><td>63.1</td><td>58.3</td><td>59.5</td></tr>
+    <tr><td>ProteinLMBench</td><td colspan="2">63.1</td><td>61.6</td><td>61.0</td><td>61.4</td><td>59.8</td><td>66.7</td><td>62.9</td><td>67.7</td><td>66.2</td></tr>
+    <tr><td>MSEarthMCQ</td><td colspan="2">65.7 👑</td><td>57.2</td><td>37.6</td><td>-</td><td>-</td><td>-</td><td>59.9</td><td>61.0</td><td>58.0</td></tr>
+    <tr><td>XLRS-Bench</td><td colspan="2">55.0 👑</td><td>49.3</td><td>50.9</td><td>-</td><td>-</td><td>-</td><td>45.2</td><td>43.6</td><td>45.4</td></tr>
+  </tbody>
+</table>
+> **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
 We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
 print(decoded_output)
 ```
+####Image input
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
 #### [sglang](https://github.com/sgl-project/sglang)
+Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
 ```bash
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
     python3 -m sglang.launch_server \
     --model-path internlm/Intern-S1 \
     --trust-remote-code \
+    --mem-fraction-static 0.85 \
     --tp 8 \
     --enable-multimodal \
     --grammar-backend none
 # install ollama
 curl -fsSL https://ollama.com/install.sh | sh
 # fetch model
+ollama pull internlm/interns1
 # run model
+ollama run internlm/interns1
 # then use openai client to call on http://localhost:11434/v1
 ```
 A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
+To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
 ```python
 from openai import OpenAI
 import json
     temperature=0.8,
     top_p=0.8,
     stream=False,
+    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
     tools=tools)
 print(response.choices[0].message)
 messages.append(response.choices[0].message)
     temperature=0.8,
     top_p=0.8,
     stream=False,
+    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
     tools=tools)
 print(response.choices[0].message.content)
 ```
 ### Switching Between Thinking and Non-Thinking Modes
 extra_body={
     "chat_template_kwargs": {"enable_thinking": false}
 }
+```