Intern-S1-GGUF / README.md

nielsr HF Staff

Improve model card for Intern-S1-GGUF: Add paper, abstract, project links & enhance description

2801e12 verified 8 months ago

27.5 kB

base_model:
  - internlm/Intern-S1
language:
  - en
license: apache-2.0
pipeline_tag: image-text-to-text
tags:
  - chat
base_model_relation: quantized

Intern-S1-GGUF: A Scientific Multimodal Foundation Model

👋 join us on Discord and WeChat

Paper

The model was presented in the paper Intern-S1: A Scientific Multimodal Foundation Model.

Abstract

In recent years, a plethora of open-source foundation models have emerged, achieving remarkable progress in some widely attended fields, with performance being quite close to that of closed-source models. However, in high-value but more challenging scientific professional fields, either the fields still rely on expert models, or the progress of general foundation models lags significantly compared to those in popular areas, far from sufficient for transforming scientific research and leaving substantial gap between open-source models and closed-source models in these scientific domains. To mitigate this gap and explore a step further toward Artificial General Intelligence (AGI), we introduce Intern-S1, a specialized generalist equipped with general understanding and reasoning capabilities with expertise to analyze multiple science modal data. Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the post-training stage, Intern-S1 undergoes offline and then online reinforcement learning (RL) in InternBootCamp, where we propose Mixture-of-Rewards (MoR) to synergize the RL training on more than 1000 tasks simultaneously. Through integrated innovations in algorithms, data, and training systems, Intern-S1 achieved top-tier performance in online RL training. On comprehensive evaluation benchmarks, Intern-S1 demonstrates competitive performance on general reasoning tasks among open-source models and significantly outperforms open-source models in scientific domains, surpassing closed-source state-of-the-art models in professional tasks, such as molecular synthesis planning, reaction condition prediction, predicting thermodynamic stabilities for crystals. Our models are available at this https URL .

Introduction

We introduce Intern-S1, our most advanced open-source multimodal reasoning model to date. Intern-S1 combines strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks, rivaling leading closed-source commercial models.

Built upon a 235B MoE language model (Qwen3) and a 6B Vision encoder (InternViT), Intern-S1 has been further pretrained on 5 trillion tokens of multimodal data, including over 2.5 trillion scientific-domain tokens. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes, making Intern-S1 to be a capable research assistant for real-world scientific applications.

This repository offers Intern-S1 models in GGUF format, which can be utilized by llama.cpp, a highly popular open-source framework for Large Language Model (LLM) inference, across a variety of hardware platforms, both locally and in the cloud. This repository provides Intern-S1 models in GGUF format in both half precision and various low-bit quantized versions, including q8_0.

In the subsequent sections, we will first present the installation procedure, followed by an explanation of the model download process. And finally we will illustrate the methods for model inference and service deployment through specific examples.

Features

Strong performance across language and vision reasoning benchmarks, especially scientific tasks.
Continuously pretrained on a massive 5T token dataset, with over 50% specialized scientific data, embedding deep domain expertise.
Dynamic tokenizer enables native understanding of molecular formulas, protein sequences, and seismic signals.

Model Zoo

Intern-S1

	BF16	FP8	GGUF
🤗HuggingFace	internlm/Intern-S1	internlm/Intern-S1-FP8	internlm/Intern-S1-GGUF
ModelScope	Shanghai_AI_Laboratory/Intern-S1	Shanghai_AI_Laboratory/Intern-S1-FP8	Shanghai_AI_Laboratory/Intern-S1-GGUF

Intern-S1-mini

	BF16	FP8	GGUF
🤗HuggingFace	internlm/Intern-S1-mini	internlm/Intern-S1-mini-FP8	internlm/Intern-S1-mini-GGUF
ModelScope	Shanghai_AI_Laboratory/Intern-S1-mini	Shanghai_AI_Laboratory/Intern-S1-mini-FP8	-

Performance

We evaluate the Intern-S1 on various benchmarks including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.

Intern-S1

Benchmarks	Intern-S1		InternVL3-78B	Qwen2.5-VL-72B	DS-R1-0528	Qwen3-235B-A22B	Kimi-K2-Instruct	Gemini-2.5 Pro	o3	Grok-4
Benchmarks	MMLU-Pro	83.5 ✅		73.0	72.1	83.4	82.2	82.7	86.0	85.0	85.9
MMMU	77.7 ✅		72.2	70.2	-	-	-	81.9	80.8	77.9
GPQA	77.3		49.9	49.0	80.6	71.1	77.8	83.8	83.3	87.5
MMStar	74.9 ✅		72.5	70.8	-	-	-	79.3	75.1	69.6
MathVista	81.5 👑		79.0	74.8	-	-	-	80.3	77.5	72.5
AIME2025	86.0		10.7	10.9	87.5	81.5	51.4	83.0	88.9	91.7
MathVision	62.5 ✅		43.1	38.1	-	-	-	73.0	67.7	67.3
IFEval	86.7		75.6	83.9	79.7	85.0	90.2	91.5	92.2	92.8
SFE	44.3 👑		36.2	30.5	-	-	-	43.0	37.7	31.2
Physics	44.0 ✅		23.1	15.7	-	-	-	40.0	47.9	42.8
SmolInstruct	51.0 👑		19.4	21.0	30.7	28.7	48.1	40.4	43.9	47.3
ChemBench	83.4 👑		61.3	61.6	75.6	75.8	75.3	82.8	81.6	83.3
MatBench	75.0 👑		49.3	51.5	57.7	52.1	61.7	61.7	61.6	67.9
MicroVQA	63.9 👑		59.1	53.0	-	-	-	63.1	58.3	59.5
ProteinLMBench	63.1		61.6	61.0	61.4	59.8	66.7	62.9	67.7	66.2
MSEarthMCQ	65.7 👑		57.2	37.6	-	-	-	59.9	61.0	58.0
XLRS-Bench	55.0 👑		49.3	50.9	-	-	-	45.2	43.6	45.4

Note: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.

Intern-S1-mini

Benchmarks	Intern-S1-mini	Qwen3-8B	GLM-4.1V	MiMo-VL-7B-RL-2508
MMLU-Pro	74.78	73.7	57.1	73.93
MMMU	72.33	-	69.9	70.4
MMStar	65.2	-	71.5	72.9
GPQA	65.15	62	50.32	60.35
AIME2024	84.58	76	36.2	72.6
AIME2025	80	67.3	32	64.4
MathVision	51.41	-	53.9	54.5
MathVista	70.3	-	80.7	79.4
IFEval	81.15	85	71.53	71.4
SFE	35.84	-	43.2	43.9
Physics	28.76	-	28.3	28.2
SmolInstruct	32.2	17.6	18.1	16.11
ChemBench	76.47	61.1	56.2	66.78
MatBench	61.55	45.24	54.3	46.9
MicroVQA	56.62	-	50.2	50.96
ProteinLMBench	58.47	59.1	58.3	59.8
MSEarthMCQ	58.12	-	50.3	47.3
XLRS-Bench	51.63	-	49.8	12.29

We use the OpenCompass and VLMEvalkit to evaluate all models. Please refer to this page to quickly start the text-only evaluation task.

Quick Start (GGUF via llama.cpp and Ollama)

Sampling Parameters

We recommend using the following hyperparameters to ensure better results

For Intern-S1:

top_p = 1.0
top_k = 50
min_p = 0.0
temperature = 0.7

For Intern-S1-mini:

top_p = 1.0
top_k = 50
min_p = 0.0
temperature = 0.8

Installation (llama.cpp)

We recommend building llama.cpp from source. The following code snippet provides an example for the Linux CUDA platform. For instructions on other platforms, please refer to the official guide.

Step 1: create a conda environment and install cmake

conda create --name interns1 python=3.10 -y
conda activate interns1
pip install cmake

Step 2: clone the source code and build the project

git clone --depth=1 https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j

All the built targets can be found in the sub directory build/bin

In the following sections, we assume that the working directory is at the root directory of llama.cpp.

Download models (GGUF)

In the introduction section, we mentioned that this repository includes several models with varying levels of computational precision. You can download the appropriate model based on your requirements. For instance, fp16 gguf files can be downloaded as below：

pip install huggingface-hub
huggingface-cli download internlm/Intern-S1-GGUF --include *-f16-*.gguf  --local-dir Intern-S1-GGUF --local-dir-use-symlinks False

Merge model files (GGUF)

Run the following command to merge gguf files into one:

build/bin/llama-gguf-split \
--merge \
Intern-S1-GGUF/f16/Intern-S1-f16-00001-of-00016.gguf \
Intern-S1-GGUF/f16/Intern-S1-f16.gguf

Inference (llama.cpp)

You can use build/bin/llama-mtmd-cli for conducting inference. For a detailed explanation of build/bin/llama-mtmd-cli, please refer to this guide

Chat example

Here is an example of using the thinking system prompt.


system_prompt="<|im_start|>system
You are an expert reasoner with extensive experience in all areas. You approach problems through systematic thinking and rigorous reasoning. Your response should reflect deep understanding and precise logical thinking, making your solution path and reasoning clear to others. Please put your thinking process within <think>...</think> tags.
<|im_end|>
"

build/bin/llama-mtmd-cli \
    --model Intern-S1-GGUF/f16/Intern-S1-f16.gguf \
    --mmproj Intern-S1-GGUF/f16/mmproj-Intern-S1-f16.gguf \
    --predict 2048 \
    --ctx-size 8192 \
    --gpu-layers 100 \
    --temp 0.8 \
    --top-p 0.8 \
    --top-k 50 \
    --seed 1024

Then input your question with image input as /image xxx.jpg.

Serving (llama.cpp)

llama.cpp provides an OpenAI API compatible server - llama-server. You can deploy the model as a service like this:

./build/bin/llama-server \
    --model Intern-S1-GGUF/f16/Intern-S1-f16.gguf \
    --mmproj Intern-S1-GGUF/f16/mmproj-Intern-S1-f16.gguf \
    --gpu-layers 100 \
    --temp 0.8 \
    --top-p 0.8 \
    --top-k 50 \
    --port 8080 \
    --seed 1024

At the client side, you can access the service through OpenAI API:

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url='http://localhost:8080/v1'
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": " provide three suggestions about time management"},
  ],
  temperature=0.8,
  top_p=0.8
)
print(response)

Ollama

# install ollama
curl -fsSL https://ollama.com/install.sh | sh
# fetch model
ollama pull internlm/interns1
# run model 
ollama run internlm/interns1
# then use openai client to call on http://localhost:11434/v1

Advanced Usage

Tool Calling

Many Large Language Models (LLMs) now feature Tool Calling, a powerful capability that allows them to extend their functionality by interacting with external tools and APIs. This enables models to perform tasks like fetching up-to-the-minute information, running code, or calling functions within other applications.

A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.

To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).


from openai import OpenAI
import json


def get_current_temperature(location: str, unit: str = "celsius"):
    """Get current temperature at a location.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, and the unit in a dict
    """
    return {
        "temperature": 26.1,
        "location": location,
        "unit": unit,
    }


def get_temperature_date(location: str, date: str, unit: str = "celsius"):
    """Get temperature at a location and date.

    Args:
        location: The location to get the temperature for, in the format "City, State, Country".
        date: The date to get the temperature for, in the format "Year-Month-Day".
        unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])

    Returns:
        the temperature, the location, the date and the unit in a dict
    """
    return {
        "temperature": 25.9,
        "location": location,
        "date": date,
        "unit": unit,
    }

def get_function_by_name(name):
    if name == "get_current_temperature":
        return get_current_temperature
    if name == "get_temperature_date":
        return get_temperature_date

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_current_temperature',
        'description': 'Get current temperature at a location.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location'
            ]
        }
    }
}, {
    'type': 'function',
    'function': {
        'name': 'get_temperature_date',
        'description': 'Get temperature at a location and date.',
        'parameters': {
            'type': 'object',
            'properties': {
                'location': {
                    'type': 'string',
                    'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
                },
                'date': {
                    'type': 'string',
                    'description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'
                },
                'unit': {
                    'type': 'string',
                    'enum': [
                        'celsius',
                        'fahrenheit'
                    ],
                    'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
                }
            },
            'required': [
                'location',
                'date'
            ]
        }
    }
}]



messages = [
    {'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now? How about tomorrow?'}
]

openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    max_tokens=32768,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
    tools=tools)
print(response.choices[0].message)
messages.append(response.choices[0].message)

for tool_call in response.choices[0].message.tool_calls:
    tool_call_args = json.loads(tool_call.function.arguments)
    tool_call_result = get_function_by_name(tool_call.function.name)(**tool_call_args)
    tool_call_result = json.dumps(tool_call_result, ensure_ascii=False)
    messages.append({
        'role': 'tool',
        'name': tool_call.function.name,
        'content': tool_call_result,
        'tool_call_id': tool_call.id
    })

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.8,
    top_p=0.8,
    stream=False,
    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
    tools=tools)
print(response.choices[0].message.content)

Switching Between Thinking and Non-Thinking Modes

Intern-S1 enables thinking mode by default, enhancing the model's reasoning capabilities to generate higher-quality responses. This feature can be disabled by setting enable_thinking=False in tokenizer.apply_chat_template

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # think mode indicator
)

With LMDeploy serving Intern-S1 models, you can dynamically control the thinking mode by adjusting the enable_thinking parameter in your requests.

from openai import OpenAI
import json

messages = [
{
    'role': 'user',
    'content': 'who are you'
}, {
    'role': 'assistant',
    'content': 'I am an AI'
}, {
    'role': 'user',
    'content': 'AGI is?'
}]

openai_api_key = "EMPTY"
openai_api_base = "http://0.0.0.0:23333/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
model_name = client.models.list().data[0].id

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    temperature=0.7,
    top_p=0.8,
    max_tokens=2048,
    extra_body={
        "enable_thinking": False,
    }
)
print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))

For vllm and sglang users, configure this through,

extra_body={
    "chat_template_kwargs": {"enable_thinking": false}
}

Fine-tuning

See this documentation for more details.

License

This project is released under the Apache 2.0 license.

Citation

If you find this work useful, feel free to give us a cite.

@misc{bai2025interns1scientificmultimodalfoundation,
      title={Intern-S1: A Scientific Multimodal Foundation Model},
      author={Lei Bai and Zhongrui Cai and Maosong Cao and Weihan Cao and Chiyu Chen and Haojiong Chen and Kai Chen and Pengcheng Chen and Ying Chen and Yongkang Chen and Yu Cheng and Yu Cheng and Pei Chu and Tao Chu and Erfei Cui and Ganqu Cui and Long Cui and Ziyun Cui and Nianchen Deng and Ning Ding and Nanqin Dong and Peijie Dong and Shihan Dou and Sinan Du and Haodong Duan and Caihua Fan and Ben Gao and Changjiang Gao and Jianfei Gao and Songyang Gao and Yang Gao and Zhangwei Gao and Jiaye Ge and Qiming Ge and Lixin Gu and Yuzhe Gu and Aijia Guo and Qipeng Guo and Xu Guo and Conghui He and Junjun He and Yili Hong and Siyuan Hou and Caiyu Hu and Hanglei Hu and Jucheng Hu and Ming Hu and Zhouqi Hua and Haian Huang and Junhao Huang and Xu Huang and Zixian Huang and Zhe Jiang and Lingkai Kong and Linyang Li and Peiji Li and Pengze Li and Shuaibin Li and Tianbin Li and Wei Li and Yuqiang Li and Dahua Lin and Junyao Lin and Tianyi Lin and Zhishan Lin and Hongwei Liu and Jiangning Liu and Jiyao Liu and Junnan Liu and Kai Liu and Kaiwen Liu and Kuikun Liu and Shichun Liu and Shudong Liu and Wei Liu and Xinyao Liu and Yuhong Liu and Zhan Liu and Yinquan Lu and Haijun Lv and Hongxia Lv and Huijie Lv and Qidang Lv and Ying Lv and Chengqi Lyu and Chenglong Ma and Jianpeng Ma and Ren Ma and Runmin Ma and Runyuan Ma and Xinzhu Ma and Yichuan Ma and Zihan Ma and Sixuan Mi and Junzhi Ning and Wenchang Ning and Xinle Pang and Jiahui Peng and Runyu Peng and Yu Qiao and Jiantao Qiu and Xiaoye Qu and Yuan Qu and Yuchen Ren and Fukai Shang and Wenqi Shao and Junhao Shen and Shuaike Shen and Chunfeng Song and Demin Song and Diping Song and Chenlin Su and Weijie Su and Weigao Sun and Yu Sun and Qian Tan and Cheng Tang and Huanze Tang and Kexian Tang and Shixiang Tang and Jian Tong and Aoran Wang and Bin Wang and Dong Wang and Lintao Wang and Rui Wang and Weiyun Wang and Wenhai Wang and Yi Wang and Ziyi Wang and Ling-I Wu and Wen Wu and Yue Wu and Zijian Wu and Linchen Xiao and Shuhao Xing and Chao Xu and Huihui Xu and Jun Xu and Ruiliang Xu and Wanghan Xu and GanLin Yang and Yuming Yang and Haochen Ye and Jin Ye and Shenglong Ye and Jia Yu and Jiashuo Yu and Jing Yu and Fei Yuan and Bo Zhang and Chao Zhang and Chen Zhang and Hongjie Zhang and Jin Zhang and Qiaosheng Zhang and Qiuyinzhe Zhang and Songyang Zhang and Taolin Zhang and Wenlong Zhang and Wenwei Zhang and Yechen Zhang and Ziyang Zhang and Haiteng Zhao and Qian Zhao and Xiangyu Zhao and Xiangyu Zhao and Bowen Zhou and Dongzhan Zhou and Peiheng Zhou and Yuhao Zhou and Yunhua Zhou and Dongsheng Zhu and Lin Zhu and Yicheng Zou},
      year={2025},
      eprint={2508.15763},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.15763},
}

internlm
/

Intern-S1-GGUF

Intern-S1-GGUF: A Scientific Multimodal Foundation Model

Paper

Abstract

Links

Introduction

Features

Model Zoo

Intern-S1

Intern-S1-mini

Performance

Intern-S1

Intern-S1-mini

Quick Start (GGUF via llama.cpp and Ollama)

Sampling Parameters

Installation (llama.cpp)

Download models (GGUF)

Merge model files (GGUF)

Inference (llama.cpp)

Chat example

Serving (llama.cpp)

Ollama

Advanced Usage

Tool Calling

Switching Between Thinking and Non-Thinking Modes

Fine-tuning

License

Citation