Improve Intern-S1-mini-GGUF model card: Add Transformers usage, paper/GitHub links, and `library_name`

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +370 -10
README.md CHANGED
@@ -1,21 +1,24 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
  - en
5
- base_model:
6
- - internlm/Intern-S1-mini
7
-
8
- base_model_relation: quantized
9
-
10
  pipeline_tag: image-text-to-text
11
  tags:
12
  - chat
 
 
13
  ---
14
 
15
  # Intern-S1-mini-GGUF Model
16
 
17
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
18
 
 
 
 
 
19
 
20
  <p align="center">
21
  👋 join us on <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://cdn.vansin.top/intern-s1.jpg" target="_blank">WeChat</a>
@@ -64,7 +67,7 @@ pip install huggingface-hub
64
  huggingface-cli download internlm/Intern-S1-mini-GGUF --include *-f16.gguf --local-dir Intern-S1-mini-GGUF --local-dir-use-symlinks False
65
  ```
66
 
67
- ## Inference
68
 
69
  You can use `build/bin/llama-mtmd-cli` for conducting inference. For a detailed explanation of `build/bin/llama-mtmd-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
70
 
@@ -74,7 +77,10 @@ Here is an example of using the thinking system prompt.
74
 
75
  ```shell
76
 
77
- system_prompt="<|im_start|>system\nYou are an expert reasoner with extensive experience in all areas. You approach problems through systematic thinking and rigorous reasoning. Your response should reflect deep understanding and precise logical thinking, making your solution path and reasoning clear to others. Please put your thinking process within <think>...</think> tags.\n<|im_end|>\n"
 
 
 
78
 
79
  build/bin/llama-mtmd-cli \
80
  --model Intern-S1-mini-GGUF/f16/Intern-S1-mini-f16.gguf  \
@@ -90,7 +96,7 @@ build/bin/llama-mtmd-cli \
90
 
91
  Then input your question with image input as `/image xxx.jpg`.
92
 
93
- ## Serving
94
 
95
  `llama.cpp` provides an OpenAI API compatible server - `llama-server`. You can deploy the model as a service like this:
96
 
@@ -140,3 +146,357 @@ ollama pull internlm/interns1:mini
140
  ollama run internlm/interns1:mini
141
  # then use openai client to call on http://localhost:11434/v1
142
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - internlm/Intern-S1-mini
4
  language:
5
  - en
6
+ license: apache-2.0
 
 
 
 
7
  pipeline_tag: image-text-to-text
8
  tags:
9
  - chat
10
+ base_model_relation: quantized
11
+ library_name: transformers
12
  ---
13
 
14
  # Intern-S1-mini-GGUF Model
15
 
16
+ This repository contains the `Intern-S1-mini` model in GGUF format, which is part of the **Intern-S1** family of scientific multimodal foundation models as introduced in the paper [Intern-S1: A Scientific Multimodal Foundation Model](https://huggingface.co/papers/2508.15763) ([arXiv:2508.15763](https://arxiv.org/abs/2508.15763)).
17
 
18
+ For more details about the project, visit the [official GitHub repository](https://github.com/InternLM/Intern-S1).
19
+ You can also try the [online chat demo](https://chat.intern-ai.org.cn/).
20
+
21
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
22
 
23
  <p align="center">
24
  👋 join us on <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://cdn.vansin.top/intern-s1.jpg" target="_blank">WeChat</a>
 
67
  huggingface-cli download internlm/Intern-S1-mini-GGUF --include *-f16.gguf --local-dir Intern-S1-mini-GGUF --local-dir-use-symlinks False
68
  ```
69
 
70
+ ## Inference (llama.cpp)
71
 
72
  You can use `build/bin/llama-mtmd-cli` for conducting inference. For a detailed explanation of `build/bin/llama-mtmd-cli`, please refer to [this guide](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
73
 
 
77
 
78
  ```shell
79
 
80
+ system_prompt="<|im_start|>system
81
+ You are an expert reasoner with extensive experience in all areas. You approach problems through systematic thinking and rigorous reasoning. Your response should reflect deep understanding and precise logical thinking, making your solution path and reasoning clear to others. Please put your thinking process within <think>...</think> tags.
82
+ <|im_end|>
83
+ "
84
 
85
  build/bin/llama-mtmd-cli \
86
  --model Intern-S1-mini-GGUF/f16/Intern-S1-mini-f16.gguf  \
 
96
 
97
  Then input your question with image input as `/image xxx.jpg`.
98
 
99
+ ## Serving (llama.cpp)
100
 
101
  `llama.cpp` provides an OpenAI API compatible server - `llama-server`. You can deploy the model as a service like this:
102
 
 
146
  ollama run internlm/interns1:mini
147
  # then use openai client to call on http://localhost:11434/v1
148
  ```
149
+
150
+ ## Quick Start (Transformers)
151
+
152
+ ### Sampling Parameters
153
+
154
+ We recommend using the following hyperparameters to ensure better results
155
+
156
+ For Intern-S1-mini:
157
+
158
+ ```python
159
+ top_p = 1.0
160
+ top_k = 50
161
+ min_p = 0.0
162
+ temperature = 0.8
163
+ ```
164
+
165
+ ### Text input
166
+
167
+ ```python
168
+ from transformers import AutoProcessor, AutoModelForCausalLM
169
+ import torch
170
+
171
+ model_name = "internlm/Intern-S1-mini"
172
+ processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
173
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
174
+
175
+ messages = [
176
+ {
177
+ "role": "user",
178
+ "content": [
179
+ {"type": "text", "text": "tell me about an interesting physical phenomenon."},
180
+ ],
181
+ }
182
+ ]
183
+
184
+ inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
185
+
186
+ generate_ids = model.generate(**inputs, max_new_tokens=32768)
187
+ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
188
+ print(decoded_output)
189
+ ```
190
+
191
+ #### Image input
192
+
193
+ ```python
194
+ from transformers import AutoProcessor, AutoModelForCausalLM
195
+ import torch
196
+
197
+ model_name = "internlm/Intern-S1-mini"
198
+ processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
199
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
200
+
201
+ messages = [
202
+ {
203
+ "role": "user",
204
+ "content": [
205
+ {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
206
+ {"type": "text", "text": "Please describe the image explicitly."},
207
+ ],
208
+ }
209
+ ]
210
+
211
+ inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
212
+
213
+ generate_ids = model.generate(**inputs, max_new_tokens=32768)
214
+ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
215
+ print(decoded_output)
216
+ ```
217
+
218
+ #### Video input
219
+
220
+ Please ensure that the decord video decoding library is installed via `pip install decord`.
221
+
222
+ ```python
223
+ from transformers import AutoProcessor, AutoModelForCausalLM
224
+ import torch
225
+
226
+ model_name = "internlm/Intern-S1-mini"
227
+ processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
228
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
229
+
230
+ messages = [
231
+ {
232
+ "role": "user",
233
+ "content": [
234
+ {
235
+ "type": "video",
236
+ "url": "https://huggingface.co/datasets/hf-internal-testing/fixtures_videos/resolve/main/tennis.mp4",
237
+ },
238
+ {"type": "text", "text": "What type of shot is the man performing?"},
239
+ ],
240
+ }
241
+ ]
242
+
243
+ inputs = processor.apply_chat_template(
244
+ messages,
245
+ return_tensors="pt",
246
+ add_generation_prompt=True,
247
+ video_load_backend="decord",
248
+ tokenize=True,
249
+ return_dict=True,
250
+ ).to(model.device, dtype=torch.float16)
251
+
252
+ generate_ids = model.generate(**inputs, max_new_tokens=32768)
253
+ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
254
+ print(decoded_output)
255
+ ```
256
+
257
+ ## Advanced Usage
258
+
259
+ ### Tool Calling
260
+
261
+ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capability that allows them to extend their functionality by interacting with external tools and APIs. This enables models to perform tasks like fetching up-to-the-minute information, running code, or calling functions within other applications.
262
+
263
+ A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
264
+
265
+ To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
266
+
267
+ ```python
268
+
269
+ from openai import OpenAI
270
+ import json
271
+
272
+
273
+ def get_current_temperature(location: str, unit: str = "celsius"):
274
+ """Get current temperature at a location.
275
+
276
+ Args:
277
+ location: The location to get the temperature for, in the format "City, State, Country".
278
+ unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
279
+
280
+ Returns:
281
+ the temperature, the location, and the unit in a dict
282
+ """
283
+ return {
284
+ "temperature": 26.1,
285
+ "location": location,
286
+ "unit": unit,
287
+ }
288
+
289
+
290
+ def get_temperature_date(location: str, date: str, unit: str = "celsius"):
291
+ """Get temperature at a location and date.
292
+
293
+ Args:
294
+ location: The location to get the temperature for, in the format "City, State, Country".
295
+ date: The date to get the temperature for, in the format "Year-Month-Day".
296
+ unit: The unit to return the temperature in. Defaults to "celsius". (choices: ["celsius", "fahrenheit"])
297
+
298
+ Returns:
299
+ the temperature, the location, the date and the unit in a dict
300
+ """
301
+ return {
302
+ "temperature": 25.9,
303
+ "location": location,
304
+ "date": date,
305
+ "unit": unit,
306
+ }
307
+
308
+ def get_function_by_name(name):
309
+ if name == "get_current_temperature":
310
+ return get_current_temperature
311
+ if name == "get_temperature_date":
312
+ return get_temperature_date
313
+
314
+ tools = [{
315
+ 'type': 'function',
316
+ 'function': {
317
+ 'name': 'get_current_temperature',
318
+ 'description': 'Get current temperature at a location.',
319
+ 'parameters': {
320
+ 'type': 'object',
321
+ 'properties': {
322
+ 'location': {
323
+ 'type': 'string',
324
+ 'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
325
+ },
326
+ 'unit': {
327
+ 'type': 'string',
328
+ 'enum': [
329
+ 'celsius',
330
+ 'fahrenheit'
331
+ ],
332
+ 'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
333
+ }
334
+ },
335
+ 'required': [
336
+ 'location'
337
+ ]
338
+ }
339
+ }
340
+ }, {
341
+ 'type': 'function',
342
+ 'function': {
343
+ 'name': 'get_temperature_date',
344
+ 'description': 'Get temperature at a location and date.',
345
+ 'parameters': {
346
+ 'type': 'object',
347
+ 'properties': {
348
+ 'location': {
349
+ 'type': 'string',
350
+ 'description': 'The location to get the temperature for, in the format \'City, State, Country\'.'
351
+ },
352
+ 'date': {
353
+ 'type': 'string',
354
+ 'description': 'The date to get the temperature for, in the format \'Year-Month-Day\'.'
355
+ },
356
+ 'unit': {
357
+ 'type': 'string',
358
+ 'enum': [
359
+ 'celsius',
360
+ 'fahrenheit'
361
+ ],
362
+ 'description': 'The unit to return the temperature in. Defaults to \'celsius\'.'
363
+ }
364
+ },
365
+ 'required': [
366
+ 'location',
367
+ 'date'
368
+ ]
369
+ }
370
+ }
371
+ }]
372
+
373
+
374
+
375
+ messages = [
376
+ {'role': 'user', 'content': 'Today is 2024-11-14, What\'s the temperature in San Francisco now? How about tomorrow?'}
377
+ ]
378
+
379
+ openai_api_key = "EMPTY"
380
+ openai_api_base = "http://0.0.0.0:23333/v1"
381
+ client = OpenAI(
382
+ api_key=openai_api_key,
383
+ base_url=openai_api_base,
384
+ )
385
+ model_name = client.models.list().data[0].id
386
+ response = client.chat.completions.create(
387
+ model=model_name,
388
+ messages=messages,
389
+ max_tokens=32768,
390
+ temperature=0.8,
391
+ top_p=0.8,
392
+ stream=False,
393
+ extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
394
+ tools=tools)
395
+ print(response.choices[0].message)
396
+ messages.append(response.choices[0].message)
397
+
398
+ for tool_call in response.choices[0].message.tool_calls:
399
+ tool_call_args = json.loads(tool_call.function.arguments)
400
+ tool_call_result = get_function_by_name(tool_call.function.name)(**tool_call_args)
401
+ tool_call_result = json.dumps(tool_call_result, ensure_ascii=False)
402
+ messages.append({
403
+ 'role': 'tool',
404
+ 'name': tool_call.function.name,
405
+ 'content': tool_call_result,
406
+ 'tool_call_id': tool_call.id
407
+ })
408
+
409
+ response = client.chat.completions.create(
410
+ model=model_name,
411
+ messages=messages,
412
+ temperature=0.8,
413
+ top_p=0.8,
414
+ stream=False,
415
+ extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
416
+ tools=tools)
417
+ print(response.choices[0].message.content)
418
+ ```
419
+
420
+ ### Switching Between Thinking and Non-Thinking Modes
421
+
422
+ Intern-S1 enables thinking mode by default, enhancing the model's reasoning capabilities to generate higher-quality responses. This feature can be disabled by setting `enable_thinking=False` in `tokenizer.apply_chat_template`
423
+
424
+ ```python
425
+ text = tokenizer.apply_chat_template(
426
+ messages,
427
+ tokenize=False,
428
+ add_generation_prompt=True,
429
+ enable_thinking=False # think mode indicator
430
+ )
431
+ ```
432
+
433
+ With LMDeploy serving Intern-S1 models, you can dynamically control the thinking mode by adjusting the `enable_thinking` parameter in your requests.
434
+
435
+ ```python
436
+ from openai import OpenAI
437
+ import json
438
+
439
+ messages = [
440
+ {
441
+ 'role': 'user',
442
+ 'content': 'who are you'
443
+ }, {
444
+ 'role': 'assistant',
445
+ 'content': 'I am an AI'
446
+ }, {
447
+ 'role': 'user',
448
+ 'content': 'AGI is?'
449
+ }]
450
+
451
+ openai_api_key = "EMPTY"
452
+ openai_api_base = "http://0.0.0.0:23333/v1"
453
+ client = OpenAI(
454
+ api_key=openai_api_key,
455
+ base_url=openai_api_base,
456
+ )
457
+ model_name = client.models.list().data[0].id
458
+
459
+ response = client.chat.completions.create(
460
+ model=model_name,
461
+ messages=messages,
462
+ temperature=0.7,
463
+ top_p=0.8,
464
+ max_tokens=2048,
465
+ extra_body={
466
+ "enable_thinking": False,
467
+ }
468
+ )
469
+ print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
470
+ ```
471
+
472
+ For vllm and sglang users, configure this through,
473
+
474
+ ```python
475
+ extra_body={
476
+ "chat_template_kwargs": {"enable_thinking": false}
477
+ }
478
+ ```
479
+
480
+ ## Fine-tuning
481
+
482
+ See this [documentation](docs/sft.md) for more details.
483
+
484
+ ## License
485
+
486
+ This project is released under the Apache 2.0 license.
487
+
488
+ ## Citation
489
+
490
+ If you find this work useful, feel free to give us a cite.
491
+
492
+ ```bibtex
493
+ @misc{bai2025interns1scientificmultimodalfoundation,
494
+ title={Intern-S1: A Scientific Multimodal Foundation Model},
495
+ author={Lei Bai and Zhongrui Cai and Maosong Cao and Weihan Cao and Chiyu Chen and Haojiong Chen and Kai Chen and Pengcheng Chen and Ying Chen and Yongkang Chen and Yu Cheng and Yu Cheng and Pei Chu and Tao Chu and Erfei Cui and Ganqu Cui and Long Cui and Ziyun Cui and Nianchen Deng and Ning Ding and Nanqin Dong and Peijie Dong and Shihan Dou and Sinan Du and Haodong Duan and Caihua Fan and Ben Gao and Changjiang Gao and Jianfei Gao and Songyang Gao and Yang Gao and Zhangwei Gao and Jiaye Ge and Qiming Ge and Lixin Gu and Yuzhe Gu and Aijia Guo and Qipeng Guo and Xu Guo and Conghui He and Junjun He and Yili Hong and Siyuan Hou and Caiyu Hu and Hanglei Hu and Jucheng Hu and Ming Hu and Zhouqi Hua and Haian Huang and Junhao Huang and Xu Huang and Zixian Huang and Zhe Jiang and Lingkai Kong and Linyang Li and Peiji Li and Pengze Li and Shuaibin Li and Tianbin Li and Wei Li and Yuqiang Li and Dahua Lin and Junyao Lin and Tianyi Lin and Zhishan Lin and Hongwei Liu and Jiangning Liu and Jiyao Liu and Junnan Liu and Kai Liu and Kaiwen Liu and Kuikun Liu and Shichun Liu and Shudong Liu and Wei Liu and Xinyao Liu and Yuhong Liu and Zhan Liu and Yinquan Lu and Haijun Lv and Hongxia Lv and Huijie Lv and Qidang Lv and Ying Lv and Chengqi Lyu and Chenglong Ma and Jianpeng Ma and Ren Ma and Runmin Ma and Runyuan Ma and Xinzhu Ma and Yichuan Ma and Zihan Ma and Sixuan Mi and Junzhi Ning and Wenchang Ning and Xinle Pang and Jiahui Peng and Runyu Peng and Yu Qiao and Jiantao Qiu and Xiaoye Qu and Yuan Qu and Yuchen Ren and Fukai Shang and Wenqi Shao and Junhao Shen and Shuaike Shen and Chunfeng Song and Demin Song and Diping Song and Chenlin Su and Weijie Su and Weigao Sun and Yu Sun and Qian Tan and Cheng Tang and Huanze Tang and Kexian Tang and Shixiang Tang and Jian Tong and Aoran Wang and Bin Wang and Dong Wang and Lintao Wang and Rui Wang and Weiyun Wang and Wenhai Wang and Yi Wang and Ziyi Wang and Ling-I Wu and Wen Wu and Yue Wu and Zijian Wu and Linchen Xiao and Shuhao Xing and Chao Xu and Huihui Xu and Jun Xu and Ruiliang Xu and Wanghan Xu and GanLin Yang and Yuming Yang and Haochen Ye and Jin Ye and Shenglong Ye and Jia Yu and Jiashuo Yu and Jing Yu and Fei Yuan and Bo Zhang and Chao Zhang and Chen Zhang and Hongjie Zhang and Jin Zhang and Qiaosheng Zhang and Qiuyinzhe Zhang and Songyang Zhang and Taolin Zhang and Wenlong Zhang and Wenwei Zhang and Yechen Zhang and Ziyang Zhang and Haiteng Zhao and Qian Zhao and Xiangyu Zhao and Xiangyu Zhao and Bowen Zhou and Dongzhan Zhou and Peiheng Zhou and Yuhao Zhou and Yunhua Zhou and Dongsheng Zhu and Lin Zhu and Yicheng Zou},
496
+ year={2025},
497
+ eprint={2508.15763},
498
+ archivePrefix={arXiv},
499
+ primaryClass={cs.LG},
500
+ url={https://arxiv.org/abs/2508.15763},
501
+ }
502
+ ```