Instructions to use sensenova/SenseNova-MARS-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sensenova/SenseNova-MARS-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sensenova/SenseNova-MARS-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sensenova/SenseNova-MARS-8B")
model = AutoModelForImageTextToText.from_pretrained("sensenova/SenseNova-MARS-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sensenova/SenseNova-MARS-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sensenova/SenseNova-MARS-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sensenova/SenseNova-MARS-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sensenova/SenseNova-MARS-8B

SGLang

How to use sensenova/SenseNova-MARS-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sensenova/SenseNova-MARS-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sensenova/SenseNova-MARS-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sensenova/SenseNova-MARS-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sensenova/SenseNova-MARS-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use sensenova/SenseNova-MARS-8B with Docker Model Runner:
```
docker model run hf.co/sensenova/SenseNova-MARS-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🪐 SenseNova-MARS

Overview

While Vision-Language Models (VLMs) can solve complex tasks through agentic reasoning, their capabilities remain largely constrained to text-oriented chain-of-thought or isolated tool invocation. They fail to exhibit the human-like proficiency required to seamlessly interleave dynamic tool manipulation with continuous reasoning, particularly in knowledge-intensive and visually complex scenarios that demand coordinated external tools such as search and image cropping. In this work, we introduce SenseNova-MARS, a novel Multimodal Agentic Reasoning and Search framework that empowers VLMs with interleaved visual reasoning and tool-use capabilities via reinforcement learning (RL). Specifically, SenseNova-MARS dynamically integrates the image search, text search, and image crop tools to tackle fine-grained and knowledge-intensive visual understanding challenges. In the RL stage, we propose the Batch-Normalized Group Sequence Policy Optimization (BN-GSPO) algorithm to improve the training stability and advance the model’s ability to invoke tools and reason effectively. To comprehensively evaluate the agentic VLMs on complex visual tasks, we introduce the HR-MMSearch benchmark, the first search-oriented benchmark composed of high-resolution images with knowledge-intensive and search-driven questions. Experiments demonstrate that SenseNova-MARS achieves state-of-the-art performance on open-source search and fine-grained image understanding benchmarks. Specifically, on search-oriented benchmarks, SenseNovaMARS-32B scores 74.3 on MMSearch and 54.4 on HR-MMSearch, surpassing proprietary models such as Gemini-3-Pro and GPT-5.2. SenseNova-MARS represents a promising step toward agentic VLMs by providing effective and robust tool-use capabilities.

Overall performance of SenseNova-MARS-32B compares to other models across six benchmarks. SenseNova-MARS-32B can surpass proprietary models such as Gemini-3-Pro and GPT-5.2 on the search-oriented benchmarks such as MMSearch and HR-MMSearch

SenseNova-MARS can tackle the challenging visual task by leveraging an integrated suite of text search, image search, and image crop tools within the reasoning process. This is a demo example.

Benchmark Performance

Search-oriented benchmarks

Type	Model	Average	MMSearch	HR-MMSearch	FVQA-test	InfoSeek	SimpleVQA	LiveVQA	MAT-Search
Direct Answer
Open-source	Qwen2.5-VL-7B-Instruct	27.70	7.60	0.58	26.28	31.95	47.88	19.63	60.00
	Qwen3-VL-8B-Instruct	29.24	11.70	12.13	24.22	23.15	42.94	23.18	67.33
	Qwen2.5-VL-32B-Instruct	32.01	11.70	3.93	30.50	36.65	48.57	21.40	71.33
	Qwen3-VL-32B-Instruct	35.22	16.96	19.02	32.17	28.95	45.90	31.59	72.67
Proprietary	GPT-4o-mini	33.08	15.79	1.31	36.83	35.95	44.42	24.63	72.66
	Gemini-2.5-Flash	40.87	21.64	7.54	43.78	44.10	55.48	31.57	82.00
	GPT-4o	42.38	23.39	13.11	48.00	52.90	51.73	28.18	79.33
	GPT-5	50.24	35.09	22.62	54.39	54.15	61.70	44.39	79.33
	GPT-5.2	50.92	43.27	24.92	50.94	50.40	59.92	47.00	80.00
	Gemini-3-Flash	53.68	57.31	21.97	56.50	54.85	63.57	38.90	82.67
	Gemini-3-Pro	55.87	62.57	26.89	59.22	56.30	64.07	40.06	82.00
Agentic Model (zero-shot)
Open-source	Qwen2.5-VL-7B-Instruct	35.50	32.16	19.34	36.00	28.80	42.35	22.52	67.33
	Qwen3-VL-8B-Instruct	51.52	47.37	27.87	53.61	46.15	62.29	39.37	84.00
	Qwen2.5-VL-32B-Instruct	53.45	49.71	33.44	52.22	50.10	65.15	42.17	81.33
	Qwen3-VL-32B-Instruct	53.82	49.12	34.43	54.28	49.85	64.17	42.87	82.00
Proprietary	GPT-4o-mini	45.65	38.60	26.23	50.00	42.35	50.84	31.54	80.00
	GPT-4o	55.09	49.12	30.16	66.34	59.55	63.67	40.09	76.67
	Gemini-2.5-Flash	58.05	59.06	40.00	61.72	53.70	68.81	47.75	75.33
	GPT-5	60.12	52.63	38.36	62.61	55.95	70.58	56.02	84.67
	Gemini-3-Flash	61.26	62.57	41.64	64.89	61.10	67.92	48.06	82.67
	GPT-5.2	67.64	66.08	48.20	68.78	65.55	78.18	65.99	80.67
	Gemini-3-Pro	69.06	74.27	48.52	72.61	66.45	75.91	59.69	86.00
Agentic Model
Open-source	Visual-ARFT	40.13	34.50	24.92	41.72	37.95	42.45	25.40	74.00
	DeepMMSearch-R1	-	-	-	-	47.51	55.87	-	-
	MMSearch-R1	52.49	53.80	20.33	58.40	55.10	57.40	48.40	74.00
	DeepEyesV2	-	63.70	-	60.60	51.10	59.40	-	-
	SenseNova-MARS-8B	64.20	67.84	41.64	67.11	61.70	70.19	56.22	84.67
	SenseNova-MARS-32B	69.74	74.27	54.43	72.61	65.25	74.14	60.83	86.67

High-resolution Benchmarks

Model	V* Bench	HR-Bench 4K	HR-Bench 8K	MME RealWorld	Avg.
Direct Answer
Gemini-2.5-Pro	83.8	87.3	85.4	-	-
GPT-4o	67.5	65.0	59.6	62.8	63.7
LLaVA-onevison	75.4	63.0	59.8	57.4	63.9
Qwen2.5-VL-7B-Instruct	75.3	65.5	62.1	56.8	64.9
Qwen2.5-VL-32B-Instruct	80.6	69.3	63.6	59.1	68.2
Qwen3-VL-8B-Instruct	86.4	78.9	74.6	61.9	75.5
Agentic Model
SEAL	74.8	-	-	-	-
Qwen3-VL-32B-Instruct	91.1	84.6	81.6	-	-
Qwen3-VL-235B-A22B-Instruct	93.7	85.4	82.4	-	-
Monet	83.3	71.0	68.0	-	-
Pixel-Reasoner	84.3	72.6	66.1	64.4	71.9
DeepEyes	83.3	73.2	69.5	64.1	72.5
Thyme	82.2	77.0	72.0	64.8	74.0
DeepEyesV2	81.8	77.9	73.8	64.9	74.6
Mini-o3	88.2	77.5	73.3	65.5	76.1
Skywork-R1V4	88.0	82.8	79.8	71.4	80.5
SenseNova-MARS-8B	92.2	83.1	78.4	67.9	80.4
SenseNova-MARS-32B	94.2	90.2	86.6	72.7	85.9

Citation

@article{SenseNova-MARS,
  title={SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning},
  author={Yong Xien Chng and Tao Hu and Wenwen Tong and Xueheng Li and Jiandong Chen and Haojia Yu and Jiefan Lu and Hewei Guo and Hanming Deng and Chengjun Xie and Gao Huang and Dahua Lin and Lewei Lu},
  journal={arXiv preprint arXiv:2512.24330},
  year={2025}
}