Instructions to use ServiceNow-AI/Apriel-1.5-15b-Thinker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ServiceNow-AI/Apriel-1.5-15b-Thinker")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("ServiceNow-AI/Apriel-1.5-15b-Thinker")
model = AutoModelForMultimodalLM.from_pretrained("ServiceNow-AI/Apriel-1.5-15b-Thinker")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ServiceNow-AI/Apriel-1.5-15b-Thinker"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.5-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ServiceNow-AI/Apriel-1.5-15b-Thinker

SGLang

How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ServiceNow-AI/Apriel-1.5-15b-Thinker" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.5-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ServiceNow-AI/Apriel-1.5-15b-Thinker" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.5-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with Docker Model Runner:
```
docker model run hf.co/ServiceNow-AI/Apriel-1.5-15b-Thinker
```

Best Small model out there Butt....

by Faint6005 - opened Oct 1, 2025

Discussion

Faint6005

Oct 1, 2025

And its a big butt. This model thinks a lot, i mean a looooooottttttt. Every single time, i see it got the answer much earlier in its thinking process but then it tries a diff approch. It tries a diff approach 10 or so times. I mean ok, you can try a different approach once or twice to verify your original approach's answer but doing it 10 times is unnecessary

amant555

ServiceNow-AI org Oct 1, 2025

•

edited Oct 1, 2025

@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.

Faint6005

Oct 1, 2025

@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.

Ohh.... didn't know it also had reasoning levels like gpt oss

amant555

ServiceNow-AI org Oct 1, 2025

@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.

Ohh.... didn't know it also had reasoning levels like gpt oss

@Faint6005 Just to clarify, we don’t support configurable reasoning levels like GPT-OSS. The “high-reasoning mode” mentioned is to simply describes the model default behavior, where it allocates more internal reasoning effort to improve robustness and accuracy. Users cannot manually adjust this it's baked into how the model processes queries.

We have added a developer note in our ReadMe as well. This is the initial version, built using only CPT and SFT, no RL leading to model performing extensive reasoning by default, and we are actively working on making it more efficient and concise in the next release.

ivanfioravanti

Oct 2, 2025

Glad to hear it! Model is strong, but it's really thinking too much, glad to hear you are working on this aspect 💪🏻

Faint6005

Oct 2, 2025

@Faint6005 Just to clarify, we don’t support configurable reasoning levels like GPT-OSS. The “high-reasoning mode” mentioned is to simply describes the model default behavior, where it allocates more internal reasoning effort to improve robustness and accuracy. Users cannot manually adjust this it's baked into how the model processes queries.

We have added a developer note in our ReadMe as well. This is the initial version, built using only CPT and SFT, no RL leading to model performing extensive reasoning by default, and we are actively working on making it more efficient and concise in the next release.

Sorry for jumping to conclusions. And i have one more thing to say: This model isn't good at coding

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment