Instructions to use ServiceNow-AI/Apriel-1.5-15b-Thinker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ServiceNow-AI/Apriel-1.5-15b-Thinker") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ServiceNow-AI/Apriel-1.5-15b-Thinker") model = AutoModelForImageTextToText.from_pretrained("ServiceNow-AI/Apriel-1.5-15b-Thinker") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ServiceNow-AI/Apriel-1.5-15b-Thinker" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.5-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ServiceNow-AI/Apriel-1.5-15b-Thinker
- SGLang
How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ServiceNow-AI/Apriel-1.5-15b-Thinker" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.5-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ServiceNow-AI/Apriel-1.5-15b-Thinker" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.5-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ServiceNow-AI/Apriel-1.5-15b-Thinker with Docker Model Runner:
docker model run hf.co/ServiceNow-AI/Apriel-1.5-15b-Thinker
Best Small model out there Butt....
And its a big butt. This model thinks a lot, i mean a looooooottttttt. Every single time, i see it got the answer much earlier in its thinking process but then it tries a diff approch. It tries a diff approach 10 or so times. I mean ok, you can try a different approach once or twice to verify your original approach's answer but doing it 10 times is unnecessary
@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.
@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.
Ohh.... didn't know it also had reasoning levels like gpt oss
@Faint6005 thanks for the feedback and for pointing this out!
The model currently runs in high-reasoning mode by default, which means it puts extra effort into reasoning even on simple queries. This helps with complex/ambiguous tasks, but can also lead to a bit more token usage and slightly slower responses. We’re already working on optimizing this trade-off so future releases reduce the overhead while keeping the strong performance.Ohh.... didn't know it also had reasoning levels like gpt oss
@Faint6005 Just to clarify, we don’t support configurable reasoning levels like GPT-OSS. The “high-reasoning mode” mentioned is to simply describes the model default behavior, where it allocates more internal reasoning effort to improve robustness and accuracy. Users cannot manually adjust this it's baked into how the model processes queries.
We have added a developer note in our ReadMe as well. This is the initial version, built using only CPT and SFT, no RL leading to model performing extensive reasoning by default, and we are actively working on making it more efficient and concise in the next release.
Glad to hear it! Model is strong, but it's really thinking too much, glad to hear you are working on this aspect 💪🏻
@Faint6005 Just to clarify, we don’t support configurable reasoning levels like GPT-OSS. The “high-reasoning mode” mentioned is to simply describes the model default behavior, where it allocates more internal reasoning effort to improve robustness and accuracy. Users cannot manually adjust this it's baked into how the model processes queries.
We have added a developer note in our ReadMe as well. This is the initial version, built using only CPT and SFT, no RL leading to model performing extensive reasoning by default, and we are actively working on making it more efficient and concise in the next release.
Sorry for jumping to conclusions. And i have one more thing to say: This model isn't good at coding