Instructions to use nvidia/AceReason-Nemotron-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/AceReason-Nemotron-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nvidia/AceReason-Nemotron-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nvidia/AceReason-Nemotron-7B")
model = AutoModelForCausalLM.from_pretrained("nvidia/AceReason-Nemotron-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nvidia/AceReason-Nemotron-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/AceReason-Nemotron-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/AceReason-Nemotron-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/AceReason-Nemotron-7B

SGLang

How to use nvidia/AceReason-Nemotron-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/AceReason-Nemotron-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/AceReason-Nemotron-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/AceReason-Nemotron-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/AceReason-Nemotron-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nvidia/AceReason-Nemotron-7B with Docker Model Runner:
```
docker model run hf.co/nvidia/AceReason-Nemotron-7B
```

Add link to paper and Github repo

by nielsr HF Staff - opened Jun 18, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+40

-29

Files changed (1) hide show

README.md +40 -29

README.md CHANGED Viewed

@@ -1,26 +1,24 @@
 ---
 library_name: transformers
 license: other
 license_name: nvidia-open-model-license
-license_link: >-
-  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 pipeline_tag: text-generation
-language:
-  - en
 tags:
-  - nvidia
-  - reasoning
-  - math
-  - code
-  - reinforcement learning
-  - pytorch
 ---
 # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
 <p align="center">
 [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
@@ -33,7 +31,7 @@ tags:
 ## 🔥News
 - **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
-  - Paper: https://arxiv.org/pdf/2506.13284
   - Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
   - 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
 - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
@@ -68,10 +66,6 @@ We evaluate our model against competitive reasoning models of comparable size wi
 | [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
 | [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
 ## How to use
 ```python
 import torch
@@ -104,7 +98,6 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Usage Recommendations
 1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
@@ -114,15 +107,33 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 question = "" # code question
 starter_code = "" # starter code function header
-code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
-code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
 if starter_code != "":
-    question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
-    question += "\n\n" + code_instruction_hasstartercode
 else:
-    question += "\n\n" + code_instruction_nostartercode
-final_prompt = "<｜User｜>" + question + "<｜Assistant｜><think>\n"
 ```
 4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
@@ -130,15 +141,16 @@ final_prompt = "<｜User｜>" + question + "<｜Assistant｜><think>\n"
 Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
 ## Correspondence to
 Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)
 ## License
 Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 ## Citation
 ```
 @article{chen2025acereason,
@@ -147,5 +159,4 @@ Your use of this model is governed by the [NVIDIA Open Model License](https://ww
   journal={arXiv preprint arXiv:2505.16400},
   year={2025}
 }
-```

 ---
+language:
+- en
 library_name: transformers
 license: other
 license_name: nvidia-open-model-license
+license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 pipeline_tag: text-generation
 tags:
+- nvidia
+- reasoning
+- math
+- code
+- reinforcement learning
+- pytorch
 ---
 # AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
+This repository contains the model for AceReason-Nemotron 1.1 as presented in [AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy](https://huggingface.co/papers/2506.13284).
 <p align="center">
 [![Technical Report](https://img.shields.io/badge/2505.16400-Technical_Report-blue)](https://arxiv.org/abs/2505.16400)
 ## 🔥News
 - **6/16/2025**: We are excited to share our new release combining SFT with RL: **AceReason-Nemotron-1.1-7B**
+  - Paper: https://huggingface.co/papers/2506.13284
   - Model: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
   - 4M SFT Data: https://huggingface.co/datasets/nvidia/AceReason-1.1-SFT
 - **6/11/2025**: We share our evaluation toolkit at [AceReason Evalution](https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md) including:
 | [AceReason-Nemotron-7B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-7B)| 69.0 | 53.6 | 51.8 | 44.1 |
 | [AceReason-Nemotron-14B 🤗](https://huggingface.co/nvidia/AceReason-Nemotron-14B)| 78.6 | 67.4 | 61.1 | 54.9 |
 ## How to use
 ```python
 import torch
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Usage Recommendations
 1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
 question = "" # code question
 starter_code = "" # starter code function header
+code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:
+```python
+# Your solution code here
+```"""
+code_instruction_hasstartercode = """Please place the solution code in the following format:
+```python
+# Your solution code here
+```"""
 if starter_code != "":
+    question += "
+" + "Solve the problem starting with the provided function header.
+Function header:
+" + "```
+" + starter_code + "
+```"
+    question += "
+" + code_instruction_hasstartercode
 else:
+    question += "
+" + code_instruction_nostartercode
+final_prompt = "<｜User｜>" + question + "<｜Assistant｜><think>
+"
 ```
 4. Our inference engine for evaluation is **vLLM==0.7.3** using top-p=0.95, temperature=0.6, max_tokens=32768.
 Please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/AceReason-Nemotron-14B/blob/main/README_EVALUATION.md
+## Code
+Our code is available at https://github.com/NVIDIA/TRT-LLM/tree/main/examples/research/ace_reason
 ## Correspondence to
 Yang Chen (yachen@nvidia.com), Zhuolin Yang (zhuoliny@nvidia.com), Zihan Liu (zihanl@nvidia.com), Chankyu Lee (chankyul@nvidia.com), Wei Ping (wping@nvidia.com)
 ## License
 Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 ## Citation
 ```
 @article{chen2025acereason,
   journal={arXiv preprint arXiv:2505.16400},
   year={2025}
 }
+```