Instructions to use AI-MO/NuminaMath-72B-TIR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AI-MO/NuminaMath-72B-TIR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AI-MO/NuminaMath-72B-TIR")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AI-MO/NuminaMath-72B-TIR")
model = AutoModelForCausalLM.from_pretrained("AI-MO/NuminaMath-72B-TIR")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AI-MO/NuminaMath-72B-TIR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AI-MO/NuminaMath-72B-TIR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-MO/NuminaMath-72B-TIR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AI-MO/NuminaMath-72B-TIR

SGLang

How to use AI-MO/NuminaMath-72B-TIR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AI-MO/NuminaMath-72B-TIR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-MO/NuminaMath-72B-TIR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AI-MO/NuminaMath-72B-TIR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AI-MO/NuminaMath-72B-TIR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AI-MO/NuminaMath-72B-TIR with Docker Model Runner:
```
docker model run hf.co/AI-MO/NuminaMath-72B-TIR
```

add results table

by benlipkin - opened Jul 20, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+15

-2

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -101,8 +101,6 @@ NuminaMath is a series of language models that are trained with two stages of su
 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
 * **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs.
 ## Model description
 - **Model type:** A 72B parameter math LLM fine-tuned on a dataset with 860k+ math problem-solution pairs.
@@ -110,6 +108,21 @@ NuminaMath is a series of language models that are trained with two stages of su
 - **License:** Tongyi Qianwen
 - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
 ### Model Sources
 <!-- Provide the basic links for the model. -->

 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
 * **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs.
 ## Model description
 - **Model type:** A 72B parameter math LLM fine-tuned on a dataset with 860k+ math problem-solution pairs.
 - **License:** Tongyi Qianwen
 - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
+## Model performance
+| | | NuminaMath-72B-CoT | NuminaMath-72B-TIR | Qwen2-72B-Instruct | Llama3-70B-Instruct | Claude-3.5-Sonnet | GPT-4o-0513 |
+| --- | --- | :---: | :---: | :---: | :---: | :---: | :---: |
+| **GSM8k** | 0-shot | 91.4% | 91.5% | 91.1% | 93.0% | **96.4%** | 95.8% |
+| Grade school math |
+| **MATH** | 0-shot | 68.0% | 75.8% | 59.7% | 50.4% | 71.1% | **76.6%** |
+| Math problem-solving |
+| **AMC 2023** | 0-shot | 21/40 | **24/40** | 19/40 | 13/40 | 17/40 | 20/40 |
+| Competition-level math | maj@64 | 24/40 | **34/40** | 21/40 | 13/40 | - | - |
+| **AIME 2024** | 0-shot | 1/30 | **5/30** | 3/30 | 0/30 | 2/30 | 2/30 |
+| Competition-level math | maj@64 | 3/30 | **12/30** | 4/30 | 2/30 | - | - |
+*Table: Comparison of various open weight and proprietary language models on different math benchmarks. All scores except those for NuminaMath-72B-TIR are reported without tool-integrated reasoning.*
 ### Model Sources
 <!-- Provide the basic links for the model. -->