Instructions to use amd/SAND-MathScience-DeepSeek-Qwen32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use amd/SAND-MathScience-DeepSeek-Qwen32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="amd/SAND-MathScience-DeepSeek-Qwen32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("amd/SAND-MathScience-DeepSeek-Qwen32B")
model = AutoModelForCausalLM.from_pretrained("amd/SAND-MathScience-DeepSeek-Qwen32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use amd/SAND-MathScience-DeepSeek-Qwen32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "amd/SAND-MathScience-DeepSeek-Qwen32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amd/SAND-MathScience-DeepSeek-Qwen32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/amd/SAND-MathScience-DeepSeek-Qwen32B

SGLang

How to use amd/SAND-MathScience-DeepSeek-Qwen32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "amd/SAND-MathScience-DeepSeek-Qwen32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amd/SAND-MathScience-DeepSeek-Qwen32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "amd/SAND-MathScience-DeepSeek-Qwen32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "amd/SAND-MathScience-DeepSeek-Qwen32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use amd/SAND-MathScience-DeepSeek-Qwen32B with Docker Model Runner:
```
docker model run hf.co/amd/SAND-MathScience-DeepSeek-Qwen32B
```

Prakamya commited on Dec 6, 2025

Commit

e55db52

verified ·

1 Parent(s): 723cdff

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +8 -6
SAND-MATH-Blog.png +3 -0

.gitattributes CHANGED Viewed

@@ -4,4 +4,5 @@
 *.pth filter=lfs diff=lfs merge=lfs -text
 PipelineSimple.png filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.pth filter=lfs diff=lfs merge=lfs -text
 PipelineSimple.png filter=lfs diff=lfs merge=lfs -text
+SAND-MATH-Blog.png filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -12,7 +12,9 @@ base_model:
   - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
 ---
-# SAND-Reasoning: Best-in-class Large Reasoning Model Built with Synthetic Data only using AMD GPUs
 | [![Paper](https://img.shields.io/badge/ArXiv-2507.20527-B31B1B.svg)](https://arxiv.org/pdf/2507.20527) | [![Hugging Face Dataset](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-green)](https://huggingface.co/datasets/amd/SAND-Post-Training-Dataset) | [![GitHub](https://img.shields.io/badge/GitHub-Repository-black)](https://github.com/AMD-AGI/sand-pipeline) | [![Blog Post](https://img.shields.io/badge/Blog%20Post-Read%20More-blue)](https://rocm.blogs.amd.com/artificial-intelligence/sand-math/README.html) |
 | :---: | :---: | :---: | :---: |
@@ -20,7 +22,7 @@ base_model:
 ## Model Summary
-We introduce **SAND-Math-Qwen2.5-32B** and **SAND-MathScience-DeepSeek-Qwen32B**, reasoning models built entirely using a synthetic data pipeline running on the **AMD ROCm™ stack** and **AMD Instinct™ MI325 GPUs**.
 By prioritizing data difficulty along with quantity, we demonstrate that high-difficulty synthetic data can elevate prior-generation models to match or exceed modern proprietary models. `SAND-Math-Qwen2.5-32B` is fine-tuned from **Qwen2.5-32B-Instruct** on just **14k synthetic math samples**, achieving strong reasoning capabilities with minimal data outperforming other data distillation and post training approaches. `SAND-MathScience-DeepSeek-Qwen32B` is fine-tuned from **DeepSeek-R1-Distill-Qwen-32B** on a compact dataset of **27k samples** (15k Math + 12k Science), achieving a generational leap in performance that rivals **Qwen3-32B**.
@@ -46,10 +48,10 @@ Using only **14k synthetic math samples** and standard SFT (no RL), our approach
 | Model | Data Size | AIME24 | AIME25 | MATH500 | GPQA |
 | :--- | :--- | :---: | :---: | :---: | :---: |
 | Qwen2.5-32B-Instruct (Base) | - | 16.7 | 13.3 | 83.4 | 53.5 |
-| DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 54.9 | 94.3 | 62.1 |
 | Light-R1-32B | 79k | 73.0 | 64.3 | 93.3 | 60.6 |
 | OpenThinker-32B | 114k | 66.0 | 53.3 | 89.4 | 57.6 |
-| **SAND-Math-Qwen2.5-32B (Ours)** | **14k** | **74.01** | **68.18** | **92.05** | **60.8** |
 ---
@@ -57,7 +59,7 @@ Using only **14k synthetic math samples** and standard SFT (no RL), our approach
 Our results are powered by a 4-stage automated pipeline running on AMD hardware that prioritizes **difficulty and novelty** over volume. Unlike datasets that recycle easy problems, our pipeline leverages a Teacher Model (`GPT-OSS120b`) to generate, validate, and systematically "hike" the difficulty of reasoning problems.
-![Pipeline Overview](PipelineSimple.png)
 ### Pipeline Stages
@@ -95,7 +97,7 @@ model = AutoModelForCausalLM.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 # Example prompt
-prompt = "Find the number of pairs of positive integers $(m, n)$ such that $m^2 + n < 22$ and $n^2 + m < 22$."
 messages = [
     {"role": "user", "content": prompt}
 ]

   - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
 ---
+# State-of-the-art Large Reasoning Model Built Using Only Synthetic Data on AMD GPUs
+<div align="center">
 | [![Paper](https://img.shields.io/badge/ArXiv-2507.20527-B31B1B.svg)](https://arxiv.org/pdf/2507.20527) | [![Hugging Face Dataset](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-green)](https://huggingface.co/datasets/amd/SAND-Post-Training-Dataset) | [![GitHub](https://img.shields.io/badge/GitHub-Repository-black)](https://github.com/AMD-AGI/sand-pipeline) | [![Blog Post](https://img.shields.io/badge/Blog%20Post-Read%20More-blue)](https://rocm.blogs.amd.com/artificial-intelligence/sand-math/README.html) |
 | :---: | :---: | :---: | :---: |
 ## Model Summary
+We introduce **SAND-Math-Qwen2.5-32B** and **SAND-MathScience-DeepSeek-Qwen32B**, state-of-the-art reasoning models in the 32B parameter range, built entirely using a synthetic data pipeline running on the **AMD ROCm™ stack** and **AMD Instinct™ MI325 GPUs**.
 By prioritizing data difficulty along with quantity, we demonstrate that high-difficulty synthetic data can elevate prior-generation models to match or exceed modern proprietary models. `SAND-Math-Qwen2.5-32B` is fine-tuned from **Qwen2.5-32B-Instruct** on just **14k synthetic math samples**, achieving strong reasoning capabilities with minimal data outperforming other data distillation and post training approaches. `SAND-MathScience-DeepSeek-Qwen32B` is fine-tuned from **DeepSeek-R1-Distill-Qwen-32B** on a compact dataset of **27k samples** (15k Math + 12k Science), achieving a generational leap in performance that rivals **Qwen3-32B**.
 | Model | Data Size | AIME24 | AIME25 | MATH500 | GPQA |
 | :--- | :--- | :---: | :---: | :---: | :---: |
 | Qwen2.5-32B-Instruct (Base) | - | 16.7 | 13.3 | 83.4 | 53.5 |
+| DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 54.9 | **94.3** | **62.1** |
 | Light-R1-32B | 79k | 73.0 | 64.3 | 93.3 | 60.6 |
 | OpenThinker-32B | 114k | 66.0 | 53.3 | 89.4 | 57.6 |
+| **SAND-Math-Qwen2.5-32B (Ours)** | **14k** | **74.01** | **68.18** | 92.05 | 60.8 |
 ---
 Our results are powered by a 4-stage automated pipeline running on AMD hardware that prioritizes **difficulty and novelty** over volume. Unlike datasets that recycle easy problems, our pipeline leverages a Teacher Model (`GPT-OSS120b`) to generate, validate, and systematically "hike" the difficulty of reasoning problems.
+![Pipeline Overview](SAND-MATH-Blog.png)
 ### Pipeline Stages
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 # Example prompt
+prompt = "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
 messages = [
     {"role": "user", "content": prompt}
 ]

SAND-MATH-Blog.png ADDED Viewed

Git LFS Details

SHA256: f1841bc904fb83d026983cf6fb1eae6188dcd10d1ddd9d4e4c295eab7458640f
Pointer size: 132 Bytes
Size of remote file: 1.31 MB