Instructions to use PKU-ML/GRASP-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PKU-ML/GRASP-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PKU-ML/GRASP-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PKU-ML/GRASP-4B")
model = AutoModelForCausalLM.from_pretrained("PKU-ML/GRASP-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PKU-ML/GRASP-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PKU-ML/GRASP-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PKU-ML/GRASP-4B

SGLang

How to use PKU-ML/GRASP-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PKU-ML/GRASP-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PKU-ML/GRASP-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PKU-ML/GRASP-4B with Docker Model Runner:
```
docker model run hf.co/PKU-ML/GRASP-4B
```

GRASP-4B / README.md

PKU-ML

Update README.md

8d4957b verified 17 days ago

preview code

raw

history blame contribute delete

5.29 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen3-4B-Thinking-2507
	library_name: transformers
	---




	<p align="center">
	<img src="https://raw.githubusercontent.com/PKU-ML/GRASP/main/logo-new.png" width="15%"/>
	<p>

	# PKU-ML/GRASP-4B

	## 📊 Overview

	Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations.
	To solve this, we propose GRASP (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration.
	By interleaving Neighbor Retrieval for on-demand probing with Code Interpreter as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
	We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
	Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks,
	with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.



	## 📌 Key Takeaways

	1️⃣ Agentic Probing over Passive Ingestion.
	We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.

	2️⃣ Structure-Blind RL Training.
	We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.

	3️⃣ From Million-Node Graphs to Hard LeetCode.
	Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.




	## 🌊 Evaluation on Graph Reasoning Benchmarks


	\| Model \| Arxiv \|PubMed \|Products \| WikiCS \| fb15k237 \|wn18rr \|TSG-Bench \|ExplaGraphs \|Erdős \|RealErdős \|Average \|
	\|------------------\|-----------\|-----------\|-----------\|-----------\|------------\|-----------\|------------\|------------\|------------\|------------\|------------\|
	\| Qwen3-4B-Thinking\|51.00 \|25.00 \|21.00 \|29.00 \|16.00 \|13.00 \|62.00 \|45.00 \|38.80 \|7.11 \|30.79 \|
	\| GPT-4o \|52.00 \|43.00 \|72.00 \|24.00 \|52.00 \|24.00 \|72.00 \|77.00 \|40.60 \|18.07 \|47.46 \|
	\| DeepsSeek-V3.2 \|65.00 \|47.00 \|70.00 \|79.00 \|65.00 \|26.00 \|88.00 \|99.00 \|83.60 \|66.44 \|68.90 \|
	\| GRASP-4B \|73.00 \|90.00 \|77.00 \|88.00 \|82.00 \|67.00 \|85.00 \|97.00 \|91.00 \|88.57 \|83.85 \|






	## Quickstart

	The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.

	With `transformers<4.51.0`, you will encounter the following error:
	```
	KeyError: 'qwen3'
	```

	The following contains a code snippet illustrating how to use the model generate content based on given inputs.
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "PKU-ML/GRASP-4B"

	# load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# prepare the model input
	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# conduct text completion
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=8192
	)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

	# parsing thinking content
	try:
	# rindex finding 151668 (</think>)
	index = len(output_ids) - output_ids[::-1].index(151668)
	except ValueError:
	index = 0

	thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
	content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

	print("thinking content:", thinking_content) # no opening <think> tag
	print("content:", content)

	```

	## Agentic Use

	For the specific tool configuration and agentic usages of GRASP, please refer to our [example](https://github.com/PKU-ML/GRASP/blob/main/evaluation/example.py) on Github.



	## Citation

	If you find our work helpful, feel free to give us a cite.

	```

	```