--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-4B-Thinking-2507 library_name: transformers ---
# PKU-ML/GRASP-4B
## 📊 Overview
Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations.
To solve this, we propose **GRASP** (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration.
By interleaving **Neighbor Retrieval** for on-demand probing with **Code Interpreter** as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks,
with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
## 📌 Key Takeaways
1️⃣ **Agentic Probing over Passive Ingestion**.
We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
2️⃣ **Structure-Blind RL Training**.
We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
3️⃣ **From Million-Node Graphs to Hard LeetCode**.
Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
## 🌊 Evaluation on Graph Reasoning Benchmarks
| Model | Arxiv |PubMed |Products | WikiCS | fb15k237 |wn18rr |TSG-Bench |ExplaGraphs |Erdős |RealErdős |Average |
|------------------|-----------|-----------|-----------|-----------|------------|-----------|------------|------------|------------|------------|------------|
| Qwen3-4B-Thinking|51.00 |25.00 |21.00 |29.00 |16.00 |13.00 |62.00 |45.00 |38.80 |7.11 |30.79 |
| GPT-4o |52.00 |43.00 |72.00 |24.00 |52.00 |24.00 |72.00 |77.00 |40.60 |18.07 |47.46 |
| DeepsSeek-V3.2 |65.00 |47.00 |70.00 |79.00 |65.00 |26.00 |**88.00** |**99.00** |83.60 |66.44 |68.90 |
| GRASP-4B |**73.00** |**90.00** |**77.00** |**88.00** |**82.00** |**67.00** |85.00 |97.00 |**91.00** |**88.57** |**83.85** |
## Quickstart
The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
With `transformers<4.51.0`, you will encounter the following error:
```
KeyError: 'qwen3'
```
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "PKU-ML/GRASP-4B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 ()
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content) # no opening