GRASP-4B / README.md
PKU-ML's picture
Update README.md
8d4957b verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen3-4B-Thinking-2507
library_name: transformers

PKU-ML/GRASP-4B

๐Ÿ“Š Overview

Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations. To solve this, we propose GRASP (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval for on-demand probing with Code Interpreter as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.

๐Ÿ“Œ Key Takeaways

1๏ธโƒฃ Agentic Probing over Passive Ingestion. We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes ๐Ÿ‘€) for on-demand probing with Code Interpreter (Hands ๐Ÿ™Œ) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.

2๏ธโƒฃ Structure-Blind RL Training. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.

3๏ธโƒฃ From Million-Node Graphs to Hard LeetCode. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.

๐ŸŒŠ Evaluation on Graph Reasoning Benchmarks

Model Arxiv PubMed Products WikiCS fb15k237 wn18rr TSG-Bench ExplaGraphs Erdล‘s RealErdล‘s Average
Qwen3-4B-Thinking 51.00 25.00 21.00 29.00 16.00 13.00 62.00 45.00 38.80 7.11 30.79
GPT-4o 52.00 43.00 72.00 24.00 52.00 24.00 72.00 77.00 40.60 18.07 47.46
DeepsSeek-V3.2 65.00 47.00 70.00 79.00 65.00 26.00 88.00 99.00 83.60 66.44 68.90
GRASP-4B 73.00 90.00 77.00 88.00 82.00 67.00 85.00 97.00 91.00 88.57 83.85

Quickstart

The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "PKU-ML/GRASP-4B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)

Agentic Use

For the specific tool configuration and agentic usages of GRASP, please refer to our example on Github.

Citation

If you find our work helpful, feel free to give us a cite.