--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-4B-Thinking-2507 library_name: transformers ---

# PKU-ML/GRASP-4B ## 📊 Overview Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations. To solve this, we propose **GRASP** (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving **Neighbor Retrieval** for on-demand probing with **Code Interpreter** as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems. ## 📌 Key Takeaways 1️⃣ **Agentic Probing over Passive Ingestion**. We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. 2️⃣ **Structure-Blind RL Training**. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. 3️⃣ **From Million-Node Graphs to Hard LeetCode**. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems. ## 🌊 Evaluation on Graph Reasoning Benchmarks | Model | Arxiv |PubMed |Products | WikiCS | fb15k237 |wn18rr |TSG-Bench |ExplaGraphs |Erdős |RealErdős |Average | |------------------|-----------|-----------|-----------|-----------|------------|-----------|------------|------------|------------|------------|------------| | Qwen3-4B-Thinking|51.00 |25.00 |21.00 |29.00 |16.00 |13.00 |62.00 |45.00 |38.80 |7.11 |30.79 | | GPT-4o |52.00 |43.00 |72.00 |24.00 |52.00 |24.00 |72.00 |77.00 |40.60 |18.07 |47.46 | | DeepsSeek-V3.2 |65.00 |47.00 |70.00 |79.00 |65.00 |26.00 |**88.00** |**99.00** |83.60 |66.44 |68.90 | | GRASP-4B |**73.00** |**90.00** |**77.00** |**88.00** |**82.00** |**67.00** |85.00 |97.00 |**91.00** |**88.57** |**83.85** | ## Quickstart The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.51.0`, you will encounter the following error: ``` KeyError: 'qwen3' ``` The following contains a code snippet illustrating how to use the model generate content based on given inputs. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "PKU-ML/GRASP-4B" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=8192 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) # no opening tag print("content:", content) ``` ## Agentic Use For the specific tool configuration and agentic usages of GRASP, please refer to our [example](https://github.com/PKU-ML/GRASP/blob/main/evaluation/example.py) on Github. ## Citation If you find our work helpful, feel free to give us a cite. ``` ```