Instructions to use PKU-ML/GRASP-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PKU-ML/GRASP-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PKU-ML/GRASP-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("PKU-ML/GRASP-4B") model = AutoModelForCausalLM.from_pretrained("PKU-ML/GRASP-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PKU-ML/GRASP-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PKU-ML/GRASP-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PKU-ML/GRASP-4B
- SGLang
How to use PKU-ML/GRASP-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PKU-ML/GRASP-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PKU-ML/GRASP-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PKU-ML/GRASP-4B with Docker Model Runner:
docker model run hf.co/PKU-ML/GRASP-4B
| license: apache-2.0 | |
| language: | |
| - en | |
| base_model: | |
| - Qwen/Qwen3-4B-Thinking-2507 | |
| library_name: transformers | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/PKU-ML/GRASP/main/logo-new.png" width="15%"/> | |
| <p> | |
| # PKU-ML/GRASP-4B | |
| ## 📊 Overview | |
| Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations. | |
| To solve this, we propose **GRASP** (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. | |
| By interleaving **Neighbor Retrieval** for on-demand probing with **Code Interpreter** as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. | |
| We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. | |
| Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, | |
| with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems. | |
| ## 📌 Key Takeaways | |
| 1️⃣ **Agentic Probing over Passive Ingestion**. | |
| We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. | |
| 2️⃣ **Structure-Blind RL Training**. | |
| We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. | |
| 3️⃣ **From Million-Node Graphs to Hard LeetCode**. | |
| Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems. | |
| ## 🌊 Evaluation on Graph Reasoning Benchmarks | |
| | Model | Arxiv |PubMed |Products | WikiCS | fb15k237 |wn18rr |TSG-Bench |ExplaGraphs |Erdős |RealErdős |Average | | |
| |------------------|-----------|-----------|-----------|-----------|------------|-----------|------------|------------|------------|------------|------------| | |
| | Qwen3-4B-Thinking|51.00 |25.00 |21.00 |29.00 |16.00 |13.00 |62.00 |45.00 |38.80 |7.11 |30.79 | | |
| | GPT-4o |52.00 |43.00 |72.00 |24.00 |52.00 |24.00 |72.00 |77.00 |40.60 |18.07 |47.46 | | |
| | DeepsSeek-V3.2 |65.00 |47.00 |70.00 |79.00 |65.00 |26.00 |**88.00** |**99.00** |83.60 |66.44 |68.90 | | |
| | GRASP-4B |**73.00** |**90.00** |**77.00** |**88.00** |**82.00** |**67.00** |85.00 |97.00 |**91.00** |**88.57** |**83.85** | | |
| ## Quickstart | |
| The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`. | |
| With `transformers<4.51.0`, you will encounter the following error: | |
| ``` | |
| KeyError: 'qwen3' | |
| ``` | |
| The following contains a code snippet illustrating how to use the model generate content based on given inputs. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "PKU-ML/GRASP-4B" | |
| # load the tokenizer and the model | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype="auto", | |
| device_map="auto" | |
| ) | |
| # prepare the model input | |
| prompt = "Give me a short introduction to large language model." | |
| messages = [ | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| # conduct text completion | |
| generated_ids = model.generate( | |
| **model_inputs, | |
| max_new_tokens=8192 | |
| ) | |
| output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() | |
| # parsing thinking content | |
| try: | |
| # rindex finding 151668 (</think>) | |
| index = len(output_ids) - output_ids[::-1].index(151668) | |
| except ValueError: | |
| index = 0 | |
| thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") | |
| content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") | |
| print("thinking content:", thinking_content) # no opening <think> tag | |
| print("content:", content) | |
| ``` | |
| ## Agentic Use | |
| For the specific tool configuration and agentic usages of GRASP, please refer to our [example](https://github.com/PKU-ML/GRASP/blob/main/evaluation/example.py) on Github. | |
| ## Citation | |
| If you find our work helpful, feel free to give us a cite. | |
| ``` | |
| ``` |