Instructions to use PKU-ML/GRASP-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PKU-ML/GRASP-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PKU-ML/GRASP-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("PKU-ML/GRASP-4B") model = AutoModelForCausalLM.from_pretrained("PKU-ML/GRASP-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PKU-ML/GRASP-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PKU-ML/GRASP-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PKU-ML/GRASP-4B
- SGLang
How to use PKU-ML/GRASP-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PKU-ML/GRASP-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PKU-ML/GRASP-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-ML/GRASP-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PKU-ML/GRASP-4B with Docker Model Runner:
docker model run hf.co/PKU-ML/GRASP-4B
PKU-ML/GRASP-4B
📊 Overview
Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations. To solve this, we propose GRASP (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval for on-demand probing with Code Interpreter as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
📌 Key Takeaways
1️⃣ Agentic Probing over Passive Ingestion. We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
2️⃣ Structure-Blind RL Training. We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
3️⃣ From Million-Node Graphs to Hard LeetCode. Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
🌊 Evaluation on Graph Reasoning Benchmarks
| Model | Arxiv | PubMed | Products | WikiCS | fb15k237 | wn18rr | TSG-Bench | ExplaGraphs | Erdős | RealErdős | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3-4B-Thinking | 51.00 | 25.00 | 21.00 | 29.00 | 16.00 | 13.00 | 62.00 | 45.00 | 38.80 | 7.11 | 30.79 |
| GPT-4o | 52.00 | 43.00 | 72.00 | 24.00 | 52.00 | 24.00 | 72.00 | 77.00 | 40.60 | 18.07 | 47.46 |
| DeepsSeek-V3.2 | 65.00 | 47.00 | 70.00 | 79.00 | 65.00 | 26.00 | 88.00 | 99.00 | 83.60 | 66.44 | 68.90 |
| GRASP-4B | 73.00 | 90.00 | 77.00 | 88.00 | 82.00 | 67.00 | 85.00 | 97.00 | 91.00 | 88.57 | 83.85 |
Quickstart
The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.
With transformers<4.51.0, you will encounter the following error:
KeyError: 'qwen3'
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "PKU-ML/GRASP-4B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)
Agentic Use
For the specific tool configuration and agentic usages of GRASP, please refer to our example on Github.
Citation
If you find our work helpful, feel free to give us a cite.
- Downloads last month
- 29
docker model run hf.co/PKU-ML/GRASP-4B