Instructions to use PKU-ML/GRASP-base-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PKU-ML/GRASP-base-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PKU-ML/GRASP-base-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PKU-ML/GRASP-base-4B")
model = AutoModelForCausalLM.from_pretrained("PKU-ML/GRASP-base-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PKU-ML/GRASP-base-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PKU-ML/GRASP-base-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-base-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PKU-ML/GRASP-base-4B

SGLang

How to use PKU-ML/GRASP-base-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PKU-ML/GRASP-base-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-base-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PKU-ML/GRASP-base-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PKU-ML/GRASP-base-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PKU-ML/GRASP-base-4B with Docker Model Runner:
```
docker model run hf.co/PKU-ML/GRASP-base-4B
```

PKU-ML commited on 19 days ago

Commit

bdf0762

verified ·

1 Parent(s): eb89784

Update README.md

Browse files

Files changed (1) hide show

README.md +128 -3

README.md CHANGED Viewed

@@ -1,3 +1,128 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen3-4B-Thinking-2507
+library_name: transformers
+---
+<p align="center">
+    <img src="https://raw.githubusercontent.com/PKU-ML/GRASP/main/logo-new.png" width="15%"/>
+<p>
+# PKU-ML/GRASP-base-4B
+## 📊 Overview
+Integrating graph knowledge into Large Language Models (LLMs) via passive representation faces critical bottlenecks: limited context windows, unreliable numerical computation, and structural hallucinations.
+To solve this, we propose **GRASP** (Graph Reasoning via Agentic Solving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration.
+By interleaving **Neighbor Retrieval** for on-demand probing with **Code Interpreter** as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
+We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
+Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks,
+with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
+## 📌 Key Takeaways
+1️⃣ **Agentic Probing over Passive Ingestion**.
+We propose GRASP (Graph Reasoning via AgenticSolving and Probing), shifting the paradigm from passive ingestion to proactive agentic exploration. By interleaving Neighbor Retrieval (Eyes 👀) for on-demand probing with Code Interpreter (Hands 🙌) as a deterministic solver, GRASP enables LLMs to autonomously navigate and compute over complex topologies.
+2️⃣ **Structure-Blind RL Training**.
+We employ a staged reinforcement learning strategy (GRPO) that transitions from visible tuning to a structure-blind environment, forcing the agent to develop genuine topological awareness.
+3️⃣ **From Million-Node Graphs to Hard LeetCode**.
+Evaluated on multi-domain graph reasoning benchmarks, our 4B model achieves a 53.06% average performance boost, surpassing SOTA baselines like DeepSeek-V3.2 and successfully generalizing to unseen tasks, with high potential for tackling sampling on million-node graphs and solving Hard-level LeetCode graph problems.
+## 🌊 Evaluation on Graph Reasoning Benchmarks
+| Model            | Arxiv     |PubMed     |Products   | WikiCS    | fb15k237   |wn18rr     |TSG-Bench   |ExplaGraphs |Erdős       |RealErdős   |Average     |
+|------------------|-----------|-----------|-----------|-----------|------------|-----------|------------|------------|------------|------------|------------|
+| Qwen3-4B-Thinking|51.00      |25.00      |21.00      |29.00      |16.00       |13.00      |62.00       |45.00       |38.80       |7.11        |30.79       |
+| GPT-4o           |52.00      |43.00      |72.00      |24.00      |52.00       |24.00      |72.00       |77.00       |40.60       |18.07       |47.46       |
+| DeepsSeek-V3.2   |65.00      |47.00      |70.00      |79.00      |65.00       |26.00      |**88.00**   |**99.00**   |83.60       |66.44       |68.90       |
+| GRASP-base-4B    |**69.00**  |**91.00**  |**78.00**  |**88.00**  |**86.00**   |**68.00**  |85.00       |95.00       |**89.40**   |**86.22**   |**83.56**   |
+## Quickstart
+The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
+With `transformers<4.51.0`, you will encounter the following error:
+```
+KeyError: 'qwen3'
+```
+The following contains a code snippet illustrating how to use the model generate content based on given inputs.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "PKU-ML/GRASP-base-4B"
+# load the tokenizer and the model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# prepare the model input
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# conduct text completion
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=8192
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+# parsing thinking content
+try:
+    # rindex finding 151668 (</think>)
+    index = len(output_ids) - output_ids[::-1].index(151668)
+except ValueError:
+    index = 0
+thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
+content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
+print("thinking content:", thinking_content) # no opening <think> tag
+print("content:", content)
+```
+## Agentic Use
+For the specific tool configuration and agentic usages of GRASP, please refer to our [example](https://github.com/PKU-ML/GRASP/blob/main/evaluation/example.py) on Github.
+## Citation
+If you find our work helpful, feel free to give us a cite.
+```
+```