TopoPRM Checkpoints
LoRA adapters trained with TopoPRM (Topology-Aware Process Rewards) for mathematical reasoning.
Paper: EMNLP 2026 (ARR May cycle) Code: https://github.com/RWLinno/TopoPRM
Available Checkpoints
| Subdirectory | Base Model | Method | Steps | Key Results |
|---|---|---|---|---|
sft-dr1-7b-final |
DeepSeek-R1-Distill-Qwen-7B | SFT | 3651 | GSM8K 83.5% baseline |
grpo-topoprm-dr1-7b |
DeepSeek-R1-Distill-Qwen-7B | GRPO+TopoPRM | 100 | Hierarchical reward |
grpo-topoprm-qwen35-9b |
Qwen3.5-9B | GRPO+TopoPRM | 50 | GSM8K 93.5%, MATH500 49.8% |
opd-topoprm-dr1-7b-v2 |
DeepSeek-R1-Distill-Qwen-7B | OPD Stage3 | 200 | MATH500 60.8%, Omni-MATH 56.9% |
opd-topoprm-qwen35-9b-v2 |
Qwen3.5-9B | OPD Stage3 | 50 | Distillation |
grpo-scae-qwen35-9b |
Qwen3.5-9B | GRPO+SCAE | 949 | SCAE variant |
Common LoRA Config
All adapters share:
- r: 64
- alpha: 128
- dropout: 0.05
- target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- task_type: CAUSAL_LM
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
Citation
@inproceedings{topoprm2026,
title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},
author={Weilin Ruan},
booktitle={Proceedings of EMNLP 2026},
year={2026}
}
Model tree for rwlinno/topoprm-ckpts
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B