TopoPRM Checkpoints

LoRA adapters trained with TopoPRM (Topology-Aware Process Rewards) for mathematical reasoning.

Paper: EMNLP 2026 (ARR May cycle) Code: https://github.com/RWLinno/TopoPRM

Available Checkpoints

Subdirectory	Base Model	Method	Steps	Key Results
`sft-dr1-7b-final`	DeepSeek-R1-Distill-Qwen-7B	SFT	3651	GSM8K 83.5% baseline
`grpo-topoprm-dr1-7b`	DeepSeek-R1-Distill-Qwen-7B	GRPO+TopoPRM	100	Hierarchical reward
`grpo-topoprm-qwen35-9b`	Qwen3.5-9B	GRPO+TopoPRM	50	GSM8K 93.5%, MATH500 49.8%
`opd-topoprm-dr1-7b-v2`	DeepSeek-R1-Distill-Qwen-7B	OPD Stage3	200	MATH500 60.8%, Omni-MATH 56.9%
`opd-topoprm-qwen35-9b-v2`	Qwen3.5-9B	OPD Stage3	50	Distillation
`grpo-scae-qwen35-9b`	Qwen3.5-9B	GRPO+SCAE	949	SCAE variant

Common LoRA Config

All adapters share:

r: 64
alpha: 128
dropout: 0.05
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
task_type: CAUSAL_LM

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")

Citation

@inproceedings{topoprm2026,
  title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},
  author={Weilin Ruan},
  booktitle={Proceedings of EMNLP 2026},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for rwlinno/topoprm-ckpts

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Adapter

(121)

this model