TopoPRM Checkpoints

LoRA adapters trained with TopoPRM (Topology-Aware Process Rewards) for mathematical reasoning.

Paper: EMNLP 2026 (ARR May cycle) Code: https://github.com/RWLinno/TopoPRM

Available Checkpoints

Subdirectory Base Model Method Steps Key Results
sft-dr1-7b-final DeepSeek-R1-Distill-Qwen-7B SFT 3651 GSM8K 83.5% baseline
grpo-topoprm-dr1-7b DeepSeek-R1-Distill-Qwen-7B GRPO+TopoPRM 100 Hierarchical reward
grpo-topoprm-qwen35-9b Qwen3.5-9B GRPO+TopoPRM 50 GSM8K 93.5%, MATH500 49.8%
opd-topoprm-dr1-7b-v2 DeepSeek-R1-Distill-Qwen-7B OPD Stage3 200 MATH500 60.8%, Omni-MATH 56.9%
opd-topoprm-qwen35-9b-v2 Qwen3.5-9B OPD Stage3 50 Distillation
grpo-scae-qwen35-9b Qwen3.5-9B GRPO+SCAE 949 SCAE variant

Common LoRA Config

All adapters share:

  • r: 64
  • alpha: 128
  • dropout: 0.05
  • target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • task_type: CAUSAL_LM

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = PeftModel.from_pretrained(base_model, "rwlinno/topoprm-ckpts", subfolder="grpo-topoprm-dr1-7b")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")

Citation

@inproceedings{topoprm2026,
  title={Topology-Aware Process Rewards for Verifiable Mathematical Reasoning},
  author={Weilin Ruan},
  booktitle={Proceedings of EMNLP 2026},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rwlinno/topoprm-ckpts

Adapter
(121)
this model