UKPLab/Qwen2.5-3b-spare-prm-math
Text Generation
•
3B
•
Updated
•
7
Process Reward Models (PRMs) trained using Single-Pass Annotation with Reference-Guided Evaluation (SPARE) methodology proposed in our AAAI-2026 paper