| # Qwen2.5 Math PRM Student (1.5B) | |
| Custom pairwise reward model head (2 logits) on top of Qwen/Qwen2.5-Math-1.5B. | |
| Pool at last occurrence of token `</think>`. | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel | |
| repo = 'omrisap/Qwen2.5-Math-PRM-1.5B' | |
| tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) | |
| model = AutoModel.from_pretrained(repo, trust_remote_code=True) | |
| # logits shape: (batch, 2) | |
| ``` | |