Text Classification
Transformers
Safetensors
English
Chinese
qwen2
feature-extraction
reward model
custom_code
text-embeddings-inference
Instructions to use Qwen/Qwen2.5-Math-PRM-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen2.5-Math-PRM-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Qwen/Qwen2.5-Math-PRM-7B", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Math-PRM-7B", trust_remote_code=True) model = AutoModel.from_pretrained("Qwen/Qwen2.5-Math-PRM-7B", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
If the response length exceeds 4096, is a sliding window used, or is it simply truncated?
#6
by ShelterW - opened
step_reward = make_step_rewards(logits, token_masks)
product_step_reward = 1.0
for reward in step_reward:
product_step_reward *= reward
According to the paper, the score of each candidate response is calculated as the product of the individual scores of each step within the response. Then, how to weigh the difference between responses with fewer steps and those with more steps for the same question?
How can PRM@8 be combined with QwQ to demonstrate the best performance?
Same problem. I found that the sliding window was not enabled in the config, but when I entered 22k, there seemed to be no errors or warnings.