HuggingFaceH4
/

Qwen2.5-Math-7B-Instruct-PRM-0.2

Token Classification

Generated from Trainer

text-generation-inference

Model card Files Files and versions

plaguss commited on Jan 9, 2025

Commit

db98287

·

verified ·

1 Parent(s): f1c76f6

Update README.md

Files changed (1) hide show

README.md +28 -4

README.md CHANGED Viewed

@@ -20,10 +20,34 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="plaguss/Qwen2.5-Math-7B-Instruct-PRM-0.2", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Training procedure

 ```python
 from transformers import pipeline
+pipe = pipeline("token-classification", model="Qwen2.5-Math-7B-Instruct-PRM-0.2", device="cuda")
+example = {
+    "prompt": "Let $a,$ $b,$ and $c$ be positive real numbers.  Find the set of all possible values of\n\\[\\frac{c}{a} + \\frac{a}{b + c} + \\frac{b}{c}.\\]",
+    "completions": [
+        "This problem involves finding the range of an expression involving three variables.",
+        "One possible strategy is to try to eliminate some variables and write the expression in terms of one variable only.",
+        "To do this, I might look for some common factors or symmetries in the expression.",
+        "I notice that the first and last terms have $c$ in the denominator, so I can factor out $c$ from the whole expression and get\n\\[\\frac{1}{c}\\left(c + \\frac{a^2}{b + c} + b\\right).\\]"
+    ],
+    "labels": [True, True, True, False],
+}
+separator = "\n\n"  # It's important to use the same separator as the one used during training
+for idx in range(1, len(example["completions"]) + 1):
+    steps = example["completions"][0:idx]
+    text = separator.join((example["prompt"], *steps)) + separator  # Add a separator between the prompt and each steps
+    pred_entity = pipe(text)[-1]["entity"]
+    pred = {"LABEL_0": False, "LABEL_1": True}[pred_entity]
+    label = example["labels"][idx - 1]
+    print(f"Step {idx}\tPredicted: {pred} \tLabel: {label}")
+# Step 1  Predicted: True         Label: True
+# Step 2  Predicted: True         Label: True
+# Step 3  Predicted: True         Label: True
+# Step 4  Predicted: False        Label: False
 ```
 ## Training procedure