DeBERTa-v3-large PN (Process Necessity) Step Scorer โ 3 class
A step-level Process Necessity Score (PN) classifier, fine-tuned from
microsoft/deberta-v3-large. It takes a reasoning prefix + a candidate
reasoning step and predicts one of three PN bins:
| class_id | class_name | PN value |
|---|---|---|
| 0 | PN_low |
0.0 |
| 1 | PN_mid |
0.5 |
| 2 | PN_high |
1.0 |
These scores can be used to redistribute the trajectory-level reward in
GRPO / PPO training for mathematical reasoning LLMs
(see verl fork,
branch feat/pns-reward-redistribution).
Training data
Chain-of-thought rollouts on GSM8K + MATH with k=5 intervention
rollouts per step, PN labelled by downstream success-rate change.
Validation metrics (best epoch)
| metric | value |
|---|---|
| epoch | 5 |
| val_acc | 0.7131 |
| val_loss | 0.8059 |
| macro_recall | 0.5560 |
| focal_gamma | 0.0 |
Per-class recall:
| class | correct / total | recall |
|---|---|---|
| PN_low | 907 / 1142 | 79% |
| PN_mid | 39 / 121 | 32% |
| PN_high | 155 / 281 | 55% |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Thedatababbler/deberta-v3-large-pn-scorer-3class"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).eval().cuda()
PN_VALUES = [0.0, 0.5, 1.0] # class_id -> PN
CLASS_NAMES = ["PN_low", "PN_mid", "PN_high"]
prefix = "Q: ... A:\nStep 1: First, we compute 3+4 = 7.\nStep 2: "
step = "Then multiply by 2 to get 14."
text = prefix + step
batch = tokenizer(text, truncation=True, max_length=512, return_tensors="pt").to("cuda")
with torch.no_grad():
logits = model(**batch).logits
probs = logits.softmax(-1)[0]
pred_idx = int(probs.argmax())
print(CLASS_NAMES[pred_idx], PN_VALUES[pred_idx], probs.tolist())
Intended use
Step-level reward shaping for reasoning RL. Not a safety classifier; not for general NLU.
Citation / provenance
Project: PNS reward redistribution on top of
verl-project/verl.
Notes on classes
config.json keeps generic LABEL_0/1/2 for framework compatibility.
The semantic labels are:
LABEL_0 = PN_low (PN = 0.0)
LABEL_1 = PN_mid (PN = 0.5)
LABEL_2 = PN_high (PN = 1.0)
- Downloads last month
- 5
Model tree for drdoggo/deberta-v3-large-pn-scorer-3class
Base model
microsoft/deberta-v3-large