DeBERTa-v3-large PN (Process Necessity) Step Scorer — 3 class

A step-level Process Necessity Score (PN) classifier, fine-tuned from microsoft/deberta-v3-large. It takes a reasoning prefix + a candidate reasoning step and predicts one of three PN bins:

class_id	class_name	PN value
0	`PN_low`	0.0
1	`PN_mid`	0.5
2	`PN_high`	1.0

These scores can be used to redistribute the trajectory-level reward in GRPO / PPO training for mathematical reasoning LLMs (see verl fork, branch feat/pns-reward-redistribution).

Training data

Chain-of-thought rollouts on GSM8K + MATH with k=5 intervention rollouts per step, PN labelled by downstream success-rate change.

Validation metrics (best epoch)

metric	value
epoch	5
val_acc	0.7131
val_loss	0.8059
macro_recall	0.5560
focal_gamma	0.0

Per-class recall:

class	correct / total	recall
PN_low	907 / 1142	79%
PN_mid	39 / 121	32%
PN_high	155 / 281	55%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "Thedatababbler/deberta-v3-large-pn-scorer-3class"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).eval().cuda()

PN_VALUES = [0.0, 0.5, 1.0]  # class_id -> PN
CLASS_NAMES = ["PN_low", "PN_mid", "PN_high"]

prefix = "Q: ... A:\nStep 1: First, we compute 3+4 = 7.\nStep 2: "
step   = "Then multiply by 2 to get 14."
text   = prefix + step

batch = tokenizer(text, truncation=True, max_length=512, return_tensors="pt").to("cuda")
with torch.no_grad():
    logits = model(**batch).logits
probs = logits.softmax(-1)[0]
pred_idx = int(probs.argmax())
print(CLASS_NAMES[pred_idx], PN_VALUES[pred_idx], probs.tolist())

Intended use

Step-level reward shaping for reasoning RL. Not a safety classifier; not for general NLU.

Citation / provenance

Project: PNS reward redistribution on top of verl-project/verl.

Notes on classes

config.json keeps generic LABEL_0/1/2 for framework compatibility. The semantic labels are:

LABEL_0 = PN_low  (PN = 0.0)
LABEL_1 = PN_mid  (PN = 0.5)
LABEL_2 = PN_high (PN = 1.0)

Downloads last month: 5

Model tree for drdoggo/deberta-v3-large-pn-scorer-3class

Base model

microsoft/deberta-v3-large

Finetuned

(266)

this model