DeBERTa-v3-large PN (Process Necessity) Step Scorer โ€” 3 class

A step-level Process Necessity Score (PN) classifier, fine-tuned from microsoft/deberta-v3-large. It takes a reasoning prefix + a candidate reasoning step and predicts one of three PN bins:

class_id class_name PN value
0 PN_low 0.0
1 PN_mid 0.5
2 PN_high 1.0

These scores can be used to redistribute the trajectory-level reward in GRPO / PPO training for mathematical reasoning LLMs (see verl fork, branch feat/pns-reward-redistribution).

Training data

Chain-of-thought rollouts on GSM8K + MATH with k=5 intervention rollouts per step, PN labelled by downstream success-rate change.

Validation metrics (best epoch)

metric value
epoch 5
val_acc 0.7131
val_loss 0.8059
macro_recall 0.5560
focal_gamma 0.0

Per-class recall:

class correct / total recall
PN_low 907 / 1142 79%
PN_mid 39 / 121 32%
PN_high 155 / 281 55%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "Thedatababbler/deberta-v3-large-pn-scorer-3class"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID).eval().cuda()

PN_VALUES = [0.0, 0.5, 1.0]  # class_id -> PN
CLASS_NAMES = ["PN_low", "PN_mid", "PN_high"]

prefix = "Q: ... A:\nStep 1: First, we compute 3+4 = 7.\nStep 2: "
step   = "Then multiply by 2 to get 14."
text   = prefix + step

batch = tokenizer(text, truncation=True, max_length=512, return_tensors="pt").to("cuda")
with torch.no_grad():
    logits = model(**batch).logits
probs = logits.softmax(-1)[0]
pred_idx = int(probs.argmax())
print(CLASS_NAMES[pred_idx], PN_VALUES[pred_idx], probs.tolist())

Intended use

Step-level reward shaping for reasoning RL. Not a safety classifier; not for general NLU.

Citation / provenance

Project: PNS reward redistribution on top of verl-project/verl.

Notes on classes

config.json keeps generic LABEL_0/1/2 for framework compatibility. The semantic labels are:

LABEL_0 = PN_low  (PN = 0.0)
LABEL_1 = PN_mid  (PN = 0.5)
LABEL_2 = PN_high (PN = 1.0)
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for drdoggo/deberta-v3-large-pn-scorer-3class

Finetuned
(266)
this model