DeBERTaV3-Large Reward Model — formatguard & preference distillation (ckpt-799)

Base backbone: microsoft/deberta-v3-large
Init RM: yungshun317/deberta-v3-large-format-guard
This snapshot adds anti-format-spam preference tuning.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tok = AutoTokenizer.from_pretrained("yungshun317/deberta-v3-large-format-guard-preference-distillation")
rm  = AutoModelForSequenceClassification.from_pretrained("yungshun317/deberta-v3-large-format-guard-preference-distillation")

Downloads last month: 1

Safetensors

Model size

0.4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support