metadata
tags:
- deberta-v3
- cross-encoder
- osmosis
- response-sufficiency
- binary-classification
language: en
license: mit
base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0
datasets:
- KingTechnician/yahoo-answers-osmosis-labeled
- KingTechnician/triage-synthetic-data-v1
OSMoSIS Binary Cross-Encoder
DeBERTa-v3 cross-encoder for binary response-sufficiency classification.
Given (objective, response), predicts ADDR (response addresses the objective)
or NOADDR (response does not).
Intended use
First stage of a cascaded pipeline. Confident binary predictions are used directly; low-confidence cases should route to an LLM judge for fine-grained classification.
Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses), validated against 254 human-reviewed labels for deployment-grade evaluation.
Performance
Yahoo within-domain test
Accuracy: 0.806 | Macro F1: 0.669
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.857 | 0.909 | 0.882 | 798 |
| NOADDR | 0.526 | 0.401 | 0.455 | 202 |
Triage synthetic held-out (architecture validation)
Accuracy: 0.995 | Macro F1: 0.994
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 1.000 | 0.987 | 0.993 | 150 |
| NOADDR | 0.991 | 1.000 | 0.996 | 223 |
Human gold-standard held-out
Accuracy: 0.776 | Macro F1: 0.680
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.824 | 0.889 | 0.855 | 189 |
| NOADDR | 0.580 | 0.446 | 0.504 | 65 |
Training
- Base:
MoritzLaurer/deberta-v3-base-zeroshot-v2.0(NLI-pretrained) - Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
- Best epoch: 3 (selected by val macro-F1)
- Batch size: 16, max length: 512, LR: 2e-05
- Class weights: [0.724, 1.615]
- Early stopping: patience 3 on val macro-F1
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
inputs = tokenizer("What causes rain?",
"Rain forms when water vapor condenses into droplets.",
return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
pred = logits.argmax(dim=-1).item()
print(["ADDR", "NOADDR"][pred])
Limitations
- NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic all map to the same target). Sub-classification of NOADDR requires a stronger model — see the cascade evaluation in the OSMoSIS repo.
- Synthetic Triage results (near-ceiling) validate the architecture but are not representative of open-domain difficulty. Use the human held-out number as the realistic deployment estimate.