OSMoSIS Binary Cross-Encoder

DeBERTa-v3 cross-encoder for binary response-sufficiency classification. Given (objective, response), predicts ADDR (response addresses the objective) or NOADDR (response does not).

Intended use

First stage of a cascaded pipeline. Confident binary predictions are used directly; low-confidence cases should route to an LLM judge for fine-grained classification.

Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses), validated against 254 human-reviewed labels for deployment-grade evaluation.

Performance

Yahoo within-domain test

Accuracy: 0.806 | Macro F1: 0.669

Class Precision Recall F1 Support
ADDR 0.857 0.909 0.882 798
NOADDR 0.526 0.401 0.455 202

Triage synthetic held-out (architecture validation)

Accuracy: 0.995 | Macro F1: 0.994

Class Precision Recall F1 Support
ADDR 1.000 0.987 0.993 150
NOADDR 0.991 1.000 0.996 223

Human gold-standard held-out

Accuracy: 0.776 | Macro F1: 0.680

Class Precision Recall F1 Support
ADDR 0.824 0.889 0.855 189
NOADDR 0.580 0.446 0.504 65

Training

  • Base: MoritzLaurer/deberta-v3-base-zeroshot-v2.0 (NLI-pretrained)
  • Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
  • Best epoch: 3 (selected by val macro-F1)
  • Batch size: 16, max length: 512, LR: 2e-05
  • Class weights: [0.724, 1.615]
  • Early stopping: patience 3 on val macro-F1

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")

inputs = tokenizer("What causes rain?",
                   "Rain forms when water vapor condenses into droplets.",
                   return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    pred = logits.argmax(dim=-1).item()
print(["ADDR", "NOADDR"][pred])

Limitations

  • NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic all map to the same target). Sub-classification of NOADDR requires a stronger model — see the cascade evaluation in the OSMoSIS repo.
  • Synthetic Triage results (near-ceiling) validate the architecture but are not representative of open-domain difficulty. Use the human held-out number as the realistic deployment estimate.
Downloads last month
86
Safetensors
Model size
0.2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KingTechnician/osmosis-crossencoder-binary

Finetuned
(11)
this model

Dataset used to train KingTechnician/osmosis-crossencoder-binary