Upload README.md with huggingface_hub

4f77aa0 verified 18 days ago

2.95 kB

tags:
  - deberta-v3
  - cross-encoder
  - osmosis
  - response-sufficiency
  - binary-classification
language: en
license: mit
base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0
datasets:
  - KingTechnician/yahoo-answers-osmosis-labeled
  - KingTechnician/triage-synthetic-data-v1

OSMoSIS Binary Cross-Encoder

DeBERTa-v3 cross-encoder for binary response-sufficiency classification. Given (objective, response), predicts ADDR (response addresses the objective) or NOADDR (response does not).

Intended use

First stage of a cascaded pipeline. Confident binary predictions are used directly; low-confidence cases should route to an LLM judge for fine-grained classification.

Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses), validated against 254 human-reviewed labels for deployment-grade evaluation.

Performance

Yahoo within-domain test

Accuracy: 0.806 | Macro F1: 0.669

Class	Precision	Recall	F1	Support
ADDR	0.857	0.909	0.882	798
NOADDR	0.526	0.401	0.455	202

Triage synthetic held-out (architecture validation)

Accuracy: 0.995 | Macro F1: 0.994

Class	Precision	Recall	F1	Support
ADDR	1.000	0.987	0.993	150
NOADDR	0.991	1.000	0.996	223

Human gold-standard held-out

Accuracy: 0.776 | Macro F1: 0.680

Class	Precision	Recall	F1	Support
ADDR	0.824	0.889	0.855	189
NOADDR	0.580	0.446	0.504	65

Training

Base: MoritzLaurer/deberta-v3-base-zeroshot-v2.0 (NLI-pretrained)
Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
Best epoch: 3 (selected by val macro-F1)
Batch size: 16, max length: 512, LR: 2e-05
Class weights: [0.724, 1.615]
Early stopping: patience 3 on val macro-F1

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")

inputs = tokenizer("What causes rain?",
                   "Rain forms when water vapor condenses into droplets.",
                   return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    pred = logits.argmax(dim=-1).item()
print(["ADDR", "NOADDR"][pred])

Limitations

NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic all map to the same target). Sub-classification of NOADDR requires a stronger model — see the cascade evaluation in the OSMoSIS repo.
Synthetic Triage results (near-ceiling) validate the architecture but are not representative of open-domain difficulty. Use the human held-out number as the realistic deployment estimate.