--- tags: - deberta-v3 - cross-encoder - osmosis - response-sufficiency - binary-classification language: en license: mit base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0 datasets: - KingTechnician/yahoo-answers-osmosis-labeled - KingTechnician/triage-synthetic-data-v1 --- # OSMoSIS Binary Cross-Encoder DeBERTa-v3 cross-encoder for binary response-sufficiency classification. Given `(objective, response)`, predicts ADDR (response addresses the objective) or NOADDR (response does not). ## Intended use First stage of a cascaded pipeline. Confident binary predictions are used directly; low-confidence cases should route to an LLM judge for fine-grained classification. Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses), validated against 254 human-reviewed labels for deployment-grade evaluation. ## Performance ### Yahoo within-domain test Accuracy: **0.806** | Macro F1: **0.669** | Class | Precision | Recall | F1 | Support | |---|---|---|---|---| | ADDR | 0.857 | 0.909 | 0.882 | 798 | | NOADDR | 0.526 | 0.401 | 0.455 | 202 | ### Triage synthetic held-out (architecture validation) Accuracy: **0.995** | Macro F1: **0.994** | Class | Precision | Recall | F1 | Support | |---|---|---|---|---| | ADDR | 1.000 | 0.987 | 0.993 | 150 | | NOADDR | 0.991 | 1.000 | 0.996 | 223 | ### Human gold-standard held-out Accuracy: **0.776** | Macro F1: **0.680** | Class | Precision | Recall | F1 | Support | |---|---|---|---|---| | ADDR | 0.824 | 0.889 | 0.855 | 189 | | NOADDR | 0.580 | 0.446 | 0.504 | 65 | ## Training - Base: `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` (NLI-pretrained) - Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training - Best epoch: 3 (selected by val macro-F1) - Batch size: 16, max length: 512, LR: 2e-05 - Class weights: [0.724, 1.615] - Early stopping: patience 3 on val macro-F1 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary") tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary") inputs = tokenizer("What causes rain?", "Rain forms when water vapor condenses into droplets.", return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): logits = model(**inputs).logits pred = logits.argmax(dim=-1).item() print(["ADDR", "NOADDR"][pred]) ``` ## Limitations - NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic all map to the same target). Sub-classification of NOADDR requires a stronger model — see the cascade evaluation in the OSMoSIS repo. - Synthetic Triage results (near-ceiling) validate the architecture but are not representative of open-domain difficulty. Use the human held-out number as the realistic deployment estimate.