---
tags:
  - deberta-v3
  - cross-encoder
  - osmosis
  - response-sufficiency
  - binary-classification
language: en
license: mit
base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0
datasets:
  - KingTechnician/yahoo-answers-osmosis-labeled
  - KingTechnician/triage-synthetic-data-v1
---

# OSMoSIS Binary Cross-Encoder

DeBERTa-v3 cross-encoder for binary response-sufficiency classification.
Given `(objective, response)`, predicts ADDR (response addresses the objective)
or NOADDR (response does not).

## Intended use

First stage of a cascaded pipeline. Confident binary predictions are used
directly; low-confidence cases should route to an LLM judge for fine-grained
classification.

Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses),
validated against 254 human-reviewed labels for deployment-grade evaluation.

## Performance

### Yahoo within-domain test

Accuracy: **0.806**  |  Macro F1: **0.669**

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.857 | 0.909 | 0.882 | 798 |
| NOADDR | 0.526 | 0.401 | 0.455 | 202 |

### Triage synthetic held-out (architecture validation)

Accuracy: **0.995**  |  Macro F1: **0.994**

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 1.000 | 0.987 | 0.993 | 150 |
| NOADDR | 0.991 | 1.000 | 0.996 | 223 |

### Human gold-standard held-out

Accuracy: **0.776**  |  Macro F1: **0.680**

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.824 | 0.889 | 0.855 | 189 |
| NOADDR | 0.580 | 0.446 | 0.504 | 65 |


## Training
- Base: `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` (NLI-pretrained)
- Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
- Best epoch: 3 (selected by val macro-F1)
- Batch size: 16, max length: 512, LR: 2e-05
- Class weights: [0.724, 1.615]
- Early stopping: patience 3 on val macro-F1

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")

inputs = tokenizer("What causes rain?",
                   "Rain forms when water vapor condenses into droplets.",
                   return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    pred = logits.argmax(dim=-1).item()
print(["ADDR", "NOADDR"][pred])
```

## Limitations

- NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic
  all map to the same target). Sub-classification of NOADDR requires a stronger
  model — see the cascade evaluation in the OSMoSIS repo.
- Synthetic Triage results (near-ceiling) validate the architecture but are not
  representative of open-domain difficulty. Use the human held-out number as the
  realistic deployment estimate.