KingTechnician's picture
Upload README.md with huggingface_hub
4f77aa0 verified
---
tags:
- deberta-v3
- cross-encoder
- osmosis
- response-sufficiency
- binary-classification
language: en
license: mit
base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0
datasets:
- KingTechnician/yahoo-answers-osmosis-labeled
- KingTechnician/triage-synthetic-data-v1
---
# OSMoSIS Binary Cross-Encoder
DeBERTa-v3 cross-encoder for binary response-sufficiency classification.
Given `(objective, response)`, predicts ADDR (response addresses the objective)
or NOADDR (response does not).
## Intended use
First stage of a cascaded pipeline. Confident binary predictions are used
directly; low-confidence cases should route to an LLM judge for fine-grained
classification.
Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses),
validated against 254 human-reviewed labels for deployment-grade evaluation.
## Performance
### Yahoo within-domain test
Accuracy: **0.806** | Macro F1: **0.669**
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.857 | 0.909 | 0.882 | 798 |
| NOADDR | 0.526 | 0.401 | 0.455 | 202 |
### Triage synthetic held-out (architecture validation)
Accuracy: **0.995** | Macro F1: **0.994**
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 1.000 | 0.987 | 0.993 | 150 |
| NOADDR | 0.991 | 1.000 | 0.996 | 223 |
### Human gold-standard held-out
Accuracy: **0.776** | Macro F1: **0.680**
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| ADDR | 0.824 | 0.889 | 0.855 | 189 |
| NOADDR | 0.580 | 0.446 | 0.504 | 65 |
## Training
- Base: `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` (NLI-pretrained)
- Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
- Best epoch: 3 (selected by val macro-F1)
- Batch size: 16, max length: 512, LR: 2e-05
- Class weights: [0.724, 1.615]
- Early stopping: patience 3 on val macro-F1
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
inputs = tokenizer("What causes rain?",
"Rain forms when water vapor condenses into droplets.",
return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
pred = logits.argmax(dim=-1).item()
print(["ADDR", "NOADDR"][pred])
```
## Limitations
- NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic
all map to the same target). Sub-classification of NOADDR requires a stronger
model — see the cascade evaluation in the OSMoSIS repo.
- Synthetic Triage results (near-ceiling) validate the architecture but are not
representative of open-domain difficulty. Use the human held-out number as the
realistic deployment estimate.