KingTechnician
/

osmosis-crossencoder-binary

response-sufficiency

binary-classification

Model card Files Files and versions

osmosis-crossencoder-binary / README.md

KingTechnician's picture

Upload README.md with huggingface_hub

4f77aa0 verified 18 days ago

|

history blame contribute delete

2.95 kB

	---
	tags:
	- deberta-v3
	- cross-encoder
	- osmosis
	- response-sufficiency
	- binary-classification
	language: en
	license: mit
	base_model: MoritzLaurer/deberta-v3-base-zeroshot-v2.0
	datasets:
	- KingTechnician/yahoo-answers-osmosis-labeled
	- KingTechnician/triage-synthetic-data-v1
	---

	# OSMoSIS Binary Cross-Encoder

	DeBERTa-v3 cross-encoder for binary response-sufficiency classification.
	Given `(objective, response)`, predicts ADDR (response addresses the objective)
	or NOADDR (response does not).

	## Intended use

	First stage of a cascaded pipeline. Confident binary predictions are used
	directly; low-confidence cases should route to an LLM judge for fine-grained
	classification.

	Trained on Sonnet-4.6-generated labels (flat prompt, echo-stripped responses),
	validated against 254 human-reviewed labels for deployment-grade evaluation.

	## Performance

	### Yahoo within-domain test

	Accuracy: 0.806 \| Macro F1: 0.669

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|---\|---\|---\|---\|---\|
	\| ADDR \| 0.857 \| 0.909 \| 0.882 \| 798 \|
	\| NOADDR \| 0.526 \| 0.401 \| 0.455 \| 202 \|

	### Triage synthetic held-out (architecture validation)

	Accuracy: 0.995 \| Macro F1: 0.994

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|---\|---\|---\|---\|---\|
	\| ADDR \| 1.000 \| 0.987 \| 0.993 \| 150 \|
	\| NOADDR \| 0.991 \| 1.000 \| 0.996 \| 223 \|

	### Human gold-standard held-out

	Accuracy: 0.776 \| Macro F1: 0.680

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|---\|---\|---\|---\|---\|
	\| ADDR \| 0.824 \| 0.889 \| 0.855 \| 189 \|
	\| NOADDR \| 0.580 \| 0.446 \| 0.504 \| 65 \|


	## Training
	- Base: `MoritzLaurer/deberta-v3-base-zeroshot-v2.0` (NLI-pretrained)
	- Data: Yahoo Answers (echo-stripped) + Triage synthetic, joint training
	- Best epoch: 3 (selected by val macro-F1)
	- Batch size: 16, max length: 512, LR: 2e-05
	- Class weights: [0.724, 1.615]
	- Early stopping: patience 3 on val macro-F1

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("KingTechnician/osmosis-crossencoder-binary")
	tokenizer = AutoTokenizer.from_pretrained("KingTechnician/osmosis-crossencoder-binary")

	inputs = tokenizer("What causes rain?",
	"Rain forms when water vapor condenses into droplets.",
	return_tensors="pt", truncation=True, max_length=512)
	with torch.no_grad():
	logits = model(**inputs).logits
	pred = logits.argmax(dim=-1).item()
	print(["ADDR", "NOADDR"][pred])
	```

	## Limitations

	- NOADDR class is heterogeneous (on-topic-but-not-answering, tangential, off-topic
	all map to the same target). Sub-classification of NOADDR requires a stronger
	model — see the cascade evaluation in the OSMoSIS repo.
	- Synthetic Triage results (near-ceiling) validate the architecture but are not
	representative of open-domain difficulty. Use the human held-out number as the
	realistic deployment estimate.