CRITICAL: Remove all escaped underscores from YAML metadata

6824a07 18 days ago

7.89 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- text-classification
	- multi-label-classification
	- dialogue
	- conversational-ai
	- gricean-maxims
	- cooperative-communication
	- deberta
	- nlp
	- pragmatics
	datasets:
	- topical-chat
	metrics:
	- f1
	- precision
	- recall
	- roc_auc
	pipeline_tag: text-classification
	base_model: microsoft/deberta-v3-base
	model-index:
	- name: GriceBench-Detector
	results:
	- task:
	type: text-classification
	name: Multi-Label Gricean Maxim Violation Detection
	dataset:
	name: Topical-Chat (GriceBench held-out split, N=1000)
	type: topical-chat
	split: test
	metrics:
	- type: f1
	value: 0.955
	name: Macro F1
	- type: f1
	value: 1.000
	name: Quantity F1
	- type: f1
	value: 0.928
	name: Quality F1
	- type: f1
	value: 1.000
	name: Relation F1
	- type: f1
	value: 0.891
	name: Manner F1
	---

	<div align="center">

	# 🔍 GriceBench-Detector

	Detects cooperative communication failures in AI dialogue — one Gricean maxim at a time.

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![HuggingFace](https://img.shields.io/badge/🤗-GriceBench-yellow)](https://huggingface.co/Pushkar27)
	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

	Part of the GriceBench system —
	[GitHub](https://github.com/PushkarPrabhath27/Research-Model) \|
	[🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) \|
	[⚡ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)

	</div>

	---

	## What This Model Does

	GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities — one per maxim — enabling targeted, explainable repair downstream.

	\| Output \| Maxim \| Violation Detected \| Example \|
	\|--------\|-------\|-------------------\|---------\|
	\| `quantity_prob` \| Quantity \| Response too short (<8 words) or too long (>38 words) \| "Yes." to a detailed question \|
	\| `quality_prob` \| Quality \| Factually inconsistent with knowledge evidence \| Wrong date, incorrect name \|
	\| `relation_prob` \| Relation \| Off-topic response \| Jazz question answered with classical music facts \|
	\| `manner_prob` \| Manner \| Ambiguous, jargon-heavy, or disorganized \| Unclear pronoun references \|

	Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate — outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).

	---

	## Quick Start

	```python
	import torch
	import torch.nn as nn
	import json
	from transformers import AutoTokenizer, AutoModel

	class MaximDetector(nn.Module):
	def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
	super().__init__()
	self.encoder = AutoModel.from_pretrained(model_name)
	hidden = self.encoder.config.hidden_size
	self.classifiers = nn.ModuleList([
	nn.Sequential(
	nn.Dropout(0.15),
	nn.Linear(hidden, hidden // 2), nn.GELU(),
	nn.Dropout(0.15),
	nn.Linear(hidden // 2, hidden // 4), nn.GELU(),
	nn.Dropout(0.15),
	nn.Linear(hidden // 4, 1)
	) for _ in range(num_maxims)
	])

	def forward(self, input_ids, attention_mask):
	outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
	cls = outputs.last_hidden_state[:, 0, :]
	return torch.cat([head(cls) for head in self.classifiers], dim=1)

	tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
	model = MaximDetector()
	state_dict = torch.load("pytorch_model.pt", map_location="cpu")
	model.load_state_dict(state_dict)
	model.eval()

	with open("temperatures.json") as f:
	temperatures = json.load(f)

	def detect_violations(context: str, response: str, evidence: str = "") -> dict:
	input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
	inputs = tokenizer(
	input_text, return_tensors="pt",
	max_length=512, truncation=True, padding=True
	)

	maxim_names = ["quantity", "quality", "relation", "manner"]
	temp_values = [
	temperatures.get("quantity", 0.9),
	temperatures.get("quality", 0.55),
	temperatures.get("relation", 0.75),
	temperatures.get("manner", 0.45),
	]

	with torch.no_grad():
	logits = model(**inputs)

	probs, violations = {}, {}
	for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
	prob = torch.sigmoid(logits[0, i] / temp).item()
	probs[maxim] = round(prob, 4)
	violations[maxim] = prob > 0.5

	return {
	"violations": violations,
	"probabilities": probs,
	"is_cooperative": not any(violations.values())
	}

	result = detect_violations(
	context="What do you think about the latest developments in AI?",
	response="Yes.",
	evidence="AI has seen rapid advancement in large language models during 2024-2025."
	)
	print(result)
	```

	---

	## Performance

	Evaluated on 1,000 held-out Topical-Chat dialogue turns (500 violation-injected, 500 clean).

	\| Maxim \| F1 \| Precision \| Recall \| AUC-ROC \|
	\|-------\|-----\|-----------\|--------\|---------\|
	\| Quantity \| 1.000 \| 1.000 \| 1.000 \| 1.000 \|
	\| Quality \| 0.928 \| 0.866 \| 1.000 \| 0.999 \|
	\| Relation \| 1.000 \| 1.000 \| 1.000 \| 1.000 \|
	\| Manner \| 0.891 \| 0.864 \| 0.919 \| 0.979 \|
	\| Macro Avg \| 0.955 \| — \| — \| — \|

	---

	## Architecture & Training

	- Base model: `microsoft/deberta-v3-base` (184M parameters)
	- Heads: 4 independent binary classification heads (one per maxim)
	- Loss: Focal Loss (α=0.25, γ=2.0) for class imbalance
	- Calibration: Per-head temperature scaling (see `temperatures.json`)
	- Training data: 4,012 examples (weak supervision + ~1,000 gold labels)
	- Epochs: 5 \| LR: 2e-5 \| Hardware: Kaggle T4 ×2, ~2–3 hours

	Calibrated temperatures:

	\| Maxim \| Temperature \| Effect \|
	\|-------\|-------------\|--------\|
	\| Quantity \| 0.90 \| Slightly sharper \|
	\| Quality \| 0.55 \| Conservative (fewer false positives) \|
	\| Relation \| 0.75 \| Balanced \|
	\| Manner \| 0.45 \| Most conservative (subjective maxim) \|

	---

	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `pytorch_model.pt` \| Trained model weights \|
	\| `temperatures.json` \| Per-maxim calibration temperatures \|

	---

	## Limitations & Biases

	- Subjectivity: The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
	- Domain Specificity: Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
	- English-Only: This model is trained and evaluated exclusively on English dialogue.
	- Prompt Sensitivity: Detection results can be sensitive to the formatting of the "Evidence" field.

	---

	## Citation

	```bibtex
	@article{prabhath2026gricebench,
	title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
	author={Prabhath, Pushkar},
	year={2026},
	note={Under review, EMNLP 2026}
	}
	```

	---

	## Related Models

	\| Model \| Role \| Link \|
	\|-------\|------\|------\|
	\| GriceBench-Detector \| Detects violations (this model) \| You are here \|
	\| GriceBench-Repair \| Repairs detected violations \| [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) \|
	\| GriceBench-DPO \| Generates cooperative responses \| [⚡ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) \|

	GitHub: https://github.com/PushkarPrabhath27/Research-Model

	---

	## Environmental Impact

	\| Aspect \| Value \|
	\|--------\|-------\|
	\| Hardware Used \| 2x NVIDIA Tesla T4 GPUs (Kaggle) \|
	\| Training Time \| ~3 hours \|
	\| Estimated Carbon Footprint \| ~0.45 kg CO2eq