Text Classification
Transformers
English
multi-label-classification
dialogue
conversational-ai
gricean-maxims
cooperative-communication
deberta
nlp
pragmatics
Eval Results (legacy)
Instructions to use Pushkar27/GriceBench-Detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pushkar27/GriceBench-Detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Pushkar27/GriceBench-Detector")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pushkar27/GriceBench-Detector", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 7,889 Bytes
fca1e39 9d98903 fca1e39 9d98903 fca1e39 9d98903 6824a07 9d98903 fca1e39 9d98903 fca1e39 6824a07 fca1e39 9d98903 d923478 9d98903 d923478 9d98903 d923478 9d98903 d923478 9d98903 6824a07 d923478 9d98903 d923478 9d98903 d923478 9d98903 d923478 ebfe70b d923478 9d98903 d923478 9d98903 d923478 9d98903 d923478 9d98903 d923478 fca1e39 9d98903 ebfe70b 9d98903 6824a07 9d98903 ebfe70b 9d98903 6824a07 9d98903 ebfe70b 9d98903 ebfe70b 9d98903 ebfe70b 9d98903 6824a07 ebfe70b 9d98903 ebfe70b 9d98903 fca1e39 6824a07 fca1e39 d923478 9d98903 ebfe70b 9d98903 ebfe70b 9d98903 d923478 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | ---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- text-classification
- multi-label-classification
- dialogue
- conversational-ai
- gricean-maxims
- cooperative-communication
- deberta
- nlp
- pragmatics
datasets:
- topical-chat
metrics:
- f1
- precision
- recall
- roc_auc
pipeline_tag: text-classification
base_model: microsoft/deberta-v3-base
model-index:
- name: GriceBench-Detector
results:
- task:
type: text-classification
name: Multi-Label Gricean Maxim Violation Detection
dataset:
name: Topical-Chat (GriceBench held-out split, N=1000)
type: topical-chat
split: test
metrics:
- type: f1
value: 0.955
name: Macro F1
- type: f1
value: 1.000
name: Quantity F1
- type: f1
value: 0.928
name: Quality F1
- type: f1
value: 1.000
name: Relation F1
- type: f1
value: 0.891
name: Manner F1
---
<div align="center">
# π GriceBench-Detector
**Detects cooperative communication failures in AI dialogue β one Gricean maxim at a time.**
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/Pushkar27)
[](https://www.python.org/downloads/)
**Part of the GriceBench system** β
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
[β‘ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
</div>
---
## What This Model Does
GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β one per maxim β enabling targeted, explainable repair downstream.
| Output | Maxim | Violation Detected | Example |
|--------|-------|-------------------|---------|
| `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
| `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
| `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
| `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
Used in the full GriceBench pipeline, this detector helps achieve a **95.0% cooperative rate** β outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
---
## Quick Start
```python
import torch
import torch.nn as nn
import json
from transformers import AutoTokenizer, AutoModel
class MaximDetector(nn.Module):
def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
super().__init__()
self.encoder = AutoModel.from_pretrained(model_name)
hidden = self.encoder.config.hidden_size
self.classifiers = nn.ModuleList([
nn.Sequential(
nn.Dropout(0.15),
nn.Linear(hidden, hidden // 2), nn.GELU(),
nn.Dropout(0.15),
nn.Linear(hidden // 2, hidden // 4), nn.GELU(),
nn.Dropout(0.15),
nn.Linear(hidden // 4, 1)
) for _ in range(num_maxims)
])
def forward(self, input_ids, attention_mask):
outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
cls = outputs.last_hidden_state[:, 0, :]
return torch.cat([head(cls) for head in self.classifiers], dim=1)
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
model = MaximDetector()
state_dict = torch.load("pytorch_model.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
with open("temperatures.json") as f:
temperatures = json.load(f)
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
inputs = tokenizer(
input_text, return_tensors="pt",
max_length=512, truncation=True, padding=True
)
maxim_names = ["quantity", "quality", "relation", "manner"]
temp_values = [
temperatures.get("quantity", 0.9),
temperatures.get("quality", 0.55),
temperatures.get("relation", 0.75),
temperatures.get("manner", 0.45),
]
with torch.no_grad():
logits = model(**inputs)
probs, violations = {}, {}
for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
prob = torch.sigmoid(logits[0, i] / temp).item()
probs[maxim] = round(prob, 4)
violations[maxim] = prob > 0.5
return {
"violations": violations,
"probabilities": probs,
"is_cooperative": not any(violations.values())
}
result = detect_violations(
context="What do you think about the latest developments in AI?",
response="Yes.",
evidence="AI has seen rapid advancement in large language models during 2024-2025."
)
print(result)
```
---
## Performance
Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
| Maxim | F1 | Precision | Recall | AUC-ROC |
|-------|-----|-----------|--------|---------|
| Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
| Relation | **1.000** | 1.000 | 1.000 | 1.000 |
| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
| **Macro Avg** | **0.955** | β | β | β |
---
## Architecture & Training
- **Base model:** `microsoft/deberta-v3-base` (184M parameters)
- **Heads:** 4 independent binary classification heads (one per maxim)
- **Loss:** Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
- **Calibration:** Per-head temperature scaling (see `temperatures.json`)
- **Training data:** 4,012 examples (weak supervision + ~1,000 gold labels)
- **Epochs:** 5 | **LR:** 2e-5 | **Hardware:** Kaggle T4 Γ2, ~2β3 hours
**Calibrated temperatures:**
| Maxim | Temperature | Effect |
|-------|-------------|--------|
| Quantity | 0.90 | Slightly sharper |
| Quality | 0.55 | Conservative (fewer false positives) |
| Relation | 0.75 | Balanced |
| Manner | 0.45 | Most conservative (subjective maxim) |
---
## Files
| File | Description |
|------|-------------|
| `pytorch_model.pt` | Trained model weights |
| `temperatures.json` | Per-maxim calibration temperatures |
---
## Limitations & Biases
- **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
- **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
- **English-Only:** This model is trained and evaluated exclusively on English dialogue.
- **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
---
## Citation
```bibtex
@article{prabhath2026gricebench,
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
author={Prabhath, Pushkar},
year={2026},
note={Under review, EMNLP 2026}
}
```
---
## Related Models
| Model | Role | Link |
|-------|------|------|
| GriceBench-Detector | Detects violations (this model) | You are here |
| GriceBench-Repair | Repairs detected violations | [π§ Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
| GriceBench-DPO | Generates cooperative responses | [β‘ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) |
**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
---
## Environmental Impact
| Aspect | Value |
|--------|-------|
| Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) |
| Training Time | ~3 hours |
| Estimated Carbon Footprint | ~0.45 kg CO2eq |