language:
- en
license: apache-2.0
library_name: transformers
tags:
- text-classification
- multi-label-classification
- dialogue
- conversational-ai
- gricean-maxims
- cooperative-communication
- deberta
- nlp
- pragmatics
datasets:
- topical-chat
metrics:
- f1
- precision
- recall
- roc_auc
pipeline_tag: text-classification
base_model: microsoft/deberta-v3-base
model-index:
- name: GriceBench-Detector
results:
- task:
type: text-classification
name: Multi-Label Gricean Maxim Violation Detection
dataset:
name: Topical-Chat (GriceBench held-out split, N=1000)
type: topical-chat
split: test
metrics:
- type: f1
value: 0.955
name: Macro F1
- type: f1
value: 1
name: Quantity F1
- type: f1
value: 0.928
name: Quality F1
- type: f1
value: 1
name: Relation F1
- type: f1
value: 0.891
name: Manner F1
π GriceBench-Detector
Detects cooperative communication failures in AI dialogue β one Gricean maxim at a time.
Part of the GriceBench system β GitHub | π§ Repair Model | β‘ DPO Generator
What This Model Does
GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β one per maxim β enabling targeted, explainable repair downstream.
| Output | Maxim | Violation Detected | Example |
|---|---|---|---|
quantity_prob |
Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
quality_prob |
Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
relation_prob |
Relation | Off-topic response | Jazz question answered with classical music facts |
manner_prob |
Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate β outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
Quick Start
import torch
import torch.nn as nn
import json
from transformers import AutoTokenizer, AutoModel
class MaximDetector(nn.Module):
def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
super().__init__()
self.encoder = AutoModel.from_pretrained(model_name)
hidden = self.encoder.config.hidden_size
self.classifiers = nn.ModuleList([
nn.Sequential(
nn.Dropout(0.15),
nn.Linear(hidden, hidden // 2), nn.GELU(),
nn.Dropout(0.15),
nn.Linear(hidden // 2, hidden // 4), nn.GELU(),
nn.Dropout(0.15),
nn.Linear(hidden // 4, 1)
) for _ in range(num_maxims)
])
def forward(self, input_ids, attention_mask):
outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
cls = outputs.last_hidden_state[:, 0, :]
return torch.cat([head(cls) for head in self.classifiers], dim=1)
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
model = MaximDetector()
state_dict = torch.load("pytorch_model.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
with open("temperatures.json") as f:
temperatures = json.load(f)
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
inputs = tokenizer(
input_text, return_tensors="pt",
max_length=512, truncation=True, padding=True
)
maxim_names = ["quantity", "quality", "relation", "manner"]
temp_values = [
temperatures.get("quantity", 0.9),
temperatures.get("quality", 0.55),
temperatures.get("relation", 0.75),
temperatures.get("manner", 0.45),
]
with torch.no_grad():
logits = model(**inputs)
probs, violations = {}, {}
for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
prob = torch.sigmoid(logits[0, i] / temp).item()
probs[maxim] = round(prob, 4)
violations[maxim] = prob > 0.5
return {
"violations": violations,
"probabilities": probs,
"is_cooperative": not any(violations.values())
}
result = detect_violations(
context="What do you think about the latest developments in AI?",
response="Yes.",
evidence="AI has seen rapid advancement in large language models during 2024-2025."
)
print(result)
Performance
Evaluated on 1,000 held-out Topical-Chat dialogue turns (500 violation-injected, 500 clean).
| Maxim | F1 | Precision | Recall | AUC-ROC |
|---|---|---|---|---|
| Quantity | 1.000 | 1.000 | 1.000 | 1.000 |
| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
| Relation | 1.000 | 1.000 | 1.000 | 1.000 |
| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
| Macro Avg | 0.955 | β | β | β |
Architecture & Training
- Base model:
microsoft/deberta-v3-base(184M parameters) - Heads: 4 independent binary classification heads (one per maxim)
- Loss: Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
- Calibration: Per-head temperature scaling (see
temperatures.json) - Training data: 4,012 examples (weak supervision + ~1,000 gold labels)
- Epochs: 5 | LR: 2e-5 | Hardware: Kaggle T4 Γ2, ~2β3 hours
Calibrated temperatures:
| Maxim | Temperature | Effect |
|---|---|---|
| Quantity | 0.90 | Slightly sharper |
| Quality | 0.55 | Conservative (fewer false positives) |
| Relation | 0.75 | Balanced |
| Manner | 0.45 | Most conservative (subjective maxim) |
Files
| File | Description |
|---|---|
pytorch_model.pt |
Trained model weights |
temperatures.json |
Per-maxim calibration temperatures |
Limitations & Biases
- Subjectivity: The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
- Domain Specificity: Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
- English-Only: This model is trained and evaluated exclusively on English dialogue.
- Prompt Sensitivity: Detection results can be sensitive to the formatting of the "Evidence" field.
Citation
@article{prabhath2026gricebench,
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
author={Prabhath, Pushkar},
year={2026},
note={Under review, EMNLP 2026}
}
Related Models
| Model | Role | Link |
|---|---|---|
| GriceBench-Detector | Detects violations (this model) | You are here |
| GriceBench-Repair | Repairs detected violations | π§ Repair |
| GriceBench-DPO | Generates cooperative responses | β‘ DPO |
GitHub: https://github.com/PushkarPrabhath27/Research-Model
Environmental Impact
| Aspect | Value |
|---|---|
| Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) |
| Training Time | ~3 hours |
| Estimated Carbon Footprint | ~0.45 kg CO2eq |