--- language: - en license: apache-2.0 library_name: transformers tags: - text-classification - multi-label-classification - dialogue - conversational-ai - gricean-maxims - cooperative-communication - deberta - nlp - pragmatics datasets: - topical-chat metrics: - f1 - precision - recall - roc_auc pipeline_tag: text-classification base_model: microsoft/deberta-v3-base model-index: - name: GriceBench-Detector results: - task: type: text-classification name: Multi-Label Gricean Maxim Violation Detection dataset: name: Topical-Chat (GriceBench held-out split, N=1000) type: topical-chat split: test metrics: - type: f1 value: 0.955 name: Macro F1 - type: f1 value: 1.000 name: Quantity F1 - type: f1 value: 0.928 name: Quality F1 - type: f1 value: 1.000 name: Relation F1 - type: f1 value: 0.891 name: Manner F1 ---
# 🔍 GriceBench-Detector **Detects cooperative communication failures in AI dialogue — one Gricean maxim at a time.** [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![HuggingFace](https://img.shields.io/badge/🤗-GriceBench-yellow)](https://huggingface.co/Pushkar27) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) **Part of the GriceBench system** — [GitHub](https://github.com/PushkarPrabhath27/Research-Model) | [🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) | [⚡ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
--- ## What This Model Does GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities — one per maxim — enabling targeted, explainable repair downstream. | Output | Maxim | Violation Detected | Example | |--------|-------|-------------------|---------| | `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question | | `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name | | `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts | | `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references | Used in the full GriceBench pipeline, this detector helps achieve a **95.0% cooperative rate** — outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%). --- ## Quick Start ```python import torch import torch.nn as nn import json from transformers import AutoTokenizer, AutoModel class MaximDetector(nn.Module): def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4): super().__init__() self.encoder = AutoModel.from_pretrained(model_name) hidden = self.encoder.config.hidden_size self.classifiers = nn.ModuleList([ nn.Sequential( nn.Dropout(0.15), nn.Linear(hidden, hidden // 2), nn.GELU(), nn.Dropout(0.15), nn.Linear(hidden // 2, hidden // 4), nn.GELU(), nn.Dropout(0.15), nn.Linear(hidden // 4, 1) ) for _ in range(num_maxims) ]) def forward(self, input_ids, attention_mask): outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask) cls = outputs.last_hidden_state[:, 0, :] return torch.cat([head(cls) for head in self.classifiers], dim=1) tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base") model = MaximDetector() state_dict = torch.load("pytorch_model.pt", map_location="cpu") model.load_state_dict(state_dict) model.eval() with open("temperatures.json") as f: temperatures = json.load(f) def detect_violations(context: str, response: str, evidence: str = "") -> dict: input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}" inputs = tokenizer( input_text, return_tensors="pt", max_length=512, truncation=True, padding=True ) maxim_names = ["quantity", "quality", "relation", "manner"] temp_values = [ temperatures.get("quantity", 0.9), temperatures.get("quality", 0.55), temperatures.get("relation", 0.75), temperatures.get("manner", 0.45), ] with torch.no_grad(): logits = model(**inputs) probs, violations = {}, {} for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)): prob = torch.sigmoid(logits[0, i] / temp).item() probs[maxim] = round(prob, 4) violations[maxim] = prob > 0.5 return { "violations": violations, "probabilities": probs, "is_cooperative": not any(violations.values()) } result = detect_violations( context="What do you think about the latest developments in AI?", response="Yes.", evidence="AI has seen rapid advancement in large language models during 2024-2025." ) print(result) ``` --- ## Performance Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean). | Maxim | F1 | Precision | Recall | AUC-ROC | |-------|-----|-----------|--------|---------| | Quantity | **1.000** | 1.000 | 1.000 | 1.000 | | Quality | 0.928 | 0.866 | 1.000 | 0.999 | | Relation | **1.000** | 1.000 | 1.000 | 1.000 | | Manner | 0.891 | 0.864 | 0.919 | 0.979 | | **Macro Avg** | **0.955** | — | — | — | --- ## Architecture & Training - **Base model:** `microsoft/deberta-v3-base` (184M parameters) - **Heads:** 4 independent binary classification heads (one per maxim) - **Loss:** Focal Loss (α=0.25, γ=2.0) for class imbalance - **Calibration:** Per-head temperature scaling (see `temperatures.json`) - **Training data:** 4,012 examples (weak supervision + ~1,000 gold labels) - **Epochs:** 5 | **LR:** 2e-5 | **Hardware:** Kaggle T4 ×2, ~2–3 hours **Calibrated temperatures:** | Maxim | Temperature | Effect | |-------|-------------|--------| | Quantity | 0.90 | Slightly sharper | | Quality | 0.55 | Conservative (fewer false positives) | | Relation | 0.75 | Balanced | | Manner | 0.45 | Most conservative (subjective maxim) | --- ## Files | File | Description | |------|-------------| | `pytorch_model.pt` | Trained model weights | | `temperatures.json` | Per-maxim calibration temperatures | --- ## Limitations & Biases - **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set. - **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains. - **English-Only:** This model is trained and evaluated exclusively on English dialogue. - **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field. --- ## Citation ```bibtex @article{prabhath2026gricebench, title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation}, author={Prabhath, Pushkar}, year={2026}, note={Under review, EMNLP 2026} } ``` --- ## Related Models | Model | Role | Link | |-------|------|------| | GriceBench-Detector | Detects violations (this model) | You are here | | GriceBench-Repair | Repairs detected violations | [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) | | GriceBench-DPO | Generates cooperative responses | [⚡ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) | **GitHub:** https://github.com/PushkarPrabhath27/Research-Model --- ## Environmental Impact | Aspect | Value | |--------|-------| | Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) | | Training Time | ~3 hours | | Estimated Carbon Footprint | ~0.45 kg CO2eq