Text Classification
Transformers
English
multi-label-classification
dialogue
conversational-ai
gricean-maxims
cooperative-communication
deberta
nlp
pragmatics
Eval Results (legacy)
Instructions to use Pushkar27/GriceBench-Detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pushkar27/GriceBench-Detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Pushkar27/GriceBench-Detector")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pushkar27/GriceBench-Detector", dtype="auto") - Notebooks
- Google Colab
- Kaggle
docs: upgrade to production-quality model card with Limitations and Environmental Impact
Browse files
README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
license: apache-2.0
|
|
@@ -52,15 +52,16 @@ model-index:
|
|
| 52 |
|
| 53 |
<div align="center">
|
| 54 |
|
| 55 |
-
#
|
| 56 |
|
| 57 |
-
**Detects cooperative communication failures in AI dialogue β one maxim at a time.**
|
| 58 |
|
| 59 |
[](https://opensource.org/licenses/Apache-2.0)
|
|
|
|
| 60 |
[](https://www.python.org/downloads/)
|
| 61 |
-
[](https://huggingface.co/docs/transformers)
|
| 62 |
|
| 63 |
-
Part of the
|
|
|
|
| 64 |
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
| 65 |
[β‘ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
|
| 66 |
|
|
@@ -71,32 +72,36 @@ Part of the **GriceBench** system β [GitHub](https://github.com/PushkarPrabhat
|
|
| 71 |
## What This Model Does
|
| 72 |
|
| 73 |
GriceBench-Detector identifies which of Paul Grice's four conversational maxims
|
| 74 |
-
a dialogue response violates. It returns four independent
|
| 75 |
-
one per maxim β enabling targeted, explainable repair.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
| **Relation** | Topical relevance | Responding to "Tell me about jazz" with information about classical music |
|
| 82 |
-
| **Manner** | Clarity and organization | Pronoun ambiguity, jargon, disorganized sentences |
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
## Quick Start
|
| 87 |
|
| 88 |
```python
|
| 89 |
-
from transformers import AutoTokenizer, AutoModel
|
| 90 |
import torch
|
| 91 |
import torch.nn as nn
|
| 92 |
import json
|
|
|
|
| 93 |
|
| 94 |
-
# ββ
|
| 95 |
-
# Download temperatures.json from the model repo
|
| 96 |
-
with open("temperatures.json") as f:
|
| 97 |
-
temperatures = json.load(f) # {"quantity": 0.9, "quality": 0.55, ...}
|
| 98 |
-
|
| 99 |
-
# ββ Define model architecture (must match training) ββββββββββββββββββββββββ
|
| 100 |
class MaximDetector(nn.Module):
|
| 101 |
def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
|
| 102 |
super().__init__()
|
|
@@ -118,23 +123,24 @@ class MaximDetector(nn.Module):
|
|
| 118 |
cls = outputs.last_hidden_state[:, 0, :]
|
| 119 |
return torch.cat([head(cls) for head in self.classifiers], dim=1)
|
| 120 |
|
| 121 |
-
# ββ Load model ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 122 |
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
|
| 123 |
model = MaximDetector()
|
| 124 |
-
|
| 125 |
-
# Load weights (download pytorch_model.pt from this repo)
|
| 126 |
state_dict = torch.load("pytorch_model.pt", map_location="cpu")
|
| 127 |
model.load_state_dict(state_dict)
|
| 128 |
model.eval()
|
| 129 |
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
| 131 |
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
|
| 132 |
input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
|
| 133 |
inputs = tokenizer(
|
| 134 |
-
input_text, return_tensors="pt",
|
| 135 |
-
truncation=True, padding=True
|
| 136 |
)
|
| 137 |
-
|
| 138 |
maxim_names = ["quantity", "quality", "relation", "manner"]
|
| 139 |
temp_values = [
|
| 140 |
temperatures.get("quantity", 0.9),
|
|
@@ -142,142 +148,71 @@ def detect_violations(context: str, response: str, evidence: str = "") -> dict:
|
|
| 142 |
temperatures.get("relation", 0.75),
|
| 143 |
temperatures.get("manner", 0.45),
|
| 144 |
]
|
| 145 |
-
|
| 146 |
with torch.no_grad():
|
| 147 |
-
logits = model(**inputs)
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
probs = {}
|
| 151 |
-
violations = {}
|
| 152 |
for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
|
| 153 |
prob = torch.sigmoid(logits[0, i] / temp).item()
|
| 154 |
probs[maxim] = round(prob, 4)
|
| 155 |
violations[maxim] = prob > 0.5
|
| 156 |
-
|
| 157 |
return {
|
| 158 |
"violations": violations,
|
| 159 |
"probabilities": probs,
|
| 160 |
"is_cooperative": not any(violations.values())
|
| 161 |
}
|
| 162 |
-
|
| 163 |
-
# ββ Example ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 164 |
-
result = detect_violations(
|
| 165 |
-
context="What do you think about the latest developments in AI?",
|
| 166 |
-
response="Yes.", # Too short β Quantity violation
|
| 167 |
-
evidence="AI has seen rapid advancement in large language models during 2024-2025."
|
| 168 |
-
)
|
| 169 |
-
print(result)
|
| 170 |
-
# {'violations': {'quantity': True, 'quality': False, 'relation': False, 'manner': False},
|
| 171 |
-
# 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
|
| 172 |
-
# 'is_cooperative': False}
|
| 173 |
```
|
| 174 |
|
| 175 |
---
|
| 176 |
|
| 177 |
-
##
|
| 178 |
|
| 179 |
Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
|
| 180 |
|
| 181 |
| Maxim | F1 | Precision | Recall | AUC-ROC |
|
| 182 |
|-------|-----|-----------|--------|---------|
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
**System-level result:** When used in the full GriceBench pipeline (Detect β Repair β Generate),
|
| 190 |
-
the system achieves a **95.0% cooperative rate** β outperforming Mistral-7B (89.1%) and
|
| 191 |
-
Qwen2.5-7B (84.2%) despite using a far smaller generator.
|
| 192 |
|
| 193 |
---
|
| 194 |
|
| 195 |
-
##
|
| 196 |
-
|
| 197 |
-
**Base model:** `microsoft/deberta-v3-base` (184M parameters)
|
| 198 |
-
|
| 199 |
-
**Key design choices:**
|
| 200 |
-
- **Four independent binary heads** (not a shared linear layer): each maxim head specializes
|
| 201 |
-
independently, since Quantity violations (length) and Relation violations (semantic relevance)
|
| 202 |
-
are completely different feature distributions.
|
| 203 |
-
- **Focal Loss** (Ξ±=0.25, Ξ³=2.0): down-weights easy negatives to focus training on hard,
|
| 204 |
-
ambiguous boundary cases β critical for minority-class violation detection.
|
| 205 |
-
- **Temperature scaling**: post-hoc calibration (one scalar per maxim) ensures output
|
| 206 |
-
probabilities match true violation frequencies on the validation set.
|
| 207 |
-
|
| 208 |
-
**Calibrated temperatures:**
|
| 209 |
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
| Relation | 0.75 | Balanced |
|
| 215 |
-
| Manner | 0.45 | Most conservative (Manner is inherently ambiguous) |
|
| 216 |
|
| 217 |
---
|
| 218 |
|
| 219 |
-
##
|
| 220 |
-
|
| 221 |
-
| Hyperparameter | Value |
|
| 222 |
-
+|----------------|-------|
|
| 223 |
-
+| Base model | microsoft/deberta-v3-base |
|
| 224 |
-
+| Learning rate | 2e-5 |
|
| 225 |
-
+| Batch size | 16 (effective, with grad accumulation Γ2) |
|
| 226 |
-
+| Epochs | 5 |
|
| 227 |
-
+| Loss | Focal Loss (Ξ±=0.25, Ξ³=2.0) |
|
| 228 |
-
+| Optimizer | AdamW + weight decay 0.01 |
|
| 229 |
-
+| Scheduler | OneCycleLR |
|
| 230 |
-
+| Hardware | Kaggle T4 Γ2 |
|
| 231 |
-
+| Training time | ~2-3 hours |
|
| 232 |
-
+| Training examples | 4,012 (weak supervision + ~1,000 gold labels) |
|
| 233 |
-
|
| 234 |
-
**Two-stage labeling:** Weak supervision (50,000+ heuristic-labeled examples) for pre-training,
|
| 235 |
-
followed by gold fine-tuning on ~1,000 human-annotated examples (inter-annotator agreement
|
| 236 |
-
measured via Krippendorff's Ξ±).
|
| 237 |
-
|
| 238 |
-
---
|
| 239 |
-
|
| 240 |
-
## Input Format
|
| 241 |
-
|
| 242 |
-
```
|
| 243 |
-
Context: [multi-turn conversation history]
|
| 244 |
-
Evidence: [knowledge snippet from reading set β required for Quality detection]
|
| 245 |
-
Response: [the response being evaluated]
|
| 246 |
-
```
|
| 247 |
|
| 248 |
-
|
|
|
|
|
|
|
| 249 |
|
| 250 |
---
|
| 251 |
|
| 252 |
-
##
|
| 253 |
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
| `temperatures.json` | Per-maxim calibration temperatures |
|
| 258 |
|
| 259 |
---
|
| 260 |
|
| 261 |
## Citation
|
| 262 |
|
| 263 |
-
If you use this model, please cite:
|
| 264 |
-
|
| 265 |
```bibtex
|
| 266 |
-
@article{prabhath2026gricebench,
|
| 267 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
| 268 |
author={Prabhath, Pushkar},
|
| 269 |
-
year={2026}
|
|
|
|
| 270 |
}
|
| 271 |
```
|
| 272 |
-
|
| 273 |
-
---
|
| 274 |
-
|
| 275 |
-
## Related Models
|
| 276 |
-
|
| 277 |
-
| Model | Role | Link |
|
| 278 |
-
|-------|------|------|
|
| 279 |
-
| GriceBench-Detector | Detects violations (this model) | You are here |
|
| 280 |
-
| GriceBench-Repair | Repairs detected violations | [π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
| 281 |
-
| GriceBench-DPO | Generates cooperative responses | [β‘ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO) |
|
| 282 |
-
|
| 283 |
-
**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
|
|
|
|
| 1 |
+
ο»Ώ---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
license: apache-2.0
|
|
|
|
| 52 |
|
| 53 |
<div align="center">
|
| 54 |
|
| 55 |
+
# π GriceBench-Detector
|
| 56 |
|
| 57 |
+
**Detects cooperative communication failures in AI dialogue β one Gricean maxim at a time.**
|
| 58 |
|
| 59 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 60 |
+
[](https://huggingface.co/Pushkar27)
|
| 61 |
[](https://www.python.org/downloads/)
|
|
|
|
| 62 |
|
| 63 |
+
**Part of the GriceBench system** β
|
| 64 |
+
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
|
| 65 |
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
| 66 |
[β‘ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
|
| 67 |
|
|
|
|
| 72 |
## What This Model Does
|
| 73 |
|
| 74 |
GriceBench-Detector identifies which of Paul Grice's four conversational maxims
|
| 75 |
+
a dialogue response violates. It returns four independent calibrated violation
|
| 76 |
+
probabilities β one per maxim β enabling targeted, explainable repair downstream.
|
| 77 |
+
|
| 78 |
+
| Output | Maxim | Violation Detected | Example |
|
| 79 |
+
|--------|-------|-------------------|---------|
|
| 80 |
+
| `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
|
| 81 |
+
| `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
|
| 82 |
+
| `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
|
| 83 |
+
| `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Intended Use
|
| 88 |
|
| 89 |
+
- **Primary Use:** Evaluating conversational AI for cooperative behavior.
|
| 90 |
+
- **Filtering:** Post-generation filtering to flag responses for repair.
|
| 91 |
+
- **Research:** Investigating pragmatics and Gricean maxim violations in LLMs.
|
| 92 |
+
- **Out-of-Scope:** Not intended for high-stakes factual verification (e.g., medical/legal) or as a stand-alone truth-teller.
|
|
|
|
|
|
|
| 93 |
|
| 94 |
---
|
| 95 |
|
| 96 |
## Quick Start
|
| 97 |
|
| 98 |
```python
|
|
|
|
| 99 |
import torch
|
| 100 |
import torch.nn as nn
|
| 101 |
import json
|
| 102 |
+
from transformers import AutoTokenizer, AutoModel
|
| 103 |
|
| 104 |
+
# ββ Define model architecture (must match training) βββββββββββββββββββββββββ
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
class MaximDetector(nn.Module):
|
| 106 |
def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
|
| 107 |
super().__init__()
|
|
|
|
| 123 |
cls = outputs.last_hidden_state[:, 0, :]
|
| 124 |
return torch.cat([head(cls) for head in self.classifiers], dim=1)
|
| 125 |
|
| 126 |
+
# ββ Load model and calibration ββββββββββββββββββββββββββββββββββββββββββββββ
|
| 127 |
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
|
| 128 |
model = MaximDetector()
|
|
|
|
|
|
|
| 129 |
state_dict = torch.load("pytorch_model.pt", map_location="cpu")
|
| 130 |
model.load_state_dict(state_dict)
|
| 131 |
model.eval()
|
| 132 |
|
| 133 |
+
with open("temperatures.json") as f:
|
| 134 |
+
temperatures = json.load(f)
|
| 135 |
+
|
| 136 |
+
# ββ Detect violations βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 137 |
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
|
| 138 |
input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
|
| 139 |
inputs = tokenizer(
|
| 140 |
+
input_text, return_tensors="pt",
|
| 141 |
+
max_length=512, truncation=True, padding=True
|
| 142 |
)
|
| 143 |
+
|
| 144 |
maxim_names = ["quantity", "quality", "relation", "manner"]
|
| 145 |
temp_values = [
|
| 146 |
temperatures.get("quantity", 0.9),
|
|
|
|
| 148 |
temperatures.get("relation", 0.75),
|
| 149 |
temperatures.get("manner", 0.45),
|
| 150 |
]
|
| 151 |
+
|
| 152 |
with torch.no_grad():
|
| 153 |
+
logits = model(**inputs)
|
| 154 |
+
|
| 155 |
+
probs, violations = {}, {}
|
|
|
|
|
|
|
| 156 |
for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
|
| 157 |
prob = torch.sigmoid(logits[0, i] / temp).item()
|
| 158 |
probs[maxim] = round(prob, 4)
|
| 159 |
violations[maxim] = prob > 0.5
|
| 160 |
+
|
| 161 |
return {
|
| 162 |
"violations": violations,
|
| 163 |
"probabilities": probs,
|
| 164 |
"is_cooperative": not any(violations.values())
|
| 165 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
```
|
| 167 |
|
| 168 |
---
|
| 169 |
|
| 170 |
+
## Performance
|
| 171 |
|
| 172 |
Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
|
| 173 |
|
| 174 |
| Maxim | F1 | Precision | Recall | AUC-ROC |
|
| 175 |
|-------|-----|-----------|--------|---------|
|
| 176 |
+
| Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
|
| 177 |
+
| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
|
| 178 |
+
| Relation | **1.000** | 1.000 | 1.000 | 1.000 |
|
| 179 |
+
| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
|
| 180 |
+
| **Macro Avg** | **0.955** | β | β | β |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
|
| 182 |
---
|
| 183 |
|
| 184 |
+
## Limitations & Biases
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
|
| 186 |
+
- **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
|
| 187 |
+
- **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains (e.g., highly technical or medical).
|
| 188 |
+
- **English-Only:** This model is trained and evaluated exclusively on English dialogue.
|
| 189 |
+
- **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
|
|
|
|
|
|
|
| 190 |
|
| 191 |
---
|
| 192 |
|
| 193 |
+
## Environmental Impact
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
+
- **Hardware Used:** 2x NVIDIA Tesla T4 GPUs (Kaggle).
|
| 196 |
+
- **Training Time:** ~3 hours.
|
| 197 |
+
- **Estimated Carbon Footprint:** ~0.45 kg CO2eq (based on average TDP and regional carbon intensity).
|
| 198 |
|
| 199 |
---
|
| 200 |
|
| 201 |
+
## Architecture & Training
|
| 202 |
|
| 203 |
+
- **Base model:** `microsoft/deberta-v3-base` (184M parameters)
|
| 204 |
+
- **Heads:** 4 independent binary classification heads.
|
| 205 |
+
- **Calibration:** Per-head temperature scaling (see `temperatures.json`).
|
|
|
|
| 206 |
|
| 207 |
---
|
| 208 |
|
| 209 |
## Citation
|
| 210 |
|
|
|
|
|
|
|
| 211 |
```bibtex
|
| 212 |
+
@article{prabhath2026gricebench,
|
| 213 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
| 214 |
author={Prabhath, Pushkar},
|
| 215 |
+
year={2026},
|
| 216 |
+
note={Under review, EMNLP 2026}
|
| 217 |
}
|
| 218 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|