Text Classification
Transformers
English
multi-label-classification
dialogue
conversational-ai
gricean-maxims
cooperative-communication
deberta
nlp
pragmatics
Eval Results (legacy)
Instructions to use Pushkar27/GriceBench-Detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pushkar27/GriceBench-Detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Pushkar27/GriceBench-Detector")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pushkar27/GriceBench-Detector", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Fix YAML metadata - proper list syntax with - prefix, complete model-index with evaluation results
Browse files
README.md
CHANGED
|
@@ -30,7 +30,7 @@ model-index:
|
|
| 30 |
name: Multi-Label Gricean Maxim Violation Detection
|
| 31 |
dataset:
|
| 32 |
name: Topical-Chat (GriceBench held-out split, N=1000)
|
| 33 |
-
type:
|
| 34 |
split: test
|
| 35 |
metrics:
|
| 36 |
- type: f1
|
|
@@ -50,40 +50,42 @@ model-index:
|
|
| 50 |
name: Manner F1
|
| 51 |
---
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
| 56 |
|
|
|
|
| 57 |
|
| 58 |
-
License-Apache%202.0-blue.svg
|
|
|
|
|
|
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
|
|
|
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
Part of the GriceBench system β
|
| 68 |
-
|
| 69 |
-
GitHub |
|
| 70 |
|
| 71 |
-
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
-
GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β one per maxim β enabling targeted, explainable repair downstream.
|
| 78 |
|
| 79 |
-
|
| 80 |
-
quantity_prob Quantity Response too short (<8 words) or too long (>38 words) "Yes." to a detailed question
|
| 81 |
-
quality_prob Quality Factually inconsistent with knowledge evidence Wrong date, incorrect name
|
| 82 |
-
relation_prob Relation Off-topic response Jazz question answered with classical music facts
|
| 83 |
-
manner_prob Manner Ambiguous, jargon-heavy, or disorganized Unclear pronoun references
|
| 84 |
-
Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate β outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
|
| 85 |
|
| 86 |
-
Quick Start
|
| 87 |
```python
|
| 88 |
import torch
|
| 89 |
import torch.nn as nn
|
|
@@ -125,7 +127,7 @@ with open("temperatures.json") as f:
|
|
| 125 |
|
| 126 |
# ββ Detect violations βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 127 |
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
|
| 128 |
-
input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
|
| 129 |
inputs = tokenizer(
|
| 130 |
input_text, return_tensors="pt",
|
| 131 |
max_length=512, truncation=True, padding=True
|
|
@@ -165,39 +167,63 @@ print(result)
|
|
| 165 |
# 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
|
| 166 |
# 'is_cooperative': False}
|
| 167 |
```
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
```bibtex
|
| 202 |
@article{prabhath2026gricebench,
|
| 203 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
|
@@ -206,15 +232,25 @@ Citation
|
|
| 206 |
note={Under review, EMNLP 2026}
|
| 207 |
}
|
| 208 |
```
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
name: Multi-Label Gricean Maxim Violation Detection
|
| 31 |
dataset:
|
| 32 |
name: Topical-Chat (GriceBench held-out split, N=1000)
|
| 33 |
+
type: topical_chat
|
| 34 |
split: test
|
| 35 |
metrics:
|
| 36 |
- type: f1
|
|
|
|
| 50 |
name: Manner F1
|
| 51 |
---
|
| 52 |
|
| 53 |
+
<div align="center">
|
| 54 |
|
| 55 |
+
# π GriceBench-Detector
|
| 56 |
|
| 57 |
+
**Detects cooperative communication failures in AI dialogue β one Gricean maxim at a time.**
|
| 58 |
|
| 59 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 60 |
+
[](https://huggingface.co/Pushkar27)
|
| 61 |
+
[](https://www.python.org/downloads/)
|
| 62 |
|
| 63 |
+
**Part of the GriceBench system** β
|
| 64 |
+
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
|
| 65 |
+
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
| 66 |
+
[β‘ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
|
| 67 |
|
| 68 |
+
</div>
|
| 69 |
|
| 70 |
+
---
|
| 71 |
|
| 72 |
+
## What This Model Does
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β one per maxim β enabling targeted, explainable repair downstream.
|
| 75 |
|
| 76 |
+
| Output | Maxim | Violation Detected | Example |
|
| 77 |
+
|--------|-------|-------------------|---------|
|
| 78 |
+
| `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
|
| 79 |
+
| `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
|
| 80 |
+
| `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
|
| 81 |
+
| `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
|
| 82 |
|
| 83 |
+
Used in the full GriceBench pipeline, this detector helps achieve a **95.0% cooperative rate** β outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
|
| 84 |
|
| 85 |
+
---
|
|
|
|
| 86 |
|
| 87 |
+
## Quick Start
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
|
|
|
| 89 |
```python
|
| 90 |
import torch
|
| 91 |
import torch.nn as nn
|
|
|
|
| 127 |
|
| 128 |
# ββ Detect violations βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 129 |
def detect_violations(context: str, response: str, evidence: str = "") -> dict:
|
| 130 |
+
input_text = f"Context: {context}\n\nEvidence: {evidence}\n\nResponse: {response}"
|
| 131 |
inputs = tokenizer(
|
| 132 |
input_text, return_tensors="pt",
|
| 133 |
max_length=512, truncation=True, padding=True
|
|
|
|
| 167 |
# 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
|
| 168 |
# 'is_cooperative': False}
|
| 169 |
```
|
| 170 |
+
|
| 171 |
+
---
|
| 172 |
+
|
| 173 |
+
## Performance
|
| 174 |
+
|
| 175 |
+
Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
|
| 176 |
+
|
| 177 |
+
| Maxim | F1 | Precision | Recall | AUC-ROC |
|
| 178 |
+
|-------|-----|-----------|--------|---------|
|
| 179 |
+
| Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
|
| 180 |
+
| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
|
| 181 |
+
| Relation | **1.000** | 1.000 | 1.000 | 1.000 |
|
| 182 |
+
| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
|
| 183 |
+
| **Macro Avg** | **0.955** | β | β | β |
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## Architecture & Training
|
| 188 |
+
|
| 189 |
+
- **Base model:** `microsoft/deberta-v3-base` (184M parameters)
|
| 190 |
+
- **Heads:** 4 independent binary classification heads (one per maxim)
|
| 191 |
+
- **Loss:** Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
|
| 192 |
+
- **Calibration:** Per-head temperature scaling (see `temperatures.json`)
|
| 193 |
+
- **Training data:** 4,012 examples (weak supervision + ~1,000 gold labels)
|
| 194 |
+
- **Epochs:** 5 | **LR:** 2e-5 | **Hardware:** Kaggle T4 Γ2, ~2β3 hours
|
| 195 |
+
|
| 196 |
+
**Calibrated temperatures:**
|
| 197 |
+
|
| 198 |
+
| Maxim | Temperature | Effect |
|
| 199 |
+
|-------|-------------|--------|
|
| 200 |
+
| Quantity | 0.90 | Slightly sharper |
|
| 201 |
+
| Quality | 0.55 | Conservative (fewer false positives) |
|
| 202 |
+
| Relation | 0.75 | Balanced |
|
| 203 |
+
| Manner | 0.45 | Most conservative (subjective maxim) |
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## Files
|
| 208 |
+
|
| 209 |
+
| File | Description |
|
| 210 |
+
|------|-------------|
|
| 211 |
+
| `pytorch_model.pt` | Trained model weights |
|
| 212 |
+
| `temperatures.json` | Per-maxim calibration temperatures |
|
| 213 |
+
|
| 214 |
+
---
|
| 215 |
+
|
| 216 |
+
## Limitations & Biases
|
| 217 |
+
|
| 218 |
+
- **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
|
| 219 |
+
- **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
|
| 220 |
+
- **English-Only:** This model is trained and evaluated exclusively on English dialogue.
|
| 221 |
+
- **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
|
| 222 |
+
|
| 223 |
+
---
|
| 224 |
+
|
| 225 |
+
## Citation
|
| 226 |
+
|
| 227 |
```bibtex
|
| 228 |
@article{prabhath2026gricebench,
|
| 229 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
|
|
|
| 232 |
note={Under review, EMNLP 2026}
|
| 233 |
}
|
| 234 |
```
|
| 235 |
+
|
| 236 |
+
---
|
| 237 |
+
|
| 238 |
+
## Related Models
|
| 239 |
+
|
| 240 |
+
| Model | Role | Link |
|
| 241 |
+
|-------|------|------|
|
| 242 |
+
| GriceBench-Detector | Detects violations (this model) | You are here |
|
| 243 |
+
| GriceBench-Repair | Repairs detected violations | [π§ Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
| 244 |
+
| GriceBench-DPO | Generates cooperative responses | [β‘ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) |
|
| 245 |
+
|
| 246 |
+
**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
|
| 247 |
+
|
| 248 |
+
---
|
| 249 |
+
|
| 250 |
+
## Environmental Impact
|
| 251 |
+
|
| 252 |
+
| Aspect | Value |
|
| 253 |
+
|--------|-------|
|
| 254 |
+
| Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) |
|
| 255 |
+
| Training Time | ~3 hours |
|
| 256 |
+
| Estimated Carbon Footprint | ~0.45 kg CO2eq
|