Pushkar27 commited on
Commit
9d98903
Β·
verified Β·
1 Parent(s): f12cb44

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +283 -34
README.md CHANGED
@@ -1,34 +1,283 @@
1
- # Model Card: GriceBench Violation Detector
2
-
3
- ## Model Details
4
- - **Architecture**: DeBERTa-v3-base with 4 binary classification heads.
5
- - **Parameters**: 184M
6
- - **Task**: Multi-label classification of Gricean Maxim violations (Quantity, Quality, Relation, Manner).
7
- - **Language**: English
8
- - **Release Date**: March 2026
9
-
10
- ## Performance
11
- Evaluated on 1,000 held-out Topical-Chat dialogue turns.
12
-
13
- | Maxim | F1 Score | Precision | Recall | AUC |
14
- |-------|----------|-----------|--------|-----|
15
- | Quantity | 1.000 | 1.000 | 1.000 | 1.000 |
16
- | Quality | 0.928 | 0.866 | 1.000 | 0.999 |
17
- | Relation | 1.000 | 1.000 | 1.000 | 1.000 |
18
- | Manner | 0.891 | 0.864 | 0.919 | 0.979 |
19
- | **Macro Avg** | **0.955** | -- | -- | -- |
20
-
21
- ## Intended Use
22
- - **Primary Use**: Detecting cooperative failures in AI dialogue systems.
23
- - **Out-of-Scope**: Detection of hate speech, toxic content, or PII.
24
-
25
- ## Training Data
26
- - **Source**: Topical-Chat dataset (50,000+ turns).
27
- - **Labeling**: Two-stage pipeline (Weak Supervision -> Gold Fine-tuning).
28
-
29
- ## Calibration
30
- The model uses temperature scaling for probability calibration.
31
- - **Quantity Temp**: 0.90
32
- - **Quality Temp**: 0.55
33
- - **Relation Temp**: 0.75
34
- - **Manner Temp**: 0.45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - text-classification
8
+ - multi-label-classification
9
+ - dialogue
10
+ - conversational-ai
11
+ - gricean-maxims
12
+ - cooperative-communication
13
+ - deberta
14
+ - nlp
15
+ - pragmatics
16
+ datasets:
17
+ - topical_chat
18
+ metrics:
19
+ - f1
20
+ - precision
21
+ - recall
22
+ - roc_auc
23
+ pipeline_tag: text-classification
24
+ base_model: microsoft/deberta-v3-base
25
+ model-index:
26
+ - name: GriceBench-Detector
27
+ results:
28
+ - task:
29
+ type: text-classification
30
+ name: Multi-Label Gricean Maxim Violation Detection
31
+ dataset:
32
+ name: Topical-Chat (GriceBench held-out split)
33
+ type: custom
34
+ split: test
35
+ metrics:
36
+ - type: f1
37
+ value: 0.955
38
+ name: Macro F1
39
+ - type: f1
40
+ value: 1.000
41
+ name: Quantity F1
42
+ - type: f1
43
+ value: 0.928
44
+ name: Quality F1
45
+ - type: f1
46
+ value: 1.000
47
+ name: Relation F1
48
+ - type: f1
49
+ value: 0.891
50
+ name: Manner F1
51
+ ---
52
+
53
+ <div align="center">
54
+
55
+ # πŸ—£οΈ GriceBench-Detector
56
+
57
+ **Detects cooperative communication failures in AI dialogue β€” one maxim at a time.**
58
+
59
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
60
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
61
+ [![Transformers](https://img.shields.io/badge/πŸ€—-Transformers-yellow)](https://huggingface.co/docs/transformers)
62
+
63
+ Part of the **GriceBench** system β€” [GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
64
+ [πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
65
+ [⚑ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
66
+
67
+ </div>
68
+
69
+ ---
70
+
71
+ ## What This Model Does
72
+
73
+ GriceBench-Detector identifies which of Paul Grice's four conversational maxims
74
+ a dialogue response violates. It returns four independent violation probabilities β€”
75
+ one per maxim β€” enabling targeted, explainable repair.
76
+
77
+ | Maxim | What It Measures | Example Violation |
78
+ |-------|-----------------|-------------------|
79
+ | **Quantity** | Response informativeness | "Yes." in response to a detailed question |
80
+ | **Quality** | Factual consistency with evidence | Stating an incorrect fact contradicted by the knowledge source |
81
+ | **Relation** | Topical relevance | Responding to "Tell me about jazz" with information about classical music |
82
+ | **Manner** | Clarity and organization | Pronoun ambiguity, jargon, disorganized sentences |
83
+
84
+ ---
85
+
86
+ ## Quick Start
87
+
88
+ ```python
89
+ from transformers import AutoTokenizer, AutoModel
90
+ import torch
91
+ import torch.nn as nn
92
+ import json
93
+
94
+ # ── Load calibration temperatures ──────────────────────────────────────────
95
+ # Download temperatures.json from the model repo
96
+ with open("temperatures.json") as f:
97
+ temperatures = json.load(f) # {"quantity": 0.9, "quality": 0.55, ...}
98
+
99
+ # ── Define model architecture (must match training) ────────────────────────
100
+ class MaximDetector(nn.Module):
101
+ def __init__(self, model_name="microsoft/deberta-v3-base", num_maxims=4):
102
+ super().__init__()
103
+ self.encoder = AutoModel.from_pretrained(model_name)
104
+ hidden = self.encoder.config.hidden_size # 768
105
+ self.classifiers = nn.ModuleList([
106
+ nn.Sequential(
107
+ nn.Dropout(0.15),
108
+ nn.Linear(hidden, hidden // 2), nn.GELU(),
109
+ nn.Dropout(0.15),
110
+ nn.Linear(hidden // 2, hidden // 4), nn.GELU(),
111
+ nn.Dropout(0.15),
112
+ nn.Linear(hidden // 4, 1)
113
+ ) for _ in range(num_maxims)
114
+ ])
115
+
116
+ def forward(self, input_ids, attention_mask):
117
+ outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
118
+ cls = outputs.last_hidden_state[:, 0, :]
119
+ return torch.cat([head(cls) for head in self.classifiers], dim=1)
120
+
121
+ # ── Load model ─────────────────────────────────────────────────────────────
122
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
123
+ model = MaximDetector()
124
+
125
+ # Load weights (download pytorch_model.pt from this repo)
126
+ state_dict = torch.load("pytorch_model.pt", map_location="cpu")
127
+ model.load_state_dict(state_dict)
128
+ model.eval()
129
+
130
+ # ── Run detection ──────────────────────────────────────────────────────────
131
+ def detect_violations(context: str, response: str, evidence: str = "") -> dict:
132
+ input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
133
+ inputs = tokenizer(
134
+ input_text, return_tensors="pt", max_length=512,
135
+ truncation=True, padding=True
136
+ )
137
+
138
+ maxim_names = ["quantity", "quality", "relation", "manner"]
139
+ temp_values = [
140
+ temperatures.get("quantity", 0.9),
141
+ temperatures.get("quality", 0.55),
142
+ temperatures.get("relation", 0.75),
143
+ temperatures.get("manner", 0.45),
144
+ ]
145
+
146
+ with torch.no_grad():
147
+ logits = model(**inputs) # Shape: [1, 4]
148
+
149
+ # Apply temperature scaling and sigmoid
150
+ probs = {}
151
+ violations = {}
152
+ for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
153
+ prob = torch.sigmoid(logits[0, i] / temp).item()
154
+ probs[maxim] = round(prob, 4)
155
+ violations[maxim] = prob > 0.5
156
+
157
+ return {
158
+ "violations": violations,
159
+ "probabilities": probs,
160
+ "is_cooperative": not any(violations.values())
161
+ }
162
+
163
+ # ── Example ────────────────────────────────────────────────────────────────
164
+ result = detect_violations(
165
+ context="What do you think about the latest developments in AI?",
166
+ response="Yes.", # Too short β€” Quantity violation
167
+ evidence="AI has seen rapid advancement in large language models during 2024-2025."
168
+ )
169
+ print(result)
170
+ # {'violations': {'quantity': True, 'quality': False, 'relation': False, 'manner': False},
171
+ # 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
172
+ # 'is_cooperative': False}
173
+ ```
174
+
175
+ ---
176
+
177
+ ## Model Performance
178
+
179
+ Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
180
+
181
+ | Maxim | F1 | Precision | Recall | AUC-ROC |
182
+ |-------|-----|-----------|--------|---------|
183
+ +| Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
184
+ +| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
185
+ +| Relation | **1.000** | 1.000 | 1.000 | 1.000 |
186
+ +| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
187
+ +| **Macro Avg** | **0.955** | β€” | β€” | β€” |
188
+
189
+ **System-level result:** When used in the full GriceBench pipeline (Detect β†’ Repair β†’ Generate),
190
+ the system achieves a **95.0% cooperative rate** β€” outperforming Mistral-7B (89.1%) and
191
+ Qwen2.5-7B (84.2%) despite using a far smaller generator.
192
+
193
+ ---
194
+
195
+ ## Architecture
196
+
197
+ **Base model:** `microsoft/deberta-v3-base` (184M parameters)
198
+
199
+ **Key design choices:**
200
+ - **Four independent binary heads** (not a shared linear layer): each maxim head specializes
201
+ independently, since Quantity violations (length) and Relation violations (semantic relevance)
202
+ are completely different feature distributions.
203
+ - **Focal Loss** (Ξ±=0.25, Ξ³=2.0): down-weights easy negatives to focus training on hard,
204
+ ambiguous boundary cases β€” critical for minority-class violation detection.
205
+ - **Temperature scaling**: post-hoc calibration (one scalar per maxim) ensures output
206
+ probabilities match true violation frequencies on the validation set.
207
+
208
+ **Calibrated temperatures:**
209
+
210
+ | Maxim | Temperature | Effect |
211
+ |-------|-------------|--------|
212
+ | Quantity | 0.90 | Slightly sharper predictions |
213
+ | Quality | 0.55 | More conservative (fewer false positives) |
214
+ | Relation | 0.75 | Balanced |
215
+ | Manner | 0.45 | Most conservative (Manner is inherently ambiguous) |
216
+
217
+ ---
218
+
219
+ ## Training Details
220
+
221
+ | Hyperparameter | Value |
222
+ +|----------------|-------|
223
+ +| Base model | microsoft/deberta-v3-base |
224
+ +| Learning rate | 2e-5 |
225
+ +| Batch size | 16 (effective, with grad accumulation Γ—2) |
226
+ +| Epochs | 5 |
227
+ +| Loss | Focal Loss (Ξ±=0.25, Ξ³=2.0) |
228
+ +| Optimizer | AdamW + weight decay 0.01 |
229
+ +| Scheduler | OneCycleLR |
230
+ +| Hardware | Kaggle T4 Γ—2 |
231
+ +| Training time | ~2-3 hours |
232
+ +| Training examples | 4,012 (weak supervision + ~1,000 gold labels) |
233
+
234
+ **Two-stage labeling:** Weak supervision (50,000+ heuristic-labeled examples) for pre-training,
235
+ followed by gold fine-tuning on ~1,000 human-annotated examples (inter-annotator agreement
236
+ measured via Krippendorff's Ξ±).
237
+
238
+ ---
239
+
240
+ ## Input Format
241
+
242
+ ```
243
+ Context: [multi-turn conversation history]
244
+ Evidence: [knowledge snippet from reading set β€” required for Quality detection]
245
+ Response: [the response being evaluated]
246
+ ```
247
+
248
+ Maximum token length: 512 (response is never truncated β€” context is truncated if needed).
249
+
250
+ ---
251
+
252
+ ## Files in This Repository
253
+
254
+ | File | Description |
255
+ |------|-------------|
256
+ | `pytorch_model.pt` | Trained model weights (2.22 GB) |
257
+ | `temperatures.json` | Per-maxim calibration temperatures |
258
+
259
+ ---
260
+
261
+ ## Citation
262
+
263
+ If you use this model, please cite:
264
+
265
+ ```bibtex
266
+ @article{prabhath2026gricebench,
267
+ title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
268
+ author={Prabhath, Pushkar},
269
+ year={2026}
270
+ }
271
+ ```
272
+
273
+ ---
274
+
275
+ ## Related Models
276
+
277
+ | Model | Role | Link |
278
+ |-------|------|------|
279
+ | GriceBench-Detector | Detects violations (this model) | You are here |
280
+ | GriceBench-Repair | Repairs detected violations | [πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
281
+ | GriceBench-DPO | Generates cooperative responses | [⚑ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO) |
282
+
283
+ **GitHub:** https://github.com/PushkarPrabhath27/Research-Model