Pushkar27 commited on
Commit
fca1e39
Β·
1 Parent(s): ebfe70b

Complete documentation rewrite with full YAML metadata, model-index, per-maxim F1 scores, and calibration details

Browse files
Files changed (1) hide show
  1. README.md +118 -116
README.md CHANGED
@@ -1,100 +1,89 @@
1
- ο»Ώ---
2
  language:
3
- - en
4
  license: apache-2.0
5
  library_name: transformers
6
  tags:
7
- - text-classification
8
- - multi-label-classification
9
- - dialogue
10
- - conversational-ai
11
- - gricean-maxims
12
- - cooperative-communication
13
- - deberta
14
- - nlp
15
- - pragmatics
16
  datasets:
17
- - topical_chat
18
  metrics:
19
- - f1
20
- - precision
21
- - recall
22
- - roc_auc
23
  pipeline_tag: text-classification
24
  base_model: microsoft/deberta-v3-base
25
  model-index:
26
- - name: GriceBench-Detector
27
- results:
28
- - task:
29
- type: text-classification
30
- name: Multi-Label Gricean Maxim Violation Detection
31
- dataset:
32
- name: Topical-Chat (GriceBench held-out split)
33
- type: custom
34
- split: test
35
- metrics:
36
- - type: f1
37
- value: 0.955
38
- name: Macro F1
39
- - type: f1
40
- value: 1.000
41
- name: Quantity F1
42
- - type: f1
43
- value: 0.928
44
- name: Quality F1
45
- - type: f1
46
- value: 1.000
47
- name: Relation F1
48
- - type: f1
49
- value: 0.891
50
- name: Manner F1
51
  ---
52
 
53
- <div align="center">
54
 
55
- # πŸ” GriceBench-Detector
56
 
57
- **Detects cooperative communication failures in AI dialogue β€” one Gricean maxim at a time.**
58
 
59
- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
60
- [![HuggingFace](https://img.shields.io/badge/πŸ€—-GriceBench-yellow)](https://huggingface.co/Pushkar27)
61
- [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
62
 
63
- **Part of the GriceBench system** β€”
64
- [GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
65
- [πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
66
- [⚑ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
67
 
68
- </div>
69
 
70
- ---
71
 
72
- ## What This Model Does
73
 
74
- GriceBench-Detector identifies which of Paul Grice's four conversational maxims
75
- a dialogue response violates. It returns four independent calibrated violation
76
- probabilities β€” one per maxim β€” enabling targeted, explainable repair downstream.
77
 
78
- | Output | Maxim | Violation Detected | Example |
79
- |--------|-------|-------------------|---------|
80
- | `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
81
- | `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
82
- | `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
83
- | `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
84
 
85
- ---
86
 
87
- ## Intended Use
88
 
89
- - **Primary Use:** Evaluating conversational AI for cooperative behavior.
90
- - **Filtering:** Post-generation filtering to flag responses for repair.
91
- - **Research:** Investigating pragmatics and Gricean maxim violations in LLMs.
92
- - **Out-of-Scope:** Not intended for high-stakes factual verification (e.g., medical/legal) or as a stand-alone truth-teller.
93
 
94
- ---
95
 
96
- ## Quick Start
 
97
 
 
 
 
 
 
 
 
 
98
  ```python
99
  import torch
100
  import torch.nn as nn
@@ -124,6 +113,7 @@ class MaximDetector(nn.Module):
124
  return torch.cat([head(cls) for head in self.classifiers], dim=1)
125
 
126
  # ── Load model and calibration ──────────────────────────────────────────────
 
127
  tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
128
  model = MaximDetector()
129
  state_dict = torch.load("pytorch_model.pt", map_location="cpu")
@@ -150,7 +140,7 @@ def detect_violations(context: str, response: str, evidence: str = "") -> dict:
150
  ]
151
 
152
  with torch.no_grad():
153
- logits = model(**inputs)
154
 
155
  probs, violations = {}, {}
156
  for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
@@ -163,51 +153,51 @@ def detect_violations(context: str, response: str, evidence: str = "") -> dict:
163
  "probabilities": probs,
164
  "is_cooperative": not any(violations.values())
165
  }
166
- ```
167
-
168
- ---
169
-
170
- ## Performance
171
-
172
- Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
173
-
174
- | Maxim | F1 | Precision | Recall | AUC-ROC |
175
- |-------|-----|-----------|--------|---------|
176
- | Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
177
- | Quality | 0.928 | 0.866 | 1.000 | 0.999 |
178
- | Relation | **1.000** | 1.000 | 1.000 | 1.000 |
179
- | Manner | 0.891 | 0.864 | 0.919 | 0.979 |
180
- | **Macro Avg** | **0.955** | β€” | β€” | β€” |
181
-
182
- ---
183
-
184
- ## Limitations & Biases
185
-
186
- - **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
187
- - **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains (e.g., highly technical or medical).
188
- - **English-Only:** This model is trained and evaluated exclusively on English dialogue.
189
- - **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
190
-
191
- ---
192
-
193
- ## Environmental Impact
194
-
195
- - **Hardware Used:** 2x NVIDIA Tesla T4 GPUs (Kaggle).
196
- - **Training Time:** ~3 hours.
197
- - **Estimated Carbon Footprint:** ~0.45 kg CO2eq (based on average TDP and regional carbon intensity).
198
-
199
- ---
200
-
201
- ## Architecture & Training
202
-
203
- - **Base model:** `microsoft/deberta-v3-base` (184M parameters)
204
- - **Heads:** 4 independent binary classification heads.
205
- - **Calibration:** Per-head temperature scaling (see `temperatures.json`).
206
-
207
- ---
208
-
209
- ## Citation
210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
211
  ```bibtex
212
  @article{prabhath2026gricebench,
213
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
@@ -216,3 +206,15 @@ Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injec
216
  note={Under review, EMNLP 2026}
217
  }
218
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
  language:
3
+ - en
4
  license: apache-2.0
5
  library_name: transformers
6
  tags:
7
+ - text-classification
8
+ - multi-label-classification
9
+ - dialogue
10
+ - conversational-ai
11
+ - gricean-maxims
12
+ - cooperative-communication
13
+ - deberta
14
+ - nlp
15
+ - pragmatics
16
  datasets:
17
+ - topical_chat
18
  metrics:
19
+ - f1
20
+ - precision
21
+ - recall
22
+ - roc_auc
23
  pipeline_tag: text-classification
24
  base_model: microsoft/deberta-v3-base
25
  model-index:
26
+ - name: GriceBench-Detector
27
+ results:
28
+ - task:
29
+ type: text-classification
30
+ name: Multi-Label Gricean Maxim Violation Detection
31
+ dataset:
32
+ name: Topical-Chat (GriceBench held-out split, N=1000)
33
+ type: custom
34
+ split: test
35
+ metrics:
36
+ - type: f1
37
+ value: 0.955
38
+ name: Macro F1
39
+ - type: f1
40
+ value: 1.000
41
+ name: Quantity F1
42
+ - type: f1
43
+ value: 0.928
44
+ name: Quality F1
45
+ - type: f1
46
+ value: 1.000
47
+ name: Relation F1
48
+ - type: f1
49
+ value: 0.891
50
+ name: Manner F1
51
  ---
52
 
53
+ πŸ” GriceBench-Detector
54
 
55
+ Detects cooperative communication failures in AI dialogue β€” one Gricean maxim at a time.
56
 
 
57
 
58
+ License-Apache%202.0-blue.svg
 
 
59
 
 
 
 
 
60
 
61
+ %F0%9F%A4%97-GriceBench-yellow
62
 
 
63
 
64
+ python-3.8+-blue.svg
65
 
 
 
 
66
 
67
+ Part of the GriceBench system β€”
 
 
 
 
 
68
 
69
+ GitHub |
70
 
71
+ πŸ”§ Repair Model |
72
 
73
+ ⚑ DPO Generator
 
 
 
74
 
 
75
 
76
+ What This Model Does
77
+ GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β€” one per maxim β€” enabling targeted, explainable repair downstream.
78
 
79
+ Output Maxim Violation Detected Example
80
+ quantity_prob Quantity Response too short (<8 words) or too long (>38 words) "Yes." to a detailed question
81
+ quality_prob Quality Factually inconsistent with knowledge evidence Wrong date, incorrect name
82
+ relation_prob Relation Off-topic response Jazz question answered with classical music facts
83
+ manner_prob Manner Ambiguous, jargon-heavy, or disorganized Unclear pronoun references
84
+ Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate β€” outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
85
+
86
+ Quick Start
87
  ```python
88
  import torch
89
  import torch.nn as nn
 
113
  return torch.cat([head(cls) for head in self.classifiers], dim=1)
114
 
115
  # ── Load model and calibration ──────────────────────────────────────────────
116
+ # Download pytorch_model.pt and temperatures.json from this repo first
117
  tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
118
  model = MaximDetector()
119
  state_dict = torch.load("pytorch_model.pt", map_location="cpu")
 
140
  ]
141
 
142
  with torch.no_grad():
143
+ logits = model(**inputs) # Shape: [1, 4]
144
 
145
  probs, violations = {}, {}
146
  for i, (maxim, temp) in enumerate(zip(maxim_names, temp_values)):
 
153
  "probabilities": probs,
154
  "is_cooperative": not any(violations.values())
155
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
+ # ── Example ─────────────────────────────────────────────────────────────────
158
+ result = detect_violations(
159
+ context="What do you think about the latest developments in AI?",
160
+ response="Yes.", # Too short β€” Quantity violation
161
+ evidence="AI has seen rapid advancement in large language models during 2024-2025."
162
+ )
163
+ print(result)
164
+ # {'violations': {'quantity': True, 'quality': False, 'relation': False, 'manner': False},
165
+ # 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
166
+ # 'is_cooperative': False}
167
+ ```
168
+ Performance
169
+ Evaluated on 1,000 held-out Topical-Chat dialogue turns (500 violation-injected, 500 clean).
170
+
171
+ Maxim F1 Precision Recall AUC-ROC
172
+ Quantity 1.000 1.000 1.000 1.000
173
+ Quality 0.928 0.866 1.000 0.999
174
+ Relation 1.000 1.000 1.000 1.000
175
+ Manner 0.891 0.864 0.919 0.979
176
+ Macro Avg 0.955 β€” β€” β€”
177
+ Architecture & Training
178
+ Base model: microsoft/deberta-v3-base (184M parameters)
179
+ Heads: 4 independent binary classification heads (one per maxim)
180
+ Loss: Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
181
+ Calibration: Per-head temperature scaling (see temperatures.json)
182
+ Training data: 4,012 examples (weak supervision + ~1,000 gold labels)
183
+ Epochs: 5 | LR: 2e-5 | Hardware: Kaggle T4 Γ—2, ~2–3 hours
184
+ Calibrated temperatures:
185
+
186
+ Maxim Temperature Effect
187
+ Quantity 0.90 Slightly sharper
188
+ Quality 0.55 Conservative (fewer false positives)
189
+ Relation 0.75 Balanced
190
+ Manner 0.45 Most conservative (subjective maxim)
191
+ Files
192
+ File Description
193
+ pytorch_model.pt Trained model weights
194
+ temperatures.json Per-maxim calibration temperatures
195
+ Limitations & Biases
196
+ Subjectivity: The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
197
+ Domain Specificity: Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
198
+ English-Only: This model is trained and evaluated exclusively on English dialogue.
199
+ Prompt Sensitivity: Detection results can be sensitive to the formatting of the "Evidence" field.
200
+ Citation
201
  ```bibtex
202
  @article{prabhath2026gricebench,
203
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
 
206
  note={Under review, EMNLP 2026}
207
  }
208
  ```
209
+ Related Models
210
+ Model Role Link
211
+ GriceBench-Detector Detects violations (this model) You are here
212
+ GriceBench-Repair Repairs detected violations πŸ”§ Repair
213
+ GriceBench-DPO Generates cooperative responses ⚑ DPO
214
+ GitHub: https://github.com/PushkarPrabhath27/Research-Model
215
+
216
+ Environmental Impact
217
+ Aspect Value
218
+ Hardware Used 2x NVIDIA Tesla T4 GPUs (Kaggle)
219
+ Training Time ~3 hours
220
+ Estimated Carbon Footprint ~0.45 kg CO2eq