Pushkar27 commited on
Commit
d923478
Β·
1 Parent(s): fca1e39

Fix YAML metadata - proper list syntax with - prefix, complete model-index with evaluation results

Browse files
Files changed (1) hide show
  1. README.md +104 -68
README.md CHANGED
@@ -30,7 +30,7 @@ model-index:
30
  name: Multi-Label Gricean Maxim Violation Detection
31
  dataset:
32
  name: Topical-Chat (GriceBench held-out split, N=1000)
33
- type: custom
34
  split: test
35
  metrics:
36
  - type: f1
@@ -50,40 +50,42 @@ model-index:
50
  name: Manner F1
51
  ---
52
 
53
- πŸ” GriceBench-Detector
54
 
55
- Detects cooperative communication failures in AI dialogue β€” one Gricean maxim at a time.
56
 
 
57
 
58
- License-Apache%202.0-blue.svg
 
 
59
 
 
 
 
 
60
 
61
- %F0%9F%A4%97-GriceBench-yellow
62
 
 
63
 
64
- python-3.8+-blue.svg
65
-
66
-
67
- Part of the GriceBench system β€”
68
-
69
- GitHub |
70
 
71
- πŸ”§ Repair Model |
72
 
73
- ⚑ DPO Generator
 
 
 
 
 
74
 
 
75
 
76
- What This Model Does
77
- GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β€” one per maxim β€” enabling targeted, explainable repair downstream.
78
 
79
- Output Maxim Violation Detected Example
80
- quantity_prob Quantity Response too short (<8 words) or too long (>38 words) "Yes." to a detailed question
81
- quality_prob Quality Factually inconsistent with knowledge evidence Wrong date, incorrect name
82
- relation_prob Relation Off-topic response Jazz question answered with classical music facts
83
- manner_prob Manner Ambiguous, jargon-heavy, or disorganized Unclear pronoun references
84
- Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate β€” outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
85
 
86
- Quick Start
87
  ```python
88
  import torch
89
  import torch.nn as nn
@@ -125,7 +127,7 @@ with open("temperatures.json") as f:
125
 
126
  # ── Detect violations ───────────────────────────────────────────────────────
127
  def detect_violations(context: str, response: str, evidence: str = "") -> dict:
128
- input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
129
  inputs = tokenizer(
130
  input_text, return_tensors="pt",
131
  max_length=512, truncation=True, padding=True
@@ -165,39 +167,63 @@ print(result)
165
  # 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
166
  # 'is_cooperative': False}
167
  ```
168
- Performance
169
- Evaluated on 1,000 held-out Topical-Chat dialogue turns (500 violation-injected, 500 clean).
170
-
171
- Maxim F1 Precision Recall AUC-ROC
172
- Quantity 1.000 1.000 1.000 1.000
173
- Quality 0.928 0.866 1.000 0.999
174
- Relation 1.000 1.000 1.000 1.000
175
- Manner 0.891 0.864 0.919 0.979
176
- Macro Avg 0.955 β€” β€” β€”
177
- Architecture & Training
178
- Base model: microsoft/deberta-v3-base (184M parameters)
179
- Heads: 4 independent binary classification heads (one per maxim)
180
- Loss: Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
181
- Calibration: Per-head temperature scaling (see temperatures.json)
182
- Training data: 4,012 examples (weak supervision + ~1,000 gold labels)
183
- Epochs: 5 | LR: 2e-5 | Hardware: Kaggle T4 Γ—2, ~2–3 hours
184
- Calibrated temperatures:
185
-
186
- Maxim Temperature Effect
187
- Quantity 0.90 Slightly sharper
188
- Quality 0.55 Conservative (fewer false positives)
189
- Relation 0.75 Balanced
190
- Manner 0.45 Most conservative (subjective maxim)
191
- Files
192
- File Description
193
- pytorch_model.pt Trained model weights
194
- temperatures.json Per-maxim calibration temperatures
195
- Limitations & Biases
196
- Subjectivity: The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
197
- Domain Specificity: Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
198
- English-Only: This model is trained and evaluated exclusively on English dialogue.
199
- Prompt Sensitivity: Detection results can be sensitive to the formatting of the "Evidence" field.
200
- Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  ```bibtex
202
  @article{prabhath2026gricebench,
203
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
@@ -206,15 +232,25 @@ Citation
206
  note={Under review, EMNLP 2026}
207
  }
208
  ```
209
- Related Models
210
- Model Role Link
211
- GriceBench-Detector Detects violations (this model) You are here
212
- GriceBench-Repair Repairs detected violations πŸ”§ Repair
213
- GriceBench-DPO Generates cooperative responses ⚑ DPO
214
- GitHub: https://github.com/PushkarPrabhath27/Research-Model
215
-
216
- Environmental Impact
217
- Aspect Value
218
- Hardware Used 2x NVIDIA Tesla T4 GPUs (Kaggle)
219
- Training Time ~3 hours
220
- Estimated Carbon Footprint ~0.45 kg CO2eq
 
 
 
 
 
 
 
 
 
 
 
30
  name: Multi-Label Gricean Maxim Violation Detection
31
  dataset:
32
  name: Topical-Chat (GriceBench held-out split, N=1000)
33
+ type: topical_chat
34
  split: test
35
  metrics:
36
  - type: f1
 
50
  name: Manner F1
51
  ---
52
 
53
+ <div align="center">
54
 
55
+ # πŸ” GriceBench-Detector
56
 
57
+ **Detects cooperative communication failures in AI dialogue β€” one Gricean maxim at a time.**
58
 
59
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
60
+ [![HuggingFace](https://img.shields.io/badge/πŸ€—-GriceBench-yellow)](https://huggingface.co/Pushkar27)
61
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
62
 
63
+ **Part of the GriceBench system** β€”
64
+ [GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
65
+ [πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
66
+ [⚑ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
67
 
68
+ </div>
69
 
70
+ ---
71
 
72
+ ## What This Model Does
 
 
 
 
 
73
 
74
+ GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities β€” one per maxim β€” enabling targeted, explainable repair downstream.
75
 
76
+ | Output | Maxim | Violation Detected | Example |
77
+ |--------|-------|-------------------|---------|
78
+ | `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
79
+ | `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
80
+ | `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
81
+ | `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
82
 
83
+ Used in the full GriceBench pipeline, this detector helps achieve a **95.0% cooperative rate** β€” outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
84
 
85
+ ---
 
86
 
87
+ ## Quick Start
 
 
 
 
 
88
 
 
89
  ```python
90
  import torch
91
  import torch.nn as nn
 
127
 
128
  # ── Detect violations ───────────────────────────────────────────────────────
129
  def detect_violations(context: str, response: str, evidence: str = "") -> dict:
130
+ input_text = f"Context: {context}\n\nEvidence: {evidence}\n\nResponse: {response}"
131
  inputs = tokenizer(
132
  input_text, return_tensors="pt",
133
  max_length=512, truncation=True, padding=True
 
167
  # 'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
168
  # 'is_cooperative': False}
169
  ```
170
+
171
+ ---
172
+
173
+ ## Performance
174
+
175
+ Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
176
+
177
+ | Maxim | F1 | Precision | Recall | AUC-ROC |
178
+ |-------|-----|-----------|--------|---------|
179
+ | Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
180
+ | Quality | 0.928 | 0.866 | 1.000 | 0.999 |
181
+ | Relation | **1.000** | 1.000 | 1.000 | 1.000 |
182
+ | Manner | 0.891 | 0.864 | 0.919 | 0.979 |
183
+ | **Macro Avg** | **0.955** | β€” | β€” | β€” |
184
+
185
+ ---
186
+
187
+ ## Architecture & Training
188
+
189
+ - **Base model:** `microsoft/deberta-v3-base` (184M parameters)
190
+ - **Heads:** 4 independent binary classification heads (one per maxim)
191
+ - **Loss:** Focal Loss (Ξ±=0.25, Ξ³=2.0) for class imbalance
192
+ - **Calibration:** Per-head temperature scaling (see `temperatures.json`)
193
+ - **Training data:** 4,012 examples (weak supervision + ~1,000 gold labels)
194
+ - **Epochs:** 5 | **LR:** 2e-5 | **Hardware:** Kaggle T4 Γ—2, ~2–3 hours
195
+
196
+ **Calibrated temperatures:**
197
+
198
+ | Maxim | Temperature | Effect |
199
+ |-------|-------------|--------|
200
+ | Quantity | 0.90 | Slightly sharper |
201
+ | Quality | 0.55 | Conservative (fewer false positives) |
202
+ | Relation | 0.75 | Balanced |
203
+ | Manner | 0.45 | Most conservative (subjective maxim) |
204
+
205
+ ---
206
+
207
+ ## Files
208
+
209
+ | File | Description |
210
+ |------|-------------|
211
+ | `pytorch_model.pt` | Trained model weights |
212
+ | `temperatures.json` | Per-maxim calibration temperatures |
213
+
214
+ ---
215
+
216
+ ## Limitations & Biases
217
+
218
+ - **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
219
+ - **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
220
+ - **English-Only:** This model is trained and evaluated exclusively on English dialogue.
221
+ - **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
222
+
223
+ ---
224
+
225
+ ## Citation
226
+
227
  ```bibtex
228
  @article{prabhath2026gricebench,
229
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
 
232
  note={Under review, EMNLP 2026}
233
  }
234
  ```
235
+
236
+ ---
237
+
238
+ ## Related Models
239
+
240
+ | Model | Role | Link |
241
+ |-------|------|------|
242
+ | GriceBench-Detector | Detects violations (this model) | You are here |
243
+ | GriceBench-Repair | Repairs detected violations | [πŸ”§ Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
244
+ | GriceBench-DPO | Generates cooperative responses | [⚑ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) |
245
+
246
+ **GitHub:** https://github.com/PushkarPrabhath27/Research-Model
247
+
248
+ ---
249
+
250
+ ## Environmental Impact
251
+
252
+ | Aspect | Value |
253
+ |--------|-------|
254
+ | Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) |
255
+ | Training Time | ~3 hours |
256
+ | Estimated Carbon Footprint | ~0.45 kg CO2eq