Pushkar27
/

GriceBench-Detector

@@ -30,7 +30,7 @@ model-index:
       name: Multi-Label Gricean Maxim Violation Detection
     dataset:
       name: Topical-Chat (GriceBench held-out split, N=1000)
-      type: custom
       split: test
     metrics:
     - type: f1
@@ -50,40 +50,42 @@ model-index:
       name: Manner F1
 ---
-🔍 GriceBench-Detector
-Detects cooperative communication failures in AI dialogue — one Gricean maxim at a time.
-License-Apache%202.0-blue.svg
-%F0%9F%A4%97-GriceBench-yellow
-python-3.8+-blue.svg
-Part of the GriceBench system —
-GitHub |
-🔧 Repair Model |
-⚡ DPO Generator
-What This Model Does
-GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities — one per maxim — enabling targeted, explainable repair downstream.
-Output	Maxim	Violation Detected	Example
-quantity_prob	Quantity	Response too short (<8 words) or too long (>38 words)	"Yes." to a detailed question
-quality_prob	Quality	Factually inconsistent with knowledge evidence	Wrong date, incorrect name
-relation_prob	Relation	Off-topic response	Jazz question answered with classical music facts
-manner_prob	Manner	Ambiguous, jargon-heavy, or disorganized	Unclear pronoun references
-Used in the full GriceBench pipeline, this detector helps achieve a 95.0% cooperative rate — outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
-Quick Start
 ```python
 import torch
 import torch.nn as nn
@@ -125,7 +127,7 @@ with open("temperatures.json") as f:
 # ── Detect violations ───────────────────────────────────────────────────────
 def detect_violations(context: str, response: str, evidence: str = "") -> dict:
-    input_text = f"Context: {context}\nEvidence: {evidence}\nResponse: {response}"
     inputs = tokenizer(
         input_text, return_tensors="pt",
         max_length=512, truncation=True, padding=True
@@ -165,39 +167,63 @@ print(result)
 #  'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
 #  'is_cooperative': False}
 ```
-Performance
-Evaluated on 1,000 held-out Topical-Chat dialogue turns (500 violation-injected, 500 clean).
-Maxim	F1	Precision	Recall	AUC-ROC
-Quantity	1.000	1.000	1.000	1.000
-Quality	0.928	0.866	1.000	0.999
-Relation	1.000	1.000	1.000	1.000
-Manner	0.891	0.864	0.919	0.979
-Macro Avg	0.955	—	—	—
-Architecture & Training
-Base model: microsoft/deberta-v3-base (184M parameters)
-Heads: 4 independent binary classification heads (one per maxim)
-Loss: Focal Loss (α=0.25, γ=2.0) for class imbalance
-Calibration: Per-head temperature scaling (see temperatures.json)
-Training data: 4,012 examples (weak supervision + ~1,000 gold labels)
-Epochs: 5 | LR: 2e-5 | Hardware: Kaggle T4 ×2, ~2–3 hours
-Calibrated temperatures:
-Maxim	Temperature	Effect
-Quantity	0.90	Slightly sharper
-Quality	0.55	Conservative (fewer false positives)
-Relation	0.75	Balanced
-Manner	0.45	Most conservative (subjective maxim)
-Files
-File	Description
-pytorch_model.pt	Trained model weights
-temperatures.json	Per-maxim calibration temperatures
-Limitations & Biases
-Subjectivity: The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
-Domain Specificity: Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
-English-Only: This model is trained and evaluated exclusively on English dialogue.
-Prompt Sensitivity: Detection results can be sensitive to the formatting of the "Evidence" field.
-Citation
 ```bibtex
  @article{prabhath2026gricebench,
   title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
@@ -206,15 +232,25 @@ Citation
   note={Under review, EMNLP 2026}
 }
 ```
-Related Models
-Model	Role	Link
-GriceBench-Detector	Detects violations (this model)	You are here
-GriceBench-Repair	Repairs detected violations	🔧 Repair
-GriceBench-DPO	Generates cooperative responses	⚡ DPO
-GitHub: https://github.com/PushkarPrabhath27/Research-Model
-Environmental Impact
-Aspect	Value
-Hardware Used	2x NVIDIA Tesla T4 GPUs (Kaggle)
-Training Time	~3 hours
-Estimated Carbon Footprint	~0.45 kg CO2eq

       name: Multi-Label Gricean Maxim Violation Detection
     dataset:
       name: Topical-Chat (GriceBench held-out split, N=1000)
+      type: topical_chat
       split: test
     metrics:
     - type: f1
       name: Manner F1
 ---
+<div align="center">
+# 🔍 GriceBench-Detector
+**Detects cooperative communication failures in AI dialogue — one Gricean maxim at a time.**
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![HuggingFace](https://img.shields.io/badge/🤗-GriceBench-yellow)](https://huggingface.co/Pushkar27)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+**Part of the GriceBench system** —
+[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
+[🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
+[⚡ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
+</div>
+---
+## What This Model Does
+GriceBench-Detector identifies which of Paul Grice's four conversational maxims a dialogue response violates. It returns four independent calibrated violation probabilities — one per maxim — enabling targeted, explainable repair downstream.
+| Output | Maxim | Violation Detected | Example |
+|--------|-------|-------------------|---------|
+| `quantity_prob` | Quantity | Response too short (<8 words) or too long (>38 words) | "Yes." to a detailed question |
+| `quality_prob` | Quality | Factually inconsistent with knowledge evidence | Wrong date, incorrect name |
+| `relation_prob` | Relation | Off-topic response | Jazz question answered with classical music facts |
+| `manner_prob` | Manner | Ambiguous, jargon-heavy, or disorganized | Unclear pronoun references |
+Used in the full GriceBench pipeline, this detector helps achieve a **95.0% cooperative rate** — outperforming Mistral-7B-Instruct (89.1%) and Qwen2.5-7B-Instruct (84.2%).
+---
+## Quick Start
 ```python
 import torch
 import torch.nn as nn
 # ── Detect violations ───────────────────────────────────────────────────────
 def detect_violations(context: str, response: str, evidence: str = "") -> dict:
+    input_text = f"Context: {context}\n\nEvidence: {evidence}\n\nResponse: {response}"
     inputs = tokenizer(
         input_text, return_tensors="pt",
         max_length=512, truncation=True, padding=True
 #  'probabilities': {'quantity': 0.97, 'quality': 0.02, 'relation': 0.03, 'manner': 0.11},
 #  'is_cooperative': False}
 ```
+---
+## Performance
+Evaluated on **1,000 held-out Topical-Chat dialogue turns** (500 violation-injected, 500 clean).
+| Maxim | F1 | Precision | Recall | AUC-ROC |
+|-------|-----|-----------|--------|---------|
+| Quantity | **1.000** | 1.000 | 1.000 | 1.000 |
+| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
+| Relation | **1.000** | 1.000 | 1.000 | 1.000 |
+| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
+| **Macro Avg** | **0.955** | — | — | — |
+---
+## Architecture & Training
+- **Base model:** `microsoft/deberta-v3-base` (184M parameters)
+- **Heads:** 4 independent binary classification heads (one per maxim)
+- **Loss:** Focal Loss (α=0.25, γ=2.0) for class imbalance
+- **Calibration:** Per-head temperature scaling (see `temperatures.json`)
+- **Training data:** 4,012 examples (weak supervision + ~1,000 gold labels)
+- **Epochs:** 5 | **LR:** 2e-5 | **Hardware:** Kaggle T4 ×2, ~2–3 hours
+**Calibrated temperatures:**
+| Maxim | Temperature | Effect |
+|-------|-------------|--------|
+| Quantity | 0.90 | Slightly sharper |
+| Quality | 0.55 | Conservative (fewer false positives) |
+| Relation | 0.75 | Balanced |
+| Manner | 0.45 | Most conservative (subjective maxim) |
+---
+## Files
+| File | Description |
+|------|-------------|
+| `pytorch_model.pt` | Trained model weights |
+| `temperatures.json` | Per-maxim calibration temperatures |
+---
+## Limitations & Biases
+- **Subjectivity:** The "Manner" maxim is inherently subjective; detection reflects the labels in the training set.
+- **Domain Specificity:** Performance is optimized for general knowledge dialogue (Topical-Chat). Results may vary in specialized domains.
+- **English-Only:** This model is trained and evaluated exclusively on English dialogue.
+- **Prompt Sensitivity:** Detection results can be sensitive to the formatting of the "Evidence" field.
+---
+## Citation
 ```bibtex
  @article{prabhath2026gricebench,
   title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
   note={Under review, EMNLP 2026}
 }
 ```
+---
+## Related Models
+| Model | Role | Link |
+|-------|------|------|
+| GriceBench-Detector | Detects violations (this model) | You are here |
+| GriceBench-Repair | Repairs detected violations | [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
+| GriceBench-DPO | Generates cooperative responses | [⚡ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) |
+**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
+---
+## Environmental Impact
+| Aspect | Value |
+|--------|-------|
+| Hardware Used | 2x NVIDIA Tesla T4 GPUs (Kaggle) |
+| Training Time | ~3 hours |
+| Estimated Carbon Footprint | ~0.45 kg CO2eq