Pushkar27
/

GriceBench-Detector

Text Classification

multi-label-classification

conversational-ai

cooperative-communication

Eval Results (legacy)

Model card Files Files and versions

Pushkar27 commited on 29 days ago

Commit

f12cb44

·

verified ·

1 Parent(s): 836b34b

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+# Model Card: GriceBench Violation Detector
+## Model Details
+- **Architecture**: DeBERTa-v3-base with 4 binary classification heads.
+- **Parameters**: 184M
+- **Task**: Multi-label classification of Gricean Maxim violations (Quantity, Quality, Relation, Manner).
+- **Language**: English
+- **Release Date**: March 2026
+## Performance
+Evaluated on 1,000 held-out Topical-Chat dialogue turns.
+| Maxim | F1 Score | Precision | Recall | AUC |
+|-------|----------|-----------|--------|-----|
+| Quantity | 1.000 | 1.000 | 1.000 | 1.000 |
+| Quality | 0.928 | 0.866 | 1.000 | 0.999 |
+| Relation | 1.000 | 1.000 | 1.000 | 1.000 |
+| Manner | 0.891 | 0.864 | 0.919 | 0.979 |
+| **Macro Avg** | **0.955** | -- | -- | -- |
+## Intended Use
+- **Primary Use**: Detecting cooperative failures in AI dialogue systems.
+- **Out-of-Scope**: Detection of hate speech, toxic content, or PII.
+## Training Data
+- **Source**: Topical-Chat dataset (50,000+ turns).
+- **Labeling**: Two-stage pipeline (Weak Supervision -> Gold Fine-tuning).
+## Calibration
+The model uses temperature scaling for probability calibration.
+- **Quantity Temp**: 0.90
+- **Quality Temp**: 0.55
+- **Relation Temp**: 0.75
+- **Manner Temp**: 0.45