Pushkar27 commited on
Commit
f12cb44
·
verified ·
1 Parent(s): 836b34b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card: GriceBench Violation Detector
2
+
3
+ ## Model Details
4
+ - **Architecture**: DeBERTa-v3-base with 4 binary classification heads.
5
+ - **Parameters**: 184M
6
+ - **Task**: Multi-label classification of Gricean Maxim violations (Quantity, Quality, Relation, Manner).
7
+ - **Language**: English
8
+ - **Release Date**: March 2026
9
+
10
+ ## Performance
11
+ Evaluated on 1,000 held-out Topical-Chat dialogue turns.
12
+
13
+ | Maxim | F1 Score | Precision | Recall | AUC |
14
+ |-------|----------|-----------|--------|-----|
15
+ | Quantity | 1.000 | 1.000 | 1.000 | 1.000 |
16
+ | Quality | 0.928 | 0.866 | 1.000 | 0.999 |
17
+ | Relation | 1.000 | 1.000 | 1.000 | 1.000 |
18
+ | Manner | 0.891 | 0.864 | 0.919 | 0.979 |
19
+ | **Macro Avg** | **0.955** | -- | -- | -- |
20
+
21
+ ## Intended Use
22
+ - **Primary Use**: Detecting cooperative failures in AI dialogue systems.
23
+ - **Out-of-Scope**: Detection of hate speech, toxic content, or PII.
24
+
25
+ ## Training Data
26
+ - **Source**: Topical-Chat dataset (50,000+ turns).
27
+ - **Labeling**: Two-stage pipeline (Weak Supervision -> Gold Fine-tuning).
28
+
29
+ ## Calibration
30
+ The model uses temperature scaling for probability calibration.
31
+ - **Quantity Temp**: 0.90
32
+ - **Quality Temp**: 0.55
33
+ - **Relation Temp**: 0.75
34
+ - **Manner Temp**: 0.45