Pushkar27 commited on
Commit
bff0b88
Β·
1 Parent(s): 28742be

Complete documentation rewrite with full YAML metadata, model-index, ablation results, and training details

Browse files
Files changed (1) hide show
  1. README.md +161 -98
README.md CHANGED
@@ -1,90 +1,80 @@
1
- ο»Ώ---
2
  language:
3
- - en
4
  license: apache-2.0
5
  library_name: peft
6
  tags:
7
- - text-generation
8
- - dialogue
9
- - gricean-maxims
10
- - cooperative-communication
11
- - lora
12
- - dpo
13
- - direct-preference-optimization
14
- - gpt2
15
- - nlp
 
16
  datasets:
17
- - topical_chat
18
  metrics:
19
- - cooperative_rate
20
  pipeline_tag: text-generation
21
  base_model: openai-community/gpt2-medium
22
  model-index:
23
- - name: GriceBench-DPO
24
- results:
25
- - task:
26
- type: text-generation
27
- name: Cooperative Dialogue Generation
28
- dataset:
29
- name: Topical-Chat
30
- type: custom
31
- split: test
32
- metrics:
33
- - type: custom
34
- value: 83.2
35
- name: Standalone Cooperative Rate
36
- - type: custom
37
- value: 95.0
38
- name: Full Pipeline Cooperative Rate
39
- - type: accuracy
40
- value: 75.0
41
- name: DPO Preference Accuracy
42
- ---
 
43
 
44
- <div align="center">
45
 
46
- # ⚑ GriceBench-DPO
47
 
48
- **GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.**
49
 
50
- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
51
- [![PEFT LoRA](https://img.shields.io/badge/πŸ€—-PEFT%20LoRA-yellow)](https://huggingface.co/docs/peft)
52
- [![HuggingFace](https://img.shields.io/badge/πŸ€—-GriceBench-yellow)](https://huggingface.co/Pushkar27)
53
 
54
- **Part of the GriceBench system** β€”
55
- [GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
56
- [πŸ” Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
57
- [πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)
58
 
59
- </div>
60
 
61
- ---
62
 
63
- ## What This Model Does
64
 
65
- GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference
66
- Optimization (DPO) to generate dialogue responses that comply with Gricean
67
- conversational maxims. It is the **generation stage** of the GriceBench pipeline.
68
 
69
- | Metric | Score | Context |
70
- |--------|-------|---------|
71
- | Standalone cooperative rate | 83.2% | Using this model alone |
72
- | Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair |
73
- | DPO preference accuracy | 75.0% | Held-out preference pairs |
74
 
75
- ---
76
 
77
- ## Intended Use
78
 
79
- - **Primary Use:** Generating dialogue responses that aim to follow Gricean maxims.
80
- - **System Integration:** Serves as the first stage in the GriceBench pipeline.
81
- - **Out-of-Scope:** Not intended for high-stakes autonomous decision-making or sensitive medical/legal interactions.
82
 
83
- ---
 
84
 
85
- ## Quick Start
 
 
 
 
 
86
 
87
- ```python
 
88
  from peft import PeftModel, PeftConfig
89
  from transformers import AutoModelForCausalLM, AutoTokenizer
90
  import torch
@@ -92,56 +82,129 @@ import torch
92
  # Load LoRA adapter on GPT-2-medium base
93
  adapter_path = "Pushkar27/GriceBench-DPO"
94
  config = PeftConfig.from_pretrained(adapter_path)
 
 
 
95
  tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
96
- base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
 
 
 
97
  model = PeftModel.from_pretrained(base_model, adapter_path)
98
  model.eval()
99
 
100
- def generate_cooperative_response(context: str) -> str:
101
  prompt = f"Context: {context}\nResponse:"
102
  inputs = tokenizer(prompt, return_tensors="pt")
 
103
  with torch.no_grad():
104
  output_ids = model.generate(
105
- **inputs, max_new_tokens=80, do_sample=True,
106
- temperature=0.85, top_p=0.92, repetition_penalty=1.3,
 
 
 
 
107
  pad_token_id=tokenizer.eos_token_id,
108
  )
109
- return tokenizer.decode(output_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
110
- ```
111
-
112
- ---
113
-
114
- ## Limitations & Biases
115
-
116
- - **Manner Persistence:** DPO alone struggles to eliminate "Manner" violations (ambiguity, verbosity). The full GriceBench pipeline (with Repair) is required for optimal results.
117
- - **Reference Model Dependency:** DPO performance is tied to the quality of the reference model and the preference data used during training.
118
- - **Hallucinations:** The model may still produce factually incorrect or "Quality" violating responses, necessitating post-generation detection.
119
-
120
- ---
121
-
122
- ## Environmental Impact
123
-
124
- - **Hardware Used:** NVIDIA Tesla P100 GPU.
125
- - **Training Time:** ~24 minutes.
126
- - **Estimated Carbon Footprint:** ~0.05 kg CO2eq.
127
-
128
- ---
129
-
130
- ## Architecture & Training
131
-
132
- - **Base model:** `openai-community/gpt2-medium` (355M parameters)
133
- - **Method:** LoRA (rank=128, alpha=256)
134
- - **Data:** 1,970 filtered preference pairs.
135
-
136
- ---
137
-
138
- ## Citation
139
 
140
- ```bibtex
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  @article{prabhath2026gricebench,
142
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
143
  author={Prabhath, Pushkar},
144
  year={2026},
145
  note={Under review, EMNLP 2026}
146
  }
147
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  language:
2
+ en
3
  license: apache-2.0
4
  library_name: peft
5
  tags:
6
+ text-generation
7
+ dialogue
8
+ gricean-maxims
9
+ cooperative-communication
10
+ lora
11
+ dpo
12
+ direct-preference-optimization
13
+ peft
14
+ gpt2
15
+ nlp
16
  datasets:
17
+ topical_chat
18
  metrics:
19
+ cooperative_rate
20
  pipeline_tag: text-generation
21
  base_model: openai-community/gpt2-medium
22
  model-index:
23
+ name: GriceBench-DPO
24
+ results:
25
+ task:
26
+ type: text-generation
27
+ name: Cooperative Dialogue Generation
28
+ dataset:
29
+ name: Topical-Chat (GriceBench test split)
30
+ type: topical_chat
31
+ split: test
32
+ metrics:
33
+ type: cooperative_rate
34
+ value: 0.832
35
+ name: Standalone Cooperative Rate
36
+ type: cooperative_rate
37
+ value: 0.950
38
+ name: Full Pipeline Cooperative Rate
39
+ type: accuracy
40
+ value: 0.750
41
+ name: DPO Preference Accuracy
42
+
43
+ ⚑ GriceBench-DPO
44
 
45
+ GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.
46
 
 
47
 
48
+ License-Apache%202.0-blue.svg
49
 
 
 
 
50
 
51
+ %F0%9F%A4%97-PEFT%20LoRA-yellow
 
 
 
52
 
 
53
 
54
+ %F0%9F%A4%97-GriceBench-yellow
55
 
 
56
 
57
+ Part of the GriceBench system β€”
 
 
58
 
59
+ GitHub |
 
 
 
 
60
 
61
+ πŸ” Detector |
62
 
63
+ πŸ”§ Repair Model
64
 
 
 
 
65
 
66
+ What This Model Does
67
+ GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the generation stage of the GriceBench pipeline, producing responses that are more likely to be cooperative before any post-generation detection and repair is applied.
68
 
69
+ Metric Score Context
70
+ Standalone cooperative rate 83.2% Using this model alone
71
+ Full pipeline cooperative rate 95.0% DPO + Detector + Repair
72
+ DPO preference accuracy 75.0% Held-out preference pairs
73
+ DPO eval loss 0.5595 End of training
74
+ Important: The 95.0% figure requires the full pipeline. This model alone achieves 83.2% β€” still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% β†’ ~10%).
75
 
76
+ Quick Start
77
+ python
78
  from peft import PeftModel, PeftConfig
79
  from transformers import AutoModelForCausalLM, AutoTokenizer
80
  import torch
 
82
  # Load LoRA adapter on GPT-2-medium base
83
  adapter_path = "Pushkar27/GriceBench-DPO"
84
  config = PeftConfig.from_pretrained(adapter_path)
85
+ print(f"Base model: {config.base_model_name_or_path}")
86
+ # β†’ openai-community/gpt2-medium
87
+
88
  tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
89
+ base_model = AutoModelForCausalLM.from_pretrained(
90
+ config.base_model_name_or_path,
91
+ torch_dtype=torch.float32,
92
+ )
93
  model = PeftModel.from_pretrained(base_model, adapter_path)
94
  model.eval()
95
 
96
+ def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str:
97
  prompt = f"Context: {context}\nResponse:"
98
  inputs = tokenizer(prompt, return_tensors="pt")
99
+
100
  with torch.no_grad():
101
  output_ids = model.generate(
102
+ **inputs,
103
+ max_new_tokens=max_new_tokens,
104
+ do_sample=True,
105
+ temperature=0.85,
106
+ top_p=0.92,
107
+ repetition_penalty=1.3,
108
  pad_token_id=tokenizer.eos_token_id,
109
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
+ # Decode only the newly generated tokens
112
+ new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
113
+ return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
114
+
115
+
116
+ # Example
117
+ context = "What do you think about the history of jazz music in New Orleans?"
118
+ print(generate_cooperative_response(context))
119
+ Full Pipeline Usage (Recommended for Best Results)
120
+ python
121
+ # For 95.0% cooperative rate, use all three GriceBench models together:
122
+ # Step 1: Generate with this DPO model
123
+ response = generate_cooperative_response(context)
124
+
125
+ # Step 2: Detect any remaining violations
126
+ # (see GriceBench-Detector model card for detection code)
127
+ result = detect_violations(context, response, evidence)
128
+
129
+ # Step 3: Repair each flagged violation
130
+ for maxim, violated in result["violations"].items():
131
+ if violated and maxim != "relation":
132
+ response = repair_violation(context, response, maxim)
133
+
134
+ # Final response achieves 95.0% cooperative rate across the test set
135
+ print(response)
136
+ Full pipeline implementation: GitHub repository
137
+
138
+ Ablation Results (Why You Need the Full Pipeline)
139
+ Configuration Cooperative Rate Notes
140
+ Baseline (GPT-2, no tuning) 83.8% Reference
141
+ This model (DPO only) 83.2% Relation violations -52pp; Manner unchanged
142
+ Detect + Repair (no DPO) 93.0% Repair handles Manner
143
+ Full System 95.0% DPO + Detect + Repair combined
144
+ Why DPO alone barely moves the overall number: DPO dramatically reduces Relation violations (62% β†’ ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%.
145
+
146
+ Training Details
147
+ Model Architecture
148
+ Parameter Value
149
+ Base model openai-community/gpt2-medium (355M)
150
+ Method LoRA (Low-Rank Adaptation)
151
+ LoRA rank (r) 128
152
+ LoRA alpha (Ξ±) 256
153
+ Target modules q, k, v, o attention projections
154
+ Adapter size ~25 MB
155
+ DPO Training
156
+ Hyperparameter Value
157
+ Algorithm Direct Preference Optimization (DPO)
158
+ DPO Ξ² 0.1
159
+ Learning rate 5e-7
160
+ Batch size 16 (grad accum Γ—8)
161
+ Epochs 3
162
+ Training pairs 1,970 filtered preference pairs
163
+ Hardware Kaggle P100-16GB, ~24 minutes
164
+ DPO Loss (Plain Text)
165
+ The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model:
166
+
167
+ L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x))
168
+
169
+ - log(pi(y_l|x)/pi_ref(y_l|x)) ] )
170
+
171
+ where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response.
172
+
173
+ Training Data
174
+ Source Pairs Description
175
+ Human-labeled 411 Expert-verified cooperative/violating pairs
176
+ Repair-derived ~1,200 (original violation, T5-repaired output)
177
+ Synthetic (LLM) ~1,200 Generated via Groq API (llama-3.3-70b)
178
+ Total (filtered) 1,970 After conflict-detection filtering
179
+ Files
180
+ File Description
181
+ adapter_config.json LoRA configuration (base model, rank, alpha)
182
+ adapter_model.safetensors LoRA weights (~25 MB)
183
+ tokenizer.json GPT-2 tokenizer
184
+ tokenizer_config.json Tokenizer configuration
185
+ special_tokens_map.json Special token mappings
186
+ Limitations
187
+ Manner violations persist standalone: DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result.
188
+ Single domain: Trained and evaluated on Topical-Chat only.
189
+ English only: No multilingual support.
190
+ Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%): The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number.
191
+ Citation
192
+ bibtex
193
  @article{prabhath2026gricebench,
194
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
195
  author={Prabhath, Pushkar},
196
  year={2026},
197
  note={Under review, EMNLP 2026}
198
  }
199
+ Related Models
200
+ Model Role Link
201
+ GriceBench-Detector Detects violations πŸ” Detector
202
+ GriceBench-Repair Repairs violations πŸ”§ Repair
203
+ GriceBench-DPO Generates cooperative responses (this model) You are here
204
+ GitHub: https://github.com/PushkarPrabhath27/Research-Model
205
+
206
+ Environmental Impact
207
+ Aspect Value
208
+ Hardware Used NVIDIA Tesla P100 GPU
209
+ Training Time ~24 minutes
210
+ Estimated Carbon Footprint ~0.05 kg CO2eq