File size: 8,310 Bytes
916949c
c85d941
916949c
c85d941
6a31a32
 
916949c
 
 
 
 
 
 
 
 
 
c85d941
b0af001
c85d941
916949c
c85d941
 
28742be
916949c
 
 
 
 
 
 
b0af001
916949c
 
 
 
 
 
 
 
 
 
 
 
6a31a32
916949c
6a31a32
916949c
6a31a32
916949c
6a31a32
916949c
 
 
6a31a32
916949c
 
 
 
6a31a32
916949c
6a31a32
916949c
6a31a32
916949c
28742be
916949c
6a31a32
916949c
 
 
 
 
 
6a31a32
916949c
6a31a32
916949c
6a31a32
916949c
6a31a32
916949c
c85d941
 
 
 
28742be
c85d941
 
bff0b88
916949c
bff0b88
c85d941
bff0b88
 
 
 
c85d941
 
 
bff0b88
b0af001
c85d941
bff0b88
c85d941
 
bff0b88
 
 
 
 
 
c85d941
 
6a31a32
bff0b88
 
 
b0af001
bff0b88
 
916949c
 
 
 
 
 
 
bff0b88
 
 
 
 
 
 
 
 
 
 
 
 
916949c
 
 
bff0b88
916949c
bff0b88
916949c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0af001
916949c
bff0b88
 
 
916949c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28742be
c85d941
 
28742be
 
c85d941
916949c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
language:
- en
license: apache-2.0
library_name: peft
tags:
- text-generation
- dialogue
- gricean-maxims
- cooperative-communication
- lora
- dpo
- direct-preference-optimization
- peft
- gpt2
- nlp
datasets:
- topical-chat
metrics:
- cooperative_rate
pipeline_tag: text-generation
base_model: openai-community/gpt2-medium
model-index:
- name: GriceBench-DPO
  results:
  - task:
      type: text-generation
      name: Cooperative Dialogue Generation
    dataset:
      name: Topical-Chat (GriceBench test split)
      type: topical-chat
      split: test
    metrics:
    - type: cooperative_rate
      value: 0.832
      name: Standalone Cooperative Rate
    - type: cooperative_rate
      value: 0.950
      name: Full Pipeline Cooperative Rate
    - type: accuracy
      value: 0.750
      name: DPO Preference Accuracy
---

<div align="center">

# ⚑ GriceBench-DPO

**GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.**

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![PEFT LoRA](https://img.shields.io/badge/πŸ€—-PEFT%20LoRA-yellow)](https://huggingface.co/docs/peft)
[![HuggingFace](https://img.shields.io/badge/πŸ€—-GriceBench-yellow)](https://huggingface.co/Pushkar27)

**Part of the GriceBench system** β€”
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
[πŸ” Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
[πŸ”§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)

</div>

---

## What This Model Does

GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the **generation stage** of the GriceBench pipeline, producing responses that are more likely to be cooperative *before* any post-generation detection and repair is applied.

| Metric | Score | Context |
|--------|-------|---------|
| Standalone cooperative rate | 83.2% | Using this model alone |
| Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair |
| DPO preference accuracy | 75.0% | Held-out preference pairs |
| DPO eval loss | 0.5595 | End of training |

> **Important:** The 95.0% figure requires the full pipeline. This model alone achieves 83.2% β€” still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% β†’ ~10%).

---

## Quick Start

```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load LoRA adapter on GPT-2-medium base
adapter_path = "Pushkar27/GriceBench-DPO"
config = PeftConfig.from_pretrained(adapter_path)
print(f"Base model: {config.base_model_name_or_path}")
# β†’ openai-community/gpt2-medium

tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float32,
)
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()

def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str:
    prompt = f"Context: {context}\nResponse:"
    inputs = tokenizer(prompt, return_tensors="pt")

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.85,
            top_p=0.92,
            repetition_penalty=1.3,
            pad_token_id=tokenizer.eos_token_id,
        )

    new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()


context = "What do you think about the history of jazz music in New Orleans?"
print(generate_cooperative_response(context))
```

---

## Full Pipeline Usage (Recommended for Best Results)

```python
# For 95.0% cooperative rate, use all three GriceBench models together:
# Step 1: Generate with this DPO model
response = generate_cooperative_response(context)

# Step 2: Detect any remaining violations
result = detect_violations(context, response, evidence)

# Step 3: Repair each flagged violation
for maxim, violated in result["violations"].items():
    if violated and maxim != "relation":
        response = repair_violation(context, response, maxim)

print(response)
```

Full pipeline implementation: [GitHub repository](https://github.com/PushkarPrabhath27/Research-Model)

---

## Ablation Results (Why You Need the Full Pipeline)

| Configuration | Cooperative Rate | Notes |
|---------------|-----------------|-------|
| Baseline (GPT-2, no tuning) | 83.8% | Reference |
| **This model (DPO only)** | **83.2%** | Relation violations -52pp; Manner unchanged |
| Detect + Repair (no DPO) | 93.0% | Repair handles Manner |
| **Full System** | **95.0%** | DPO + Detect + Repair combined |

**Why DPO alone barely moves the overall number:** DPO dramatically reduces Relation violations (62% β†’ ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%.

---

## Training Details

### Model Architecture

| Parameter | Value |
|-----------|-------|
| Base model | `openai-community/gpt2-medium` (355M) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA rank (r) | 128 |
| LoRA alpha (Ξ±) | 256 |
| Target modules | q, k, v, o attention projections |
| Adapter size | ~25 MB |

### DPO Training

| Hyperparameter | Value |
|----------------|-------|
| Algorithm | Direct Preference Optimization (DPO) |
| DPO Ξ² | 0.1 |
| Learning rate | 5e-7 |
| Batch size | 16 (grad accum Γ—8) |
| Epochs | 3 |
| Training pairs | 1,970 filtered preference pairs |
| Hardware | Kaggle P100-16GB, ~24 minutes |

### DPO Loss (Plain Text)

The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model:

  L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x))
                                  - log(pi(y_l|x)/pi_ref(y_l|x)) ] )

where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response.

### Training Data

| Source | Pairs | Description |
|--------|-------|-------------|
| Human-labeled | 411 | Expert-verified cooperative/violating pairs |
| Repair-derived | ~1,200 | (original violation, T5-repaired output) |
| Synthetic (LLM) | ~1,200 | Generated via Groq API (llama-3.3-70b) |
| **Total (filtered)** | **1,970** | After conflict-detection filtering |

---

## Files

| File | Description |
|------|-------------|
| `adapter_config.json` | LoRA configuration (base model, rank, alpha) |
| `adapter_model.safetensors` | LoRA weights (~25 MB) |
| `tokenizer.json` | GPT-2 tokenizer |
| `tokenizer_config.json` | Tokenizer configuration |
| `special_tokens_map.json` | Special token mappings |

---

## Limitations

- **Manner violations persist standalone:** DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result.
- **Single domain:** Trained and evaluated on Topical-Chat only.
- **English only:** No multilingual support.
- **Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%):** The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number.

---

## Citation

```bibtex
 @article{prabhath2026gricebench,
  title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
  author={Prabhath, Pushkar},
  year={2026},
  note={Under review, EMNLP 2026}
}
```

---

## Related Models

| Model | Role | Link |
|-------|------|------|
| GriceBench-Detector | Detects violations | [πŸ” Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
| GriceBench-Repair | Repairs violations | [πŸ”§ Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
| GriceBench-DPO | Generates cooperative responses (this model) | You are here |

**GitHub:** https://github.com/PushkarPrabhath27/Research-Model

---

## Environmental Impact

| Aspect | Value |
|--------|-------|
| Hardware Used | NVIDIA Tesla P100 GPU |
| Training Time | ~24 minutes |
| Estimated Carbon Footprint | ~0.05 kg CO2eq