Text Generation
PEFT
Safetensors
English
dialogue
gricean-maxims
cooperative-communication
lora
dpo
direct-preference-optimization
gpt2
nlp
Eval Results (legacy)
Instructions to use Pushkar27/GriceBench-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pushkar27/GriceBench-DPO with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("gpt2-medium") model = PeftModel.from_pretrained(base_model, "Pushkar27/GriceBench-DPO") - Notebooks
- Google Colab
- Kaggle
File size: 8,310 Bytes
916949c c85d941 916949c c85d941 6a31a32 916949c c85d941 b0af001 c85d941 916949c c85d941 28742be 916949c b0af001 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 28742be 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c 6a31a32 916949c c85d941 28742be c85d941 bff0b88 916949c bff0b88 c85d941 bff0b88 c85d941 bff0b88 b0af001 c85d941 bff0b88 c85d941 bff0b88 c85d941 6a31a32 bff0b88 b0af001 bff0b88 916949c bff0b88 916949c bff0b88 916949c bff0b88 916949c b0af001 916949c bff0b88 916949c 28742be c85d941 28742be c85d941 916949c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | ---
language:
- en
license: apache-2.0
library_name: peft
tags:
- text-generation
- dialogue
- gricean-maxims
- cooperative-communication
- lora
- dpo
- direct-preference-optimization
- peft
- gpt2
- nlp
datasets:
- topical-chat
metrics:
- cooperative_rate
pipeline_tag: text-generation
base_model: openai-community/gpt2-medium
model-index:
- name: GriceBench-DPO
results:
- task:
type: text-generation
name: Cooperative Dialogue Generation
dataset:
name: Topical-Chat (GriceBench test split)
type: topical-chat
split: test
metrics:
- type: cooperative_rate
value: 0.832
name: Standalone Cooperative Rate
- type: cooperative_rate
value: 0.950
name: Full Pipeline Cooperative Rate
- type: accuracy
value: 0.750
name: DPO Preference Accuracy
---
<div align="center">
# β‘ GriceBench-DPO
**GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.**
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/docs/peft)
[](https://huggingface.co/Pushkar27)
**Part of the GriceBench system** β
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
[π Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)
</div>
---
## What This Model Does
GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the **generation stage** of the GriceBench pipeline, producing responses that are more likely to be cooperative *before* any post-generation detection and repair is applied.
| Metric | Score | Context |
|--------|-------|---------|
| Standalone cooperative rate | 83.2% | Using this model alone |
| Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair |
| DPO preference accuracy | 75.0% | Held-out preference pairs |
| DPO eval loss | 0.5595 | End of training |
> **Important:** The 95.0% figure requires the full pipeline. This model alone achieves 83.2% β still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% β ~10%).
---
## Quick Start
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load LoRA adapter on GPT-2-medium base
adapter_path = "Pushkar27/GriceBench-DPO"
config = PeftConfig.from_pretrained(adapter_path)
print(f"Base model: {config.base_model_name_or_path}")
# β openai-community/gpt2-medium
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float32,
)
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()
def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str:
prompt = f"Context: {context}\nResponse:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.85,
top_p=0.92,
repetition_penalty=1.3,
pad_token_id=tokenizer.eos_token_id,
)
new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
context = "What do you think about the history of jazz music in New Orleans?"
print(generate_cooperative_response(context))
```
---
## Full Pipeline Usage (Recommended for Best Results)
```python
# For 95.0% cooperative rate, use all three GriceBench models together:
# Step 1: Generate with this DPO model
response = generate_cooperative_response(context)
# Step 2: Detect any remaining violations
result = detect_violations(context, response, evidence)
# Step 3: Repair each flagged violation
for maxim, violated in result["violations"].items():
if violated and maxim != "relation":
response = repair_violation(context, response, maxim)
print(response)
```
Full pipeline implementation: [GitHub repository](https://github.com/PushkarPrabhath27/Research-Model)
---
## Ablation Results (Why You Need the Full Pipeline)
| Configuration | Cooperative Rate | Notes |
|---------------|-----------------|-------|
| Baseline (GPT-2, no tuning) | 83.8% | Reference |
| **This model (DPO only)** | **83.2%** | Relation violations -52pp; Manner unchanged |
| Detect + Repair (no DPO) | 93.0% | Repair handles Manner |
| **Full System** | **95.0%** | DPO + Detect + Repair combined |
**Why DPO alone barely moves the overall number:** DPO dramatically reduces Relation violations (62% β ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%.
---
## Training Details
### Model Architecture
| Parameter | Value |
|-----------|-------|
| Base model | `openai-community/gpt2-medium` (355M) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA rank (r) | 128 |
| LoRA alpha (Ξ±) | 256 |
| Target modules | q, k, v, o attention projections |
| Adapter size | ~25 MB |
### DPO Training
| Hyperparameter | Value |
|----------------|-------|
| Algorithm | Direct Preference Optimization (DPO) |
| DPO Ξ² | 0.1 |
| Learning rate | 5e-7 |
| Batch size | 16 (grad accum Γ8) |
| Epochs | 3 |
| Training pairs | 1,970 filtered preference pairs |
| Hardware | Kaggle P100-16GB, ~24 minutes |
### DPO Loss (Plain Text)
The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model:
L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x))
- log(pi(y_l|x)/pi_ref(y_l|x)) ] )
where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response.
### Training Data
| Source | Pairs | Description |
|--------|-------|-------------|
| Human-labeled | 411 | Expert-verified cooperative/violating pairs |
| Repair-derived | ~1,200 | (original violation, T5-repaired output) |
| Synthetic (LLM) | ~1,200 | Generated via Groq API (llama-3.3-70b) |
| **Total (filtered)** | **1,970** | After conflict-detection filtering |
---
## Files
| File | Description |
|------|-------------|
| `adapter_config.json` | LoRA configuration (base model, rank, alpha) |
| `adapter_model.safetensors` | LoRA weights (~25 MB) |
| `tokenizer.json` | GPT-2 tokenizer |
| `tokenizer_config.json` | Tokenizer configuration |
| `special_tokens_map.json` | Special token mappings |
---
## Limitations
- **Manner violations persist standalone:** DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result.
- **Single domain:** Trained and evaluated on Topical-Chat only.
- **English only:** No multilingual support.
- **Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%):** The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number.
---
## Citation
```bibtex
@article{prabhath2026gricebench,
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
author={Prabhath, Pushkar},
year={2026},
note={Under review, EMNLP 2026}
}
```
---
## Related Models
| Model | Role | Link |
|-------|------|------|
| GriceBench-Detector | Detects violations | [π Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
| GriceBench-Repair | Repairs violations | [π§ Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
| GriceBench-DPO | Generates cooperative responses (this model) | You are here |
**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
---
## Environmental Impact
| Aspect | Value |
|--------|-------|
| Hardware Used | NVIDIA Tesla P100 GPU |
| Training Time | ~24 minutes |
| Estimated Carbon Footprint | ~0.05 kg CO2eq |