---
language:
- en
license: apache-2.0
library_name: peft
tags:
- text-generation
- dialogue
- gricean-maxims
- cooperative-communication
- lora
- dpo
- direct-preference-optimization
- peft
- gpt2
- nlp
datasets:
- topical-chat
metrics:
- cooperative_rate
pipeline_tag: text-generation
base_model: openai-community/gpt2-medium
model-index:
- name: GriceBench-DPO
results:
- task:
type: text-generation
name: Cooperative Dialogue Generation
dataset:
name: Topical-Chat (GriceBench test split)
type: topical-chat
split: test
metrics:
- type: cooperative_rate
value: 0.832
name: Standalone Cooperative Rate
- type: cooperative_rate
value: 0.950
name: Full Pipeline Cooperative Rate
- type: accuracy
value: 0.750
name: DPO Preference Accuracy
---
# ⚡ GriceBench-DPO
**GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.**
[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/docs/peft)
[](https://huggingface.co/Pushkar27)
**Part of the GriceBench system** —
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
[🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
[🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)
---
## What This Model Does
GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the **generation stage** of the GriceBench pipeline, producing responses that are more likely to be cooperative *before* any post-generation detection and repair is applied.
| Metric | Score | Context |
|--------|-------|---------|
| Standalone cooperative rate | 83.2% | Using this model alone |
| Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair |
| DPO preference accuracy | 75.0% | Held-out preference pairs |
| DPO eval loss | 0.5595 | End of training |
> **Important:** The 95.0% figure requires the full pipeline. This model alone achieves 83.2% — still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% → ~10%).
---
## Quick Start
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load LoRA adapter on GPT-2-medium base
adapter_path = "Pushkar27/GriceBench-DPO"
config = PeftConfig.from_pretrained(adapter_path)
print(f"Base model: {config.base_model_name_or_path}")
# → openai-community/gpt2-medium
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float32,
)
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()
def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str:
prompt = f"Context: {context}\nResponse:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.85,
top_p=0.92,
repetition_penalty=1.3,
pad_token_id=tokenizer.eos_token_id,
)
new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
context = "What do you think about the history of jazz music in New Orleans?"
print(generate_cooperative_response(context))
```
---
## Full Pipeline Usage (Recommended for Best Results)
```python
# For 95.0% cooperative rate, use all three GriceBench models together:
# Step 1: Generate with this DPO model
response = generate_cooperative_response(context)
# Step 2: Detect any remaining violations
result = detect_violations(context, response, evidence)
# Step 3: Repair each flagged violation
for maxim, violated in result["violations"].items():
if violated and maxim != "relation":
response = repair_violation(context, response, maxim)
print(response)
```
Full pipeline implementation: [GitHub repository](https://github.com/PushkarPrabhath27/Research-Model)
---
## Ablation Results (Why You Need the Full Pipeline)
| Configuration | Cooperative Rate | Notes |
|---------------|-----------------|-------|
| Baseline (GPT-2, no tuning) | 83.8% | Reference |
| **This model (DPO only)** | **83.2%** | Relation violations -52pp; Manner unchanged |
| Detect + Repair (no DPO) | 93.0% | Repair handles Manner |
| **Full System** | **95.0%** | DPO + Detect + Repair combined |
**Why DPO alone barely moves the overall number:** DPO dramatically reduces Relation violations (62% → ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%.
---
## Training Details
### Model Architecture
| Parameter | Value |
|-----------|-------|
| Base model | `openai-community/gpt2-medium` (355M) |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA rank (r) | 128 |
| LoRA alpha (α) | 256 |
| Target modules | q, k, v, o attention projections |
| Adapter size | ~25 MB |
### DPO Training
| Hyperparameter | Value |
|----------------|-------|
| Algorithm | Direct Preference Optimization (DPO) |
| DPO β | 0.1 |
| Learning rate | 5e-7 |
| Batch size | 16 (grad accum ×8) |
| Epochs | 3 |
| Training pairs | 1,970 filtered preference pairs |
| Hardware | Kaggle P100-16GB, ~24 minutes |
### DPO Loss (Plain Text)
The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model:
L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x))
- log(pi(y_l|x)/pi_ref(y_l|x)) ] )
where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response.
### Training Data
| Source | Pairs | Description |
|--------|-------|-------------|
| Human-labeled | 411 | Expert-verified cooperative/violating pairs |
| Repair-derived | ~1,200 | (original violation, T5-repaired output) |
| Synthetic (LLM) | ~1,200 | Generated via Groq API (llama-3.3-70b) |
| **Total (filtered)** | **1,970** | After conflict-detection filtering |
---
## Files
| File | Description |
|------|-------------|
| `adapter_config.json` | LoRA configuration (base model, rank, alpha) |
| `adapter_model.safetensors` | LoRA weights (~25 MB) |
| `tokenizer.json` | GPT-2 tokenizer |
| `tokenizer_config.json` | Tokenizer configuration |
| `special_tokens_map.json` | Special token mappings |
---
## Limitations
- **Manner violations persist standalone:** DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result.
- **Single domain:** Trained and evaluated on Topical-Chat only.
- **English only:** No multilingual support.
- **Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%):** The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number.
---
## Citation
```bibtex
@article{prabhath2026gricebench,
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
author={Prabhath, Pushkar},
year={2026},
note={Under review, EMNLP 2026}
}
```
---
## Related Models
| Model | Role | Link |
|-------|------|------|
| GriceBench-Detector | Detects violations | [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
| GriceBench-Repair | Repairs violations | [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) |
| GriceBench-DPO | Generates cooperative responses (this model) | You are here |
**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
---
## Environmental Impact
| Aspect | Value |
|--------|-------|
| Hardware Used | NVIDIA Tesla P100 GPU |
| Training Time | ~24 minutes |
| Estimated Carbon Footprint | ~0.05 kg CO2eq