Text Generation
PEFT
Safetensors
English
dialogue
gricean-maxims
cooperative-communication
lora
dpo
direct-preference-optimization
gpt2
nlp
Eval Results (legacy)
Instructions to use Pushkar27/GriceBench-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pushkar27/GriceBench-DPO with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("gpt2-medium") model = PeftModel.from_pretrained(base_model, "Pushkar27/GriceBench-DPO") - Notebooks
- Google Colab
- Kaggle
Complete documentation rewrite with full YAML metadata, model-index, ablation results, and training details
Browse files
README.md
CHANGED
|
@@ -1,90 +1,80 @@
|
|
| 1 |
-
ο»Ώ---
|
| 2 |
language:
|
| 3 |
-
|
| 4 |
license: apache-2.0
|
| 5 |
library_name: peft
|
| 6 |
tags:
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
|
|
|
| 16 |
datasets:
|
| 17 |
-
|
| 18 |
metrics:
|
| 19 |
-
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
base_model: openai-community/gpt2-medium
|
| 22 |
model-index:
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
|
|
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
# β‘ GriceBench-DPO
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
[](https://opensource.org/licenses/Apache-2.0)
|
| 51 |
-
[](https://huggingface.co/docs/peft)
|
| 52 |
-
[](https://huggingface.co/Pushkar27)
|
| 53 |
|
| 54 |
-
|
| 55 |
-
[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
|
| 56 |
-
[π Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
|
| 57 |
-
[π§ Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)
|
| 58 |
|
| 59 |
-
</div>
|
| 60 |
|
| 61 |
-
--
|
| 62 |
|
| 63 |
-
## What This Model Does
|
| 64 |
|
| 65 |
-
|
| 66 |
-
Optimization (DPO) to generate dialogue responses that comply with Gricean
|
| 67 |
-
conversational maxims. It is the **generation stage** of the GriceBench pipeline.
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|--------|-------|---------|
|
| 71 |
-
| Standalone cooperative rate | 83.2% | Using this model alone |
|
| 72 |
-
| Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair |
|
| 73 |
-
| DPO preference accuracy | 75.0% | Held-out preference pairs |
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
- **Primary Use:** Generating dialogue responses that aim to follow Gricean maxims.
|
| 80 |
-
- **System Integration:** Serves as the first stage in the GriceBench pipeline.
|
| 81 |
-
- **Out-of-Scope:** Not intended for high-stakes autonomous decision-making or sensitive medical/legal interactions.
|
| 82 |
|
| 83 |
-
|
|
|
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
|
|
|
|
| 88 |
from peft import PeftModel, PeftConfig
|
| 89 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 90 |
import torch
|
|
@@ -92,56 +82,129 @@ import torch
|
|
| 92 |
# Load LoRA adapter on GPT-2-medium base
|
| 93 |
adapter_path = "Pushkar27/GriceBench-DPO"
|
| 94 |
config = PeftConfig.from_pretrained(adapter_path)
|
|
|
|
|
|
|
|
|
|
| 95 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 96 |
-
base_model = AutoModelForCausalLM.from_pretrained(
|
|
|
|
|
|
|
|
|
|
| 97 |
model = PeftModel.from_pretrained(base_model, adapter_path)
|
| 98 |
model.eval()
|
| 99 |
|
| 100 |
-
def generate_cooperative_response(context: str) -> str:
|
| 101 |
prompt = f"Context: {context}\nResponse:"
|
| 102 |
inputs = tokenizer(prompt, return_tensors="pt")
|
|
|
|
| 103 |
with torch.no_grad():
|
| 104 |
output_ids = model.generate(
|
| 105 |
-
**inputs,
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
pad_token_id=tokenizer.eos_token_id,
|
| 108 |
)
|
| 109 |
-
return tokenizer.decode(output_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
---
|
| 113 |
-
|
| 114 |
-
## Limitations & Biases
|
| 115 |
-
|
| 116 |
-
- **Manner Persistence:** DPO alone struggles to eliminate "Manner" violations (ambiguity, verbosity). The full GriceBench pipeline (with Repair) is required for optimal results.
|
| 117 |
-
- **Reference Model Dependency:** DPO performance is tied to the quality of the reference model and the preference data used during training.
|
| 118 |
-
- **Hallucinations:** The model may still produce factually incorrect or "Quality" violating responses, necessitating post-generation detection.
|
| 119 |
-
|
| 120 |
-
---
|
| 121 |
-
|
| 122 |
-
## Environmental Impact
|
| 123 |
-
|
| 124 |
-
- **Hardware Used:** NVIDIA Tesla P100 GPU.
|
| 125 |
-
- **Training Time:** ~24 minutes.
|
| 126 |
-
- **Estimated Carbon Footprint:** ~0.05 kg CO2eq.
|
| 127 |
-
|
| 128 |
-
---
|
| 129 |
-
|
| 130 |
-
## Architecture & Training
|
| 131 |
-
|
| 132 |
-
- **Base model:** `openai-community/gpt2-medium` (355M parameters)
|
| 133 |
-
- **Method:** LoRA (rank=128, alpha=256)
|
| 134 |
-
- **Data:** 1,970 filtered preference pairs.
|
| 135 |
-
|
| 136 |
-
---
|
| 137 |
-
|
| 138 |
-
## Citation
|
| 139 |
|
| 140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
@article{prabhath2026gricebench,
|
| 142 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
| 143 |
author={Prabhath, Pushkar},
|
| 144 |
year={2026},
|
| 145 |
note={Under review, EMNLP 2026}
|
| 146 |
}
|
| 147 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
language:
|
| 2 |
+
en
|
| 3 |
license: apache-2.0
|
| 4 |
library_name: peft
|
| 5 |
tags:
|
| 6 |
+
text-generation
|
| 7 |
+
dialogue
|
| 8 |
+
gricean-maxims
|
| 9 |
+
cooperative-communication
|
| 10 |
+
lora
|
| 11 |
+
dpo
|
| 12 |
+
direct-preference-optimization
|
| 13 |
+
peft
|
| 14 |
+
gpt2
|
| 15 |
+
nlp
|
| 16 |
datasets:
|
| 17 |
+
topical_chat
|
| 18 |
metrics:
|
| 19 |
+
cooperative_rate
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
base_model: openai-community/gpt2-medium
|
| 22 |
model-index:
|
| 23 |
+
name: GriceBench-DPO
|
| 24 |
+
results:
|
| 25 |
+
task:
|
| 26 |
+
type: text-generation
|
| 27 |
+
name: Cooperative Dialogue Generation
|
| 28 |
+
dataset:
|
| 29 |
+
name: Topical-Chat (GriceBench test split)
|
| 30 |
+
type: topical_chat
|
| 31 |
+
split: test
|
| 32 |
+
metrics:
|
| 33 |
+
type: cooperative_rate
|
| 34 |
+
value: 0.832
|
| 35 |
+
name: Standalone Cooperative Rate
|
| 36 |
+
type: cooperative_rate
|
| 37 |
+
value: 0.950
|
| 38 |
+
name: Full Pipeline Cooperative Rate
|
| 39 |
+
type: accuracy
|
| 40 |
+
value: 0.750
|
| 41 |
+
name: DPO Preference Accuracy
|
| 42 |
+
|
| 43 |
+
β‘ GriceBench-DPO
|
| 44 |
|
| 45 |
+
GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.
|
| 46 |
|
|
|
|
| 47 |
|
| 48 |
+
License-Apache%202.0-blue.svg
|
| 49 |
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
+
%F0%9F%A4%97-PEFT%20LoRA-yellow
|
|
|
|
|
|
|
|
|
|
| 52 |
|
|
|
|
| 53 |
|
| 54 |
+
%F0%9F%A4%97-GriceBench-yellow
|
| 55 |
|
|
|
|
| 56 |
|
| 57 |
+
Part of the GriceBench system β
|
|
|
|
|
|
|
| 58 |
|
| 59 |
+
GitHub |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
+
π Detector |
|
| 62 |
|
| 63 |
+
π§ Repair Model
|
| 64 |
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
What This Model Does
|
| 67 |
+
GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the generation stage of the GriceBench pipeline, producing responses that are more likely to be cooperative before any post-generation detection and repair is applied.
|
| 68 |
|
| 69 |
+
Metric Score Context
|
| 70 |
+
Standalone cooperative rate 83.2% Using this model alone
|
| 71 |
+
Full pipeline cooperative rate 95.0% DPO + Detector + Repair
|
| 72 |
+
DPO preference accuracy 75.0% Held-out preference pairs
|
| 73 |
+
DPO eval loss 0.5595 End of training
|
| 74 |
+
Important: The 95.0% figure requires the full pipeline. This model alone achieves 83.2% β still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% β ~10%).
|
| 75 |
|
| 76 |
+
Quick Start
|
| 77 |
+
python
|
| 78 |
from peft import PeftModel, PeftConfig
|
| 79 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 80 |
import torch
|
|
|
|
| 82 |
# Load LoRA adapter on GPT-2-medium base
|
| 83 |
adapter_path = "Pushkar27/GriceBench-DPO"
|
| 84 |
config = PeftConfig.from_pretrained(adapter_path)
|
| 85 |
+
print(f"Base model: {config.base_model_name_or_path}")
|
| 86 |
+
# β openai-community/gpt2-medium
|
| 87 |
+
|
| 88 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 89 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
| 90 |
+
config.base_model_name_or_path,
|
| 91 |
+
torch_dtype=torch.float32,
|
| 92 |
+
)
|
| 93 |
model = PeftModel.from_pretrained(base_model, adapter_path)
|
| 94 |
model.eval()
|
| 95 |
|
| 96 |
+
def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str:
|
| 97 |
prompt = f"Context: {context}\nResponse:"
|
| 98 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 99 |
+
|
| 100 |
with torch.no_grad():
|
| 101 |
output_ids = model.generate(
|
| 102 |
+
**inputs,
|
| 103 |
+
max_new_tokens=max_new_tokens,
|
| 104 |
+
do_sample=True,
|
| 105 |
+
temperature=0.85,
|
| 106 |
+
top_p=0.92,
|
| 107 |
+
repetition_penalty=1.3,
|
| 108 |
pad_token_id=tokenizer.eos_token_id,
|
| 109 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
# Decode only the newly generated tokens
|
| 112 |
+
new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
|
| 113 |
+
return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
# Example
|
| 117 |
+
context = "What do you think about the history of jazz music in New Orleans?"
|
| 118 |
+
print(generate_cooperative_response(context))
|
| 119 |
+
Full Pipeline Usage (Recommended for Best Results)
|
| 120 |
+
python
|
| 121 |
+
# For 95.0% cooperative rate, use all three GriceBench models together:
|
| 122 |
+
# Step 1: Generate with this DPO model
|
| 123 |
+
response = generate_cooperative_response(context)
|
| 124 |
+
|
| 125 |
+
# Step 2: Detect any remaining violations
|
| 126 |
+
# (see GriceBench-Detector model card for detection code)
|
| 127 |
+
result = detect_violations(context, response, evidence)
|
| 128 |
+
|
| 129 |
+
# Step 3: Repair each flagged violation
|
| 130 |
+
for maxim, violated in result["violations"].items():
|
| 131 |
+
if violated and maxim != "relation":
|
| 132 |
+
response = repair_violation(context, response, maxim)
|
| 133 |
+
|
| 134 |
+
# Final response achieves 95.0% cooperative rate across the test set
|
| 135 |
+
print(response)
|
| 136 |
+
Full pipeline implementation: GitHub repository
|
| 137 |
+
|
| 138 |
+
Ablation Results (Why You Need the Full Pipeline)
|
| 139 |
+
Configuration Cooperative Rate Notes
|
| 140 |
+
Baseline (GPT-2, no tuning) 83.8% Reference
|
| 141 |
+
This model (DPO only) 83.2% Relation violations -52pp; Manner unchanged
|
| 142 |
+
Detect + Repair (no DPO) 93.0% Repair handles Manner
|
| 143 |
+
Full System 95.0% DPO + Detect + Repair combined
|
| 144 |
+
Why DPO alone barely moves the overall number: DPO dramatically reduces Relation violations (62% β ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%.
|
| 145 |
+
|
| 146 |
+
Training Details
|
| 147 |
+
Model Architecture
|
| 148 |
+
Parameter Value
|
| 149 |
+
Base model openai-community/gpt2-medium (355M)
|
| 150 |
+
Method LoRA (Low-Rank Adaptation)
|
| 151 |
+
LoRA rank (r) 128
|
| 152 |
+
LoRA alpha (Ξ±) 256
|
| 153 |
+
Target modules q, k, v, o attention projections
|
| 154 |
+
Adapter size ~25 MB
|
| 155 |
+
DPO Training
|
| 156 |
+
Hyperparameter Value
|
| 157 |
+
Algorithm Direct Preference Optimization (DPO)
|
| 158 |
+
DPO Ξ² 0.1
|
| 159 |
+
Learning rate 5e-7
|
| 160 |
+
Batch size 16 (grad accum Γ8)
|
| 161 |
+
Epochs 3
|
| 162 |
+
Training pairs 1,970 filtered preference pairs
|
| 163 |
+
Hardware Kaggle P100-16GB, ~24 minutes
|
| 164 |
+
DPO Loss (Plain Text)
|
| 165 |
+
The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model:
|
| 166 |
+
|
| 167 |
+
L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x))
|
| 168 |
+
|
| 169 |
+
- log(pi(y_l|x)/pi_ref(y_l|x)) ] )
|
| 170 |
+
|
| 171 |
+
where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response.
|
| 172 |
+
|
| 173 |
+
Training Data
|
| 174 |
+
Source Pairs Description
|
| 175 |
+
Human-labeled 411 Expert-verified cooperative/violating pairs
|
| 176 |
+
Repair-derived ~1,200 (original violation, T5-repaired output)
|
| 177 |
+
Synthetic (LLM) ~1,200 Generated via Groq API (llama-3.3-70b)
|
| 178 |
+
Total (filtered) 1,970 After conflict-detection filtering
|
| 179 |
+
Files
|
| 180 |
+
File Description
|
| 181 |
+
adapter_config.json LoRA configuration (base model, rank, alpha)
|
| 182 |
+
adapter_model.safetensors LoRA weights (~25 MB)
|
| 183 |
+
tokenizer.json GPT-2 tokenizer
|
| 184 |
+
tokenizer_config.json Tokenizer configuration
|
| 185 |
+
special_tokens_map.json Special token mappings
|
| 186 |
+
Limitations
|
| 187 |
+
Manner violations persist standalone: DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result.
|
| 188 |
+
Single domain: Trained and evaluated on Topical-Chat only.
|
| 189 |
+
English only: No multilingual support.
|
| 190 |
+
Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%): The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number.
|
| 191 |
+
Citation
|
| 192 |
+
bibtex
|
| 193 |
@article{prabhath2026gricebench,
|
| 194 |
title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
|
| 195 |
author={Prabhath, Pushkar},
|
| 196 |
year={2026},
|
| 197 |
note={Under review, EMNLP 2026}
|
| 198 |
}
|
| 199 |
+
Related Models
|
| 200 |
+
Model Role Link
|
| 201 |
+
GriceBench-Detector Detects violations π Detector
|
| 202 |
+
GriceBench-Repair Repairs violations π§ Repair
|
| 203 |
+
GriceBench-DPO Generates cooperative responses (this model) You are here
|
| 204 |
+
GitHub: https://github.com/PushkarPrabhath27/Research-Model
|
| 205 |
+
|
| 206 |
+
Environmental Impact
|
| 207 |
+
Aspect Value
|
| 208 |
+
Hardware Used NVIDIA Tesla P100 GPU
|
| 209 |
+
Training Time ~24 minutes
|
| 210 |
+
Estimated Carbon Footprint ~0.05 kg CO2eq
|