| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: peft |
| tags: |
| - text-generation |
| - dialogue |
| - gricean-maxims |
| - cooperative-communication |
| - lora |
| - dpo |
| - direct-preference-optimization |
| - peft |
| - gpt2 |
| - nlp |
| datasets: |
| - topical-chat |
| metrics: |
| - cooperative_rate |
| pipeline_tag: text-generation |
| base_model: openai-community/gpt2-medium |
| model-index: |
| - name: GriceBench-DPO |
| results: |
| - task: |
| type: text-generation |
| name: Cooperative Dialogue Generation |
| dataset: |
| name: Topical-Chat (GriceBench test split) |
| type: topical-chat |
| split: test |
| metrics: |
| - type: cooperative_rate |
| value: 0.832 |
| name: Standalone Cooperative Rate |
| - type: cooperative_rate |
| value: 0.950 |
| name: Full Pipeline Cooperative Rate |
| - type: accuracy |
| value: 0.750 |
| name: DPO Preference Accuracy |
| --- |
| |
| <div align="center"> |
|
|
| # ⚡ GriceBench-DPO |
|
|
| **GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.** |
|
|
| [](https://opensource.org/licenses/Apache-2.0) |
| [](https://huggingface.co/docs/peft) |
| [](https://huggingface.co/Pushkar27) |
|
|
| **Part of the GriceBench system** — |
| [GitHub](https://github.com/PushkarPrabhath27/Research-Model) | |
| [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) | |
| [🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair) |
|
|
| </div> |
|
|
| --- |
|
|
| ## What This Model Does |
|
|
| GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the **generation stage** of the GriceBench pipeline, producing responses that are more likely to be cooperative *before* any post-generation detection and repair is applied. |
|
|
| | Metric | Score | Context | |
| |--------|-------|---------| |
| | Standalone cooperative rate | 83.2% | Using this model alone | |
| | Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair | |
| | DPO preference accuracy | 75.0% | Held-out preference pairs | |
| | DPO eval loss | 0.5595 | End of training | |
|
|
| > **Important:** The 95.0% figure requires the full pipeline. This model alone achieves 83.2% — still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% → ~10%). |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ```python |
| from peft import PeftModel, PeftConfig |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # Load LoRA adapter on GPT-2-medium base |
| adapter_path = "Pushkar27/GriceBench-DPO" |
| config = PeftConfig.from_pretrained(adapter_path) |
| print(f"Base model: {config.base_model_name_or_path}") |
| # → openai-community/gpt2-medium |
| |
| tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) |
| base_model = AutoModelForCausalLM.from_pretrained( |
| config.base_model_name_or_path, |
| torch_dtype=torch.float32, |
| ) |
| model = PeftModel.from_pretrained(base_model, adapter_path) |
| model.eval() |
| |
| def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str: |
| prompt = f"Context: {context}\nResponse:" |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| with torch.no_grad(): |
| output_ids = model.generate( |
| **inputs, |
| max_new_tokens=max_new_tokens, |
| do_sample=True, |
| temperature=0.85, |
| top_p=0.92, |
| repetition_penalty=1.3, |
| pad_token_id=tokenizer.eos_token_id, |
| ) |
| |
| new_tokens = output_ids[0][inputs["input_ids"].shape[1]:] |
| return tokenizer.decode(new_tokens, skip_special_tokens=True).strip() |
| |
| |
| context = "What do you think about the history of jazz music in New Orleans?" |
| print(generate_cooperative_response(context)) |
| ``` |
|
|
| --- |
|
|
| ## Full Pipeline Usage (Recommended for Best Results) |
|
|
| ```python |
| # For 95.0% cooperative rate, use all three GriceBench models together: |
| # Step 1: Generate with this DPO model |
| response = generate_cooperative_response(context) |
| |
| # Step 2: Detect any remaining violations |
| result = detect_violations(context, response, evidence) |
| |
| # Step 3: Repair each flagged violation |
| for maxim, violated in result["violations"].items(): |
| if violated and maxim != "relation": |
| response = repair_violation(context, response, maxim) |
| |
| print(response) |
| ``` |
|
|
| Full pipeline implementation: [GitHub repository](https://github.com/PushkarPrabhath27/Research-Model) |
|
|
| --- |
|
|
| ## Ablation Results (Why You Need the Full Pipeline) |
|
|
| | Configuration | Cooperative Rate | Notes | |
| |---------------|-----------------|-------| |
| | Baseline (GPT-2, no tuning) | 83.8% | Reference | |
| | **This model (DPO only)** | **83.2%** | Relation violations -52pp; Manner unchanged | |
| | Detect + Repair (no DPO) | 93.0% | Repair handles Manner | |
| | **Full System** | **95.0%** | DPO + Detect + Repair combined | |
|
|
| **Why DPO alone barely moves the overall number:** DPO dramatically reduces Relation violations (62% → ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%. |
|
|
| --- |
|
|
| ## Training Details |
|
|
| ### Model Architecture |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Base model | `openai-community/gpt2-medium` (355M) | |
| | Method | LoRA (Low-Rank Adaptation) | |
| | LoRA rank (r) | 128 | |
| | LoRA alpha (α) | 256 | |
| | Target modules | q, k, v, o attention projections | |
| | Adapter size | ~25 MB | |
|
|
| ### DPO Training |
|
|
| | Hyperparameter | Value | |
| |----------------|-------| |
| | Algorithm | Direct Preference Optimization (DPO) | |
| | DPO β | 0.1 | |
| | Learning rate | 5e-7 | |
| | Batch size | 16 (grad accum ×8) | |
| | Epochs | 3 | |
| | Training pairs | 1,970 filtered preference pairs | |
| | Hardware | Kaggle P100-16GB, ~24 minutes | |
|
|
| ### DPO Loss (Plain Text) |
|
|
| The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model: |
|
|
| L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x)) |
| - log(pi(y_l|x)/pi_ref(y_l|x)) ] ) |
| |
| where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response. |
| |
| ### Training Data |
| |
| | Source | Pairs | Description | |
| |--------|-------|-------------| |
| | Human-labeled | 411 | Expert-verified cooperative/violating pairs | |
| | Repair-derived | ~1,200 | (original violation, T5-repaired output) | |
| | Synthetic (LLM) | ~1,200 | Generated via Groq API (llama-3.3-70b) | |
| | **Total (filtered)** | **1,970** | After conflict-detection filtering | |
| |
| --- |
| |
| ## Files |
| |
| | File | Description | |
| |------|-------------| |
| | `adapter_config.json` | LoRA configuration (base model, rank, alpha) | |
| | `adapter_model.safetensors` | LoRA weights (~25 MB) | |
| | `tokenizer.json` | GPT-2 tokenizer | |
| | `tokenizer_config.json` | Tokenizer configuration | |
| | `special_tokens_map.json` | Special token mappings | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **Manner violations persist standalone:** DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result. |
| - **Single domain:** Trained and evaluated on Topical-Chat only. |
| - **English only:** No multilingual support. |
| - **Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%):** The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number. |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{prabhath2026gricebench, |
| title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation}, |
| author={Prabhath, Pushkar}, |
| year={2026}, |
| note={Under review, EMNLP 2026} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Related Models |
|
|
| | Model | Role | Link | |
| |-------|------|------| |
| | GriceBench-Detector | Detects violations | [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) | |
| | GriceBench-Repair | Repairs violations | [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) | |
| | GriceBench-DPO | Generates cooperative responses (this model) | You are here | |
|
|
| **GitHub:** https://github.com/PushkarPrabhath27/Research-Model |
|
|
| --- |
|
|
| ## Environmental Impact |
|
|
| | Aspect | Value | |
| |--------|-------| |
| | Hardware Used | NVIDIA Tesla P100 GPU | |
| | Training Time | ~24 minutes | |
| | Estimated Carbon Footprint | ~0.05 kg CO2eq |