--- language: - en license: apache-2.0 library_name: peft tags: - text-generation - dialogue - gricean-maxims - cooperative-communication - lora - dpo - direct-preference-optimization - peft - gpt2 - nlp datasets: - topical-chat metrics: - cooperative_rate pipeline_tag: text-generation base_model: openai-community/gpt2-medium model-index: - name: GriceBench-DPO results: - task: type: text-generation name: Cooperative Dialogue Generation dataset: name: Topical-Chat (GriceBench test split) type: topical-chat split: test metrics: - type: cooperative_rate value: 0.832 name: Standalone Cooperative Rate - type: cooperative_rate value: 0.950 name: Full Pipeline Cooperative Rate - type: accuracy value: 0.750 name: DPO Preference Accuracy ---
# ⚡ GriceBench-DPO **GPT-2-medium fine-tuned with Direct Preference Optimization to generate cooperative dialogue.** [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![PEFT LoRA](https://img.shields.io/badge/🤗-PEFT%20LoRA-yellow)](https://huggingface.co/docs/peft) [![HuggingFace](https://img.shields.io/badge/🤗-GriceBench-yellow)](https://huggingface.co/Pushkar27) **Part of the GriceBench system** — [GitHub](https://github.com/PushkarPrabhath27/Research-Model) | [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) | [🔧 Repair Model](https://huggingface.co/Pushkar27/GriceBench-Repair)
--- ## What This Model Does GriceBench-DPO is a LoRA-adapted GPT-2-medium model trained with Direct Preference Optimization (DPO) to generate dialogue responses that comply with Gricean conversational maxims. It is the **generation stage** of the GriceBench pipeline, producing responses that are more likely to be cooperative *before* any post-generation detection and repair is applied. | Metric | Score | Context | |--------|-------|---------| | Standalone cooperative rate | 83.2% | Using this model alone | | Full pipeline cooperative rate | **95.0%** | DPO + Detector + Repair | | DPO preference accuracy | 75.0% | Held-out preference pairs | | DPO eval loss | 0.5595 | End of training | > **Important:** The 95.0% figure requires the full pipeline. This model alone achieves 83.2% — still competitive with the un-tuned baseline (83.8%), with Relation violations dramatically reduced (~62% → ~10%). --- ## Quick Start ```python from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load LoRA adapter on GPT-2-medium base adapter_path = "Pushkar27/GriceBench-DPO" config = PeftConfig.from_pretrained(adapter_path) print(f"Base model: {config.base_model_name_or_path}") # → openai-community/gpt2-medium tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) base_model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, torch_dtype=torch.float32, ) model = PeftModel.from_pretrained(base_model, adapter_path) model.eval() def generate_cooperative_response(context: str, max_new_tokens: int = 80) -> str: prompt = f"Context: {context}\nResponse:" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=0.85, top_p=0.92, repetition_penalty=1.3, pad_token_id=tokenizer.eos_token_id, ) new_tokens = output_ids[0][inputs["input_ids"].shape[1]:] return tokenizer.decode(new_tokens, skip_special_tokens=True).strip() context = "What do you think about the history of jazz music in New Orleans?" print(generate_cooperative_response(context)) ``` --- ## Full Pipeline Usage (Recommended for Best Results) ```python # For 95.0% cooperative rate, use all three GriceBench models together: # Step 1: Generate with this DPO model response = generate_cooperative_response(context) # Step 2: Detect any remaining violations result = detect_violations(context, response, evidence) # Step 3: Repair each flagged violation for maxim, violated in result["violations"].items(): if violated and maxim != "relation": response = repair_violation(context, response, maxim) print(response) ``` Full pipeline implementation: [GitHub repository](https://github.com/PushkarPrabhath27/Research-Model) --- ## Ablation Results (Why You Need the Full Pipeline) | Configuration | Cooperative Rate | Notes | |---------------|-----------------|-------| | Baseline (GPT-2, no tuning) | 83.8% | Reference | | **This model (DPO only)** | **83.2%** | Relation violations -52pp; Manner unchanged | | Detect + Repair (no DPO) | 93.0% | Repair handles Manner | | **Full System** | **95.0%** | DPO + Detect + Repair combined | **Why DPO alone barely moves the overall number:** DPO dramatically reduces Relation violations (62% → ~10%) but cannot address Manner violations (still ~64%), which are the dominant failure mode. The repair model handles Manner. Together: 95.0%. --- ## Training Details ### Model Architecture | Parameter | Value | |-----------|-------| | Base model | `openai-community/gpt2-medium` (355M) | | Method | LoRA (Low-Rank Adaptation) | | LoRA rank (r) | 128 | | LoRA alpha (α) | 256 | | Target modules | q, k, v, o attention projections | | Adapter size | ~25 MB | ### DPO Training | Hyperparameter | Value | |----------------|-------| | Algorithm | Direct Preference Optimization (DPO) | | DPO β | 0.1 | | Learning rate | 5e-7 | | Batch size | 16 (grad accum ×8) | | Epochs | 3 | | Training pairs | 1,970 filtered preference pairs | | Hardware | Kaggle P100-16GB, ~24 minutes | ### DPO Loss (Plain Text) The DPO loss maximizes the margin between chosen (y_w) and rejected (y_l) responses relative to a reference model: L_DPO = -log sigmoid( beta * [ log(pi(y_w|x)/pi_ref(y_w|x)) - log(pi(y_l|x)/pi_ref(y_l|x)) ] ) where beta = 0.1 controls preference strength, y_w = cooperative response, y_l = violating response. ### Training Data | Source | Pairs | Description | |--------|-------|-------------| | Human-labeled | 411 | Expert-verified cooperative/violating pairs | | Repair-derived | ~1,200 | (original violation, T5-repaired output) | | Synthetic (LLM) | ~1,200 | Generated via Groq API (llama-3.3-70b) | | **Total (filtered)** | **1,970** | After conflict-detection filtering | --- ## Files | File | Description | |------|-------------| | `adapter_config.json` | LoRA configuration (base model, rank, alpha) | | `adapter_model.safetensors` | LoRA weights (~25 MB) | | `tokenizer.json` | GPT-2 tokenizer | | `tokenizer_config.json` | Tokenizer configuration | | `special_tokens_map.json` | Special token mappings | --- ## Limitations - **Manner violations persist standalone:** DPO reduces Relation violations but not Manner. The full pipeline is required for the headline 95.0% result. - **Single domain:** Trained and evaluated on Topical-Chat only. - **English only:** No multilingual support. - **Preference accuracy (75.0%) vs. Phase 5 training accuracy (98.7%):** The 75.0% figure is from held-out Phase 7 evaluation (canonical). The 98.7% was from in-distribution Phase 5 evaluation and is not the representative number. --- ## Citation ```bibtex @article{prabhath2026gricebench, title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation}, author={Prabhath, Pushkar}, year={2026}, note={Under review, EMNLP 2026} } ``` --- ## Related Models | Model | Role | Link | |-------|------|------| | GriceBench-Detector | Detects violations | [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) | | GriceBench-Repair | Repairs violations | [🔧 Repair](https://huggingface.co/Pushkar27/GriceBench-Repair) | | GriceBench-DPO | Generates cooperative responses (this model) | You are here | **GitHub:** https://github.com/PushkarPrabhath27/Research-Model --- ## Environmental Impact | Aspect | Value | |--------|-------| | Hardware Used | NVIDIA Tesla P100 GPU | | Training Time | ~24 minutes | | Estimated Carbon Footprint | ~0.05 kg CO2eq