Instructions to use Pushkar27/GriceBench-Repair with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pushkar27/GriceBench-Repair with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pushkar27/GriceBench-Repair")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Pushkar27/GriceBench-Repair")
model = AutoModelForSeq2SeqLM.from_pretrained("Pushkar27/GriceBench-Repair")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Pushkar27/GriceBench-Repair with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pushkar27/GriceBench-Repair"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Pushkar27/GriceBench-Repair

SGLang

How to use Pushkar27/GriceBench-Repair with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pushkar27/GriceBench-Repair" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pushkar27/GriceBench-Repair" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Pushkar27/GriceBench-Repair with Docker Model Runner:
```
docker model run hf.co/Pushkar27/GriceBench-Repair
```

Pushkar27 commited on 24 days ago

Commit

3fb9bfb

1 Parent(s): c208727

Fix YAML metadata - remove escaped underscores, proper list syntax, complete model-index

Browse files

Files changed (1) hide show

README.md +112 -77

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ model-index:
       name: Gricean Maxim Violation Repair
     dataset:
       name: Topical-Chat (GriceBench repair validation split, N=401)
-      type: custom
       split: validation
     metrics:
     - type: bleu
@@ -43,40 +43,42 @@ model-index:
       name: Violation Removal Rate
 ---
-🔧 GriceBench-Repair
-Rewrites Gricean maxim violations into cooperative dialogue — surgically, not generally.
-License-Apache%202.0-blue.svg
-%F0%9F%A4%97-GriceBench-yellow
-python-3.8+-blue.svg
-Part of the GriceBench system —
-GitHub |
-🔍 Detector |
-⚡ DPO Generator
-What This Model Does
-GriceBench-Repair is a T5-base seq2seq model that rewrites Gricean maxim violations into cooperative responses. It is violation-type-aware: different maxims use different generation strategies because the nature of the repair task differs.
-Violation	Decoding Strategy	Why
-Quantity	Beam search (n=4) + length constraints	Needs precise length control
-Quality	Beam search (n=4) + repetition penalty	Needs factual precision
-Manner	Nucleus sampling (T=0.85, top-p=0.92)	Needs creative diverse rewrites
-Relation	NOT this model — use FAISS retrieval	Entire response is off-topic; editing can't fix it
-Violation removal rate: 93.0% (post-fix evaluation, N=200)
-Quick Start
 ```python
 from transformers import T5ForConditionalGeneration, T5Tokenizer
 import torch
@@ -123,7 +125,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
     return tokenizer.decode(output_ids[0], skip_special_tokens=True)
 # ── Examples ────────────────────────────────────────────────────────────────
 # Quantity (too short)
@@ -143,51 +144,75 @@ print(repair_violation(
 ))
 # → "Alice confirmed she would complete the project before leaving the office."
 ```
-Performance
-Violation removal rate: 93.0% (corrected, post-fix evaluation)
 Per-maxim BLEU scores on the repair validation set (N=401):
-Violation Type	BLEU	Notes
-Quality	97.8%	Near-perfect factual correction
-Manner	92.5%	Strong clarity improvements
-Quantity	61.8%	Harder — requires insertions/deletions
-Relation	N/A	Route to FAISS retrieval — do not use T5 for this
-Degeneracy fix (before vs. after violation-type-aware decoding):
-Maxim	Before Fix	After Fix	Improvement
-Quantity	30.1% degenerate	2.1%	−28.0pp
-Manner	93.3% degenerate	4.5%	−88.8pp
-Overall	64.4% degenerate	5.2%	−59.2pp
-Key lesson: Beam search produces mode-collapsed outputs for Manner repairs (model inserts ! as a proxy for "clarity"). Nucleus sampling eliminates this.
-Architecture & Training
-Base model: google-t5/t5-base (220M parameters)
-Training pairs: 3,210 (violation → cooperative) seq2seq pairs
-Validation pairs: 401
-Epochs: 5 | Label smoothing: 0.1 | Hardware: Kaggle T4
-Three-layer degeneracy prevention:
-1.
-Violation-type-aware decoding (nucleus sampling for Manner, beam for others)
-2.
-Post-generation multi-signal filter (punctuation bursts, trigram repetition, exclamation density)
-3.
-Graceful fallback — returns original with is_fallback: True flag if all attempts fail
-Why Relation Violations Use Retrieval
-Relation violations mean the entire response is off-topic — there is nothing to edit. T5 in a seq2seq framing can only edit, not generate entirely new content. We route Relation repairs to a FAISS index over 50,000 Topical-Chat responses (MRR > 0.70, Top-1 accuracy > 60%). See the GitHub repo for the full retrieval system.
-Files
-File	Description
-config.json	T5-base configuration
-model.safetensors	Trained model weights
-tokenizer.json	SentencePiece tokenizer
-tokenizer_config.json	Tokenizer configuration
-Limitations & Biases
-Hallucination Risk: Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
-Dependency on Context: Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
-Mode Collapse: Avoid using beam search for "Manner" repairs, as it can lead to repetitive punctuation or symbols.
-Citation
 ```bibtex
  @article{prabhath2026gricebench,
   title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
@@ -196,15 +221,25 @@ Citation
   note={Under review, EMNLP 2026}
 }
 ```
-Related Models
-Model	Role	Link
-GriceBench-Detector	Detects which maxim was violated	🔍 Detector
-GriceBench-Repair	Repairs violations (this model)	You are here
-GriceBench-DPO	Generates cooperative responses	⚡ DPO
-GitHub: https://github.com/PushkarPrabhath27/Research-Model
-Environmental Impact
-Aspect	Value
-Hardware Used	NVIDIA Tesla T4 GPU
-Training Time	~2 hours
-Estimated Carbon Footprint	~0.25 kg CO2eq

       name: Gricean Maxim Violation Repair
     dataset:
       name: Topical-Chat (GriceBench repair validation split, N=401)
+      type: topical_chat
       split: validation
     metrics:
     - type: bleu
       name: Violation Removal Rate
 ---
+<div align="center">
+# 🔧 GriceBench-Repair
+**Rewrites Gricean maxim violations into cooperative dialogue — surgically, not generally.**
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![HuggingFace](https://img.shields.io/badge/🤗-GriceBench-yellow)](https://huggingface.co/Pushkar27)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+**Part of the GriceBench system** —
+[GitHub](https://github.com/PushkarPrabhath27/Research-Model) |
+[🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
+[⚡ DPO Generator](https://huggingface.co/Pushkar27/GriceBench-DPO)
+</div>
+---
+## What This Model Does
+GriceBench-Repair is a T5-base seq2seq model that rewrites Gricean maxim violations into cooperative responses. It is **violation-type-aware**: different maxims use different generation strategies because the nature of the repair task differs.
+| Violation | Decoding Strategy | Why |
+|-----------|------------------|-----|
+| **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
+| **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
+| **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
+| **Relation** | NOT this model — use FAISS retrieval | Entire response is off-topic; editing can't fix it |
+**Violation removal rate: 93.0%** (post-fix evaluation, N=200)
+---
+## Quick Start
 ```python
 from transformers import T5ForConditionalGeneration, T5Tokenizer
 import torch
     return tokenizer.decode(output_ids[0], skip_special_tokens=True)
 # ── Examples ────────────────────────────────────────────────────────────────
 # Quantity (too short)
 ))
 # → "Alice confirmed she would complete the project before leaving the office."
 ```
+---
+## Performance
+**Violation removal rate: 93.0%** (corrected, post-fix evaluation)
 Per-maxim BLEU scores on the repair validation set (N=401):
+| Violation Type | BLEU | Notes |
+|----------------|------|-------|
+| Quality | **97.8%** | Near-perfect factual correction |
+| Manner | **92.5%** | Strong clarity improvements |
+| Quantity | 61.8% | Harder — requires insertions/deletions |
+| Relation | N/A | Route to FAISS retrieval — do not use T5 for this |
+**Degeneracy fix (before vs. after violation-type-aware decoding):**
+| Maxim | Before Fix | After Fix | Improvement |
+|-------|-----------|-----------|-------------|
+| Quantity | 30.1% degenerate | 2.1% | **−28.0pp** |
+| Manner | 93.3% degenerate | 4.5% | **−88.8pp** |
+| Overall | 64.4% degenerate | 5.2% | **−59.2pp** |
+> **Key lesson:** Beam search produces mode-collapsed outputs for Manner repairs (model inserts `!` as a proxy for "clarity"). Nucleus sampling eliminates this.
+---
+## Architecture & Training
+- **Base model:** `google-t5/t5-base` (220M parameters)
+- **Training pairs:** 3,210 (violation → cooperative) seq2seq pairs
+- **Validation pairs:** 401
+- **Epochs:** 5 | **Label smoothing:** 0.1 | **Hardware:** Kaggle T4
+**Three-layer degeneracy prevention:**
+1. Violation-type-aware decoding (nucleus sampling for Manner, beam for others)
+2. Post-generation multi-signal filter (punctuation bursts, trigram repetition, exclamation density)
+3. Graceful fallback — returns original with `is_fallback: True` flag if all attempts fail
+---
+## Why Relation Violations Use Retrieval
+Relation violations mean the *entire response* is off-topic — there is nothing to edit. T5 in a seq2seq framing can only edit, not generate entirely new content. We route Relation repairs to a FAISS index over 50,000 Topical-Chat responses (MRR > 0.70, Top-1 accuracy > 60%). See the GitHub repo for the full retrieval system.
+---
+## Files
+| File | Description |
+|------|-------------|
+| `config.json` | T5-base configuration |
+| `model.safetensors` | Trained model weights |
+| `tokenizer.json` | SentencePiece tokenizer |
+| `tokenizer_config.json` | Tokenizer configuration |
+---
+## Limitations & Biases
+- **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
+- **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
+- **Mode Collapse:** Avoid using beam search for "Manner" repairs, as it can lead to repetitive punctuation or symbols.
+---
+## Citation
 ```bibtex
  @article{prabhath2026gricebench,
   title={GriceBench: Operationalizing Gricean Maxims for Cooperative Dialogue Evaluation and Generation},
   note={Under review, EMNLP 2026}
 }
 ```
+---
+## Related Models
+| Model | Role | Link |
+|-------|------|------|
+| GriceBench-Detector | Detects which maxim was violated | [🔍 Detector](https://huggingface.co/Pushkar27/GriceBench-Detector) |
+| GriceBench-Repair | Repairs violations (this model) | You are here |
+| GriceBench-DPO | Generates cooperative responses | [⚡ DPO](https://huggingface.co/Pushkar27/GriceBench-DPO) |
+**GitHub:** https://github.com/PushkarPrabhath27/Research-Model
+---
+## Environmental Impact
+| Aspect | Value |
+|--------|-------|
+| Hardware Used | NVIDIA Tesla T4 GPU |
+| Training Time | ~2 hours |
+| Estimated Carbon Footprint | ~0.25 kg CO2eq