Text Generation
Transformers
Safetensors
English
t5
text2text-generation
dialogue
gricean-maxims
cooperative-communication
text-repair
seq2seq
nlp
Eval Results (legacy)
text-generation-inference
Instructions to use Pushkar27/GriceBench-Repair with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Pushkar27/GriceBench-Repair with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Pushkar27/GriceBench-Repair")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Pushkar27/GriceBench-Repair") model = AutoModelForSeq2SeqLM.from_pretrained("Pushkar27/GriceBench-Repair") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Pushkar27/GriceBench-Repair with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Pushkar27/GriceBench-Repair" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pushkar27/GriceBench-Repair", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Pushkar27/GriceBench-Repair
- SGLang
How to use Pushkar27/GriceBench-Repair with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Pushkar27/GriceBench-Repair" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pushkar27/GriceBench-Repair", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Pushkar27/GriceBench-Repair" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pushkar27/GriceBench-Repair", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Pushkar27/GriceBench-Repair with Docker Model Runner:
docker model run hf.co/Pushkar27/GriceBench-Repair
CRITICAL: Remove all escaped underscores from YAML metadata
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ tags:
|
|
| 13 |
- seq2seq
|
| 14 |
- nlp
|
| 15 |
datasets:
|
| 16 |
-
-
|
| 17 |
metrics:
|
| 18 |
- bleu
|
| 19 |
pipeline_tag: text2text-generation
|
|
@@ -26,7 +26,7 @@ model-index:
|
|
| 26 |
name: Gricean Maxim Violation Repair
|
| 27 |
dataset:
|
| 28 |
name: Topical-Chat (GriceBench repair validation split, N=401)
|
| 29 |
-
type:
|
| 30 |
split: validation
|
| 31 |
metrics:
|
| 32 |
- type: bleu
|
|
@@ -71,7 +71,7 @@ GriceBench-Repair is a T5-base seq2seq model that rewrites Gricean maxim violati
|
|
| 71 |
| **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
|
| 72 |
| **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
|
| 73 |
| **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
|
| 74 |
-
| **Relation** | NOT this model β use FAISS retrieval | Entire response is off-topic; editing
|
| 75 |
|
| 76 |
**Violation removal rate: 93.0%** (post-fix evaluation, N=200)
|
| 77 |
|
|
@@ -89,17 +89,6 @@ model = T5ForConditionalGeneration.from_pretrained(model_name)
|
|
| 89 |
model.eval()
|
| 90 |
|
| 91 |
def repair_violation(context: str, response: str, violation_type: str) -> str:
|
| 92 |
-
"""
|
| 93 |
-
Repair a Gricean maxim violation.
|
| 94 |
-
|
| 95 |
-
Args:
|
| 96 |
-
context: Conversation history
|
| 97 |
-
response: The violating response to fix
|
| 98 |
-
violation_type: One of "quantity", "quality", "manner"
|
| 99 |
-
(Relation β use FAISS retrieval instead)
|
| 100 |
-
Returns:
|
| 101 |
-
Rewritten cooperative response string
|
| 102 |
-
"""
|
| 103 |
assert violation_type in ["quantity", "quality", "manner"], \
|
| 104 |
"Relation violations must use the FAISS retrieval system β not this model."
|
| 105 |
|
|
@@ -108,7 +97,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
|
|
| 108 |
|
| 109 |
with torch.no_grad():
|
| 110 |
if violation_type == "manner":
|
| 111 |
-
# Nucleus sampling β beam search degenerates for Manner
|
| 112 |
output_ids = model.generate(
|
| 113 |
**inputs,
|
| 114 |
do_sample=True, temperature=0.85, top_p=0.92,
|
|
@@ -116,7 +104,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
|
|
| 116 |
repetition_penalty=1.5, no_repeat_ngram_size=3,
|
| 117 |
)
|
| 118 |
else:
|
| 119 |
-
# Beam search for precision
|
| 120 |
output_ids = model.generate(
|
| 121 |
**inputs,
|
| 122 |
num_beams=4, max_length=128, min_length=8,
|
|
@@ -125,16 +112,12 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
|
|
| 125 |
|
| 126 |
return tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
| 127 |
|
| 128 |
-
# ββ Examples ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 129 |
-
|
| 130 |
# Quantity (too short)
|
| 131 |
print(repair_violation(
|
| 132 |
context="What do you think about commercial space travel?",
|
| 133 |
response="It's fine.",
|
| 134 |
violation_type="quantity"
|
| 135 |
))
|
| 136 |
-
# β "Commercial space travel has advanced rapidly, with reusable rockets
|
| 137 |
-
# making orbital access cheaper, though costs remain high for most."
|
| 138 |
|
| 139 |
# Manner (ambiguous pronouns)
|
| 140 |
print(repair_violation(
|
|
@@ -142,14 +125,13 @@ print(repair_violation(
|
|
| 142 |
response="She said she would do it before she left.",
|
| 143 |
violation_type="manner"
|
| 144 |
))
|
| 145 |
-
# β "Alice confirmed she would complete the project before leaving the office."
|
| 146 |
```
|
| 147 |
|
| 148 |
---
|
| 149 |
|
| 150 |
## Performance
|
| 151 |
|
| 152 |
-
**Violation removal rate: 93.0%** (
|
| 153 |
|
| 154 |
Per-maxim BLEU scores on the repair validation set (N=401):
|
| 155 |
|
|
@@ -158,7 +140,7 @@ Per-maxim BLEU scores on the repair validation set (N=401):
|
|
| 158 |
| Quality | **97.8%** | Near-perfect factual correction |
|
| 159 |
| Manner | **92.5%** | Strong clarity improvements |
|
| 160 |
| Quantity | 61.8% | Harder β requires insertions/deletions |
|
| 161 |
-
| Relation | N/A | Route to FAISS retrieval
|
| 162 |
|
| 163 |
**Degeneracy fix (before vs. after violation-type-aware decoding):**
|
| 164 |
|
|
@@ -168,8 +150,6 @@ Per-maxim BLEU scores on the repair validation set (N=401):
|
|
| 168 |
| Manner | 93.3% degenerate | 4.5% | **β88.8pp** |
|
| 169 |
| Overall | 64.4% degenerate | 5.2% | **β59.2pp** |
|
| 170 |
|
| 171 |
-
> **Key lesson:** Beam search produces mode-collapsed outputs for Manner repairs (model inserts `!` as a proxy for "clarity"). Nucleus sampling eliminates this.
|
| 172 |
-
|
| 173 |
---
|
| 174 |
|
| 175 |
## Architecture & Training
|
|
@@ -207,7 +187,7 @@ Relation violations mean the *entire response* is off-topic β there is nothing
|
|
| 207 |
|
| 208 |
- **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
|
| 209 |
- **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
|
| 210 |
-
- **Mode Collapse:** Avoid using beam search for "Manner" repairs
|
| 211 |
|
| 212 |
---
|
| 213 |
|
|
|
|
| 13 |
- seq2seq
|
| 14 |
- nlp
|
| 15 |
datasets:
|
| 16 |
+
- topical-chat
|
| 17 |
metrics:
|
| 18 |
- bleu
|
| 19 |
pipeline_tag: text2text-generation
|
|
|
|
| 26 |
name: Gricean Maxim Violation Repair
|
| 27 |
dataset:
|
| 28 |
name: Topical-Chat (GriceBench repair validation split, N=401)
|
| 29 |
+
type: topical-chat
|
| 30 |
split: validation
|
| 31 |
metrics:
|
| 32 |
- type: bleu
|
|
|
|
| 71 |
| **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
|
| 72 |
| **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
|
| 73 |
| **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
|
| 74 |
+
| **Relation** | NOT this model β use FAISS retrieval | Entire response is off-topic; editing cannot fix it |
|
| 75 |
|
| 76 |
**Violation removal rate: 93.0%** (post-fix evaluation, N=200)
|
| 77 |
|
|
|
|
| 89 |
model.eval()
|
| 90 |
|
| 91 |
def repair_violation(context: str, response: str, violation_type: str) -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
assert violation_type in ["quantity", "quality", "manner"], \
|
| 93 |
"Relation violations must use the FAISS retrieval system β not this model."
|
| 94 |
|
|
|
|
| 97 |
|
| 98 |
with torch.no_grad():
|
| 99 |
if violation_type == "manner":
|
|
|
|
| 100 |
output_ids = model.generate(
|
| 101 |
**inputs,
|
| 102 |
do_sample=True, temperature=0.85, top_p=0.92,
|
|
|
|
| 104 |
repetition_penalty=1.5, no_repeat_ngram_size=3,
|
| 105 |
)
|
| 106 |
else:
|
|
|
|
| 107 |
output_ids = model.generate(
|
| 108 |
**inputs,
|
| 109 |
num_beams=4, max_length=128, min_length=8,
|
|
|
|
| 112 |
|
| 113 |
return tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
| 114 |
|
|
|
|
|
|
|
| 115 |
# Quantity (too short)
|
| 116 |
print(repair_violation(
|
| 117 |
context="What do you think about commercial space travel?",
|
| 118 |
response="It's fine.",
|
| 119 |
violation_type="quantity"
|
| 120 |
))
|
|
|
|
|
|
|
| 121 |
|
| 122 |
# Manner (ambiguous pronouns)
|
| 123 |
print(repair_violation(
|
|
|
|
| 125 |
response="She said she would do it before she left.",
|
| 126 |
violation_type="manner"
|
| 127 |
))
|
|
|
|
| 128 |
```
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
## Performance
|
| 133 |
|
| 134 |
+
**Violation removal rate: 93.0%** (post-fix evaluation)
|
| 135 |
|
| 136 |
Per-maxim BLEU scores on the repair validation set (N=401):
|
| 137 |
|
|
|
|
| 140 |
| Quality | **97.8%** | Near-perfect factual correction |
|
| 141 |
| Manner | **92.5%** | Strong clarity improvements |
|
| 142 |
| Quantity | 61.8% | Harder β requires insertions/deletions |
|
| 143 |
+
| Relation | N/A | Route to FAISS retrieval |
|
| 144 |
|
| 145 |
**Degeneracy fix (before vs. after violation-type-aware decoding):**
|
| 146 |
|
|
|
|
| 150 |
| Manner | 93.3% degenerate | 4.5% | **β88.8pp** |
|
| 151 |
| Overall | 64.4% degenerate | 5.2% | **β59.2pp** |
|
| 152 |
|
|
|
|
|
|
|
| 153 |
---
|
| 154 |
|
| 155 |
## Architecture & Training
|
|
|
|
| 187 |
|
| 188 |
- **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
|
| 189 |
- **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
|
| 190 |
+
- **Mode Collapse:** Avoid using beam search for "Manner" repairs.
|
| 191 |
|
| 192 |
---
|
| 193 |
|