Instructions to use Pushkar27/GriceBench-Repair with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pushkar27/GriceBench-Repair with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pushkar27/GriceBench-Repair")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Pushkar27/GriceBench-Repair")
model = AutoModelForSeq2SeqLM.from_pretrained("Pushkar27/GriceBench-Repair")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Pushkar27/GriceBench-Repair with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pushkar27/GriceBench-Repair"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Pushkar27/GriceBench-Repair

SGLang

How to use Pushkar27/GriceBench-Repair with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pushkar27/GriceBench-Repair" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pushkar27/GriceBench-Repair" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pushkar27/GriceBench-Repair",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Pushkar27/GriceBench-Repair with Docker Model Runner:
```
docker model run hf.co/Pushkar27/GriceBench-Repair
```

Pushkar27 commited on about 1 month ago

Commit

fa4778c

1 Parent(s): 3fb9bfb

CRITICAL: Remove all escaped underscores from YAML metadata

Browse files

Files changed (1) hide show

README.md +6 -26

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ tags:
 - seq2seq
 - nlp
 datasets:
-- topical_chat
 metrics:
 - bleu
 pipeline_tag: text2text-generation
@@ -26,7 +26,7 @@ model-index:
       name: Gricean Maxim Violation Repair
     dataset:
       name: Topical-Chat (GriceBench repair validation split, N=401)
-      type: topical_chat
       split: validation
     metrics:
     - type: bleu
@@ -71,7 +71,7 @@ GriceBench-Repair is a T5-base seq2seq model that rewrites Gricean maxim violati
 | **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
 | **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
 | **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
-| **Relation** | NOT this model — use FAISS retrieval | Entire response is off-topic; editing can't fix it |
 **Violation removal rate: 93.0%** (post-fix evaluation, N=200)
@@ -89,17 +89,6 @@ model = T5ForConditionalGeneration.from_pretrained(model_name)
 model.eval()
 def repair_violation(context: str, response: str, violation_type: str) -> str:
-    """
-    Repair a Gricean maxim violation.
-    Args:
-        context:        Conversation history
-        response:       The violating response to fix
-        violation_type: One of "quantity", "quality", "manner"
-                        (Relation → use FAISS retrieval instead)
-    Returns:
-        Rewritten cooperative response string
-    """
     assert violation_type in ["quantity", "quality", "manner"], \
         "Relation violations must use the FAISS retrieval system — not this model."
@@ -108,7 +97,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
     with torch.no_grad():
         if violation_type == "manner":
-            # Nucleus sampling — beam search degenerates for Manner
             output_ids = model.generate(
                 **inputs,
                 do_sample=True, temperature=0.85, top_p=0.92,
@@ -116,7 +104,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
                 repetition_penalty=1.5, no_repeat_ngram_size=3,
             )
         else:
-            # Beam search for precision
             output_ids = model.generate(
                 **inputs,
                 num_beams=4, max_length=128, min_length=8,
@@ -125,16 +112,12 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
     return tokenizer.decode(output_ids[0], skip_special_tokens=True)
-# ── Examples ────────────────────────────────────────────────────────────────
 # Quantity (too short)
 print(repair_violation(
     context="What do you think about commercial space travel?",
     response="It's fine.",
     violation_type="quantity"
 ))
-# → "Commercial space travel has advanced rapidly, with reusable rockets
-#    making orbital access cheaper, though costs remain high for most."
 # Manner (ambiguous pronouns)
 print(repair_violation(
@@ -142,14 +125,13 @@ print(repair_violation(
     response="She said she would do it before she left.",
     violation_type="manner"
 ))
-# → "Alice confirmed she would complete the project before leaving the office."
 ```
 ---
 ## Performance
-**Violation removal rate: 93.0%** (corrected, post-fix evaluation)
 Per-maxim BLEU scores on the repair validation set (N=401):
@@ -158,7 +140,7 @@ Per-maxim BLEU scores on the repair validation set (N=401):
 | Quality | **97.8%** | Near-perfect factual correction |
 | Manner | **92.5%** | Strong clarity improvements |
 | Quantity | 61.8% | Harder — requires insertions/deletions |
-| Relation | N/A | Route to FAISS retrieval — do not use T5 for this |
 **Degeneracy fix (before vs. after violation-type-aware decoding):**
@@ -168,8 +150,6 @@ Per-maxim BLEU scores on the repair validation set (N=401):
 | Manner | 93.3% degenerate | 4.5% | **−88.8pp** |
 | Overall | 64.4% degenerate | 5.2% | **−59.2pp** |
-> **Key lesson:** Beam search produces mode-collapsed outputs for Manner repairs (model inserts `!` as a proxy for "clarity"). Nucleus sampling eliminates this.
 ---
 ## Architecture & Training
@@ -207,7 +187,7 @@ Relation violations mean the *entire response* is off-topic — there is nothing
 - **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
 - **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
-- **Mode Collapse:** Avoid using beam search for "Manner" repairs, as it can lead to repetitive punctuation or symbols.
 ---

 - seq2seq
 - nlp
 datasets:
+- topical-chat
 metrics:
 - bleu
 pipeline_tag: text2text-generation
       name: Gricean Maxim Violation Repair
     dataset:
       name: Topical-Chat (GriceBench repair validation split, N=401)
+      type: topical-chat
       split: validation
     metrics:
     - type: bleu
 | **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
 | **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
 | **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
+| **Relation** | NOT this model — use FAISS retrieval | Entire response is off-topic; editing cannot fix it |
 **Violation removal rate: 93.0%** (post-fix evaluation, N=200)
 model.eval()
 def repair_violation(context: str, response: str, violation_type: str) -> str:
     assert violation_type in ["quantity", "quality", "manner"], \
         "Relation violations must use the FAISS retrieval system — not this model."
     with torch.no_grad():
         if violation_type == "manner":
             output_ids = model.generate(
                 **inputs,
                 do_sample=True, temperature=0.85, top_p=0.92,
                 repetition_penalty=1.5, no_repeat_ngram_size=3,
             )
         else:
             output_ids = model.generate(
                 **inputs,
                 num_beams=4, max_length=128, min_length=8,
     return tokenizer.decode(output_ids[0], skip_special_tokens=True)
 # Quantity (too short)
 print(repair_violation(
     context="What do you think about commercial space travel?",
     response="It's fine.",
     violation_type="quantity"
 ))
 # Manner (ambiguous pronouns)
 print(repair_violation(
     response="She said she would do it before she left.",
     violation_type="manner"
 ))
 ```
 ---
 ## Performance
+**Violation removal rate: 93.0%** (post-fix evaluation)
 Per-maxim BLEU scores on the repair validation set (N=401):
 | Quality | **97.8%** | Near-perfect factual correction |
 | Manner | **92.5%** | Strong clarity improvements |
 | Quantity | 61.8% | Harder — requires insertions/deletions |
+| Relation | N/A | Route to FAISS retrieval |
 **Degeneracy fix (before vs. after violation-type-aware decoding):**
 | Manner | 93.3% degenerate | 4.5% | **−88.8pp** |
 | Overall | 64.4% degenerate | 5.2% | **−59.2pp** |
 ---
 ## Architecture & Training
 - **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
 - **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
+- **Mode Collapse:** Avoid using beam search for "Manner" repairs.
 ---