Pushkar27 commited on
Commit
fa4778c
Β·
1 Parent(s): 3fb9bfb

CRITICAL: Remove all escaped underscores from YAML metadata

Browse files
Files changed (1) hide show
  1. README.md +6 -26
README.md CHANGED
@@ -13,7 +13,7 @@ tags:
13
  - seq2seq
14
  - nlp
15
  datasets:
16
- - topical_chat
17
  metrics:
18
  - bleu
19
  pipeline_tag: text2text-generation
@@ -26,7 +26,7 @@ model-index:
26
  name: Gricean Maxim Violation Repair
27
  dataset:
28
  name: Topical-Chat (GriceBench repair validation split, N=401)
29
- type: topical_chat
30
  split: validation
31
  metrics:
32
  - type: bleu
@@ -71,7 +71,7 @@ GriceBench-Repair is a T5-base seq2seq model that rewrites Gricean maxim violati
71
  | **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
72
  | **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
73
  | **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
74
- | **Relation** | NOT this model β€” use FAISS retrieval | Entire response is off-topic; editing can't fix it |
75
 
76
  **Violation removal rate: 93.0%** (post-fix evaluation, N=200)
77
 
@@ -89,17 +89,6 @@ model = T5ForConditionalGeneration.from_pretrained(model_name)
89
  model.eval()
90
 
91
  def repair_violation(context: str, response: str, violation_type: str) -> str:
92
- """
93
- Repair a Gricean maxim violation.
94
-
95
- Args:
96
- context: Conversation history
97
- response: The violating response to fix
98
- violation_type: One of "quantity", "quality", "manner"
99
- (Relation β†’ use FAISS retrieval instead)
100
- Returns:
101
- Rewritten cooperative response string
102
- """
103
  assert violation_type in ["quantity", "quality", "manner"], \
104
  "Relation violations must use the FAISS retrieval system β€” not this model."
105
 
@@ -108,7 +97,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
108
 
109
  with torch.no_grad():
110
  if violation_type == "manner":
111
- # Nucleus sampling β€” beam search degenerates for Manner
112
  output_ids = model.generate(
113
  **inputs,
114
  do_sample=True, temperature=0.85, top_p=0.92,
@@ -116,7 +104,6 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
116
  repetition_penalty=1.5, no_repeat_ngram_size=3,
117
  )
118
  else:
119
- # Beam search for precision
120
  output_ids = model.generate(
121
  **inputs,
122
  num_beams=4, max_length=128, min_length=8,
@@ -125,16 +112,12 @@ def repair_violation(context: str, response: str, violation_type: str) -> str:
125
 
126
  return tokenizer.decode(output_ids[0], skip_special_tokens=True)
127
 
128
- # ── Examples ────────────────────────────────────────────────────────────────
129
-
130
  # Quantity (too short)
131
  print(repair_violation(
132
  context="What do you think about commercial space travel?",
133
  response="It's fine.",
134
  violation_type="quantity"
135
  ))
136
- # β†’ "Commercial space travel has advanced rapidly, with reusable rockets
137
- # making orbital access cheaper, though costs remain high for most."
138
 
139
  # Manner (ambiguous pronouns)
140
  print(repair_violation(
@@ -142,14 +125,13 @@ print(repair_violation(
142
  response="She said she would do it before she left.",
143
  violation_type="manner"
144
  ))
145
- # β†’ "Alice confirmed she would complete the project before leaving the office."
146
  ```
147
 
148
  ---
149
 
150
  ## Performance
151
 
152
- **Violation removal rate: 93.0%** (corrected, post-fix evaluation)
153
 
154
  Per-maxim BLEU scores on the repair validation set (N=401):
155
 
@@ -158,7 +140,7 @@ Per-maxim BLEU scores on the repair validation set (N=401):
158
  | Quality | **97.8%** | Near-perfect factual correction |
159
  | Manner | **92.5%** | Strong clarity improvements |
160
  | Quantity | 61.8% | Harder β€” requires insertions/deletions |
161
- | Relation | N/A | Route to FAISS retrieval β€” do not use T5 for this |
162
 
163
  **Degeneracy fix (before vs. after violation-type-aware decoding):**
164
 
@@ -168,8 +150,6 @@ Per-maxim BLEU scores on the repair validation set (N=401):
168
  | Manner | 93.3% degenerate | 4.5% | **βˆ’88.8pp** |
169
  | Overall | 64.4% degenerate | 5.2% | **βˆ’59.2pp** |
170
 
171
- > **Key lesson:** Beam search produces mode-collapsed outputs for Manner repairs (model inserts `!` as a proxy for "clarity"). Nucleus sampling eliminates this.
172
-
173
  ---
174
 
175
  ## Architecture & Training
@@ -207,7 +187,7 @@ Relation violations mean the *entire response* is off-topic β€” there is nothing
207
 
208
  - **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
209
  - **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
210
- - **Mode Collapse:** Avoid using beam search for "Manner" repairs, as it can lead to repetitive punctuation or symbols.
211
 
212
  ---
213
 
 
13
  - seq2seq
14
  - nlp
15
  datasets:
16
+ - topical-chat
17
  metrics:
18
  - bleu
19
  pipeline_tag: text2text-generation
 
26
  name: Gricean Maxim Violation Repair
27
  dataset:
28
  name: Topical-Chat (GriceBench repair validation split, N=401)
29
+ type: topical-chat
30
  split: validation
31
  metrics:
32
  - type: bleu
 
71
  | **Quantity** | Beam search (n=4) + length constraints | Needs precise length control |
72
  | **Quality** | Beam search (n=4) + repetition penalty | Needs factual precision |
73
  | **Manner** | Nucleus sampling (T=0.85, top-p=0.92) | Needs creative diverse rewrites |
74
+ | **Relation** | NOT this model β€” use FAISS retrieval | Entire response is off-topic; editing cannot fix it |
75
 
76
  **Violation removal rate: 93.0%** (post-fix evaluation, N=200)
77
 
 
89
  model.eval()
90
 
91
  def repair_violation(context: str, response: str, violation_type: str) -> str:
 
 
 
 
 
 
 
 
 
 
 
92
  assert violation_type in ["quantity", "quality", "manner"], \
93
  "Relation violations must use the FAISS retrieval system β€” not this model."
94
 
 
97
 
98
  with torch.no_grad():
99
  if violation_type == "manner":
 
100
  output_ids = model.generate(
101
  **inputs,
102
  do_sample=True, temperature=0.85, top_p=0.92,
 
104
  repetition_penalty=1.5, no_repeat_ngram_size=3,
105
  )
106
  else:
 
107
  output_ids = model.generate(
108
  **inputs,
109
  num_beams=4, max_length=128, min_length=8,
 
112
 
113
  return tokenizer.decode(output_ids[0], skip_special_tokens=True)
114
 
 
 
115
  # Quantity (too short)
116
  print(repair_violation(
117
  context="What do you think about commercial space travel?",
118
  response="It's fine.",
119
  violation_type="quantity"
120
  ))
 
 
121
 
122
  # Manner (ambiguous pronouns)
123
  print(repair_violation(
 
125
  response="She said she would do it before she left.",
126
  violation_type="manner"
127
  ))
 
128
  ```
129
 
130
  ---
131
 
132
  ## Performance
133
 
134
+ **Violation removal rate: 93.0%** (post-fix evaluation)
135
 
136
  Per-maxim BLEU scores on the repair validation set (N=401):
137
 
 
140
  | Quality | **97.8%** | Near-perfect factual correction |
141
  | Manner | **92.5%** | Strong clarity improvements |
142
  | Quantity | 61.8% | Harder β€” requires insertions/deletions |
143
+ | Relation | N/A | Route to FAISS retrieval |
144
 
145
  **Degeneracy fix (before vs. after violation-type-aware decoding):**
146
 
 
150
  | Manner | 93.3% degenerate | 4.5% | **βˆ’88.8pp** |
151
  | Overall | 64.4% degenerate | 5.2% | **βˆ’59.2pp** |
152
 
 
 
153
  ---
154
 
155
  ## Architecture & Training
 
187
 
188
  - **Hallucination Risk:** Like all seq2seq models, T5 can occasionally introduce factual errors during repair. Always use the "Quality" detector after repair to verify.
189
  - **Dependency on Context:** Repair quality is heavily dependent on the provided "Context" being accurate and sufficient.
190
+ - **Mode Collapse:** Avoid using beam search for "Manner" repairs.
191
 
192
  ---
193