Amit5674 commited on
Commit
77b80e2
·
verified ·
1 Parent(s): b2f876b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -127
README.md CHANGED
@@ -120,133 +120,6 @@ print(f"Probabilities: {probabilities}")For detailed inference examples, see the
120
 
121
  ## Citation
122
 
123
- @misc{hebrew_binary_nli_classifier,
124
- title={Hebrew Binary NLI Classifier for Factuality Checking},
125
- author={Your Name},
126
- year={2025},
127
- publisher={Hugging Face}
128
- }---
129
- license: apache-2.0
130
- language:
131
- - he
132
- base_model:
133
- - dicta-il/neodictabert
134
- tags:
135
- - nli
136
- - natural-language-inference
137
- - hebrew
138
- - fact-checking
139
- - contradiction-detection
140
- pipeline_tag: text-classification
141
- library_name: transformers
142
- metrics:
143
- - accuracy
144
- - f1
145
- ---
146
-
147
- # Hebrew Binary NLI Classifier for Factuality Checking
148
-
149
- ## Model Description
150
-
151
- Fine-tuned [dicta-il/neodictabert](https://huggingface.co/dicta-il/neodictabert) for binary Natural Language Inference in Hebrew. Detects whether a summary claim contradicts a source article.
152
-
153
- **Task:** Entailment vs Contradiction Detection
154
- **Language:** Hebrew
155
- **Max Context:** 4,096 tokens
156
-
157
- ## Performance
158
-
159
- - **Accuracy:** 96.78%
160
- - **F1 Score:** 96.20%
161
-
162
- ## Architecture
163
-
164
- - **Base Model:** `dicta-il/neodictabert`
165
- - **Classification Head:** Binary (softmax over 2 classes)
166
- - **Input Format:** `[CLS] source_article [SEP] summary_claim [SEP]`
167
- - **Output:** Probability distribution over [contradiction, entailment]
168
-
169
- ## Training Configuration
170
-
171
- - **Learning Rate:** 2e-5
172
- - **Epochs:** 2
173
- - **Batch Size:** 2 per device (effective: 16 with gradient accumulation)
174
- - **Max Sequence Length:** 4,096 tokens
175
- - **Learning Rate Scheduler:** Linear
176
- - **Warmup Steps:** 500
177
- - **Best Model Selection:** Based on eval_f1
178
-
179
- ## Usage
180
-
181
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
182
- import torch
183
-
184
- model_name = "Amit5674/NLI-hebrew-binary-correctness-metric"
185
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
186
- model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
187
- model.eval()
188
-
189
- # Example usage
190
- article = "ישראל התחילה בהרעשה רגע אחרי הפסקת האש. הממשלה הודיעה על צעדים חדשים..."
191
- summary = "ישראל התחילה להתרגש רגע אחרי הפסקת האש"
192
-
193
- # Tokenize
194
- inputs = tokenizer(
195
- article,
196
- summary,
197
- return_tensors="pt",
198
- padding="max_length",
199
- max_length=4096,
200
- truncation=True
201
- )
202
-
203
- # Predict
204
- with torch.no_grad():
205
- outputs = model(**inputs)
206
- logits = outputs.logits[0]
207
- probs = torch.softmax(logits, dim=-1)
208
- predicted_class_idx = torch.argmax(probs).item()
209
- predicted_class = model.config.id2label[predicted_class_idx]
210
- confidence = probs[predicted_class_idx].item()
211
-
212
- probabilities = {
213
- model.config.id2label[i]: float(probs[i].item())
214
- for i in range(model.config.num_labels)
215
- }
216
-
217
- print(f"Prediction: {predicted_class}")
218
- print(f"Confidence: {confidence:.4f}")
219
- print(f"Probabilities: {probabilities}")For detailed inference examples, see the inference scripts and server API documentation.
220
-
221
- ## Input Format
222
-
223
- - **Premise:** Source article text (full document)
224
- - **Hypothesis:** Summary claim (can be full summary or individual claim)
225
- - **Processing:** Binary classification (entailment vs contradiction)
226
-
227
- ## Output Format
228
-
229
- - **Prediction:** String label (`"entailment"` or `"contradiction"`)
230
- - **Confidence:** Probability of predicted class (0.0 to 1.0)
231
- - **Probabilities:** Dictionary with probabilities for both classes:
232
- - `{"entailment": 0.9678, "contradiction": 0.0322}`
233
-
234
- ## Use Cases
235
-
236
- - **Production Fact-Checking:** Fast yes/no contradiction detection for Hebrew summaries
237
- - **Quality Control:** Automated validation of summary factuality
238
- - **Batch Processing:** Efficient processing of large document-summary pairs
239
- - **Real-Time Validation:** Low-latency factuality checking in summary generation pipelines
240
-
241
- ## Limitations
242
-
243
- - Max sequence length: 4,096 tokens (may truncate very long articles)
244
- - Binary classification: Cannot identify specific error types (use multi-label models for detailed error analysis)
245
- - Context dependency: Performance may vary with article length and complexity
246
- - Hebrew-specific: Optimized for Hebrew text; may not generalize to other languages
247
-
248
- ## Citation
249
-
250
  @misc{hebrew_binary_nli_classifier,
251
  title={Hebrew Binary NLI Classifier for Factuality Checking},
252
  author={Your Name},
 
120
 
121
  ## Citation
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  @misc{hebrew_binary_nli_classifier,
124
  title={Hebrew Binary NLI Classifier for Factuality Checking},
125
  author={Your Name},