DipeshChaudhary commited on
Commit
1e829c4
·
verified ·
1 Parent(s): a04e01b

Training in progress, step 20000

Browse files
README.md CHANGED
@@ -1,462 +1,95 @@
1
  ---
2
- language: ne
3
- license: apache-2.0
4
- tags:
5
- - nepali
6
- - grammatical-error-detection
7
- - text-classification
8
- - roberta
9
- - sequence-classification
10
- - nlp
11
- datasets:
12
- - sumitaryal/nepali_grammatical_error_detection
13
  base_model: IRIIS-RESEARCH/RoBERTa_Nepali_125M
 
 
14
  metrics:
15
  - accuracy
16
- - f1
17
  - precision
18
  - recall
19
- pipeline_tag: text-classification
20
- widget:
21
- - text: "म विद्यालय जान्छु।"
22
- example_title: "Correct Nepali"
23
- - text: "म विद्यालय जान्छ।"
24
- example_title: "Grammatical Error"
25
- ---
26
-
27
- # RoBERTa Nepali Grammatical Error Detection
28
-
29
- This model is a fine-tuned version of [IRIIS-RESEARCH/RoBERTa_Nepali_125M](https://huggingface.co/IRIIS-RESEARCH/RoBERTa_Nepali_125M) specifically trained for detecting grammatical errors in Nepali text. The model was optimized and trained on NVIDIA H100 GPU with advanced optimization techniques.
30
-
31
- ## Model Description
32
-
33
- - **Model Type:** Binary Text Classification (Sequence Classification)
34
- - **Language:** Nepali (ne)
35
- - **Base Model:** IRIIS-RESEARCH/RoBERTa_Nepali_125M (125M parameters)
36
- - **License:** Apache 2.0
37
- - **Training Infrastructure:** NVIDIA H100 (80GB)
38
- - **Training Time:** ~3.00 hours
39
- - **Fine-tuning Dataset:** [sumitaryal/nepali_grammatical_error_detection](https://huggingface.co/datasets/sumitaryal/nepali_grammatical_error_detection)
40
-
41
- ## Performance Metrics
42
-
43
- Evaluated on validation set of 771,511 samples:
44
-
45
- | Metric | Score |
46
- |--------|-------|
47
- | Accuracy | 0.9234 |
48
- | F1 Score | 0.9156 |
49
- | Precision | 0.9087 |
50
- | Recall | 0.9226 |
51
-
52
- ### Class-wise Performance
53
-
54
- | Class | Precision | Recall | F1-Score |
55
- |-------|-----------|--------|----------|
56
- | Correct | 0.9321 | 0.9145 | 0.9232 |
57
- | Incorrect | 0.8853 | 0.9307 | 0.9074 |
58
-
59
- ## Training Details
60
-
61
- ### Training Data
62
-
63
- - **Training Samples:** 10,082,804
64
- - **Validation Samples:** 771,511
65
- - **Total Dataset Size:** ~10.8M Nepali sentences
66
- - **Label Distribution:** Balanced mix of grammatically correct and incorrect sentences
67
-
68
- ### Training Configuration
69
-
70
- - **GPU:** NVIDIA H100 (80GB VRAM)
71
- - **Precision:** BF16 (Brain Floating Point 16-bit)
72
- - **Batch Size:** 128 per device
73
- - **Gradient Accumulation:** 2 steps (effective batch size: 256)
74
- - **Learning Rate:** 2e-5 with 10% warmup
75
- - **Optimizer:** AdamW (Fused)
76
- - **Weight Decay:** 0.01
77
- - **Epochs:** 3
78
- - **Max Sequence Length:** 256 tokens
79
- - **Parallel Processing:** 26 CPU cores
80
-
81
- ### Optimization Techniques
82
-
83
- - BF16 mixed precision training
84
- - Fused AdamW optimizer for faster updates
85
- - Group-by-length batching to minimize padding
86
- - Pin memory and prefetching for faster data loading
87
- - Multi-process tokenization (26 workers)
88
-
89
- ## Usage
90
-
91
- ### Quick Start
92
-
93
- ```python
94
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
95
- import torch
96
-
97
- # Load model and tokenizer
98
- model_name = "DipeshChaudhary/roberta-nepali-sequence-ged"
99
- tokenizer = AutoTokenizer.from_pretrained(model_name)
100
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
101
-
102
- # Function to check grammar
103
- def check_grammar(sentence):
104
- inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=256)
105
-
106
- with torch.no_grad():
107
- outputs = model(**inputs)
108
- probs = torch.softmax(outputs.logits, dim=-1)
109
-
110
- pred_class = probs.argmax().item()
111
- confidence = probs[0][pred_class].item()
112
-
113
- return {
114
- "label": "correct" if pred_class == 0 else "incorrect",
115
- "confidence": confidence,
116
- "probabilities": {
117
- "correct": probs[0][0].item(),
118
- "incorrect": probs[0][1].item()
119
- }
120
- }
121
-
122
- # Example usage
123
- result = check_grammar("म विद्यालय जान्छु।")
124
- print(result)
125
- # Output: {'label': 'correct', 'confidence': 0.9876, 'probabilities': {'correct': 0.9876, 'incorrect': 0.0124}}
126
-
127
- result = check_grammar("म विद्यालय जान्छ।")
128
- print(result)
129
- # Output: {'label': 'incorrect', 'confidence': 0.9543, 'probabilities': {'correct': 0.0457, 'incorrect': 0.9543}}
130
- ```
131
-
132
- ### Batch Processing
133
-
134
- ```python
135
- def check_grammar_batch(sentences):
136
- inputs = tokenizer(sentences, return_tensors="pt", truncation=True,
137
- max_length=256, padding=True)
138
-
139
- with torch.no_grad():
140
- outputs = model(**inputs)
141
- probs = torch.softmax(outputs.logits, dim=-1)
142
-
143
- results = []
144
- for i, sentence in enumerate(sentences):
145
- pred_class = probs[i].argmax().item()
146
- results.append({
147
- "sentence": sentence,
148
- "label": "correct" if pred_class == 0 else "incorrect",
149
- "confidence": probs[i][pred_class].item()
150
- })
151
-
152
- return results
153
-
154
- # Process multiple sentences
155
- sentences = [
156
- "तिमी कस्तो छौ?",
157
- "नेपाल सुन्दर देश हो।",
158
- "उनीहरू काम गर्दछन्।"
159
- ]
160
-
161
- results = check_grammar_batch(sentences)
162
- for result in results:
163
- print(f"{result['sentence']} → {result['label']} ({result['confidence']:.4f})")
164
- ```
165
-
166
- ### Using Pipeline API
167
-
168
- ```python
169
- from transformers import pipeline
170
-
171
- # Create classifier pipeline
172
- classifier = pipeline(
173
- "text-classification",
174
- model="DipeshChaudhary/roberta-nepali-sequence-ged",
175
- device=0 # Use GPU if available
176
- )
177
-
178
- # Check grammar
179
- result = classifier("म विद्यालय जान्छु।")
180
- print(result)
181
- # Output: [{'label': 'correct', 'score': 0.9876}]
182
- ```
183
-
184
- ## Use Cases
185
-
186
- ### 1. Writing Assistant for Nepali
187
-
188
- ```python
189
- def writing_assistant(text):
190
- # Check and highlight grammatical errors in Nepali text
191
- sentences = text.split('।') # Split by Nepali sentence delimiter
192
- sentences = [s.strip() + '।' for s in sentences if s.strip()]
193
-
194
- results = check_grammar_batch(sentences)
195
-
196
- print("Grammar Check Results:")
197
- print("=" * 60)
198
- for i, result in enumerate(results, 1):
199
- status = "✓" if result['label'] == 'correct' else "✗"
200
- print(f"{status} Sentence {i}: {result['sentence']}")
201
- if result['label'] == 'incorrect':
202
- print(f" └─ Potential grammar error (confidence: {result['confidence']:.2%})")
203
-
204
- error_count = sum(1 for r in results if r['label'] == 'incorrect')
205
- print(f"\nSummary: {error_count}/{len(results)} sentences may contain errors")
206
-
207
- return results
208
-
209
- # Example
210
- text = "म विद्यालय जान्छु। तिमी कस्तो छौ? उनीहरू काम गर्दछन्।"
211
- writing_assistant(text)
212
- ```
213
-
214
- ### 2. Educational Application
215
-
216
- ```python
217
- def nepali_grammar_quiz(student_answer, correct_answer):
218
- result = check_grammar(student_answer)
219
-
220
- if result['label'] == 'correct':
221
- print(f"✓ Excellent! Your sentence is grammatically correct.")
222
- print(f" Confidence: {result['confidence']:.2%}")
223
- else:
224
- print(f"✗ There might be a grammatical error.")
225
- print(f" Confidence: {result['confidence']:.2%}")
226
- print(f" Hint: Compare with correct form: {correct_answer}")
227
-
228
- return result
229
-
230
- # Example quiz question
231
- nepali_grammar_quiz(
232
- student_answer="म स्कूल जान्छ।",
233
- correct_answer="म स्कूल जान्छु।"
234
- )
235
- ```
236
-
237
- ### 3. Content Quality Control
238
-
239
- ```python
240
- def validate_nepali_content(content, threshold=0.85):
241
- """Validate grammar quality of Nepali content"""
242
- sentences = content.split('।')
243
- sentences = [s.strip() + '।' for s in sentences if s.strip()]
244
-
245
- results = check_grammar_batch(sentences)
246
-
247
- # Calculate quality score
248
- correct_count = sum(1 for r in results if r['label'] == 'correct')
249
- quality_score = correct_count / len(results)
250
-
251
- return {
252
- "passed": quality_score >= threshold,
253
- "quality_score": quality_score,
254
- "total_sentences": len(results),
255
- "correct_sentences": correct_count,
256
- "error_sentences": len(results) - correct_count,
257
- "details": results
258
- }
259
-
260
- # Example
261
- content = "नेपाल सुन्दर देश हो। यहाँ धेरै हिमाल छन्।"
262
- validation = validate_nepali_content(content)
263
- print(f"Quality Score: {validation['quality_score']:.2%}")
264
- print(f"Status: {'PASSED' if validation['passed'] else 'NEEDS REVIEW'}")
265
- ```
266
-
267
- ### 4. Real-time Text Editor Integration
268
-
269
- ```python
270
- class NepaliGrammarChecker:
271
- def __init__(self, model_name="DipeshChaudhary/roberta-nepali-sequence-ged"):
272
- self.tokenizer = AutoTokenizer.from_pretrained(model_name)
273
- self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
274
- self.model.eval()
275
-
276
- def check_realtime(self, text, return_positions=True):
277
- """Check grammar with error positions for highlighting"""
278
- sentences = text.split('।')
279
- sentences = [s.strip() for s in sentences if s.strip()]
280
-
281
- errors = []
282
- position = 0
283
-
284
- for sentence in sentences:
285
- result = check_grammar(sentence + '।')
286
-
287
- if result['label'] == 'incorrect':
288
- errors.append({
289
- "sentence": sentence,
290
- "start": position,
291
- "end": position + len(sentence),
292
- "confidence": result['confidence']
293
- })
294
-
295
- position += len(sentence) + 1 # +1 for '।'
296
-
297
- return errors
298
-
299
- # Example: Integrate with text editor
300
- checker = NepaliGrammarChecker()
301
- text = "म स्कूल जान्छ। तिमी कस्तो छौ?"
302
- errors = checker.check_realtime(text)
303
- print(f"Found {len(errors)} potential errors")
304
- ```
305
-
306
- ## Model Architecture
307
-
308
- ```
309
- RoBERTa Base Architecture
310
- ├── Embedding Layer (50,256 vocab size)
311
- ├── 12 Transformer Layers
312
- │ ├── Multi-Head Self-Attention (12 heads)
313
- │ ├── Feed-Forward Network (3072 hidden)
314
- │ └── Layer Normalization
315
- └── Classification Head
316
- ├── Dense Layer (768 → 768)
317
- ├── Dropout (0.1)
318
- └── Output Layer (768 → 2)
319
-
320
- Total Parameters: ~125M
321
- ```
322
-
323
- ## Intended Use
324
-
325
- ### Primary Applications
326
- - **Writing Assistance:** Help writers identify grammatical errors in Nepali text
327
- - **Educational Tools:** Assist students learning Nepali grammar
328
- - **Content Quality Control:** Validate grammar in published content
329
- - **Language Learning Apps:** Provide instant feedback on grammar usage
330
- - **Translation Post-Editing:** Verify grammar correctness in translated text
331
-
332
- ### Target Users
333
- - Nepali language learners
334
- - Content creators and writers
335
- - Educators and students
336
- - Publishing platforms
337
- - NLP researchers working on Nepali language
338
-
339
- ## Limitations and Considerations
340
-
341
- ### Known Limitations
342
-
343
- 1. **Dialectal Variations:** The model is trained primarily on standard Nepali and may not perform optimally on regional dialects
344
- 2. **Informal Language:** Performance may vary with colloquial or informal Nepali
345
- 3. **Context Dependency:** Some grammatical errors require broader context beyond single sentences
346
- 4. **Punctuation Sensitivity:** The model considers punctuation as part of grammar checking
347
- 5. **Domain Specificity:** May not capture domain-specific grammar rules (legal, medical, etc.)
348
-
349
- ### Important Considerations
350
-
351
- - **False Positives:** The model may occasionally flag correct sentences as incorrect
352
- - **False Negatives:** Some grammatical errors might not be detected
353
- - **Not a Grammar Corrector:** This model only detects errors; it does not suggest corrections
354
- - **Sentence-Level Only:** Designed for sentence-level classification, not word-level error detection
355
- - **Static Training Data:** Based on data available up to the training cutoff date
356
-
357
- ### Best Practices
358
-
359
- - Use as an assistive tool, not as the sole authority on grammar
360
- - Combine with human review for critical content
361
- - Consider the confidence scores when making decisions
362
- - Test on your specific domain/use case before deployment
363
- - Provide user feedback mechanisms to improve over time
364
-
365
- ## Technical Specifications
366
-
367
- ### Input/Output Format
368
-
369
- - **Input:** Single Nepali sentence (max 256 tokens)
370
- - **Output:** Binary classification (correct/incorrect) with confidence scores
371
- - **Processing:** Tokenization using RoBERTa tokenizer with BPE
372
-
373
- ### Performance Benchmarks
374
-
375
- On NVIDIA H100:
376
- - **Inference Speed:** ~500 sentences/second (batch size 32)
377
- - **Latency:** <5ms per sentence (single inference)
378
- - **Memory:** ~2GB GPU memory (FP16 inference)
379
-
380
- ### Deployment Recommendations
381
-
382
- - **CPU:** 4+ cores recommended for production
383
- - **GPU:** Any CUDA-capable GPU (T4, V100, A100, H100)
384
- - **Memory:** 4GB+ RAM, 2GB+ VRAM
385
- - **Precision:** FP16 or BF16 for optimal speed/memory tradeoff
386
-
387
- ## Training Infrastructure
388
-
389
- - **GPU:** NVIDIA H100 (80GB HBM3)
390
- - **CPU:** 26 cores
391
- - **RAM:** 200GB+
392
- - **Training Duration:** 3.00 hours
393
- - **Cost:** ~$8.97
394
-
395
- ## Ethical Considerations
396
-
397
- ### Bias and Fairness
398
- - The model reflects patterns in the training data, which may contain biases
399
- - Performance may vary across different writing styles, registers, and demographics
400
- - Users should be aware that "grammatically incorrect" is context-dependent
401
-
402
- ### Privacy
403
- - The model processes text locally and doesn't store user inputs
404
- - For production deployments, implement appropriate data handling policies
405
-
406
- ### Accessibility
407
- - This tool should support, not replace, language learning and education
408
- - Should not be used to discriminate against non-native speakers or learners
409
-
410
- ## Citation
411
-
412
- If you use this model in your research or application, please cite:
413
-
414
- ```bibtex
415
- @misc{roberta-nepali-ged-2024,
416
- author = {Dipesh Chaudhary},
417
- title = {RoBERTa Nepali Grammatical Error Detection},
418
- year = {2024},
419
- publisher = {Hugging Face},
420
- howpublished = {\url{https://huggingface.co/DipeshChaudhary/roberta-nepali-sequence-ged}}
421
- }
422
- ```
423
-
424
- Also cite the base model:
425
-
426
- ```bibtex
427
- @misc{roberta-nepali-125m,
428
- author = {IRIIS Research},
429
- title = {RoBERTa Nepali 125M},
430
- year = {2024},
431
- publisher = {Hugging Face},
432
- howpublished = {\url{https://huggingface.co/IRIIS-RESEARCH/RoBERTa_Nepali_125M}}
433
- }
434
- ```
435
-
436
- ## References
437
-
438
- 1. **Base Model:** [IRIIS-RESEARCH/RoBERTa_Nepali_125M](https://huggingface.co/IRIIS-RESEARCH/RoBERTa_Nepali_125M)
439
- 2. **Dataset:** [sumitaryal/nepali_grammatical_error_detection](https://huggingface.co/datasets/sumitaryal/nepali_grammatical_error_detection)
440
- 3. **RoBERTa Paper:** [Liu et al., 2019 - RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)
441
- 4. **Transformers Library:** [Hugging Face Transformers](https://github.com/huggingface/transformers)
442
-
443
- ## Contact and Support
444
-
445
- - **Model Repository:** [https://huggingface.co/DipeshChaudhary/roberta-nepali-sequence-ged](https://huggingface.co/DipeshChaudhary/roberta-nepali-sequence-ged)
446
- - **Issues:** Please report issues on the model repository
447
- - **Updates:** Follow the repository for model updates and improvements
448
-
449
- ## License
450
-
451
- This model is released under the Apache 2.0 License. See LICENSE for details.
452
-
453
- ## Acknowledgments
454
-
455
- - IRIIS Research for the pre-trained RoBERTa Nepali model
456
- - Sumit Aryal for the grammatical error detection dataset
457
- - Hugging Face for the Transformers library and model hosting
458
- - The Nepali NLP community for continued support and feedback
459
-
460
  ---
461
 
462
- *Last Updated: November 08, 2025*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
 
 
 
 
 
 
 
 
 
 
3
  base_model: IRIIS-RESEARCH/RoBERTa_Nepali_125M
4
+ tags:
5
+ - generated_from_trainer
6
  metrics:
7
  - accuracy
 
8
  - precision
9
  - recall
10
+ - f1
11
+ model-index:
12
+ - name: roberta-nepali-sequence-ged
13
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
+
19
+ # roberta-nepali-sequence-ged
20
+
21
+ This model is a fine-tuned version of [IRIIS-RESEARCH/RoBERTa_Nepali_125M](https://huggingface.co/IRIIS-RESEARCH/RoBERTa_Nepali_125M) on an unknown dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 0.1973
24
+ - Model Preparation Time: 0.002
25
+ - Accuracy: 0.9231
26
+ - Precision: 0.9222
27
+ - Recall: 0.9326
28
+ - F1: 0.9274
29
+ - Precision Correct: 0.9242
30
+ - Recall Correct: 0.9127
31
+ - F1 Correct: 0.9184
32
+ - Precision Incorrect: 0.9222
33
+ - Recall Incorrect: 0.9326
34
+ - F1 Incorrect: 0.9274
35
+
36
+ ## Model description
37
+
38
+ More information needed
39
+
40
+ ## Intended uses & limitations
41
+
42
+ More information needed
43
+
44
+ ## Training and evaluation data
45
+
46
+ More information needed
47
+
48
+ ## Training procedure
49
+
50
+ ### Training hyperparameters
51
+
52
+ The following hyperparameters were used during training:
53
+ - learning_rate: 2e-05
54
+ - train_batch_size: 512
55
+ - eval_batch_size: 1024
56
+ - seed: 42
57
+ - gradient_accumulation_steps: 2
58
+ - total_train_batch_size: 1024
59
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
60
+ - lr_scheduler_type: linear
61
+ - lr_scheduler_warmup_steps: 1000
62
+ - num_epochs: 2
63
+ - mixed_precision_training: Native AMP
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Accuracy | Precision | Recall | F1 | Precision Correct | Recall Correct | F1 Correct | Precision Incorrect | Recall Incorrect | F1 Incorrect |
68
+ |:-------------:|:------:|:-----:|:---------------:|:----------------------:|:--------:|:---------:|:------:|:------:|:-----------------:|:--------------:|:----------:|:-------------------:|:----------------:|:------------:|
69
+ | 0.2734 | 0.1016 | 1000 | 0.2748 | 0.002 | 0.8894 | 0.8951 | 0.8946 | 0.8949 | 0.8831 | 0.8836 | 0.8833 | 0.8951 | 0.8946 | 0.8949 |
70
+ | 0.2302 | 0.2031 | 2000 | 0.2455 | 0.002 | 0.9026 | 0.9049 | 0.9106 | 0.9078 | 0.9001 | 0.8937 | 0.8969 | 0.9049 | 0.9106 | 0.9078 |
71
+ | 0.2169 | 0.3047 | 3000 | 0.2462 | 0.002 | 0.9016 | 0.8918 | 0.9252 | 0.9082 | 0.9134 | 0.8753 | 0.8939 | 0.8918 | 0.9252 | 0.9082 |
72
+ | 0.2101 | 0.4062 | 4000 | 0.2315 | 0.002 | 0.9086 | 0.9047 | 0.9236 | 0.9140 | 0.9131 | 0.8920 | 0.9024 | 0.9047 | 0.9236 | 0.9140 |
73
+ | 0.2052 | 0.5078 | 5000 | 0.2234 | 0.002 | 0.9124 | 0.9131 | 0.9212 | 0.9171 | 0.9117 | 0.9026 | 0.9071 | 0.9131 | 0.9212 | 0.9171 |
74
+ | 0.2003 | 0.6094 | 6000 | 0.2248 | 0.002 | 0.9100 | 0.9024 | 0.9294 | 0.9157 | 0.9189 | 0.8885 | 0.9034 | 0.9024 | 0.9294 | 0.9157 |
75
+ | 0.1987 | 0.7109 | 7000 | 0.2187 | 0.002 | 0.9131 | 0.9074 | 0.9298 | 0.9184 | 0.9199 | 0.8946 | 0.9071 | 0.9074 | 0.9298 | 0.9184 |
76
+ | 0.1965 | 0.8125 | 8000 | 0.2105 | 0.002 | 0.9180 | 0.9189 | 0.9260 | 0.9224 | 0.9171 | 0.9092 | 0.9131 | 0.9189 | 0.9260 | 0.9224 |
77
+ | 0.1939 | 0.9140 | 9000 | 0.2129 | 0.002 | 0.9166 | 0.9126 | 0.9306 | 0.9215 | 0.9212 | 0.9010 | 0.9110 | 0.9126 | 0.9306 | 0.9215 |
78
+ | 0.1896 | 1.0155 | 10000 | 0.2055 | 0.002 | 0.9198 | 0.9206 | 0.9277 | 0.9241 | 0.9190 | 0.9111 | 0.9150 | 0.9206 | 0.9277 | 0.9241 |
79
+ | 0.1796 | 1.1171 | 11000 | 0.2065 | 0.002 | 0.9188 | 0.9169 | 0.9301 | 0.9234 | 0.9211 | 0.9064 | 0.9137 | 0.9169 | 0.9301 | 0.9234 |
80
+ | 0.1788 | 1.2187 | 12000 | 0.2058 | 0.002 | 0.9192 | 0.9164 | 0.9314 | 0.9238 | 0.9224 | 0.9056 | 0.9139 | 0.9164 | 0.9314 | 0.9238 |
81
+ | 0.1787 | 1.3202 | 13000 | 0.2018 | 0.002 | 0.9212 | 0.9204 | 0.9307 | 0.9255 | 0.9221 | 0.9106 | 0.9163 | 0.9204 | 0.9307 | 0.9255 |
82
+ | 0.1774 | 1.4218 | 14000 | 0.2038 | 0.002 | 0.9206 | 0.9177 | 0.9328 | 0.9252 | 0.9240 | 0.9072 | 0.9155 | 0.9177 | 0.9328 | 0.9252 |
83
+ | 0.1767 | 1.5233 | 15000 | 0.1940 | 0.002 | 0.9251 | 0.9309 | 0.9263 | 0.9286 | 0.9186 | 0.9237 | 0.9211 | 0.9309 | 0.9263 | 0.9286 |
84
+ | 0.1785 | 1.6249 | 16000 | 0.1943 | 0.002 | 0.9245 | 0.9283 | 0.9282 | 0.9283 | 0.9203 | 0.9204 | 0.9204 | 0.9283 | 0.9282 | 0.9283 |
85
+ | 0.1761 | 1.7265 | 17000 | 0.1957 | 0.002 | 0.9237 | 0.9253 | 0.9301 | 0.9277 | 0.9220 | 0.9166 | 0.9193 | 0.9253 | 0.9301 | 0.9277 |
86
+ | 0.176 | 1.8280 | 18000 | 0.1960 | 0.002 | 0.9240 | 0.9253 | 0.9307 | 0.9280 | 0.9225 | 0.9165 | 0.9195 | 0.9253 | 0.9307 | 0.9280 |
87
+ | 0.1761 | 1.9296 | 19000 | 0.1973 | 0.002 | 0.9231 | 0.9222 | 0.9326 | 0.9274 | 0.9242 | 0.9127 | 0.9184 | 0.9222 | 0.9326 | 0.9274 |
88
+
89
+
90
+ ### Framework versions
91
+
92
+ - Transformers 4.57.1
93
+ - Pytorch 2.8.0+cu128
94
+ - Datasets 4.4.1
95
+ - Tokenizers 0.22.1
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8093edfb19ed3b19d6c9432977e8120fc673a07c82522e76d4cc51ab6046c5f
3
  size 498585176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3135c543cbee8ced633a4aa5ab8f697ad254757b6ac93a80f3aa0b9a530922af
3
  size 498585176
runs/Nov08_16-52-16_192-222-52-54/events.out.tfevents.1762620831.192-222-52-54.6829.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c432bccba24462a0d5a9f13c2cd8a3b9a4b4370e33d50fb15ebdffecb0b1ad1
3
+ size 5618
runs/Nov08_16-59-39_192-222-52-54/events.out.tfevents.1762621188.192-222-52-54.6829.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a3879cd2b4ded555a76748cd933c8a66c8c5c3d3c150b13f0e77936c00a5917
3
+ size 5273
runs/Nov08_17-02-08_192-222-52-54/events.out.tfevents.1762621331.192-222-52-54.6829.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e2cd49ac314676e507d681e49e927458a9aa67caf349e347e9590112fa56e91
3
+ size 5272
runs/Nov08_17-03-01_192-222-52-54/events.out.tfevents.1762621515.192-222-52-54.6829.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd667e2f3669a6129dfa79bf11ca98127151e593830897a5047f1b29f37d2444
3
+ size 5632
runs/Nov08_17-06-03_192-222-52-54/events.out.tfevents.1762621569.192-222-52-54.6829.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0866006b75113baba648956bb93673b01e9e03642a092eb8275ee6579bc74fdb
3
+ size 5487
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f459c9b427a544ee2bf4d9385b392338c599e2c2cb6e5b7dd8714d246e41e30a
3
  size 5841
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95b36cd54d1ba2203f0fa90fc777ed7cf1bb94f9720246f2b7bbed2dd99b6ba6
3
  size 5841