sumitaryal
/

Nepali_Grammatical_Error_Detection_MuRIL

Text Classification

text-embeddings-inference

Model card Files Files and versions

sumitaryal commited on Oct 4, 2024

Commit

d5a6ec0

·

verified ·

1 Parent(s): 705bb0a

Create README.md

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+license: apache-2.0
+datasets:
+- sumitaryal/nepali_grammatical_error_detection
+language:
+- ne
+metrics:
+- accuracy
+base_model:
+- google/muril-base-cased
+pipeline_tag: text-classification
+---
+# Model Card for Nepali Grammatical Error Detection (MuRIL)
+This model is designed for **Nepali Grammatical Error Detection (GED)** task. It utilizes the BERT-based MuRIL model to detect grammatical errors in Nepali text.
+## Model Details
+### Model Description
+- **Developed by:** Sumit Aryal
+- **Model type:** BERT (MuRIL-based)
+- **Language(s):** Nepali
+- **License:** Apache 2.0
+- **Finetuned from model:** google/muril-base-cased
+### Dataset
+- **Dataset Name:** [Nepali Grammatical Error Detection Dataset](https://huggingface.co/datasets/sumitaryal/nepali_grammatical_error_detection)
+- **Description:** The dataset comprises a total of **2,568,682** correctly constructed sentences alongside their erroneous counterparts, resulting in **7,514,122** samples for the training dataset. For the validation dataset, it contains **365,606** correct sentences and **405,905** incorrect sentences. This diverse collection encompasses various types of grammatical errors, including verb inflections, homophones, punctuation errors, and sentence structure issues, making it a comprehensive resource for training and evaluating grammatical error detection models.
+### Model Sources
+- **Repository:** [Nepali Grammatical Error Detection MuRIL](https://huggingface.co/sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL)
+- **Paper:** "BERT-Based Nepali Grammatical Error Detection and Correction Leveraging a New Corpus" (INSPECT-2024)
+## Uses
+### Direct Use
+- Grammar checking for written Nepali text.
+## Evaluation Metrics
+- **Accuracy:** 91.1515%
+- **Traning Loss:** 0.242700
+- **Validation Loss:** 0.217756
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+import torch
+from transformers import BertForSequenceClassification, AutoTokenizer
+model = BertForSequenceClassification.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL")
+tokenizer = AutoTokenizer.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL", do_lower_case=False)
+input_sentence = "रामले भात खायो ।"
+inputs = tokenizer(input_sentence, return_tensors="pt")
+with torch.no_grad():
+  logits = model(**inputs).logits
+predicted_class_id = logits.argmax().item()
+predicted_class = model.config.id2label[predicted_class_id]
+print(f'The sentence "{input_sentence}" is "{predicted_class}"')
+```
+## Training Details
+- Framework: PyTorch
+- Hyperparameters:
+  - Epoch = 1
+  - Train Batch Size = 256
+  - Valid Batch Size = 256
+  - Loss Function = Cross Entripy Loss
+  - Optimizer = AdamW
+  - Optimizer Parameters:
+    - Learning Rate = 5e-5
+    - β1 = 0.9
+    - β2 = 0.999
+    - ϵ = 1e−8
+- GPU = NVIDIA® GeForce® RTXTM 4060 GPU, 8GB VRAM