sumitaryal
/

Nepali_Grammatical_Error_Detection_MuRIL

Text Classification

text-embeddings-inference

Model card Files Files and versions

Nepali_Grammatical_Error_Detection_MuRIL / README.md

sumitaryal's picture

Update README.md

aa66957 verified over 1 year ago

|

history blame contribute delete

3.09 kB

	---
	license: apache-2.0
	datasets:
	- sumitaryal/nepali_grammatical_error_detection
	language:
	- ne
	metrics:
	- accuracy
	base_model:
	- google/muril-base-cased
	pipeline_tag: text-classification
	widget:
	- src: रामले भात खायो ।
	example_title: Sample 1
	new_version: sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL
	library_name: transformers
	---

	# Model Card for Nepali Grammatical Error Detection (MuRIL)

	This model is designed for Nepali Grammatical Error Detection (GED) task. It utilizes the BERT-based MuRIL model to detect grammatical errors in Nepali text.

	## Model Details

	### Model Description

	- Developed by: Sumit Aryal
	- Model type: BERT (MuRIL-based)
	- Language(s): Nepali
	- License: Apache 2.0
	- Finetuned from model: google/muril-base-cased

	### Dataset

	- Dataset Name: [Nepali Grammatical Error Detection Dataset](https://huggingface.co/datasets/sumitaryal/nepali_grammatical_error_detection)
	- Description: The dataset comprises a total of 2,568,682 correctly constructed sentences alongside their erroneous counterparts, resulting in 7,514,122 samples for the training dataset. For the validation dataset, it contains 365,606 correct sentences and 405,905 incorrect sentences. This diverse collection encompasses various types of grammatical errors, including verb inflections, homophones, punctuation errors, and sentence structure issues, making it a comprehensive resource for training and evaluating grammatical error detection models.

	### Model Sources

	- Repository: [Nepali Grammatical Error Detection MuRIL](https://huggingface.co/sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL)
	- Paper: "BERT-Based Nepali Grammatical Error Detection and Correction Leveraging a New Corpus" (INSPECT-2024)

	## Uses

	### Direct Use

	- Grammar checking for written Nepali text.

	## Evaluation Metrics
	- Accuracy: 91.1515%
	- Traning Loss: 0.242700
	- Validation Loss: 0.217756

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import torch
	from transformers import BertForSequenceClassification, AutoTokenizer

	model = BertForSequenceClassification.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL")
	tokenizer = AutoTokenizer.from_pretrained("sumitaryal/Nepali_Grammatical_Error_Detection_MuRIL", do_lower_case=False)

	input_sentence = "रामले भात खायो ।"
	inputs = tokenizer(input_sentence, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits

	predicted_class_id = logits.argmax().item()
	predicted_class = model.config.id2label[predicted_class_id]
	print(f'The sentence "{input_sentence}" is "{predicted_class}"')
	```

	## Training Details
	- Framework: PyTorch
	- Hyperparameters:
	- Epoch = 1
	- Train Batch Size = 256
	- Valid Batch Size = 256
	- Loss Function = Cross Entripy Loss
	- Optimizer = AdamW
	- Optimizer Parameters:
	- Learning Rate = 5e-5
	- β1 = 0.9
	- β2 = 0.999
	- ϵ = 1e−8
	- GPU = NVIDIA® GeForce® RTXTM 4060 GPU, 8GB VRAM