arjunverma2004

Update README.md

e7dc8c2 verified 26 days ago

6.01 kB

	---
	base_model:
	- LiquidAI/LFM2.5-1.2B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- peft
	- qlora
	- grammar-correction
	- adapter
	- adapters
	---

	license: apache-2.0
	language:
	- en
	datasets:
	- jhu-clsp/jfleg
	---

	# Model Card for LiquidAI Grammarly (LoRA)

	## Model Details

	### Model Description
	This repository contains LoRA adapter weights fine-tuned for English grammar correction.
	The adapters are trained on top of the LiquidAI/LFM2.5-1.2B-Instruct base model using QLoRA.

	The model is designed to:
	- Correct grammatical errors
	- Preserve the original meaning
	- Minimize unnecessary rewrites

	This repository does not contain the base model weights, only the LoRA adapters.

	---
	## ⚠️ About Hugging Face Auto-Generated Code Snippets

	Hugging Face may display examples such as:

	```python
	pipeline("text-generation", model="arjunverma2004/LiquidAI-grammarly-lora")
	```
	or
	```python
	AutoModel.from_pretrained("arjunverma2004/LiquidAI-grammarly-lora")
	```

	These examples are automatically generated by the Hub and will not work for this repository.
	Below I have provided the correct code
	### Developed by
	Independent contributor

	### Funded by
	Not applicable

	### Shared by
	Community contribution

	### Model type
	Causal Language Model (LoRA adapters)

	### Language(s)
	English

	### License
	Apache 2.0 (inherits base model license)

	### Finetuned from model
	LiquidAI/LFM2.5-1.2B-Instruct

	---

	## Model Sources

	- Repository: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
	- Paper: Not available
	- Demo: Not available

	---

	## Uses

	### Direct Use
	- English grammar correction
	- Proofreading short and medium-length texts
	- Educational and language-learning tools

	### Downstream Use
	- Writing assistants
	- Grammar checking pipelines
	- Preprocessing text for downstream NLP tasks

	### Out-of-Scope Use
	- Content generation beyond grammar correction
	- Legal, medical, or professional advice
	- Multilingual grammar correction

	---

	## Bias, Risks, and Limitations

	- The model may reflect biases present in the training data.
	- It may over-correct stylistic choices in creative writing.
	- It is optimized for grammatical correctness, not factual accuracy.
	- Performance may degrade on very long or highly technical texts.

	---

	## Recommendations

	Users should:
	- Review corrections before final use
	- Avoid relying on the model for high-stakes or sensitive applications
	- Combine with human review for best results

	---

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model_name = "LiquidAI/LFM2.5-1.2B-Instruct"
	adapter_name = "arjunverma2004/LiquidAI-grammarly-lora"

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	device_map="auto",
	trust_remote_code=True,
	)

	# Attach LoRA adapters
	model = PeftModel.from_pretrained(
	base_model,
	adapter_name,
	)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(
	base_model_name,
	trust_remote_code=True,
	)
	tokenizer.pad_token = tokenizer.eos_token

	```
	```python
	from transformers import pipeline

	# Use our predefined prompt template
	sentence = """Write this sentence correctly: Here was no promise of morning except that we looked up through the trees we saw how low the forest had swung .
	"""
	dic1 = [{'role': "user", 'content': sentence}]

	prompt = tokenizer.apply_chat_template(dic1, tokenize=False, add_generation_prompt=True)
	# Run our instruction-tuned model
	pipe = pipeline(task="text-generation", model=merged_model, tokenizer=tokenizer)
	print(pipe(prompt)[0]["generated_text"])
	```
	## Training Details

	### Training Data

	- JFLEG (JHU Fluency-Extended GUG Corpus)
	- Dataset focused on grammatical error correction with multiple human references

	### Training Procedure

	#### Preprocessing

	- Inputs formatted using the base model’s chat template
	- Each example consists of an erroneous sentence and a corrected version

	#### Training Hyperparameters

	- Training regime: Supervised Fine-Tuning (SFT)
	- Method: QLoRA
	- Precision: 4-bit (NF4)
	- Max sequence length: 512 tokens
	- Optimizer: AdamW (via TRL)
	- PEFT: LoRA

	### Speeds, Sizes, Times

	- Training performed on a single GPU
	- Lightweight adapter-only training

	---

	## Evaluation

	### Testing Data

	- Held-out samples from JFLEG
	- Custom manually written grammatical error examples

	### Factors

	- Error type (tense, agreement, articles, prepositions)
	- Sentence length
	- Error density

	### Metrics

	- Training loss (cross-entropy)
	- Qualitative human evaluation
	- (Optional) GLEU score

	### Results

	- Rapid loss convergence
	- High-quality grammatical corrections
	- Minimal semantic drift

	---

	## Summary

	### Model Examination

	The model demonstrates strong grammatical correction capabilities while preserving sentence meaning.
	It performs best on common ESL-style grammatical errors.

	---

	## Environmental Impact

	- Hardware Type: NVIDIA GPU (single device)
	- Hours Used: < 5 hours
	- Cloud Provider: Google Colab
	- Compute Region: Not specified
	- Carbon Emitted: Not estimated

	---

	## Technical Specifications

	### Model Architecture and Objective

	- Base architecture: Transformer-based causal language model
	- Objective: Next-token prediction for grammar-corrected text

	### Compute Infrastructure

	- Single-GPU training with quantization

	### Hardware

	- NVIDIA GPU (Google Colab environment)

	### Software

	- Python
	- PyTorch
	- Hugging Face Transformers
	- TRL
	- PEFT
	- bitsandbytes

	---

	## Citation

	### BibTeX

	```bibtex
	@misc{liquidai_grammarly_lora,
	title={LiquidAI Grammarly LoRA},
	author={Anonymous},
	year={2026},
	url={https://huggingface.co/USERNAME/LiquidAI-grammarly-lora}
	}
	```
	### APA

	LiquidAI Grammarly LoRA. (2026). Hugging Face.
	https://huggingface.co/USERNAME/LiquidAI-grammarly-lora

	### Glossary

	LoRA: Low-Rank Adaptation

	QLoRA: Quantized LoRA

	SFT: Supervised Fine-Tuning

	JFLEG: Grammar correction dataset