NepaliBERT — Nepali Hate Content Classification

Fine-tuned NepaliBERT for multi-class hate content classification of Nepali social media text. The model is specifically optimized for Devanagari script Nepali and handles mixed-script inputs through a comprehensive preprocessing pipeline.

Model Description

This model was developed as part of a Bachelor of Computer Engineering final project at Khwopa College of Engineering, Tribhuvan University (February 2026). It classifies Nepali social media comments into four categories targeting different types of offensive content.

Base model: Rajan/NepaliBERT (110M parameters, 12 transformer layers, pre-trained on a large Nepali corpus using masked language modelling)

Task: Multi-class text classification (4 classes)

Languages: Nepali (Devanagari primary), Romanized Nepali, code-mixed

Compared to XLM-RoBERTa Large (our other model): NepaliBERT's Nepali-specific pre-training gives it stronger Devanagari understanding and the best OR (Offensive-Racist) class F1 (0.4833) among all evaluated models. However, it has limited exposure to Romanized Nepali and English, making XLM-RoBERTa more robust on heavily code-mixed inputs.

Labels

ID	Label	Description
0	`NON_OFFENSIVE`	Text containing no offensive content
1	`OTHER_OFFENSIVE`	General offensive content not targeting specific groups
2	`OFFENSIVE_RACIST`	Content targeting individuals/groups based on ethnicity, race, or caste
3	`OFFENSIVE_SEXIST`	Content targeting individuals based on gender

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="UDHOV/nepalibert-nepali-hate-classification"
)

# Devanagari input
classifier("यो राम्रो छ")

# Romanized Nepali (will be preprocessed via transliteration ideally)
classifier("yo ramro cha")

Or manually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("UDHOV/nepalibert-nepali-hate-classification")
model = AutoModelForSequenceClassification.from_pretrained("UDHOV/nepalibert-nepali-hate-classification")

text = "तिमी देखी घृणा लाग्छ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax().item()
print(model.config.id2label[predicted_class])

Preprocessing Pipeline

The model was trained on text processed through a 5-stage pipeline:

Script Detection — Unicode-based confidence scoring to classify input as Devanagari, Romanized Nepali, or English
Script Unification — Romanized Nepali transliterated to Devanagari via ITRANS; English translated to Nepali via Deep Translator API
Emoji Processing — 180+ emojis semantically mapped to Nepali equivalents; unknown emojis preserved; 18-dimensional emoji feature vector extracted
Text Cleaning — URL removal, @mention removal, hashtag handling, whitespace normalization
Feature Extraction — Script metadata, emoji features, and text statistics merged with cleaned text

Note: NepaliBERT's WordPiece tokenizer is optimized for Devanagari. For best results, pre-process Romanized or English inputs through the transliteration/translation pipeline before passing to this model.

Training Data

Source: Niraula et al. (2021) — Offensive Language Detection in Nepali Social Media (ACL Anthology)
Platform: Facebook and YouTube comments
Total samples: 7,625

Split	NO	OO	OR	OS	Total
Train	3,206 (57.7%)	1,759 (31.6%)	376 (6.8%)	214 (3.8%)	5,555
Validation	356 (57.5%)	195 (31.5%)	42 (6.8%)	27 (4.4%)	620
Test	896 (62.1%)	486 (33.7%)	49 (3.4%)	19 (1.3%)	1,450

Class imbalance: NO vs OS imbalance ratio = 14.98×. Addressed via class-weighted cross-entropy loss with weights capped in the range [0.5, 3.0] to prevent extreme gradient updates from the severely under-represented OS class.

Training Configuration

Hyperparameter	Value
Optimizer	AdamW
Learning rate	2e-5 (discriminative LR strategy)
Weight decay	0.01
Warmup steps	10% of total steps
LR schedule	Linear decay
Batch size	16 (grad accum × 2 = effective 32)
Max epochs	5
Early stopping patience	2 epochs
Max sequence length	128 tokens
Dropout (classification head)	0.3
Label smoothing	0.05
Class weight capping	[0.5, 3.0]
Gradient clipping	1.0
Loss	Class-weighted cross-entropy

Training took approximately 3,759 seconds (~62.7 minutes) on a single GPU.

Evaluation Results

Test Set Performance

Class	Precision	Recall	F1-Score	Support
NON_OFFENSIVE	0.7805	0.7701	0.7753	896
OTHER_OFFENSIVE	0.6102	0.5926	0.6013	486
OFFENSIVE_RACIST	0.4085	0.5918	0.4833	49
OFFENSIVE_SEXIST	0.1739	0.2105	0.1905	19
Macro Avg	0.4933	0.5413	0.5126	1,450
Weighted Avg	0.7029	0.6972	0.6994	1,450
Accuracy			0.6972	1,450

Validation Set Performance (Best Checkpoint)

Class	Precision	Recall	F1-Score	Support
NON_OFFENSIVE	0.7961	0.8118	0.8039	356
OTHER_OFFENSIVE	0.6609	0.5897	0.6233	195
OFFENSIVE_RACIST	0.6727	0.8810	0.7629	42
OFFENSIVE_SEXIST	0.8214	0.8519	0.8364	27
Macro Avg	0.7378	0.7836	0.7566	620
Accuracy			0.7484	620

NepaliBERT achieved the highest validation macro F1 (0.7566) among all evaluated models, outperforming even XLM-RoBERTa Large (0.7392 val macro F1). The validation-to-test gap is primarily explained by distributional shift in the OR and OS minority classes, not overfitting (train-val loss gap = 0.066).

Primary metric: Macro F1-score. Accuracy is misleading given class imbalance; macro F1 weights all classes equally, making it the appropriate metric for evaluating minority hate class performance.

Training Dynamics

Training proceeded over approximately 1,000 gradient steps in three phases:

Phase 1 (steps 0–300): Rapid co-descent of train and validation loss (1.50 → 1.00), faster than XLM-RoBERTa due to Nepali-specific pre-training. Validation F1 rises from 0.26 to 0.47.
Phase 2 (steps 300–600): Training loss continues declining (~0.90); validation loss stabilizes around 1.00–1.02. Validation F1 improves to 0.65 as OO and OR class discrimination refines.
Phase 3 (steps 600–1000): Validation F1 peaks near 0.75 at step 700, then settles at 0.72. Post-step-600 divergence between F1 and accuracy reflects a trade-off between majority class accuracy and minority class precision.

The final train-validation loss gap of 0.066 confirms minimal overfitting; poor OS test performance is a data distribution issue rather than model overfitting.

Comparison with Other Models

Approach	Model	Accuracy	Macro F1
Classical ML	Logistic Regression (TF-IDF)	0.7538	0.5701
Classical ML	SVM	0.7552	0.5502
Deep Learning	GRU + Word2Vec	—	0.3307 (test)
Transformer	XLM-RoBERTa Large	0.7034	0.5465
Transformer	NepaliBERT (this model)	0.6972	0.5126

Per-Class F1 Comparison (Test Set)

Model	Macro F1	NO	OO	OR	OS
Logistic Regression	0.5701	0.8225	0.6722	0.5000	0.2857
SVM	0.5502	0.8288	0.6659	0.4660	0.2400
XLM-RoBERTa Large	0.5465	0.7825	0.6306	0.3731	0.4000
NepaliBERT (this model)	0.5126	0.7753	0.6013	0.4833	0.1905

Key finding: NepaliBERT achieves the best OR class F1 (0.4833) among all models, outperforming XLM-RoBERTa Large (0.3731), confirming that Nepali domain pre-training provides a meaningful advantage for ethnicity/caste-related hate content. XLM-RoBERTa Large outperforms NepaliBERT on the OS class (0.4000 vs 0.1905).

Limitations

Romanized Nepali coverage: NepaliBERT's pre-training corpus is predominantly Devanagari, limiting its ability to handle Romanized Nepali without prior transliteration. The OR test set contains 59.2% Romanized script vs 46.1% in training, contributing to the validation-to-test gap.
OS class collapse: With only 19 OS test samples, high length mismatch (train avg 13.1 words vs test avg 19.9 words), and narrow training vocabulary, OS results (F1 = 0.1905) should be interpreted with significant caution.
Optimal checkpoint sensitivity: NepaliBERT shows a more pronounced F1 peak-and-drop than XLM-RoBERTa, making it more sensitive to early stopping checkpoint selection.
Preprocessing dependency: Performance on Romanized or English inputs degrades without prior transliteration/translation through the preprocessing pipeline.
Language scope: Optimized specifically for Nepali. Not evaluated on other South Asian languages.

Intended Use

Automated hate content moderation on Nepali social media platforms, especially where content is primarily in Devanagari script
Research on Nepali-specific NLP and low-resource hate speech detection
Comparative study of language-specific vs multilingual transformer models
Explainable AI integration — this model was evaluated with LIME, SHAP, and Captum-based Integrated Gradients for token-level attribution

Out-of-scope uses: This model should not be used as the sole decision-making system for content removal without human review. OS class predictions carry particularly high uncertainty due to extremely limited test support.

Explainability

The deployment system integrates three complementary XAI methods for token-level explanation of predictions:

LIME — Local surrogate model via word masking perturbations
SHAP — Shapley value attribution (KernelSHAP)
Integrated Gradients (Captum) — Gradient-based attribution along input-to-baseline path

Citation

If you use this model, please cite the original dataset:

@inproceedings{niraula2021offensive,
  title={Offensive Language Detection in Nepali Social Media},
  author={Niraula, Nobal B. and Dulal, Saurav and Koirala, Diwa},
  booktitle={Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
  pages={67--75},
  year={2021}
}

And the base model:

@article{thapa2024nepali,
  title={Development of Pre-trained Transformer-based Models for the Nepali Language},
  author={Thapa, Prashant and Sharma, Prajwal and Kharel, Aman},
  journal={Transactions on Asian and Low-Resource Language Information Processing},
  year={2024}
}

Authors

Uddav Rajbhandari

Department of Computer and Electronics Engineering Khwopa College of Engineering, Tribhuvan University, Nepal (2026)

Downloads last month: 3

Safetensors

Model size

81.9M params

Tensor type

F32

Model tree for UDHOV/nepalibert-nepali-hate-classification

Base model

Rajan/NepaliBERT

Finetuned

(9)

this model