NepaliBERT — Nepali Hate Content Classification

Fine-tuned NepaliBERT for multi-class hate content classification of Nepali social media text. The model is specifically optimized for Devanagari script Nepali and handles mixed-script inputs through a comprehensive preprocessing pipeline.

Model Description

This model was developed as part of a Bachelor of Computer Engineering final project at Khwopa College of Engineering, Tribhuvan University (February 2026). It classifies Nepali social media comments into four categories targeting different types of offensive content.

Base model: Rajan/NepaliBERT (110M parameters, 12 transformer layers, pre-trained on a large Nepali corpus using masked language modelling)

Task: Multi-class text classification (4 classes)

Languages: Nepali (Devanagari primary), Romanized Nepali, code-mixed

Compared to XLM-RoBERTa Large (our other model): NepaliBERT's Nepali-specific pre-training gives it stronger Devanagari understanding and the best OR (Offensive-Racist) class F1 (0.4833) among all evaluated models. However, it has limited exposure to Romanized Nepali and English, making XLM-RoBERTa more robust on heavily code-mixed inputs.


Labels

ID Label Description
0 NON_OFFENSIVE Text containing no offensive content
1 OTHER_OFFENSIVE General offensive content not targeting specific groups
2 OFFENSIVE_RACIST Content targeting individuals/groups based on ethnicity, race, or caste
3 OFFENSIVE_SEXIST Content targeting individuals based on gender

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="UDHOV/nepalibert-nepali-hate-classification"
)

# Devanagari input
classifier("यो राम्रो छ")

# Romanized Nepali (will be preprocessed via transliteration ideally)
classifier("yo ramro cha")

Or manually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("UDHOV/nepalibert-nepali-hate-classification")
model = AutoModelForSequenceClassification.from_pretrained("UDHOV/nepalibert-nepali-hate-classification")

text = "तिमी देखी घृणा लाग्छ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax().item()
print(model.config.id2label[predicted_class])

Preprocessing Pipeline

The model was trained on text processed through a 5-stage pipeline:

  1. Script Detection — Unicode-based confidence scoring to classify input as Devanagari, Romanized Nepali, or English
  2. Script Unification — Romanized Nepali transliterated to Devanagari via ITRANS; English translated to Nepali via Deep Translator API
  3. Emoji Processing — 180+ emojis semantically mapped to Nepali equivalents; unknown emojis preserved; 18-dimensional emoji feature vector extracted
  4. Text Cleaning — URL removal, @mention removal, hashtag handling, whitespace normalization
  5. Feature Extraction — Script metadata, emoji features, and text statistics merged with cleaned text

Note: NepaliBERT's WordPiece tokenizer is optimized for Devanagari. For best results, pre-process Romanized or English inputs through the transliteration/translation pipeline before passing to this model.


Training Data

  • Source: Niraula et al. (2021) — Offensive Language Detection in Nepali Social Media (ACL Anthology)
  • Platform: Facebook and YouTube comments
  • Total samples: 7,625
Split NO OO OR OS Total
Train 3,206 (57.7%) 1,759 (31.6%) 376 (6.8%) 214 (3.8%) 5,555
Validation 356 (57.5%) 195 (31.5%) 42 (6.8%) 27 (4.4%) 620
Test 896 (62.1%) 486 (33.7%) 49 (3.4%) 19 (1.3%) 1,450

Class imbalance: NO vs OS imbalance ratio = 14.98×. Addressed via class-weighted cross-entropy loss with weights capped in the range [0.5, 3.0] to prevent extreme gradient updates from the severely under-represented OS class.


Training Configuration

Hyperparameter Value
Optimizer AdamW
Learning rate 2e-5 (discriminative LR strategy)
Weight decay 0.01
Warmup steps 10% of total steps
LR schedule Linear decay
Batch size 16 (grad accum × 2 = effective 32)
Max epochs 5
Early stopping patience 2 epochs
Max sequence length 128 tokens
Dropout (classification head) 0.3
Label smoothing 0.05
Class weight capping [0.5, 3.0]
Gradient clipping 1.0
Loss Class-weighted cross-entropy

Training took approximately 3,759 seconds (~62.7 minutes) on a single GPU.


Evaluation Results

Test Set Performance

Class Precision Recall F1-Score Support
NON_OFFENSIVE 0.7805 0.7701 0.7753 896
OTHER_OFFENSIVE 0.6102 0.5926 0.6013 486
OFFENSIVE_RACIST 0.4085 0.5918 0.4833 49
OFFENSIVE_SEXIST 0.1739 0.2105 0.1905 19
Macro Avg 0.4933 0.5413 0.5126 1,450
Weighted Avg 0.7029 0.6972 0.6994 1,450
Accuracy 0.6972 1,450

Validation Set Performance (Best Checkpoint)

Class Precision Recall F1-Score Support
NON_OFFENSIVE 0.7961 0.8118 0.8039 356
OTHER_OFFENSIVE 0.6609 0.5897 0.6233 195
OFFENSIVE_RACIST 0.6727 0.8810 0.7629 42
OFFENSIVE_SEXIST 0.8214 0.8519 0.8364 27
Macro Avg 0.7378 0.7836 0.7566 620
Accuracy 0.7484 620

NepaliBERT achieved the highest validation macro F1 (0.7566) among all evaluated models, outperforming even XLM-RoBERTa Large (0.7392 val macro F1). The validation-to-test gap is primarily explained by distributional shift in the OR and OS minority classes, not overfitting (train-val loss gap = 0.066).

Primary metric: Macro F1-score. Accuracy is misleading given class imbalance; macro F1 weights all classes equally, making it the appropriate metric for evaluating minority hate class performance.


Training Dynamics

Training proceeded over approximately 1,000 gradient steps in three phases:

  • Phase 1 (steps 0–300): Rapid co-descent of train and validation loss (1.50 → 1.00), faster than XLM-RoBERTa due to Nepali-specific pre-training. Validation F1 rises from 0.26 to 0.47.
  • Phase 2 (steps 300–600): Training loss continues declining (~0.90); validation loss stabilizes around 1.00–1.02. Validation F1 improves to 0.65 as OO and OR class discrimination refines.
  • Phase 3 (steps 600–1000): Validation F1 peaks near 0.75 at step 700, then settles at 0.72. Post-step-600 divergence between F1 and accuracy reflects a trade-off between majority class accuracy and minority class precision.

The final train-validation loss gap of 0.066 confirms minimal overfitting; poor OS test performance is a data distribution issue rather than model overfitting.


Comparison with Other Models

Approach Model Accuracy Macro F1
Classical ML Logistic Regression (TF-IDF) 0.7538 0.5701
Classical ML SVM 0.7552 0.5502
Deep Learning GRU + Word2Vec 0.3307 (test)
Transformer XLM-RoBERTa Large 0.7034 0.5465
Transformer NepaliBERT (this model) 0.6972 0.5126

Per-Class F1 Comparison (Test Set)

Model Macro F1 NO OO OR OS
Logistic Regression 0.5701 0.8225 0.6722 0.5000 0.2857
SVM 0.5502 0.8288 0.6659 0.4660 0.2400
XLM-RoBERTa Large 0.5465 0.7825 0.6306 0.3731 0.4000
NepaliBERT (this model) 0.5126 0.7753 0.6013 0.4833 0.1905

Key finding: NepaliBERT achieves the best OR class F1 (0.4833) among all models, outperforming XLM-RoBERTa Large (0.3731), confirming that Nepali domain pre-training provides a meaningful advantage for ethnicity/caste-related hate content. XLM-RoBERTa Large outperforms NepaliBERT on the OS class (0.4000 vs 0.1905).


Limitations

  • Romanized Nepali coverage: NepaliBERT's pre-training corpus is predominantly Devanagari, limiting its ability to handle Romanized Nepali without prior transliteration. The OR test set contains 59.2% Romanized script vs 46.1% in training, contributing to the validation-to-test gap.
  • OS class collapse: With only 19 OS test samples, high length mismatch (train avg 13.1 words vs test avg 19.9 words), and narrow training vocabulary, OS results (F1 = 0.1905) should be interpreted with significant caution.
  • Optimal checkpoint sensitivity: NepaliBERT shows a more pronounced F1 peak-and-drop than XLM-RoBERTa, making it more sensitive to early stopping checkpoint selection.
  • Preprocessing dependency: Performance on Romanized or English inputs degrades without prior transliteration/translation through the preprocessing pipeline.
  • Language scope: Optimized specifically for Nepali. Not evaluated on other South Asian languages.

Intended Use

  • Automated hate content moderation on Nepali social media platforms, especially where content is primarily in Devanagari script
  • Research on Nepali-specific NLP and low-resource hate speech detection
  • Comparative study of language-specific vs multilingual transformer models
  • Explainable AI integration — this model was evaluated with LIME, SHAP, and Captum-based Integrated Gradients for token-level attribution

Out-of-scope uses: This model should not be used as the sole decision-making system for content removal without human review. OS class predictions carry particularly high uncertainty due to extremely limited test support.


Explainability

The deployment system integrates three complementary XAI methods for token-level explanation of predictions:

  • LIME — Local surrogate model via word masking perturbations
  • SHAP — Shapley value attribution (KernelSHAP)
  • Integrated Gradients (Captum) — Gradient-based attribution along input-to-baseline path

Citation

If you use this model, please cite the original dataset:

@inproceedings{niraula2021offensive,
  title={Offensive Language Detection in Nepali Social Media},
  author={Niraula, Nobal B. and Dulal, Saurav and Koirala, Diwa},
  booktitle={Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
  pages={67--75},
  year={2021}
}

And the base model:

@article{thapa2024nepali,
  title={Development of Pre-trained Transformer-based Models for the Nepali Language},
  author={Thapa, Prashant and Sharma, Prajwal and Kharel, Aman},
  journal={Transactions on Asian and Low-Resource Language Information Processing},
  year={2024}
}

Authors

Uddav Rajbhandari

Department of Computer and Electronics Engineering Khwopa College of Engineering, Tribhuvan University, Nepal (2026)

Downloads last month
16
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UDHOV/nepalibert-nepali-hate-classification

Base model

Rajan/NepaliBERT
Finetuned
(9)
this model