XLM-RoBERTa Large — Nepali Hate Content Classification

Fine-tuned XLM-RoBERTa Large for multi-class hate content classification of Nepali social media text. The model handles Devanagari script, Romanized Nepali, English, and code-mixed inputs through a comprehensive preprocessing pipeline.

Model Description

This model was developed as part of a Bachelor of Computer Engineering final project at Khwopa College of Engineering, Tribhuvan University (February 2026). It classifies Nepali social media comments into four categories targeting different types of offensive content.

Base model: FacebookAI/xlm-roberta-large (560M parameters, 24 transformer layers, pre-trained on 2.5TB CommonCrawl across 100 languages)

Task: Multi-class text classification (4 classes)

Languages: Nepali (Devanagari + Romanized), English, code-mixed

Labels

ID	Label	Description
0	`NON_OFFENSIVE`	Text containing no offensive content
1	`OTHER_OFFENSIVE`	General offensive content not targeting specific groups
2	`OFFENSIVE_RACIST`	Content targeting individuals/groups based on ethnicity, race, or caste
3	`OFFENSIVE_SEXIST`	Content targeting individuals based on gender

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="UDHOV/xlm-roberta-large-nepali-hate-classification"
)

# Devanagari input
classifier("यो राम्रो छ")

# Romanized Nepali input
classifier("yo ramro cha")

Or manually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")
model = AutoModelForSequenceClassification.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")

text = "तिमी देखी घृणा लाग्छ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax().item()
print(model.config.id2label[predicted_class])

Preprocessing Pipeline

The model was trained on text processed through a 5-stage pipeline:

Script Detection — Unicode-based confidence scoring to classify input as Devanagari, Romanized Nepali, or English
Script Unification — Romanized Nepali transliterated to Devanagari via ITRANS; English translated to Nepali via Deep Translator API
Emoji Processing — 180+ emojis semantically mapped to Nepali equivalents; unknown emojis preserved; 18-dimensional emoji feature vector extracted
Text Cleaning — URL removal, @mention removal, hashtag handling, whitespace normalization
Feature Extraction — Script metadata, emoji features, and text statistics merged with cleaned text

Note: For best results, apply the same preprocessing before inference. Raw text is also accepted but may slightly reduce performance on heavily Romanized or emoji-rich inputs.

Training Data

Source: Niraula et al. (2021) — Offensive Language Detection in Nepali Social Media (ACL Anthology)
Platform: Facebook and YouTube comments
Total samples: 7,625

Split	NO	OO	OR	OS	Total
Train	3,206 (57.7%)	1,759 (31.6%)	376 (6.8%)	214 (3.8%)	5,555
Validation	356 (57.5%)	195 (31.5%)	42 (6.8%)	27 (4.4%)	620
Test	896 (62.1%)	486 (33.7%)	49 (3.4%)	19 (1.3%)	1,450

Class imbalance: NO vs OS imbalance ratio = 14.98×. Addressed via class-weighted cross-entropy loss.

Training Configuration

Hyperparameter	Value
Optimizer	AdamW
Learning rate	2e-5
Weight decay	0.01
Warmup steps	10% of total steps
LR schedule	Linear decay
Batch size	16 (grad accum × 2 = effective 32)
Max epochs	5
Early stopping patience	2 epochs
Max sequence length	128 tokens
Dropout (classification head)	0.1
Gradient clipping	1.0
Mixed precision	FP16
Loss	Class-weighted cross-entropy

Training took approximately 2 hours on a single GPU.

Evaluation Results

Test Set Performance

Class	Precision	Recall	F1-Score	Support
NON_OFFENSIVE	0.8344	0.7366	0.7825	896
OTHER_OFFENSIVE	0.5949	0.6708	0.6306	486
OFFENSIVE_RACIST	0.2941	0.5102	0.3731	49
OFFENSIVE_SEXIST	0.3462	0.4737	0.4000	19
Macro Avg	0.5174	0.5978	0.5465	1,450
Weighted Avg	0.7295	0.7034	0.7127	1,450
Accuracy			0.7034	1,450

Validation Set Performance (Best Checkpoint)

Class	Precision	Recall	F1-Score	Support
NON_OFFENSIVE	0.8360	0.7444	0.7875	356
OTHER_OFFENSIVE	0.6296	0.6974	0.6618	195
OFFENSIVE_RACIST	0.6250	0.8333	0.7143	42
OFFENSIVE_SEXIST	0.7419	0.8519	0.7931	27
Macro Avg	0.7081	0.7818	0.7392	620

Primary metric: Macro F1-score. Accuracy is misleading given class imbalance; macro F1 weights all classes equally, making it the appropriate metric for evaluating minority hate class performance.

Comparison with Other Models

Approach	Model	Accuracy	Macro F1
Classical ML	Logistic Regression (TF-IDF)	0.7538	0.5701
Classical ML	SVM	0.7552	0.5502
Deep Learning	GRU + Word2Vec	—	0.3307 (test)
Transformer	XLM-RoBERTa Large (this model)	0.7034	0.5465
Transformer	NepaliBERT	0.6972	0.5126

Logistic Regression achieves a marginally higher macro F1 (0.5701) due to better generalization on the OR class given the small test set size (49 samples). XLM-RoBERTa achieves the best OS class F1 (0.4000) among all models.

Limitations

Minority class performance: OR and OS classes have low test support (49 and 19 samples respectively), and both exhibit significant train-test lexical shift (~33–36% keyword overlap), limiting generalization.
Distributional shift: The OR class shows a higher proportion of Romanized script in the test set (59.2%) compared to training (46.1%), contributing to lower OR test performance.
OS class fragility: With only 19 OS test samples and high length mismatch between train (avg 13.1 words) and test (avg 19.9 words), OS results should be interpreted cautiously.
Preprocessing dependency: Performance may degrade on raw text without the preprocessing pipeline, especially for heavily Romanized or emoji-rich content.
Language scope: Primarily optimized for Nepali. Performance on other low-resource South Asian languages is not evaluated.

Intended Use

Automated hate content moderation on Nepali social media platforms (Facebook, YouTube, Twitter/X)
Research on low-resource language NLP and hate speech detection
Explainable AI integration — this model was evaluated with LIME, SHAP, and Captum-based Integrated Gradients for token-level attribution

Out-of-scope uses: This model should not be used as the sole decision-making system for content removal without human review. Predictions on minority classes (OR, OS) carry higher uncertainty.

Explainability

The deployment system integrates three complementary XAI methods for token-level explanation of predictions:

LIME — Local surrogate model via word masking perturbations
SHAP — Shapley value attribution (KernelSHAP)
Integrated Gradients (Captum) — Gradient-based attribution along input-to-baseline path

Citation

If you use this model, please cite the original dataset:

@inproceedings{niraula2021offensive,
  title={Offensive Language Detection in Nepali Social Media},
  author={Niraula, Nobal B. and Dulal, Saurav and Koirala, Diwa},
  booktitle={Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
  pages={67--75},
  year={2021}
}

Authors

Uddav Rajbhandari

Department of Computer and Electronics Engineering Khwopa College of Engineering, Tribhuvan University, Nepal (2026)

Downloads last month: 6

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for UDHOV/xlm-roberta-large-nepali-hate-classification

Base model

FacebookAI/xlm-roberta-large

Finetuned

(980)

this model

UDHOV
/

xlm-roberta-large-nepali-hate-classification