XLM-RoBERTa Large — Nepali Hate Content Classification
Fine-tuned XLM-RoBERTa Large for multi-class hate content classification of Nepali social media text. The model handles Devanagari script, Romanized Nepali, English, and code-mixed inputs through a comprehensive preprocessing pipeline.
Model Description
This model was developed as part of a Bachelor of Computer Engineering final project at Khwopa College of Engineering, Tribhuvan University (February 2026). It classifies Nepali social media comments into four categories targeting different types of offensive content.
Base model: FacebookAI/xlm-roberta-large (560M parameters, 24 transformer layers, pre-trained on 2.5TB CommonCrawl across 100 languages)
Task: Multi-class text classification (4 classes)
Languages: Nepali (Devanagari + Romanized), English, code-mixed
Labels
| ID | Label | Description |
|---|---|---|
| 0 | NON_OFFENSIVE |
Text containing no offensive content |
| 1 | OTHER_OFFENSIVE |
General offensive content not targeting specific groups |
| 2 | OFFENSIVE_RACIST |
Content targeting individuals/groups based on ethnicity, race, or caste |
| 3 | OFFENSIVE_SEXIST |
Content targeting individuals based on gender |
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="UDHOV/xlm-roberta-large-nepali-hate-classification"
)
# Devanagari input
classifier("यो राम्रो छ")
# Romanized Nepali input
classifier("yo ramro cha")
Or manually:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")
model = AutoModelForSequenceClassification.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")
text = "तिमी देखी घृणा लाग्छ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
predicted_class = logits.argmax().item()
print(model.config.id2label[predicted_class])
Preprocessing Pipeline
The model was trained on text processed through a 5-stage pipeline:
- Script Detection — Unicode-based confidence scoring to classify input as Devanagari, Romanized Nepali, or English
- Script Unification — Romanized Nepali transliterated to Devanagari via ITRANS; English translated to Nepali via Deep Translator API
- Emoji Processing — 180+ emojis semantically mapped to Nepali equivalents; unknown emojis preserved; 18-dimensional emoji feature vector extracted
- Text Cleaning — URL removal, @mention removal, hashtag handling, whitespace normalization
- Feature Extraction — Script metadata, emoji features, and text statistics merged with cleaned text
Note: For best results, apply the same preprocessing before inference. Raw text is also accepted but may slightly reduce performance on heavily Romanized or emoji-rich inputs.
Training Data
- Source: Niraula et al. (2021) — Offensive Language Detection in Nepali Social Media (ACL Anthology)
- Platform: Facebook and YouTube comments
- Total samples: 7,625
| Split | NO | OO | OR | OS | Total |
|---|---|---|---|---|---|
| Train | 3,206 (57.7%) | 1,759 (31.6%) | 376 (6.8%) | 214 (3.8%) | 5,555 |
| Validation | 356 (57.5%) | 195 (31.5%) | 42 (6.8%) | 27 (4.4%) | 620 |
| Test | 896 (62.1%) | 486 (33.7%) | 49 (3.4%) | 19 (1.3%) | 1,450 |
Class imbalance: NO vs OS imbalance ratio = 14.98×. Addressed via class-weighted cross-entropy loss.
Training Configuration
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup steps | 10% of total steps |
| LR schedule | Linear decay |
| Batch size | 16 (grad accum × 2 = effective 32) |
| Max epochs | 5 |
| Early stopping patience | 2 epochs |
| Max sequence length | 128 tokens |
| Dropout (classification head) | 0.1 |
| Gradient clipping | 1.0 |
| Mixed precision | FP16 |
| Loss | Class-weighted cross-entropy |
Training took approximately 2 hours on a single GPU.
Evaluation Results
Test Set Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| NON_OFFENSIVE | 0.8344 | 0.7366 | 0.7825 | 896 |
| OTHER_OFFENSIVE | 0.5949 | 0.6708 | 0.6306 | 486 |
| OFFENSIVE_RACIST | 0.2941 | 0.5102 | 0.3731 | 49 |
| OFFENSIVE_SEXIST | 0.3462 | 0.4737 | 0.4000 | 19 |
| Macro Avg | 0.5174 | 0.5978 | 0.5465 | 1,450 |
| Weighted Avg | 0.7295 | 0.7034 | 0.7127 | 1,450 |
| Accuracy | 0.7034 | 1,450 |
Validation Set Performance (Best Checkpoint)
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| NON_OFFENSIVE | 0.8360 | 0.7444 | 0.7875 | 356 |
| OTHER_OFFENSIVE | 0.6296 | 0.6974 | 0.6618 | 195 |
| OFFENSIVE_RACIST | 0.6250 | 0.8333 | 0.7143 | 42 |
| OFFENSIVE_SEXIST | 0.7419 | 0.8519 | 0.7931 | 27 |
| Macro Avg | 0.7081 | 0.7818 | 0.7392 | 620 |
Primary metric: Macro F1-score. Accuracy is misleading given class imbalance; macro F1 weights all classes equally, making it the appropriate metric for evaluating minority hate class performance.
Comparison with Other Models
| Approach | Model | Accuracy | Macro F1 |
|---|---|---|---|
| Classical ML | Logistic Regression (TF-IDF) | 0.7538 | 0.5701 |
| Classical ML | SVM | 0.7552 | 0.5502 |
| Deep Learning | GRU + Word2Vec | — | 0.3307 (test) |
| Transformer | XLM-RoBERTa Large (this model) | 0.7034 | 0.5465 |
| Transformer | NepaliBERT | 0.6972 | 0.5126 |
Logistic Regression achieves a marginally higher macro F1 (0.5701) due to better generalization on the OR class given the small test set size (49 samples). XLM-RoBERTa achieves the best OS class F1 (0.4000) among all models.
Limitations
- Minority class performance: OR and OS classes have low test support (49 and 19 samples respectively), and both exhibit significant train-test lexical shift (~33–36% keyword overlap), limiting generalization.
- Distributional shift: The OR class shows a higher proportion of Romanized script in the test set (59.2%) compared to training (46.1%), contributing to lower OR test performance.
- OS class fragility: With only 19 OS test samples and high length mismatch between train (avg 13.1 words) and test (avg 19.9 words), OS results should be interpreted cautiously.
- Preprocessing dependency: Performance may degrade on raw text without the preprocessing pipeline, especially for heavily Romanized or emoji-rich content.
- Language scope: Primarily optimized for Nepali. Performance on other low-resource South Asian languages is not evaluated.
Intended Use
- Automated hate content moderation on Nepali social media platforms (Facebook, YouTube, Twitter/X)
- Research on low-resource language NLP and hate speech detection
- Explainable AI integration — this model was evaluated with LIME, SHAP, and Captum-based Integrated Gradients for token-level attribution
Out-of-scope uses: This model should not be used as the sole decision-making system for content removal without human review. Predictions on minority classes (OR, OS) carry higher uncertainty.
Explainability
The deployment system integrates three complementary XAI methods for token-level explanation of predictions:
- LIME — Local surrogate model via word masking perturbations
- SHAP — Shapley value attribution (KernelSHAP)
- Integrated Gradients (Captum) — Gradient-based attribution along input-to-baseline path
Citation
If you use this model, please cite the original dataset:
@inproceedings{niraula2021offensive,
title={Offensive Language Detection in Nepali Social Media},
author={Niraula, Nobal B. and Dulal, Saurav and Koirala, Diwa},
booktitle={Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
pages={67--75},
year={2021}
}
Authors
Uddav Rajbhandari
Department of Computer and Electronics Engineering Khwopa College of Engineering, Tribhuvan University, Nepal (2026)
- Downloads last month
- 66
Model tree for UDHOV/xlm-roberta-large-nepali-hate-classification
Base model
FacebookAI/xlm-roberta-large