XLM-RoBERTa Large — Nepali Hate Content Classification

Fine-tuned XLM-RoBERTa Large for multi-class hate content classification of Nepali social media text. The model handles Devanagari script, Romanized Nepali, English, and code-mixed inputs through a comprehensive preprocessing pipeline.

Model Description

This model was developed as part of a Bachelor of Computer Engineering final project at Khwopa College of Engineering, Tribhuvan University (February 2026). It classifies Nepali social media comments into four categories targeting different types of offensive content.

Base model: FacebookAI/xlm-roberta-large (560M parameters, 24 transformer layers, pre-trained on 2.5TB CommonCrawl across 100 languages)

Task: Multi-class text classification (4 classes)

Languages: Nepali (Devanagari + Romanized), English, code-mixed


Labels

ID Label Description
0 NON_OFFENSIVE Text containing no offensive content
1 OTHER_OFFENSIVE General offensive content not targeting specific groups
2 OFFENSIVE_RACIST Content targeting individuals/groups based on ethnicity, race, or caste
3 OFFENSIVE_SEXIST Content targeting individuals based on gender

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="UDHOV/xlm-roberta-large-nepali-hate-classification"
)

# Devanagari input
classifier("यो राम्रो छ")

# Romanized Nepali input
classifier("yo ramro cha")

Or manually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")
model = AutoModelForSequenceClassification.from_pretrained("UDHOV/xlm-roberta-large-nepali-hate-classification")

text = "तिमी देखी घृणा लाग्छ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class = logits.argmax().item()
print(model.config.id2label[predicted_class])

Preprocessing Pipeline

The model was trained on text processed through a 5-stage pipeline:

  1. Script Detection — Unicode-based confidence scoring to classify input as Devanagari, Romanized Nepali, or English
  2. Script Unification — Romanized Nepali transliterated to Devanagari via ITRANS; English translated to Nepali via Deep Translator API
  3. Emoji Processing — 180+ emojis semantically mapped to Nepali equivalents; unknown emojis preserved; 18-dimensional emoji feature vector extracted
  4. Text Cleaning — URL removal, @mention removal, hashtag handling, whitespace normalization
  5. Feature Extraction — Script metadata, emoji features, and text statistics merged with cleaned text

Note: For best results, apply the same preprocessing before inference. Raw text is also accepted but may slightly reduce performance on heavily Romanized or emoji-rich inputs.


Training Data

  • Source: Niraula et al. (2021) — Offensive Language Detection in Nepali Social Media (ACL Anthology)
  • Platform: Facebook and YouTube comments
  • Total samples: 7,625
Split NO OO OR OS Total
Train 3,206 (57.7%) 1,759 (31.6%) 376 (6.8%) 214 (3.8%) 5,555
Validation 356 (57.5%) 195 (31.5%) 42 (6.8%) 27 (4.4%) 620
Test 896 (62.1%) 486 (33.7%) 49 (3.4%) 19 (1.3%) 1,450

Class imbalance: NO vs OS imbalance ratio = 14.98×. Addressed via class-weighted cross-entropy loss.


Training Configuration

Hyperparameter Value
Optimizer AdamW
Learning rate 2e-5
Weight decay 0.01
Warmup steps 10% of total steps
LR schedule Linear decay
Batch size 16 (grad accum × 2 = effective 32)
Max epochs 5
Early stopping patience 2 epochs
Max sequence length 128 tokens
Dropout (classification head) 0.1
Gradient clipping 1.0
Mixed precision FP16
Loss Class-weighted cross-entropy

Training took approximately 2 hours on a single GPU.


Evaluation Results

Test Set Performance

Class Precision Recall F1-Score Support
NON_OFFENSIVE 0.8344 0.7366 0.7825 896
OTHER_OFFENSIVE 0.5949 0.6708 0.6306 486
OFFENSIVE_RACIST 0.2941 0.5102 0.3731 49
OFFENSIVE_SEXIST 0.3462 0.4737 0.4000 19
Macro Avg 0.5174 0.5978 0.5465 1,450
Weighted Avg 0.7295 0.7034 0.7127 1,450
Accuracy 0.7034 1,450

Validation Set Performance (Best Checkpoint)

Class Precision Recall F1-Score Support
NON_OFFENSIVE 0.8360 0.7444 0.7875 356
OTHER_OFFENSIVE 0.6296 0.6974 0.6618 195
OFFENSIVE_RACIST 0.6250 0.8333 0.7143 42
OFFENSIVE_SEXIST 0.7419 0.8519 0.7931 27
Macro Avg 0.7081 0.7818 0.7392 620

Primary metric: Macro F1-score. Accuracy is misleading given class imbalance; macro F1 weights all classes equally, making it the appropriate metric for evaluating minority hate class performance.


Comparison with Other Models

Approach Model Accuracy Macro F1
Classical ML Logistic Regression (TF-IDF) 0.7538 0.5701
Classical ML SVM 0.7552 0.5502
Deep Learning GRU + Word2Vec 0.3307 (test)
Transformer XLM-RoBERTa Large (this model) 0.7034 0.5465
Transformer NepaliBERT 0.6972 0.5126

Logistic Regression achieves a marginally higher macro F1 (0.5701) due to better generalization on the OR class given the small test set size (49 samples). XLM-RoBERTa achieves the best OS class F1 (0.4000) among all models.


Limitations

  • Minority class performance: OR and OS classes have low test support (49 and 19 samples respectively), and both exhibit significant train-test lexical shift (~33–36% keyword overlap), limiting generalization.
  • Distributional shift: The OR class shows a higher proportion of Romanized script in the test set (59.2%) compared to training (46.1%), contributing to lower OR test performance.
  • OS class fragility: With only 19 OS test samples and high length mismatch between train (avg 13.1 words) and test (avg 19.9 words), OS results should be interpreted cautiously.
  • Preprocessing dependency: Performance may degrade on raw text without the preprocessing pipeline, especially for heavily Romanized or emoji-rich content.
  • Language scope: Primarily optimized for Nepali. Performance on other low-resource South Asian languages is not evaluated.

Intended Use

  • Automated hate content moderation on Nepali social media platforms (Facebook, YouTube, Twitter/X)
  • Research on low-resource language NLP and hate speech detection
  • Explainable AI integration — this model was evaluated with LIME, SHAP, and Captum-based Integrated Gradients for token-level attribution

Out-of-scope uses: This model should not be used as the sole decision-making system for content removal without human review. Predictions on minority classes (OR, OS) carry higher uncertainty.


Explainability

The deployment system integrates three complementary XAI methods for token-level explanation of predictions:

  • LIME — Local surrogate model via word masking perturbations
  • SHAP — Shapley value attribution (KernelSHAP)
  • Integrated Gradients (Captum) — Gradient-based attribution along input-to-baseline path

Citation

If you use this model, please cite the original dataset:

@inproceedings{niraula2021offensive,
  title={Offensive Language Detection in Nepali Social Media},
  author={Niraula, Nobal B. and Dulal, Saurav and Koirala, Diwa},
  booktitle={Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)},
  pages={67--75},
  year={2021}
}

Authors

Uddav Rajbhandari

Department of Computer and Electronics Engineering Khwopa College of Engineering, Tribhuvan University, Nepal (2026)

Downloads last month
66
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UDHOV/xlm-roberta-large-nepali-hate-classification

Finetuned
(913)
this model

Space using UDHOV/xlm-roberta-large-nepali-hate-classification 1