🌐 XLM-Prohori-v2: Bangla/English SMS Smishing Classifier

Repository: squadgoals404/XLM-Prohori-v2
Base Model: xlm-roberta-base


πŸ“Œ Overview

XLM-Prohori-v2 is a fine-tuned XLM-RoBERTa-base model for detecting smishing (SMS phishing) in Bangla and English.
It classifies SMS into three categories:

  • normal β†’ Casual, harmless, informational texts
  • promo β†’ Promotional/advertising messages
  • smish β†’ Smishing (phishing via SMS) attempts

πŸ“Š Dataset

  • Total samples (after deduplication): ~4,507
  • Languages: Bangla, English, Banglish
  • Labels: balanced across normal, promo, smish
  • Preprocessing: All URLs normalized to [LINK]; duplicates removed; stratified train/val/test split
  • Splits: Train=3064, Val=541, Test=902 (verified zero overlap)

The raw dataset is not publicly released for privacy reasons. Some synthetic smish examples were included to balance classes.


πŸ“ˆ Performance

  • Validation Accuracy: ~97.60%
  • Test Accuracy: ~97.23%

Confusion matrices indicate generally balanced performance, with minor confusion between promo and smish in link-heavy texts.


πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model_id = "squadgoals404/XLM-Prohori-v2"
tok = AutoTokenizer.from_pretrained(model_id)
mdl = AutoModelForSequenceClassification.from_pretrained(model_id)

# Force semantic class names in-memory
CLASS_NAMES = ["normal", "promo", "smish"]  # make sure the order matches 0,1,2
mdl.config.id2label = {i: c for i, c in enumerate(CLASS_NAMES)}
mdl.config.label2id = {c: i for i, c in enumerate(CLASS_NAMES)}

text = "Bank Account temporarily lockedβ€”identity verify করঀে জরুরি কল 017XX-XXXXXX"
inputs = tok(text, return_tensors="pt")
with torch.no_grad():
    probs = F.softmax(mdl(**inputs).logits, dim=-1).squeeze().tolist()

print({CLASS_NAMES[i]: round(p, 4) for i, p in enumerate(probs)})
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for squadgoals404/XLM-Prohori-v2

Finetuned
(3728)
this model
Quantizations
1 model