surrogate-xlmroberta-malayalam-fakenews

Fine-tuned xlm-roberta-base for binary fake news classification on Malayalam text.

Label 0 → FAKE
Label 1 → REAL

Training data

DravidianLangTech Malayalam Fake News dataset (ma_fake.csv + ma_true.csv), 80/10/10 stratified train/val/test split.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam")
model     = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam")
model.eval()

text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്"
enc  = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**enc).logits
pred = model.config.id2label[logits.argmax().item()]
print(pred)  # "FAKE" or "REAL"

Purpose

Used as the surrogate classifier in an adversarial paraphrase attack pipeline for fake news detection research (thesis project).

Downloads last month: 3

Safetensors

Model size

0.3B params

Tensor type

F32