surrogate-xlmroberta-malayalam-fakenews
Fine-tuned xlm-roberta-base for binary fake news classification on Malayalam text.
- Label 0 → FAKE
- Label 1 → REAL
Training data
DravidianLangTech Malayalam Fake News dataset (ma_fake.csv + ma_true.csv),
80/10/10 stratified train/val/test split.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam")
model = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam")
model.eval()
text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്"
enc = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
logits = model(**enc).logits
pred = model.config.id2label[logits.argmax().item()]
print(pred) # "FAKE" or "REAL"
Purpose
Used as the surrogate classifier in an adversarial paraphrase attack pipeline for fake news detection research (thesis project).
- Downloads last month
- 3