--- language: - ml tags: - text-classification - fake-news-detection - malayalam - xlm-roberta license: apache-2.0 --- # surrogate-xlmroberta-malayalam-fakenews Fine-tuned `xlm-roberta-base` for **binary fake news classification** on Malayalam text. - **Label 0** → FAKE - **Label 1** → REAL ## Training data DravidianLangTech Malayalam Fake News dataset (`ma_fake.csv` + `ma_true.csv`), 80/10/10 stratified train/val/test split. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam") model = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam") model.eval() text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്" enc = tokenizer(text, return_tensors="pt", truncation=True) with torch.no_grad(): logits = model(**enc).logits pred = model.config.id2label[logits.argmax().item()] print(pred) # "FAKE" or "REAL" ``` ## Purpose Used as the surrogate classifier in an adversarial paraphrase attack pipeline for fake news detection research (thesis project).