| language: | |
| - ml | |
| tags: | |
| - text-classification | |
| - fake-news-detection | |
| - malayalam | |
| - xlm-roberta | |
| license: apache-2.0 | |
| # surrogate-xlmroberta-malayalam-fakenews | |
| Fine-tuned `xlm-roberta-base` for **binary fake news classification** on Malayalam text. | |
| - **Label 0** → FAKE | |
| - **Label 1** → REAL | |
| ## Training data | |
| DravidianLangTech Malayalam Fake News dataset (`ma_fake.csv` + `ma_true.csv`), | |
| 80/10/10 stratified train/val/test split. | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam") | |
| model = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam") | |
| model.eval() | |
| text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്" | |
| enc = tokenizer(text, return_tensors="pt", truncation=True) | |
| with torch.no_grad(): | |
| logits = model(**enc).logits | |
| pred = model.config.id2label[logits.argmax().item()] | |
| print(pred) # "FAKE" or "REAL" | |
| ``` | |
| ## Purpose | |
| Used as the surrogate classifier in an adversarial paraphrase attack pipeline | |
| for fake news detection research (thesis project). | |