Roberta_Malayalam / README.md
computerboy410's picture
Upload README.md with huggingface_hub
5c8ae39 verified
|
Raw
History Blame Contribute Delete
1.18 kB
---
language:
- ml
tags:
- text-classification
- fake-news-detection
- malayalam
- xlm-roberta
license: apache-2.0
---
# surrogate-xlmroberta-malayalam-fakenews
Fine-tuned `xlm-roberta-base` for **binary fake news classification** on Malayalam text.
- **Label 0** → FAKE
- **Label 1** → REAL
## Training data
DravidianLangTech Malayalam Fake News dataset (`ma_fake.csv` + `ma_true.csv`),
80/10/10 stratified train/val/test split.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam")
model = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam")
model.eval()
text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്"
enc = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
logits = model(**enc).logits
pred = model.config.id2label[logits.argmax().item()]
print(pred) # "FAKE" or "REAL"
```
## Purpose
Used as the surrogate classifier in an adversarial paraphrase attack pipeline
for fake news detection research (thesis project).