Upload README.md with huggingface_hub

5c8ae39 verified 3 months ago

1.18 kB

language:
  - ml
tags:
  - text-classification
  - fake-news-detection
  - malayalam
  - xlm-roberta
license: apache-2.0

surrogate-xlmroberta-malayalam-fakenews

Fine-tuned xlm-roberta-base for binary fake news classification on Malayalam text.

Label 0 → FAKE
Label 1 → REAL

Training data

DravidianLangTech Malayalam Fake News dataset (ma_fake.csv + ma_true.csv), 80/10/10 stratified train/val/test split.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam")
model     = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam")
model.eval()

text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്"
enc  = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**enc).logits
pred = model.config.id2label[logits.argmax().item()]
print(pred)  # "FAKE" or "REAL"

Purpose

Used as the surrogate classifier in an adversarial paraphrase attack pipeline for fake news detection research (thesis project).