Roberta_Malayalam / README.md
computerboy410's picture
Upload README.md with huggingface_hub
5c8ae39 verified
|
Raw
History Blame Contribute Delete
1.18 kB
metadata
language:
  - ml
tags:
  - text-classification
  - fake-news-detection
  - malayalam
  - xlm-roberta
license: apache-2.0

surrogate-xlmroberta-malayalam-fakenews

Fine-tuned xlm-roberta-base for binary fake news classification on Malayalam text.

  • Label 0 → FAKE
  • Label 1 → REAL

Training data

DravidianLangTech Malayalam Fake News dataset (ma_fake.csv + ma_true.csv), 80/10/10 stratified train/val/test split.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("computerboy410/Roberta_Malayalam")
model     = AutoModelForSequenceClassification.from_pretrained("computerboy410/Roberta_Malayalam")
model.eval()

text = "ഇത് ഒരു പ്രധാനപ്പെട്ട വാർത്തയാണ്"
enc  = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**enc).logits
pred = model.config.id2label[logits.argmax().item()]
print(pred)  # "FAKE" or "REAL"

Purpose

Used as the surrogate classifier in an adversarial paraphrase attack pipeline for fake news detection research (thesis project).