Upload fine-tuned BERT meme-vs-event classifier

1d4db61 verified 17 days ago

1.5 kB

license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-classification
tags:
  - bert
  - text-classification
  - tweet-classification
  - meme-detection
  - event-detection

Meme vs Real Event Tweet Classifier

Fine-tuned bert-base-uncased that classifies a tweet as either a meme / low-signal cultural post or a real-world event (breaking news, infrastructure outages, disasters, politics, etc.).

Base model: bert-base-uncased
Task: binary sequence classification
Labels: 0 = meme, 1 = real_event
Max sequence length: 128 tokens
Preprocessing: lowercase, strip URLs / mentions / hashtags / non-word chars

Quick start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F

repo = "Aryan047/Dynamic-event-detector"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying"
enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt")
probs = F.softmax(model(**enc).logits[0], dim=-1).tolist()
print({"meme": probs[0], "real_event": probs[1]})

Training pipeline

Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the companion notebook meme_vs_event_classifier.ipynb for the full pipeline.