Aryan047's picture
Upload fine-tuned BERT meme-vs-event classifier
1d4db61 verified
---
license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-classification
tags:
- bert
- text-classification
- tweet-classification
- meme-detection
- event-detection
---
# Meme vs Real Event Tweet Classifier
Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme /
low-signal cultural post** or a **real-world event** (breaking news,
infrastructure outages, disasters, politics, etc.).
- **Base model:** `bert-base-uncased`
- **Task:** binary sequence classification
- **Labels:** `0 = meme`, `1 = real_event`
- **Max sequence length:** 128 tokens
- **Preprocessing:** lowercase, strip URLs / mentions / hashtags / non-word chars
## Quick start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F
repo = "Aryan047/Dynamic-event-detector"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()
text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying"
enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt")
probs = F.softmax(model(**enc).logits[0], dim=-1).tolist()
print({"meme": probs[0], "real_event": probs[1]})
```
## Training pipeline
Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a
lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the
companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline.