File size: 1,503 Bytes
1d4db61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-classification
tags:
  - bert
  - text-classification
  - tweet-classification
  - meme-detection
  - event-detection
---

# Meme vs Real Event Tweet Classifier

Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme /
low-signal cultural post** or a **real-world event** (breaking news,
infrastructure outages, disasters, politics, etc.).

- **Base model:** `bert-base-uncased`
- **Task:** binary sequence classification
- **Labels:** `0 = meme`, `1 = real_event`
- **Max sequence length:** 128 tokens
- **Preprocessing:** lowercase, strip URLs / mentions / hashtags / non-word chars

## Quick start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, torch.nn.functional as F

repo = "Aryan047/Dynamic-event-detector"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying"
enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt")
probs = F.softmax(model(**enc).logits[0], dim=-1).tolist()
print({"meme": probs[0], "real_event": probs[1]})
```

## Training pipeline

Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a
lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the
companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline.