| --- |
| license: apache-2.0 |
| language: en |
| library_name: transformers |
| pipeline_tag: text-classification |
| tags: |
| - bert |
| - text-classification |
| - tweet-classification |
| - meme-detection |
| - event-detection |
| --- |
| |
| # Meme vs Real Event Tweet Classifier |
|
|
| Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme / |
| low-signal cultural post** or a **real-world event** (breaking news, |
| infrastructure outages, disasters, politics, etc.). |
|
|
| - **Base model:** `bert-base-uncased` |
| - **Task:** binary sequence classification |
| - **Labels:** `0 = meme`, `1 = real_event` |
| - **Max sequence length:** 128 tokens |
| - **Preprocessing:** lowercase, strip URLs / mentions / hashtags / non-word chars |
|
|
| ## Quick start |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch, torch.nn.functional as F |
| |
| repo = "Aryan047/Dynamic-event-detector" |
| tokenizer = AutoTokenizer.from_pretrained(repo) |
| model = AutoModelForSequenceClassification.from_pretrained(repo).eval() |
| |
| text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying" |
| enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt") |
| probs = F.softmax(model(**enc).logits[0], dim=-1).tolist() |
| print({"meme": probs[0], "real_event": probs[1]}) |
| ``` |
|
|
| ## Training pipeline |
|
|
| Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a |
| lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the |
| companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline. |
|
|