--- license: apache-2.0 language: en library_name: transformers pipeline_tag: text-classification tags: - bert - text-classification - tweet-classification - meme-detection - event-detection --- # Meme vs Real Event Tweet Classifier Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme / low-signal cultural post** or a **real-world event** (breaking news, infrastructure outages, disasters, politics, etc.). - **Base model:** `bert-base-uncased` - **Task:** binary sequence classification - **Labels:** `0 = meme`, `1 = real_event` - **Max sequence length:** 128 tokens - **Preprocessing:** lowercase, strip URLs / mentions / hashtags / non-word chars ## Quick start ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch, torch.nn.functional as F repo = "Aryan047/Dynamic-event-detector" tokenizer = AutoTokenizer.from_pretrained(repo) model = AutoModelForSequenceClassification.from_pretrained(repo).eval() text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying" enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt") probs = F.softmax(model(**enc).logits[0], dim=-1).tolist() print({"meme": probs[0], "real_event": probs[1]}) ``` ## Training pipeline Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline.