Aryan047
/

Dynamic-event-detector

Text Classification

tweet-classification

event-detection

text-embeddings-inference

Model card Files Files and versions

Dynamic-event-detector / README.md

Aryan047's picture

Upload fine-tuned BERT meme-vs-event classifier

1d4db61 verified 18 days ago

|

history blame contribute delete

1.5 kB

	---
	license: apache-2.0
	language: en
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- bert
	- text-classification
	- tweet-classification
	- meme-detection
	- event-detection
	---

	# Meme vs Real Event Tweet Classifier

	Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme /
	low-signal cultural post or a real-world event** (breaking news,
	infrastructure outages, disasters, politics, etc.).

	- Base model: `bert-base-uncased`
	- Task: binary sequence classification
	- Labels: `0 = meme`, `1 = real_event`
	- Max sequence length: 128 tokens
	- Preprocessing: lowercase, strip URLs / mentions / hashtags / non-word chars

	## Quick start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch, torch.nn.functional as F

	repo = "Aryan047/Dynamic-event-detector"
	tokenizer = AutoTokenizer.from_pretrained(repo)
	model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

	text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying"
	enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt")
	probs = F.softmax(model(**enc).logits[0], dim=-1).tolist()
	print({"meme": probs[0], "real_event": probs[1]})
	```

	## Training pipeline

	Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a
	lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the
	companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline.