Aryan047 commited on
Commit
1d4db61
·
verified ·
1 Parent(s): e412e96

Upload fine-tuned BERT meme-vs-event classifier

Browse files
Files changed (6) hide show
  1. README.md +46 -0
  2. config.json +39 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +14 -0
  6. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ library_name: transformers
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - bert
8
+ - text-classification
9
+ - tweet-classification
10
+ - meme-detection
11
+ - event-detection
12
+ ---
13
+
14
+ # Meme vs Real Event Tweet Classifier
15
+
16
+ Fine-tuned `bert-base-uncased` that classifies a tweet as either a **meme /
17
+ low-signal cultural post** or a **real-world event** (breaking news,
18
+ infrastructure outages, disasters, politics, etc.).
19
+
20
+ - **Base model:** `bert-base-uncased`
21
+ - **Task:** binary sequence classification
22
+ - **Labels:** `0 = meme`, `1 = real_event`
23
+ - **Max sequence length:** 128 tokens
24
+ - **Preprocessing:** lowercase, strip URLs / mentions / hashtags / non-word chars
25
+
26
+ ## Quick start
27
+
28
+ ```python
29
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
30
+ import torch, torch.nn.functional as F
31
+
32
+ repo = "Aryan047/Dynamic-event-detector"
33
+ tokenizer = AutoTokenizer.from_pretrained(repo)
34
+ model = AutoModelForSequenceClassification.from_pretrained(repo).eval()
35
+
36
+ text = "Massive 6.5 earthquake just rocked Istanbul, buildings swaying"
37
+ enc = tokenizer(text, truncation=True, max_length=128, return_tensors="pt")
38
+ probs = F.softmax(model(**enc).logits[0], dim=-1).tolist()
39
+ print({"meme": probs[0], "real_event": probs[1]})
40
+ ```
41
+
42
+ ## Training pipeline
43
+
44
+ Clusters of tweets were auto-labeled against the GDELT DOC 2.0 API using a
45
+ lifespan-aware heuristic, then BERT was fine-tuned on an 80/20 split. See the
46
+ companion notebook `meme_vs_event_classifier.ipynb` for the full pipeline.
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": null,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": null,
11
+ "gradient_checkpointing": false,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "id2label": {
16
+ "0": "meme",
17
+ "1": "real_event"
18
+ },
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "is_decoder": false,
22
+ "label2id": {
23
+ "meme": 0,
24
+ "real_event": 1
25
+ },
26
+ "layer_norm_eps": 1e-12,
27
+ "max_position_embeddings": 512,
28
+ "model_type": "bert",
29
+ "num_attention_heads": 12,
30
+ "num_hidden_layers": 12,
31
+ "pad_token_id": 0,
32
+ "position_embedding_type": "absolute",
33
+ "problem_type": "single_label_classification",
34
+ "tie_word_embeddings": true,
35
+ "transformers_version": "5.0.0",
36
+ "type_vocab_size": 2,
37
+ "use_cache": false,
38
+ "vocab_size": 30522
39
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24d6b17c203ad33df80d5e6e1ddce4676a9aa7fb7c2154e7d772ac28b3599b39
3
+ size 437958624
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": true,
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "strip_accents": null,
11
+ "tokenize_chinese_chars": true,
12
+ "tokenizer_class": "BertTokenizer",
13
+ "unk_token": "[UNK]"
14
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d95f651825f49a44a544ba4f8bb25740788ee96b49b002bdec27e31a9d9b4df
3
+ size 5201