File size: 1,601 Bytes
f50f340
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85ba41d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Source: CrisisMMD (Alam et al., 2017)

Data Type: Multimodal β€” each sample includes:

tweet_text (social media text)

tweet_image (corresponding image from the tweet)

Total Samples Used: ~18,802(from the dataset)

Class Labels:

0 β†’ Non-informative

1 β†’ Informative

Collect only values where tweet_text and tweet_image are equal. (thus collected 12,743 tweets and convert it into test and train .pt files)

βœ… Preprocessing Done
Text:

Tokenized using BERT tokenizer (bert-base-uncased)

Extracted input_ids and attention_mask

Image:

Processed using ResNet-50

Extracted 2048-dimensional feature vectors

Label:

Encoded to 0 or 1 as per class

The final preprocessed dataset was saved as .pt files:

train_info.pt

test_info.pt

Each contains: input_ids, attention_mask, image_vector, and label tensors.

βœ… Model Architecture
A custom multimodal neural network combining both BERT and ResNet features:

Component	Details
Text Encoder	BERT base model (bert-base-uncased) – outputs pooler_output (768-d)
Image Encoder	ResNet-50 pre-extracted features (2048-d)
Fusion	Concatenation β†’ FC layers β†’ Softmax
Classifier	Fully connected layers with BatchNorm, ReLU, Dropout

βœ… Training Setup
Loss Function: CrossEntropyLoss

Optimizer: AdamW

Scheduler: StepLR (Ξ³ = 0.9)

Epochs: 8

Batch Size: 16

Device: CUDA (if available)

βœ… Evaluation Metrics
Accuracy

Precision

Recall

F1 Score

βœ… Test Accuracy : 0.8518
βœ… Precision     : 0.8289
βœ… Recall        : 0.8032
βœ… F1 Score      : 0.8142

Newly created dataset: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task1