File size: 1,601 Bytes
f50f340 85ba41d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
Source: CrisisMMD (Alam et al., 2017)
Data Type: Multimodal β each sample includes:
tweet_text (social media text)
tweet_image (corresponding image from the tweet)
Total Samples Used: ~18,802(from the dataset)
Class Labels:
0 β Non-informative
1 β Informative
Collect only values where tweet_text and tweet_image are equal. (thus collected 12,743 tweets and convert it into test and train .pt files)
β
Preprocessing Done
Text:
Tokenized using BERT tokenizer (bert-base-uncased)
Extracted input_ids and attention_mask
Image:
Processed using ResNet-50
Extracted 2048-dimensional feature vectors
Label:
Encoded to 0 or 1 as per class
The final preprocessed dataset was saved as .pt files:
train_info.pt
test_info.pt
Each contains: input_ids, attention_mask, image_vector, and label tensors.
β
Model Architecture
A custom multimodal neural network combining both BERT and ResNet features:
Component Details
Text Encoder BERT base model (bert-base-uncased) β outputs pooler_output (768-d)
Image Encoder ResNet-50 pre-extracted features (2048-d)
Fusion Concatenation β FC layers β Softmax
Classifier Fully connected layers with BatchNorm, ReLU, Dropout
β
Training Setup
Loss Function: CrossEntropyLoss
Optimizer: AdamW
Scheduler: StepLR (Ξ³ = 0.9)
Epochs: 8
Batch Size: 16
Device: CUDA (if available)
β
Evaluation Metrics
Accuracy
Precision
Recall
F1 Score
β
Test Accuracy : 0.8518
β
Precision : 0.8289
β
Recall : 0.8032
β
F1 Score : 0.8142
Newly created dataset: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task1
|