Source: CrisisMMD (Alam et al., 2017)

Data Type: Multimodal — each sample includes:

tweet_text (social media text)

tweet_image (corresponding image from the tweet)

Total Samples Used: ~18,802(from the dataset)

Class Labels:

0 → Non-informative

1 → Informative

Collect only values where tweet_text and tweet_image are equal. (thus collected 12,743 tweets and convert it into test and train .pt files)

✅ Preprocessing Done
Text:

Tokenized using BERT tokenizer (bert-base-uncased)

Extracted input_ids and attention_mask

Image:

Processed using ResNet-50

Extracted 2048-dimensional feature vectors

Label:

Encoded to 0 or 1 as per class

The final preprocessed dataset was saved as .pt files:

train_info.pt

test_info.pt

Each contains: input_ids, attention_mask, image_vector, and label tensors.

✅ Model Architecture
A custom multimodal neural network combining both BERT and ResNet features:

Component	Details
Text Encoder	BERT base model (bert-base-uncased) – outputs pooler_output (768-d)
Image Encoder	ResNet-50 pre-extracted features (2048-d)
Fusion	Concatenation → FC layers → Softmax
Classifier	Fully connected layers with BatchNorm, ReLU, Dropout

✅ Training Setup
Loss Function: CrossEntropyLoss

Optimizer: AdamW

Scheduler: StepLR (γ = 0.9)

Epochs: 8

Batch Size: 16

Device: CUDA (if available)

✅ Evaluation Metrics
Accuracy

Precision

Recall

F1 Score

✅ Test Accuracy : 0.8518
✅ Precision     : 0.8289
✅ Recall        : 0.8032
✅ F1 Score      : 0.8142

Newly created dataset: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task1