Source: CrisisMMD (Alam et al., 2017) Data Type: Multimodal — each sample includes: tweet_text (social media text) tweet_image (corresponding image from the tweet) Total Samples Used: ~18,802(from the dataset) Class Labels: 0 → Non-informative 1 → Informative Collect only values where tweet_text and tweet_image are equal. (thus collected 12,743 tweets and convert it into test and train .pt files) ✅ Preprocessing Done Text: Tokenized using BERT tokenizer (bert-base-uncased) Extracted input_ids and attention_mask Image: Processed using ResNet-50 Extracted 2048-dimensional feature vectors Label: Encoded to 0 or 1 as per class The final preprocessed dataset was saved as .pt files: train_info.pt test_info.pt Each contains: input_ids, attention_mask, image_vector, and label tensors. ✅ Model Architecture A custom multimodal neural network combining both BERT and ResNet features: Component Details Text Encoder BERT base model (bert-base-uncased) – outputs pooler_output (768-d) Image Encoder ResNet-50 pre-extracted features (2048-d) Fusion Concatenation → FC layers → Softmax Classifier Fully connected layers with BatchNorm, ReLU, Dropout ✅ Training Setup Loss Function: CrossEntropyLoss Optimizer: AdamW Scheduler: StepLR (γ = 0.9) Epochs: 8 Batch Size: 16 Device: CUDA (if available) ✅ Evaluation Metrics Accuracy Precision Recall F1 Score ✅ Test Accuracy : 0.8518 ✅ Precision : 0.8289 ✅ Recall : 0.8032 ✅ F1 Score : 0.8142 Newly created dataset: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task1