File size: 8,775 Bytes
e44768c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
license: llama3.2
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
datasets:
- QCRI/MemeXplain
language:
- en
- ar
pipeline_tag: image-text-to-text
tags:
- meme-detection
- propaganda
- hate-speech
- multimodal
- vision-language
- explainability
library_name: transformers
---

# MemeIntel: Explainable Detection of Propagandistic and Hateful Memes

MemeIntel is a Vision-Language Model fine-tuned from [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for detecting propaganda in Arabic memes and hateful content in English memes, with explainable reasoning.

## Model Description

MemeIntel addresses the challenge of understanding and moderating complex, context-dependent multimodal content on social media. The model performs:
- **Label Detection**: Classifies memes into categories (propaganda/not-propaganda/not-meme/other for Arabic; hateful/not-hateful for English)
- **Explanation Generation**: Provides human-readable explanations for its predictions

The model was trained using a novel multi-stage optimization approach on the [MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain) dataset.

## Usage

```python
from transformers import MllamaForConditionalGeneration, AutoProcessor
from PIL import Image

# Load model and processor
model = MllamaForConditionalGeneration.from_pretrained(
    "QCRI/MemeIntel",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("QCRI/MemeIntel")

# Load your meme image
image = Image.open("path/to/meme.jpg")
```

### Arabic Propaganda Meme Detection (Arabic Explanation)

```python
messages = [
    {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."},
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: لما يقولي انتي مالكيش عزيز\nاعز ما ليا البطاطس المقلية"}
    ]}
]

input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))
```

### Arabic Propaganda Meme Detection (English Explanation)

```python
messages = [
    {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts."},
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: وأنا أبكي\n٣\nانت تتمنى وانا البي\n{7"}
    ]}
]

input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))
```

### English Hateful Meme Detection

```python
messages = [
    {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying hateful content in memes"},
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: bows here, bows there, bows everywhere"}
    ]}
]

input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))
```

## Prompt Templates

### Arabic Meme (Arabic Explanation)
```
System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts.

User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in Arabic. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT}
```

### Arabic Meme (English Explanation)
```
System: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts.

User: You are an expert social media image analyzer specializing in identifying propaganda in Arabic contexts. I will provide you with Arabic memes and the text extracted from these images. Your task is to classify the image as one of the following: 'propaganda', 'not-propaganda', 'not-meme', or 'other', and provide a brief explanation in English. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT}
```

### English Hateful Meme
```
System: You are an expert social media image analyzer specializing in identifying hateful content in memes

User: I will provide you with memes and the text extracted from these images. Your task is to classify the image as one of the following: 'hateful' or 'not-hateful' and provide a brief explanation. Start your response with 'Label:' followed by the classification label, then on a new line begin with 'Explanation:' and briefly state your reasoning. Text extracted: {OCR_TEXT}
```

## Expected Output Format

The model outputs in the following format:
```
Label: [classification_label]
Explanation: [reasoning for the classification]
```

## Training

- **Base Model**: [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
- **Training Dataset**: [QCRI/MemeXplain](https://huggingface.co/datasets/QCRI/MemeXplain)
- **Training Method**: Multi-stage optimization approach

## Performance

MemeIntel achieves state-of-the-art results:
- **ArMeme (Arabic Propaganda)**: ~3% absolute improvement over previous SOTA
- **Hateful Memes (English)**: ~7% absolute improvement over previous SOTA

## Citation

If you use this model, please cite:

```bibtex
@inproceedings{kmainasi-etal-2025-memeintel,
    title = "{M}eme{I}ntel: Explainable Detection of Propagandistic and Hateful Memes",
    author = "Kmainasi, Mohamed Bayan  and
      Hasnat, Abul  and
      Hasan, Md Arid  and
      Shahroor, Ali Ezzat  and
      Alam, Firoj",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1539/",
    doi = "10.18653/v1/2025.emnlp-main.1539",
    pages = "30263--30279",
    ISBN = "979-8-89176-332-6",
}
```

## License

This model is released under the [Llama 3.2 Community License](https://www.llama.com/llama3_2/license/).

## Authors

- Mohamed Bayan Kmainasi
- Abul Hasnat
- Md Arid Hasan
- Ali Ezzat Shahroor
- Firoj Alam

Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University