|
|
--- |
|
|
base_model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- mllama |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Llama3.2-11B based Hate Detection in Arabic MultiModal Memes |
|
|
|
|
|
The rise of social media and online communication platforms has led to the spread of Arabic memes as a key form of digital expression. |
|
|
While these contents can be humorous and informative, they are also increasingly being used to spread offensive language and hate speech. |
|
|
Consequently, there is a growing demand for precise analysis of content in Arabic meme. |
|
|
|
|
|
This work used Llama 3.2 with its vision capability to effectively identify hate content within Arabic memes. |
|
|
The evaluation is conducted using a dataset of Arabic memes proposed in the ArabicNLP MAHED 2025 challenge. |
|
|
The results underscore the capacity of ***Llama 3.2-11B fine-tuned with Arabic memes***, to deliver the superior performance. |
|
|
|
|
|
They achieve **accuracy** of **80.3%** and **macro F1 score** of **73.3%**. |
|
|
|
|
|
The proposed solutions offer a more nuanced understanding of memes for accurate and efficient Arabic content moderation systems. |
|
|
|
|
|
|
|
|
# Examples of Arabic Memes from ArabicNLP MAHED 2025 challenge |
|
|
|
|
|
# Examples |
|
|
|
|
|
| | | | |
|
|
|:-------------------------:|:-------------------------:|:-------------------------:| |
|
|
|<img width="500" height="500" src="https://cdn-uploads.huggingface.co/production/uploads/656ee240c5ac4733e9ccdd0e/jBuVCt5163WlugFRXkSgq.jpeg"> |<img width="500" height="500" src="https://cdn-uploads.huggingface.co/production/uploads/656ee240c5ac4733e9ccdd0e/jiPId6f5IiGXxpI898llC.jpeg"> | |
|
|
|<img width="500" height="500" src="https://cdn-uploads.huggingface.co/production/uploads/656ee240c5ac4733e9ccdd0e/61acyltUsTB--ZOAMkv0a.jpeg"> |<img width="500" height="500" src="https://cdn-uploads.huggingface.co/production/uploads/656ee240c5ac4733e9ccdd0e/_alSRnwG0azE_iYq2BrpP.jpeg"> | |
|
|
|
|
|
|
|
|
``` python |
|
|
|
|
|
import pandas as pd |
|
|
import os |
|
|
from unsloth import FastVisionModel |
|
|
import torch |
|
|
from datasets import load_dataset |
|
|
from transformers import TextStreamer |
|
|
from PIL import Image |
|
|
|
|
|
|
|
|
import os |
|
|
os.environ["TOKENIZERS_PARALLELISM"] = "false" |
|
|
|
|
|
|
|
|
model_name = "NYUAD-ComNets/Llama3.2_MultiModal_Memes_Hate_Detector" |
|
|
|
|
|
model, tokenizer = FastVisionModel.from_pretrained(model_name, token='xxxxxxxxxxxxxxxxxxxxxx') |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FastVisionModel.for_inference(model) |
|
|
|
|
|
|
|
|
dataset_test = load_dataset("QCRI/Prop2Hate-Meme", split = "test") |
|
|
|
|
|
|
|
|
print(dataset_test) |
|
|
|
|
|
def add_labels_column(example): |
|
|
example["labels"] = "no_hate" if example["hate_label"] == 0 else "hate" |
|
|
return example |
|
|
|
|
|
dataset_test = dataset_test.map(add_labels_column) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
pred=[] |
|
|
|
|
|
for k in range(606): |
|
|
image = dataset_test[k]["image"] |
|
|
text = dataset_test[k]["text"] |
|
|
lab = dataset_test[k]["labels"] |
|
|
|
|
|
|
|
|
messages = [ |
|
|
|
|
|
{"role": "user", "content": [ |
|
|
{"type": "image"}, |
|
|
{"type": "text", "text": text} |
|
|
]} |
|
|
] |
|
|
|
|
|
input_text = tokenizer.apply_chat_template(messages,add_generation_prompt = True) |
|
|
inputs = tokenizer( |
|
|
image, |
|
|
input_text, |
|
|
add_special_tokens = False, |
|
|
return_tensors = "pt", |
|
|
).to("cuda") |
|
|
|
|
|
text_streamer = TextStreamer(tokenizer, skip_prompt = True) |
|
|
|
|
|
p = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128, |
|
|
use_cache = False, temperature = 0.3, min_p = 0.3) |
|
|
|
|
|
|
|
|
p = tokenizer.decode(p[0], skip_special_tokens=True) |
|
|
|
|
|
pred.append(p.split('assistant')[1].strip()) |
|
|
|
|
|
print(pred) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
We used Low-Rank Adaptation (LoRA) as the Parameter-Efficient Fine-Tuning (PEFT) method for fine-tuning utilizing the unsloth framework. |
|
|
|
|
|
The hyper-parameters of Llama 3.2-11B are as follows: |
|
|
|
|
|
the training batch size per device is set to 4. |
|
|
gradients are accumulated over 4 steps. |
|
|
the learning rate warm-up lasts for 5 steps. |
|
|
the total number of training steps is 150. |
|
|
the learning rate is set to 0.0002. |
|
|
the optimizer used is 8-bit AdamW |
|
|
weight decay is set to 0.01. |
|
|
a linear learning rate scheduler is used. |
|
|
|
|
|
# BibTeX entry and citation info |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
@inproceedings{aldahoul2025nyuad, |
|
|
title={NYUAD at MAHED Shared Task: Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models}, |
|
|
author={Aldahoul, Nouar and Zaki, Yasir}, |
|
|
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks}, |
|
|
pages={575--584}, |
|
|
year={2025} |
|
|
} |
|
|
|
|
|
|
|
|
@misc{aldahoul2025detectinghopehateemotion, |
|
|
title={Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models}, |
|
|
author={Nouar AlDahoul and Yasir Zaki}, |
|
|
year={2025}, |
|
|
eprint={2508.15810}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2508.15810}, |
|
|
} |
|
|
|
|
|
|
|
|
``` |
|
|
|