--- license: mit language: - en - ar - bn - de - es - hi - ro - ru - zh base_model: Qwen/Qwen3-VL-8B-Instruct library_name: transformers pipeline_tag: image-text-to-text tags: - memes - multimodal - multilingual - hate-speech - vision-language-model - qwen3-vl datasets: - QCRI/MemeLens --- # MemeLens-VLM **MemeLens** is a unified multilingual, multitask Vision-Language Model (VLM) for meme understanding. It is fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) using a **classify-then-explain** training strategy on the [MemeLens dataset](https://huggingface.co/datasets/QCRI/MemeLens), which consolidates 38 public meme datasets across 20 tasks and 9 languages. **Paper:** [MemeLens: Multilingual Multitask VLMs for Memes](https://arxiv.org/abs/2601.12539)

MemeLens Data Construction Overview

## Overview | | | |---|---| | **Base model** | Qwen3-VL-8B-Instruct | | **Training** | Classify-then-explain (multi-stage) | | **Tasks** | 20 tasks: harm, targets, figurative/pragmatic intent, affect | | **Languages** | AR, BN, DE, EN, ES, HI, RO, RU, ZH | | **Datasets** | 38 consolidated public meme datasets | ## Results ### Overall Comparison | Model / Modality | Acc | M-F1 | W-F1 | |---|---|---|---| | Uni-modal (Text) | 65.0 | 0.460 | 0.590 | | Uni-modal (Image) | 63.6 | 0.472 | 0.600 | | Multi-modal (Seq-Classification) | 71.0 | 0.580 | 0.680 | | GPT-4.1 (Zero-Shot) | 61.2 | 0.533 | 0.599 | | Qwen3-VL-8B-Instruct (Zero-Shot) | 55.1 | 0.482 | 0.539 | | InternVL3.5-8B (Zero-Shot) | 55.4 | 0.476 | 0.545 | | Gemma-3-12B (Zero-Shot) | 48.2 | 0.439 | 0.485 | | Qwen3-2B (Zero-Shot) | 45.6 | 0.394 | 0.431 | | Phi-3.5-Vision-4.2B (Zero-Shot) | 43.8 | 0.393 | 0.447 | | **MemeLens (Ours)** | **74.1** | **0.625** | **0.720** | ### Per-Dataset Breakdown | Dataset | Task | Lang. | Text-Only (Acc/Ma/W) | Image-Only (Acc/Ma/W) | MM-Seq (Acc/Ma/W) | **MemeLens** (Acc/Ma/W) | |---|---|---|---|---|---|---| | BanglaAbuse | Abuse | BN | .660/.564/.615 | .680/.628/.663 | .731/.698/.723 | **.787/.759/.782** | | RoMemes | Deepfake | RO | .634/.259/.493 | .645/.399/.630 | .575/.338/.551 | **.770/.491/.753** | | HarMeme (Co) | Harmful | EN | .712/.499/.706 | .703/.443/.677 | **.811/.546/.797** | .748/.523/.740 | | HarMeme | Harmful | EN | .499/.338/.489 | .535/.362/.527 | .590/.400/.590 | **.622/.467/.617** | | Prop2Hate | Propaganda | AR | .746/.427/.637 | .743/.426/.636 | **.800/.650/.760** | .772/.546/.703 | | MUTE | Propaganda | BN | .642/.556/.602 | .688/.659/.682 | **.730/.710/.730** | .719/.700/.718 | | Multi3Hate | Hateful | DE | .590/.371/.438 | .557/.504/.533 | .720/.710/.720 | **.754/.731/.745** | | MIMIC_Isl | Hateful | EN | .647/.633/.635 | .580/.576/.577 | .510/.340/.350 | **.707/.707/.707** | | MMHS | Hateful | EN | .631/.387/.488 | .621/.495/.561 | **.630**/.500/**.570** | .614/**.516**/.568 | | Multi3Hate | Hateful | EN | .574/.573/.573 | .508/.507/.507 | **.770/.770/.770** | .741/.735/.734 | | FHM | Hateful | EN | .633/.541/.592 | .623/.507/.567 | .760/.740/.760 | **.798/.782/.798** | | Multi3Hate | Hateful | ES | .557/.358/.399 | .672/.661/.668 | .620/.600/.610 | **.800/.796/.799** | | Multi3Hate | Hateful | HI | .656/.579/.618 | .574/.528/.559 | **.750/.750/.760** | .754/.724/.744 | | Multi3Hate | Hateful | ZH | .639/.390/.499 | .574/.470/.535 | .640/.610/.640 | **.770/.740/.765** | | Memotion | Humor | EN | **.353**/.204/.276 | .325/.235/.297 | .350/.250/.310 | .352/**.248/.316** | | MET-Meme | Intention | EN | .464/.353/.444 | .383/.295/.363 | .320/.220/.340 | **.524/.442/.514** | | MET-Meme | Intention | ZH | .621/.443/.611 | .443/.212/.376 | .670/.470/.660 | **.710/.521/.701** | | MET-Meme | Metaphor | EN | .810/.725/.796 | .814/.724/.797 | **.870/.820/.870** | .867/.821/.863 | | MET-Meme | Metaphor | ZH | .847/.838/.846 | .678/.636/.662 | **.900/.890/.890** | .866/.859/.865 | | MAMI | Misogyny | EN | .623/.620/.620 | .628/.611/.611 | .750/.740/.740 | **.849/.849/.849** | | MIMIC2024 | Misogyny (Cat.) | HI-EN | .470/.146/.412 | .470/.146/.412 | .660/.290/.430 | **.766/.592/.659** | | MIMIC2024 | Misogyny | HI-EN | .673/.671/.671 | .630/.246/.570 | .850/.850/.850 | **.899/.899/.899** | | Memotion | Motivational | EN | **.647**/.399/.512 | .608/**.451**/.537 | .640/.450/.540 | .637/.450/**.545** | | MAMI | Objectification | EN | .670/.503/.590 | .732/.671/.714 | .810/.780/.810 | **.835/.797/.826** | | Memotion | Offensive | EN | .388/.140/.217 | .371/**.227/.334** | **.390**/.220/.330 | .386/.215/.325 | | MET-Meme | Offensive | EN | .748/.242/.667 | .742/.233/.658 | .740/**.310/.710** | **.748**/.309/.708 | | MET-Meme | Offensive | ZH | .803/.485/.787 | .742/.223/.684 | .810/.500/.790 | **.830/.535/.819** | | RoMemes | Political | RO | .677/.404/.547 | .656/.524/.613 | .830/.780/.820 | **.867/.834/.858** | | ArMeme | Propaganda | AR | .755/.639/.734 | .735/.554/.685 | **.790/.690/.770** | .789/.679/.765 | | BanglaAbuse | Sarcasm | BN | .639/.568/.599 | .656/.636/.651 | **.680**/.660/.670 | .674/**.661/.672** | | Memotion | Sarcasm | EN | .502/.167/.335 | .468/**.195/.352** | **.510**/.170/.340 | .501/.167/.337 | | MAMI | Shaming | EN | .854/.461/.787 | .834/.610/.819 | .870/.710/.870 | **.898/.719/.883** | | MAMI | Stereotype | EN | .661/.561/.624 | .729/.336/.707 | .740/.700/.730 | **.784/.739/.772** | | HarMeme (Co) | Target | EN | .777/.449/.788 | .729/.336/.707 | **.870**/.550/.870 | .823/.420/.840 | | HarMeme | Target | EN | .485/.314/.479 | .451/.204/.404 | **.590**/.350/**.580** | .562/**.493**/.565 | | Toxic | Toxic | RU | .826/.493/.771 | .839/.495/.777 | .860/.700/.850 | **.866/.691/.853** | | MAMI | Violence | EN | .853/.504/.793 | .722/.618/.702 | .910/.770/.890 | **.923/.809/.914** | | BanglaAbuse | Vulgar | BN | .743/.680/.740 | .743/.680/.740 | .800/.750/.800 | **.827/.772/.821** | | **Average** | | | 0.650/0.460/0.590 | 0.636/0.472/0.600 | 0.706/0.579/0.678 | **0.741/0.625/0.720** | Acc = Accuracy, Ma = Macro-F1, W = Weighted-F1. Bold indicates best performance for the dataset's official evaluation metric. ## Supported Tasks

Task and Language Distribution

The model supports 20 meme understanding tasks across multiple categories: | Category | Tasks | |---|---| | **Harm** | Hateful, Harmful, Toxic, Abusive, Vulgar | | **Targets** | Target identification, Misogyny, Misogyny Categories, Objectification, Shaming, Stereotype, Violence | | **Figurative/Pragmatic** | Propaganda, Metaphor, Intention, Sarcasm, Deepfake, Political | | **Affect** | Humor, Offensive, Motivational, Sentiment | ## Usage ```python from transformers import Qwen3VLForConditionalGeneration, AutoProcessor from PIL import Image model = Qwen3VLForConditionalGeneration.from_pretrained("QCRI/MemeLens-VLM", torch_dtype="auto", device_map="auto") processor = AutoProcessor.from_pretrained("QCRI/MemeLens-VLM") image = Image.open("path/to/meme.jpg") messages = [ {"role": "system", "content": "You are an expert social media image analyzer specializing in identifying hateful content in memes."}, {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": ( "Analyze the content of the meme to see if it promotes or displays hateful views against " "specific individuals or groups. Look for discriminatory, prejudiced, or hate speech elements " "associated with protected characteristics, including race, religion, and gender. " "Classify as: 'hateful' if present, or 'not-hateful' if absent.\n\n" "IMPORTANT: Your response must strictly follow this format:\n" "'Label: