# Current Tasks > () indicates the task name in the lmms_eval. The task name is also used to specify the dataset in the configuration file. **Note:** This documentation is manually maintained. For the most up-to-date and complete list of supported tasks, please run: ```bash python -m lmms_eval --tasks list ``` To see the number of questions in each task: ```bash python -m lmms_eval --tasks list_with_num ``` (Note: `list_with_num` will download all datasets and may require significant time and storage) ## 1. Image tasks: - [AI2D](https://arxiv.org/abs/1603.07396) (ai2d) - [ChartQA](https://github.com/vis-nlp/ChartQA) (chartqa) - [COCO Caption](https://github.com/tylin/coco-caption) (coco_cap) - COCO 2014 Caption (coco2014_cap) - COCO 2014 Caption Validation (coco2014_cap_val) - COCO 2014 Caption Test (coco2014_cap_test) - COCO 2017 Caption (coco2017_cap) - COCO 2017 Caption MiniVal (coco2017_cap_val) - COCO 2017 Caption MiniTest (coco2017_cap_test) - [ConBench](https://github.com/foundation-multimodal-models/ConBench) (conbench) - [DetailCaps-4870](https://github.com/foundation-multimodal-models/CAPTURE) (detailcaps) - [DOCVQA](https://github.com/anisha2102/docvqa) (docvqa) - DOCVQA Validation (docvqa_val) - DOCVQA Test (docvqa_test) - [Ferret](https://github.com/apple/ml-ferret) (ferret) - [Flickr30K](https://github.com/BryanPlummer/flickr30k_entities) (flickr30k) - Flickr30K Test (flickr30k_test) - [GQA](https://cs.stanford.edu/people/dorarad/gqa/index.html) (gqa) - [GQA-ru](https://huggingface.co/datasets/deepvk/GQA-ru) (gqa_ru) - [II-Bench](https://github.com/II-Bench/II-Bench) (ii_bench) - [IllusionVQA](https://illusionvqa.github.io/) (illusionvqa) - [Infographic VQA](https://www.docvqa.org/datasets/infographicvqa) (infovqa) - Infographic VQA Validation (infovqa_val) - Infographic VQA Test (infovqa_test) - [LiveBench](https://huggingface.co/datasets/lmms-lab/LiveBench) (live_bench) - LiveBench 06/2024 (live_bench_2406) - LiveBench 07/2024 (live_bench_2407) - [LLaVA-Bench-Wilder](https://huggingface.co/datasets/lmms-lab/LLaVA-Bench-Wilder) (llava_wilder_small) - [LLaVA-Bench-COCO](https://llava-vl.github.io/) (llava_bench_coco) - [LLaVA-Bench](https://llava-vl.github.io/) (llava_in_the_wild) - [MathVerse](https://github.com/ZrrSkywalker/MathVerse) (mathverse) - MathVerse Text Dominant (mathverse_testmini_text_dominant) - MathVerse Text Only (mathverse_testmini_text_only) - MathVerse Text Lite (mathverse_testmini_text_lite) - MathVerse Vision Dominant (mathverse_testmini_vision_dominant) - MathVerse Vision Intensive (mathverse_testmini_vision_intensive) - MathVerse Vision Only (mathverse_testmini_vision_only) - [MathVista](https://mathvista.github.io/) (mathvista) - MathVista Validation (mathvista_testmini) - MathVista Test (mathvista_test) - [MMBench](https://github.com/open-compass/MMBench) (mmbench) - MMBench English (mmbench_en) - MMBench English Dev (mmbench_en_dev) - MMBench English Test (mmbench_en_test) - MMBench Chinese (mmbench_cn) - MMBench Chinese Dev (mmbench_cn_dev) - MMBench Chinese Test (mmbench_cn_test) - [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) (mme) - [MME-RealWorld](https://mme-realworld.github.io/) (mmerealworld) - MME-RealWorld English (mmerealworld) - MME-RealWorld Mini (mmerealworld_lite) - MME-RealWorld Chinese (mmerealworld_cn) - [MMRefine](http://mmrefine.github.io/) (mmrefine) - [MMStar](https://github.com/MMStar-Benchmark/MMStar) (mmstar) - [MMUPD](https://huggingface.co/datasets/MM-UPD/MM-UPD) (mmupd) - MMUPD Base (mmupd_base) - MMAAD Base (mmaad_base) - MMIASD Base (mmiasd_base) - MMIVQD Base (mmivqd_base) - MMUPD Option (mmupd_option) - MMAAD Option (mmaad_option) - MMIASD Option (mmiasd_option) - MMIVQD Option (mmivqd_option) - MMUPD Instruction (mmupd_instruction) - MMAAD Instruction (mmaad_instruction) - MMIASD Instruction (mmiasd_instruction) - MMIVQD Instruction (mmivqd_instruction) - [MMVet](https://github.com/yuweihao/MM-Vet) (mmvet) - [Multilingual LlaVa Bench](https://huggingface.co/datasets/gagan3012/multilingual-llava-bench) - llava_in_the_wild_arabic - llava_in_the_wild_bengali - llava_in_the_wild_chinese - llava_in_the_wild_french - llava_in_the_wild_hindi - llava_in_the_wild_japanese - llava_in_the_wild_russian - llava_in_the_wild_spanish - llava_in_the_wild_urdu - [NaturalBench](https://huggingface.co/datasets/BaiqiL/NaturalBench) - [NoCaps](https://nocaps.org/) (nocaps) - NoCaps Validation (nocaps_val) - NoCaps Test (nocaps_test) - [OCRBench](https://github.com/Yuliang-Liu/MultimodalOCR) (ocrbench) - [OKVQA](https://okvqa.allenai.org/) (ok_vqa) - OKVQA Validation 2014 (ok_vqa_val2014) - [POPE](https://github.com/RUCAIBox/POPE) (pope) - [RefCOCO](https://github.com/lichengunc/refer) (refcoco) - refcoco_seg_test - refcoco_seg_val - refcoco_seg_testA - refcoco_seg_testB - refcoco_bbox_test - refcoco_bbox_val - refcoco_bbox_testA - refcoco_bbox_testB - [RefCOCO+](https://github.com/lichengunc/refer) (refcoco+) - refcoco+\_seg - refcoco+\_seg_val - refcoco+\_seg_testA - refcoco+\_seg_testB - refcoco+\_bbox - refcoco+\_bbox_val - refcoco+\_bbox_testA - refcoco+\_bbox_testB - [RefCOCOg](https://github.com/lichengunc/refer) (refcocog) - refcocog_seg_test - refcocog_seg_val - refcocog_bbox_test - refcocog_bbox_val - [ScienceQA](https://scienceqa.github.io/) (scienceqa_full) - ScienceQA Full (scienceqa) - ScienceQA IMG (scienceqa_img) - [ScreenSpot](https://github.com/njucckevin/SeeClick) (screenspot) - ScreenSpot REC / Grounding (screenspot_rec) - ScreenSpot REG / Instruction Generation (screenspot_reg) - [ST-VQA](https://rrc.cvc.uab.es/?ch=11) (stvqa) - [synthdog](https://github.com/clovaai/donut) (synthdog) - synthdog English (synthdog_en) - synthdog Chinese (synthdog_zh) - [TextCaps](https://textvqa.org/textcaps/) (textcaps) - TextCaps Validation (textcaps_val) - TextCaps Test (textcaps_test) - [TextVQA](https://textvqa.org/) (textvqa) - TextVQA Validation (textvqa_val) - TextVQA Test (textvqa_test) - [VCR-Wiki](https://github.com/tianyu-z/VCR) - VCR-Wiki English - VCR-Wiki English easy 100 (vcr_wiki_en_easy_100) - VCR-Wiki English easy 500 (vcr_wiki_en_easy_500) - VCR-Wiki English easy (vcr_wiki_en_easy) - VCR-Wiki English hard 100 (vcr_wiki_en_hard_100) - VCR-Wiki English hard 500 (vcr_wiki_en_hard_500) - VCR-Wiki English hard (vcr_wiki_en_hard) - VCR-Wiki Chinese - VCR-Wiki Chinese easy 100 (vcr_wiki_zh_easy_100) - VCR-Wiki Chinese easy 500 (vcr_wiki_zh_easy_500) - VCR-Wiki Chinese easy (vcr_wiki_zh_easy) - VCR-Wiki Chinese hard 100 (vcr_wiki_zh_hard_100) - VCR-Wiki Chinese hard 500 (vcr_wiki_zh_hard_500) - VCR-Wiki Chinese hard (vcr_wiki_zh_hard) - [VibeEval](https://github.com/reka-ai/reka-vibe-eval) (vibe_eval) - [VizWizVQA](https://vizwiz.org/tasks-and-datasets/vqa/) (vizwiz_vqa) - VizWizVQA Validation (vizwiz_vqa_val) - VizWizVQA Test (vizwiz_vqa_test) - [VL-RewardBench](https://vl-rewardbench.github.io) (vl_rewardbench) - [VQAv2](https://visualqa.org/) (vqav2) - VQAv2 Validation (vqav2_val) - VQAv2 Test (vqav2_test) - [WebSRC](https://x-lance.github.io/WebSRC/) (websrc) - WebSRC Validation (websrc_val) - WebSRC Test (websrc_test) - [WildVision-Bench](https://github.com/WildVision-AI/WildVision-Bench) (wildvision) - WildVision 0617(wildvision_0617) - WildVision 0630 (wildvision_0630) - [SeedBench 2 Plus](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2-plus) (seedbench_2_plus) - [SalBench](https://salbench.github.io/) - p3 - p3_box - p3_box_img - o3 - o3_box - o3_box_img ## 2. Multi-image tasks: - [CMMMU](https://cmmmu-benchmark.github.io/) (cmmmu) - CMMMU Validation (cmmmu_val) - CMMMU Test (cmmmu_test) - [HallusionBench](https://github.com/tianyi-lab/HallusionBench) (hallusion_bench_image) - [ICON-QA](https://iconqa.github.io/) (iconqa) - ICON-QA Validation (iconqa_val) - ICON-QA Test (iconqa_test) - [JMMMU](https://mmmu-japanese-benchmark.github.io/JMMMU/) (jmmmu) - [LLaVA-NeXT-Interleave-Bench](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Interleave-Bench) (llava_interleave_bench) - llava_interleave_bench_in_domain - llava_interleave_bench_out_domain - llava_interleave_bench_multi_view - [MIRB](https://github.com/ys-zong/MIRB) (mirb) - [MMMU](https://mmmu-benchmark.github.io/) (mmmu) - MMMU Validation (mmmu_val) - MMMU Test (mmmu_test) - [MMMU_Pro](https://huggingface.co/datasets/MMMU/MMMU_Pro) - MMMU Pro (mmmu_pro) - MMMU Pro Original (mmmu_pro_original) - MMMU Pro Vision (mmmu_pro_vision) - MMMU Pro COT (mmmu_pro_cot) - MMMU Pro Original COT (mmmu_pro_original_cot) - MMMU Pro Vision COT (mmmu_pro_vision_cot) - MMMU Pro Composite COT (mmmu_pro_composite_cot) - [MMT Multiple Image](https://mmt-bench.github.io/) (mmt_mi) - MMT Multiple Image Validation (mmt_mi_val) - MMT Multiple Image Test (mmt_mi_test) - [MuirBench](https://muirbench.github.io/) (muirbench) - [MP-DocVQA](https://github.com/rubenpt91/MP-DocVQA-Framework) (multidocvqa) - MP-DocVQA Validation (multidocvqa_val) - MP-DocVQA Test (multidocvqa_test) - [OlympiadBench](https://github.com/OpenBMB/OlympiadBench) (olympiadbench) - OlympiadBench Test English (olympiadbench_test_en) - OlympiadBench Test Chinese (olympiadbench_test_cn) - [Q-Bench](https://q-future.github.io/Q-Bench/) (qbenchs_dev) - Q-Bench2-HF (qbench2_dev) - Q-Bench-HF (qbench_dev) - A-Bench-HF (abench_dev) - [MEGA-Bench](https://tiger-ai-lab.github.io/MEGA-Bench/) (megabench) - MEGA-Bench Core (megabench_core) - MEGA-Bench Open (megabench_open) - MEGA-Bench Core single-image subset (megabench_core_si) - MEGA-Bench Open single-image subset (megabench_open_si) ## 3. Videos tasks: - [ActivityNet-QA](https://github.com/MILVLG/activitynet-qa) (activitynetqa_generation) - [SeedBench](https://github.com/AILab-CVC/SEED-Bench) (seedbench) - [SeedBench 2](https://github.com/AILab-CVC/SEED-Bench) (seedbench_2) - [CVRR-ES](https://github.com/mbzuai-oryx/CVRR-Evaluation-Suite) (cvrr) - cvrr_continuity_and_object_instance_count - cvrr_fine_grained_action_understanding - cvrr_interpretation_of_social_context - cvrr_interpretation_of_visual_context - cvrr_multiple_actions_in_a_single_video - cvrr_non_existent_actions_with_existent_scene_depictions - cvrr_non_existent_actions_with_non_existent_scene_depictions - cvrr_partial_actions - cvrr_time_order_understanding - cvrr_understanding_emotional_context - cvrr_unusual_and_physically_anomalous_activities - [EgoSchema](https://github.com/egoschema/EgoSchema) (egoschema) - egoschema_mcppl - egoschema_subset_mcppl - egoschema_subset - [LEMONADE](https://huggingface.co/datasets/amathislab/LEMONADE) (lemonade) - [LongVideoBench](https://github.com/longvideobench/LongVideoBench) - [MovieChat](https://github.com/rese1f/MovieChat) (moviechat) - Global Mode for entire video (moviechat_global) - Breakpoint Mode for specific moments (moviechat_breakpoint) - [MLVU](https://github.com/JUNJIE99/MLVU) (mlvu) - [MMT-Bench](https://mmt-bench.github.io/) (mmt) - MMT Validation (mmt_val) - MMT Test (mmt_test) - [MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/MVBENCH.md) (mvbench) - mvbench_action_sequence - mvbench_moving_count - mvbench_action_prediction - mvbench_episodic_reasoning - mvbench_action_antonym - mvbench_action_count - mvbench_scene_transition - mvbench_object_shuffle - mvbench_object_existence - mvbench_fine_grained_pose - mvbench_unexpected_action - mvbench_moving_direction - mvbench_state_change - mvbench_object_interaction - mvbench_character_order - mvbench_action_localization - mvbench_counterfactual_inference - mvbench_fine_grained_action - mvbench_moving_attribute - mvbench_egocentric_navigation - [NExT-QA](https://github.com/doc-doc/NExT-QA) (nextqa) - NExT-QA Multiple Choice Test (nextqa_mc_test) - NExT-QA Open Ended Validation (nextqa_oe_val) - NExT-QA Open Ended Test (nextqa_oe_test) - [PerceptionTest](https://github.com/google-deepmind/perception_test) - PerceptionTest Test - perceptiontest_test_mc - perceptiontest_test_mcppl - PerceptionTest Validation - perceptiontest_val_mc - perceptiontest_val_mcppl - [TempCompass](https://github.com/llyx97/TempCompass) (tempcompass) - tempcompass_multi_choice - tempcompass_yes_no - tempcompass_caption_matching - tempcompass_captioning - [TemporalBench](https://huggingface.co/datasets/microsoft/TemporalBench) (temporalbench) - temporalbench_short_qa - temporalbench_long_qa - temporalbench_short_caption - [Vatex](https://eric-xw.github.io/vatex-website/index.html) (vatex) - Vatex Chinese (vatex_val_zh) - Vatex Test (vatex_test) - [VideoDetailDescription](https://huggingface.co/datasets/lmms-lab/VideoDetailCaption) (video_dc499) - [Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT) (videochatgpt) - Video-ChatGPT Generic (videochatgpt_gen) - Video-ChatGPT Temporal (videochatgpt_temporal) - Video-ChatGPT Consistency (videochatgpt_consistency) - [Video-MME](https://video-mme.github.io/) (videomme) - [Vinoground](https://vinoground.github.io) (vinoground) - [VITATECS](https://github.com/lscpku/VITATECS) (vitatecs) - VITATECS Direction (vitatecs_direction) - VITATECS Intensity (vitatecs_intensity) - VITATECS Sequence (vitatecs_sequence) - VITATECS Compositionality (vitatecs_compositionality) - VITATECS Localization (vitatecs_localization) - VITATECS Type (vitatecs_type) - [WorldQA](https://zhangyuanhan-ai.github.io/WorldQA/) (worldqa) - WorldQA Generation (worldqa_gen) - WorldQA Multiple Choice (worldqa_mc) - [YouCook2](http://youcook2.eecs.umich.edu/) (youcook2_val) - [VDC](https://github.com/rese1f/aurora) (vdc) - VDC Detailed Caption (detailed_test) - VDC Camera Caption (camera_test) - VDC Short Caption (short_test) - VDC Background Caption (background_test) - VDC Main Object Caption (main_object_test) - [VideoEval-Pro](https://tiger-ai-lab.github.io/VideoEval-Pro/) (videoevalpro) ## 4. Text Tasks - [GSM8K](https://github.com/openai/grade-school-math) (gsm8k) - [HellaSwag](https://rowanzellers.com/hellaswag/) (hellaswag) - [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval) (ifeval) - [MMLU](https://github.com/hendrycks/test) (mmlu) - [MMLU_pro](https://github.com/TIGER-AI-Lab/MMLU-Pro) (mmlu_pro)