LMMs Eval Documentation

Welcome to the documentation for lmms-eval - a unified evaluation framework for Large Multimodal Models!

This framework enables consistent and reproducible evaluation of multimodal models across various tasks and modalities including images, videos, and audio.

Overview

lmms-eval provides:

Standardized evaluation protocols for multimodal models
Support for image, video, and audio tasks
Easy integration of new models and tasks
Reproducible benchmarking with shareable configurations

Majority of this documentation is adapted from lm-eval-harness

Commands Guide - Learn about command line flags and options
Model Guide - How to add and integrate new models
Task Guide - Create custom evaluation tasks
Current Tasks - List of all supported evaluation tasks
Run Examples - Example commands for running evaluations
Caching - Enable and reload results from the JSONL cache
Version 0.3 Features - Audio evaluation and new features
Throughput Metrics - Understanding performance metrics

Additional Resources

For dataset formatting tools, see lmms-eval tools
For the latest updates, visit our GitHub repository

csuhan
/

llm_cp2

LMMs Eval Documentation

Overview

Table of Contents

Additional Resources