LMMs Eval Documentation
Welcome to the documentation for lmms-eval - a unified evaluation framework for Large Multimodal Models!
This framework enables consistent and reproducible evaluation of multimodal models across various tasks and modalities including images, videos, and audio.
Overview
lmms-eval provides:
- Standardized evaluation protocols for multimodal models
- Support for image, video, and audio tasks
- Easy integration of new models and tasks
- Reproducible benchmarking with shareable configurations
Majority of this documentation is adapted from lm-eval-harness
Table of Contents
- Commands Guide - Learn about command line flags and options
- Model Guide - How to add and integrate new models
- Task Guide - Create custom evaluation tasks
- Current Tasks - List of all supported evaluation tasks
- Run Examples - Example commands for running evaluations
- Caching - Enable and reload results from the JSONL cache
- Version 0.3 Features - Audio evaluation and new features
- Throughput Metrics - Understanding performance metrics
Additional Resources
- For dataset formatting tools, see lmms-eval tools
- For the latest updates, visit our GitHub repository