File size: 1,587 Bytes
75d0766
 
 
 
 
 
 
 
 
b1bfa3e
 
c9f89df
b1bfa3e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: README
emoji: 🌍
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---


<div align="center">
<img src='https://cdn-uploads.huggingface.co/production/uploads/647773a1168cb428e00e9a8f/N8lP93rB6lL3iqzML4SKZ.png'  width=100px>

<h1 align="center"><b>On Path to Multimodal Generalist: Levels and Benchmarks</b></h1>
<p align="center">
<a href="https://generalist.top/">[πŸ“– Project]</a>
<a href="https://level.generalist.top">[πŸ† Leaderboard]</a>
<a href="https://arxiv.org/abs/2510.10101">[πŸ“„ Paper]</a>
<a href="https://huggingface.co/General-Level">[πŸ€— Dataset-HF]</a>
<a href="https://github.com/path2generalist/GeneralBench">[πŸ“ Dataset-Github]</a>
</p>

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/license/mit)

---
</div>



<h1 align="center" style="color:#F27E7E"><em>
Does higher performance across tasks indicate a stronger capability of MLLM, and closer to AGI?
<br>
NO! <b style="color:red">Synergy</b> does.
</em></h1>


This project introduces:

1. **General-Level**, a 5-scale level evaluation system with a new norm for assessing the multimodal generalists (multimodal LLMs/agents). The core is the use of Synergy as the evaluative criterion, categorizing capabilities based on whether MLLMs preserve synergy across comprehension and generation, as well as across multimodal interactions. 

2. **General-Bench**, a companion  massive multimodal benchmark dataset, encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325K instances.