File size: 4,380 Bytes
13edd99
3165936
 
 
29546b4
 
 
c85dcc4
58733e4
29546b4
6c930b9
96fc8c4
13edd99
 
6c930b9
e7226cc
29546b4
6c930b9
c85dcc4
3aa78c2
13edd99
f7d1b51
13edd99
6c930b9
3aa78c2
3f84332
 
13edd99
3aa78c2
 
 
 
 
 
 
 
b98f07f
072fab0
3aa78c2
 
 
 
 
 
 
 
072fab0
3aa78c2
 
 
 
b98f07f
 
f7d1b51
6c930b9
2a73469
 
13edd99
 
 
 
 
 
 
 
04c5cbd
 
 
 
 
 
81adfdd
04c5cbd
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
from textwrap import dedent

NUM_FEWSHOT = 0  # Change with your few shot
# ---------------------------------------------------


# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">EASI Leaderboard</h1>"""

# What does your leaderboard evaluate?
INTRODUCTION_TEXT = dedent("""
**EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence**

EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.
""")

# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = dedent("""
## Leaderboard

You can find the documentation of EASI here: [EvolvingLMMs-Lab/EASI](https://github.com/EvolvingLMMs-Lab/EASI).

And the dataset for this leaderboard: [lmms-lab-si/EASI-Leaderboard-Data](https://huggingface.co/datasets/lmms-lab-si/EASI-Leaderboard-Data)
""")

EVALUATION_QUEUE_TEXT = ""
dedent("""
## Some good practices before submitting an evaluation with EASI

### 1) Make sure you can load your model and tokenizer using AutoClasses:
```python
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("your model name", revision=revision)
model = AutoModel.from_pretrained("your model name", revision=revision)
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
```
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.

Note: make sure your model is public!
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!

### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!

### 3) Make sure your model has an open license!
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗

### 4) Fill up your model card
When we add extra information about models to the leaderboard, it will be automatically taken from the model card

## In case of model failure
If your model is displayed in the `FAILED` category, its execution stopped.
Make sure you have followed the above steps first.
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
""")

CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = dedent("""
@article{easi2025,
  title={Has gpt-5 achieved spatial intelligence? an empirical study},
  author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
  journal={arXiv preprint arXiv:2508.13142},
  year={2025}
}
""").strip()

# --------------------------------------

SUBMISSION_INSTRUCTIONS_TEXT = dedent("""
## Submission Instructions

0. **Login** to your HuggingFace account.
1. Fill in the model name to search for on the HuggingFace Hub. (e.g. `qwen/qwen3-vl-8b-instruct`   )
2. Select the model from search results, and check the model name autofilled below (e.g. `Qwen/Qwen3-VL-8B-Instruct`).
3. (Optional) Fill in the revision commit of the model. If not filled, means using the latest `main` branch.
4. Select the model type. (e.g. `pretrained`)
5. Select the precision of the model. (e.g. `bfloat16`)
6. Select the weights type of the model. (defaults to `Original`)
7. (Optional) Fill in the base model name for **delta** or **adapter** weights. (e.g. `Qwen/Qwen3-VL-8B-Instruct`)
8. Select a benchmark to evaluate on, and fill in the evaluation result value (e.g. `0.5` for `VSI-bench` `acc`).
9. (Optional) click **[+]** button to add more benchmarks & evaluation result values.
10. Click the **Submit Eval** button to submit the evaluation request.
""")