Spaces:
Running
Running
File size: 4,380 Bytes
13edd99 3165936 29546b4 c85dcc4 58733e4 29546b4 6c930b9 96fc8c4 13edd99 6c930b9 e7226cc 29546b4 6c930b9 c85dcc4 3aa78c2 13edd99 f7d1b51 13edd99 6c930b9 3aa78c2 3f84332 13edd99 3aa78c2 b98f07f 072fab0 3aa78c2 072fab0 3aa78c2 b98f07f f7d1b51 6c930b9 2a73469 13edd99 04c5cbd 81adfdd 04c5cbd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
from textwrap import dedent
NUM_FEWSHOT = 0 # Change with your few shot
# ---------------------------------------------------
# Your leaderboard name
TITLE = """<h1 align="center" id="space-title">EASI Leaderboard</h1>"""
# What does your leaderboard evaluate?
INTRODUCTION_TEXT = dedent("""
**EASI: Holistic Evaluation of Multimodal LLMs on Spatial Intelligence**
EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.
""")
# Which evaluations are you running? how can people reproduce what you have?
LLM_BENCHMARKS_TEXT = dedent("""
## Leaderboard
You can find the documentation of EASI here: [EvolvingLMMs-Lab/EASI](https://github.com/EvolvingLMMs-Lab/EASI).
And the dataset for this leaderboard: [lmms-lab-si/EASI-Leaderboard-Data](https://huggingface.co/datasets/lmms-lab-si/EASI-Leaderboard-Data)
""")
EVALUATION_QUEUE_TEXT = ""
dedent("""
## Some good practices before submitting an evaluation with EASI
### 1) Make sure you can load your model and tokenizer using AutoClasses:
```python
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("your model name", revision=revision)
model = AutoModel.from_pretrained("your model name", revision=revision)
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
```
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
Note: make sure your model is public!
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
### 3) Make sure your model has an open license!
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
### 4) Fill up your model card
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
## In case of model failure
If your model is displayed in the `FAILED` category, its execution stopped.
Make sure you have followed the above steps first.
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
""")
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = dedent("""
@article{easi2025,
title={Has gpt-5 achieved spatial intelligence? an empirical study},
author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
journal={arXiv preprint arXiv:2508.13142},
year={2025}
}
""").strip()
# --------------------------------------
SUBMISSION_INSTRUCTIONS_TEXT = dedent("""
## Submission Instructions
0. **Login** to your HuggingFace account.
1. Fill in the model name to search for on the HuggingFace Hub. (e.g. `qwen/qwen3-vl-8b-instruct` )
2. Select the model from search results, and check the model name autofilled below (e.g. `Qwen/Qwen3-VL-8B-Instruct`).
3. (Optional) Fill in the revision commit of the model. If not filled, means using the latest `main` branch.
4. Select the model type. (e.g. `pretrained`)
5. Select the precision of the model. (e.g. `bfloat16`)
6. Select the weights type of the model. (defaults to `Original`)
7. (Optional) Fill in the base model name for **delta** or **adapter** weights. (e.g. `Qwen/Qwen3-VL-8B-Instruct`)
8. Select a benchmark to evaluate on, and fill in the evaluation result value (e.g. `0.5` for `VSI-bench` `acc`).
9. (Optional) click **[+]** button to add more benchmarks & evaluation result values.
10. Click the **Submit Eval** button to submit the evaluation request.
""")
|