Spaces:
Runtime error
Runtime error
| from dataclasses import dataclass | |
| from enum import Enum | |
| class Task: | |
| benchmark: str | |
| metric: str | |
| col_name: str | |
| # Select your tasks here | |
| # --------------------------------------------------- | |
| class Tasks(Enum): | |
| # task_key in the json file, metric_key in the json file, name to display in the leaderboard | |
| task0 = Task("anli_r1", "acc", "ANLI") | |
| task1 = Task("logiqa", "acc_norm", "LogiQA") | |
| NUM_FEWSHOT = 0 # Change with your few shot | |
| # --------------------------------------------------- | |
| # Your leaderboard name | |
| TITLE = """<h1 align="center" id="space-title">🥇 Test Space</h1>""" | |
| # What does your leaderboard evaluate? | |
| INTRODUCTION_TEXT = """ | |
| Leaderboards for LLM evaluation. | |
| *TRUE(Trustworthy Real-world Usage Evaluation)Bench* is designed to evaluate LLMs for Productivity Assistants which stand for human's job productivity. | |
| """ | |
| # Which evaluations are you running? how can people reproduce what you have? | |
| LLM_BENCHMARKS_TEXT = f""" | |
| ## How it works | |
| We utilize LLM Judge with human-crafted criteria to assess AI response. | |
| """ | |
| EVALUATION_QUEUE_TEXT = """ | |
| ## Submission Policy | |
| For each benchmark: | |
| 1. Each model affiliation (individual or organization) can submit up to 3 times within 24 hours. | |
| 2. The same model can only be submitted once within 24 hours. | |
| 3. Criteria for determining duplicate submissions: | |
| - Benchmark name | |
| - Model full name | |
| - Sampling parameters, dtype, vLLM version, etc. are not subject to duplicate checking. | |
| 4. Submissions are only allowed if the model's organization or username matches that of the submitter. | |
| ## Some good practices before submitting a model | |
| ### 1) Make sure you can load your model and tokenizer using AutoClasses: | |
| ```python | |
| from transformers import AutoConfig, AutoModel, AutoTokenizer | |
| config = AutoConfig.from_pretrained("your model name", revision=revision) | |
| model = AutoModel.from_pretrained("your model name", revision=revision) | |
| tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision) | |
| ``` | |
| If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded. | |
| Note: make sure your model is public! | |
| Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted! | |
| ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index) | |
| It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`! | |
| ### 3) Make sure your model has an open license! | |
| This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗 | |
| ### 4) Fill up your model card | |
| When we add extra information about models to the leaderboard, it will be automatically taken from the model card | |
| """ | |
| EVALUATION_QUEUE_TEXT_OPTION1 = """ | |
| # (Option 1) Submit HF model where vLLM inference is available | |
| 1. Fill the information including model name, vLLM version, sampling hyperparameters. | |
| 2. Sign in using the log-in button below. | |
| 3. Press "Submit Eval" button to submit. | |
| """ | |
| EVALUATION_QUEUE_TEXT_OPTION2 = """ | |
| # (Option 2) Submit HF model where vLLM inference is unavailable | |
| 1. Fill the information same with Option 1 and code snippets of model loading, inference, and termination. | |
| 2. Sign in using the log-in button below. | |
| 3. Press "Submit Eval" button to submit. | |
| """ | |
| EVALUATION_QUEUE_TEXT_OPTION3 = """ | |
| # (Option 3) Pull Request | |
| If Option 1 & 2 is unavailable, make [PR](https://huggingface.co/spaces/coms1580/test_space/discussions?new_pr=true) with [ADD_MODEL] prefix with contents as follows: | |
| ``` | |
| ### Open-weight models: | |
| - Benchmark Name: [The name of benchmark to be evaluated] | |
| - HugingFace Model ID: [HF_MODEL_ID] | |
| - Pretty Name: [PRETTY_NAME] | |
| - Sampling parameters: | |
| - Temperature | |
| - Top-p | |
| - Top-k | |
| - Presence penalty | |
| - Frequency penalty | |
| - Repetition penalty | |
| - Supported by vLLM: [yes/no] | |
| - (If yes) Version of vLLM | |
| - (If no) Code snippets: | |
| - Model loading | |
| - Inference | |
| - Termination | |
| ### Misc. | |
| - Contact: [your email] | |
| - Description: [e.g., paper link, blog post, etc.] | |
| - Notes: [optional] | |
| ``` | |
| """ | |
| CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
| CITATION_BUTTON_TEXT = r""" | |
| """ | |