| # Tutorial for Evaluating Intern-S1 | |
| OpenCompass now provides the necessary configs for evaluating Intern-S1. Please perform the following steps to initiate the evaluation of Intern-S1. | |
| ## Model Download and Deployment | |
| The Intern-S1 now has been open-sourced, which can be downloaded from [Huggingface](https://huggingface.co/internlm/Intern-S1). | |
| After completing the model download, it is recommended to deploy it as an API service for calling. | |
| You can deploy it based on LMdeploy/vlLM/sglang according to [this page](https://github.com/InternLM/Intern-S1/blob/main/README.md#Serving). | |
| ## Evaluation Configs | |
| ### Model Configs | |
| We provide a config example in `opencompass/configs/models/interns1/intern_s1.py`. | |
| Please make the changes according to your needs. | |
| ```python | |
| models = [ | |
| dict( | |
| abbr="intern-s1", | |
| key="YOUR_API_KEY", # Fill in your API KEY here | |
| openai_api_base="YOUR_API_BASE", # Fill in your API BASE here | |
| type=OpenAISDK, | |
| path="internlm/Intern-S1", | |
| temperature=0.7, | |
| meta_template=api_meta_template, | |
| query_per_second=1, | |
| batch_size=8, | |
| max_out_len=64000, | |
| max_seq_len=65536, | |
| openai_extra_kwargs={ | |
| 'top_p': 0.95, | |
| }, | |
| retry=10, | |
| extra_body={ | |
| "chat_template_kwargs": {"enable_thinking": True} # Control the thinking mode when deploying the model based on vllm or sglang | |
| }, | |
| pred_postprocessor=dict(type=extract_non_reasoning_content), # Extract non-reasoning contents when opening the thinking mode | |
| ), | |
| ] | |
| ``` | |
| ### Dataset Configs | |
| We provide a config for datasets used for evaluating Intern-S1 in `examples/eval_bench_intern_s1.py`. | |
| You can also add other datasets as needed. | |
| In addition, you need to add the configuration of the LLM Judger in this config file, as shown in the following example: | |
| ```python | |
| judge_cfg = dict( | |
| abbr='YOUR_JUDGE_MODEL', | |
| type=OpenAISDK, | |
| path='YOUR_JUDGE_MODEL_PATH', | |
| key='YOUR_API_KEY', | |
| openai_api_base='YOUR_API_BASE', | |
| meta_template=dict( | |
| round=[ | |
| dict(role='HUMAN', api_role='HUMAN'), | |
| dict(role='BOT', api_role='BOT', generate=True), | |
| ]), | |
| query_per_second=1, | |
| batch_size=1, | |
| temperature=0.001, | |
| max_out_len=8192, | |
| max_seq_len=32768, | |
| mode='mid', | |
| ) | |
| ``` | |
| ## Start Evaluation | |
| After completing the above configuration, | |
| enter the following command to start the evaluation: | |
| ```bash | |
| opencompass examples/eval_bench_intern_s1.py | |
| ``` | |