msj19
/

opencompass

Model card Files Files and versions

opencompass / docs /en /user_guides /interns1.md

msj19's picture

Add files using upload-large-folder tool

65775f0 verified about 1 month ago

|

history blame contribute delete

2.51 kB

	# Tutorial for Evaluating Intern-S1

	OpenCompass now provides the necessary configs for evaluating Intern-S1. Please perform the following steps to initiate the evaluation of Intern-S1.

	## Model Download and Deployment

	The Intern-S1 now has been open-sourced, which can be downloaded from [Huggingface](https://huggingface.co/internlm/Intern-S1).
	After completing the model download, it is recommended to deploy it as an API service for calling.
	You can deploy it based on LMdeploy/vlLM/sglang according to [this page](https://github.com/InternLM/Intern-S1/blob/main/README.md#Serving).

	## Evaluation Configs

	### Model Configs

	We provide a config example in `opencompass/configs/models/interns1/intern_s1.py`.
	Please make the changes according to your needs.

	```python
	models = [
	dict(
	abbr="intern-s1",
	key="YOUR_API_KEY", # Fill in your API KEY here
	openai_api_base="YOUR_API_BASE", # Fill in your API BASE here
	type=OpenAISDK,
	path="internlm/Intern-S1",
	temperature=0.7,
	meta_template=api_meta_template,
	query_per_second=1,
	batch_size=8,
	max_out_len=64000,
	max_seq_len=65536,
	openai_extra_kwargs={
	'top_p': 0.95,
	},
	retry=10,
	extra_body={
	"chat_template_kwargs": {"enable_thinking": True} # Control the thinking mode when deploying the model based on vllm or sglang
	},
	pred_postprocessor=dict(type=extract_non_reasoning_content), # Extract non-reasoning contents when opening the thinking mode
	),
	]
	```

	### Dataset Configs

	We provide a config for datasets used for evaluating Intern-S1 in `examples/eval_bench_intern_s1.py`.
	You can also add other datasets as needed.

	In addition, you need to add the configuration of the LLM Judger in this config file, as shown in the following example:

	```python
	judge_cfg = dict(
	abbr='YOUR_JUDGE_MODEL',
	type=OpenAISDK,
	path='YOUR_JUDGE_MODEL_PATH',
	key='YOUR_API_KEY',
	openai_api_base='YOUR_API_BASE',
	meta_template=dict(
	round=[
	dict(role='HUMAN', api_role='HUMAN'),
	dict(role='BOT', api_role='BOT', generate=True),
	]),
	query_per_second=1,
	batch_size=1,
	temperature=0.001,
	max_out_len=8192,
	max_seq_len=32768,
	mode='mid',
	)
	```

	## Start Evaluation

	After completing the above configuration,
	enter the following command to start the evaluation:

	```bash
	opencompass examples/eval_bench_intern_s1.py
	```