| # Task |
|
|
| The tasks in MCPMark follows two major principles |
| - The tasks are based on realistic digital environments that are also used by human programmers. |
| - The task outcome can be robustly verified in python scripts. |
|
|
| Therefore, each MCPMark task consists of three files |
| - `meta.json` |
| - `description.md` |
| - `verify.py` |
|
|
| Here, `metadata.json` includes the meta information of the task, `description.md` describes the purpose and setting of the task, as well as the instruction to complete the task. `verify.py` checks whether the task is completed successfully. |
|
|
| For example, you can ask the model agent to create a file with specific name and write specific content to the file, which belongs to the category of operating the file context. The structure looks like |
|
|
| ``` |
| tasks |
| │ |
| └───filesystem |
| │ |
| └───standard # task_suite (also supports `easy`) |
| │ |
| └───file_context # category_id |
| │ |
| └───create_file_write |
| │ meta.json |
| │ description.md |
| │ verify.py |
| ``` |
|
|
| All tasks live under `tasks/<mcp>/<task_suite>/<category>/<task_id>/`. `filesystem` refers to the MCP service and `task_suite` captures the difficulty slice (`standard` benchmark vs `easy` smoke tests). |
|
|
| `meta.json` includes the meta information about the task, including the following key |
| - task_id: the id of the task. |
| - task_name: full name of the task. |
| - description: task description. |
| - category_id: the id of task category. |
| - category_name: the full name of task categeory. |
| - author: the author of the task. |
| - difficulty: the task difficulty level. |
| - created_at: the timestamp of task creation. |
| - tags: a list of tags that describe the task. |
| - mcp: a list of MCP services it belongs to. |
| - metadata: other meta information. |
| |
| Here `category_name` describes the shared feature or the environment across different tasks (e.g. the github repository or notion page the task is built on). In this running example, `category_name` refers to `file_context`. |
|
|
| `description.md` could include the following information |
|
|
| - Task name |
| - Create and Write File. |
| - Task description |
| - Use the filesystem MCP tools to create a new file and write content to it. |
| - Task Objectives |
| - Create a new file named `hello_world.txt` in the test directory. |
| - Write the following content to the file: ``` Hello, World``` |
| - Verify the file was created successfully |
| - Verification Criteria |
| - File `hello_world.txt` exists in the test directory |
| - File contains the expected content structure |
| - File includes "Hello, World!" on the first line |
| - Tips |
| - Use the `write_file` tool to create and write content to the file |
| - The test directory path will be provided in the task context |
|
|
| The entire content of `description.md` will be read by the model agent for completing the task. |
|
|
| Accordingly, the `verify.py` contains the following functionalities |
| - Check whether the target directory exists. [](https://postimg.cc/4nnLrw3M) |
| - Check whether the target directory contains the file with target file name. [](https://postimg.cc/7fGRTX87) |
| - Check whether the target file contains the desired content `EXPECTED_PATTERNS = ["Hello Wolrd"]`. [](https://postimg.cc/w7ZSWZc0) |
|
|
| - If the outcome passes **all the above verification functionalities**, the task would be marked as successfully completed. |
|
|