File size: 3,681 Bytes
a2ec7b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# Task

The tasks in MCPMark follows two major principles
- The tasks are based on realistic digital environments that are also used by human programmers.
- The task outcome can be robustly verified in python scripts.

Therefore, each MCPMark task consists of three files
- `meta.json`
- `description.md`
- `verify.py`

Here, `metadata.json` includes the meta information of the task, `description.md` describes the purpose and setting of the task, as well as the instruction to complete the task. `verify.py` checks whether the task is completed successfully.

For example, you can ask the model agent to create a file with specific name and write specific content to the file, which belongs to the category of operating the file context. The structure looks like

```
tasks 

└───filesystem

   └───standard          # task_suite (also supports `easy`)

      └───file_context   # category_id

         └───create_file_write
            │   meta.json 
            │   description.md
            │   verify.py
```

All tasks live under `tasks/<mcp>/<task_suite>/<category>/<task_id>/`. `filesystem` refers to the MCP service and `task_suite` captures the difficulty slice (`standard` benchmark vs `easy` smoke tests).

`meta.json` includes the meta information about the task, including the following key
- task_id: the id of the task.
- task_name: full name of the task.
- description: task description.
- category_id: the id of task category.
- category_name: the full name of task categeory.
- author: the author of the task.
- difficulty: the task difficulty level.
- created_at: the timestamp of task creation.
- tags: a list of tags that describe the task.
- mcp: a list of MCP services it belongs to.
- metadata: other meta information.

Here `category_name` describes the shared feature or the environment across different tasks (e.g. the github repository or notion page the task is built on). In this running example, `category_name` refers to `file_context`.

`description.md` could include the following information

- Task name
    - Create and Write File.
- Task description
    - Use the filesystem MCP tools to create a new file and write content to it.
- Task Objectives
    - Create a new file named `hello_world.txt` in the test directory.
    - Write the following content to the file:   ```   Hello, World```
    - Verify the file was created successfully
-  Verification Criteria
    - File `hello_world.txt` exists in the test directory
    - File contains the expected content structure
    - File includes "Hello, World!" on the first line
- Tips
    - Use the `write_file` tool to create and write content to the file
    - The test directory path will be provided in the task context

The entire content of `description.md` will be read by the model agent for completing the task. 

Accordingly, the `verify.py` contains the following functionalities
- Check whether the target directory exists. [![Check Target Directory](https://i.postimg.cc/SQfBYvby/task-sample-verify-get-test-dir.png)](https://postimg.cc/4nnLrw3M)
- Check whether the target directory contains the file with target file name. [![Check Target File Existence](https://i.postimg.cc/Qx0Zwnf6/task-sample-verify-file-existence.png)](https://postimg.cc/7fGRTX87)
- Check whether the target file contains the desired content `EXPECTED_PATTERNS = ["Hello Wolrd"]`. [![Check Content in Target File](https://i.postimg.cc/JzzMhWyV/task-sample-verify-check-content.png)](https://postimg.cc/w7ZSWZc0)

- If the outcome passes **all the above verification functionalities**, the task would be marked as successfully completed.