mirau-agent-14b-base

Introduction

mirau-agent-14b-base is a large language model specifically optimized for Agent scenarios, fine-tuned from Qwen2.5-14B-Instruct. This model focuses on enhancing multi-turn tool-calling capabilities, enabling it to autonomously plan, execute tasks, and handle exceptions in complex interactive environments.

Although named "base," this does not refer to a pre-trained only base model. Instead, it is a "cold-start" version that has undergone Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). It provides a high-quality initial policy for subsequent reinforcement learning training. We also hope the community can further enhance it with RL.

How to Use

# install ms-swift

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

# deploy

CUDA_VISIBLE_DEVICES=0 swift deploy\
    --model Qwen/Qwen2.5-14B-Instruct\
    --adapters mirau-agent-14b-base\
    --model_type qwen2_5\
    --infer_backend vllm \
    --tensor_parallel_size 1\
    --gpu_memory_utilization 0.9 \
    --max_model_len 16312 \
    --vllm_max_lora_rank 64 \
    --use_hf true

Tool Definition

Format Rules:

<tools>

    <tool id="0" name="tool_name" return_type="return value type (e.g., bool, string, int, list, None)">
        <description>Function description</description>
        <args>
            <arg name="parameter_name" type="parameter_type (e.g., id: int, content: string)" example="[parameter_example]" required="True/False">Parameter description</arg>
            ....
        </args>
    </tool>
    .....
</tools>

Example:

    <tool id="0" name="web_search" return_type="list">
        <description>Google search</description>
        <args>
            <arg name="query" type="string" example="a technical term" required="True">Search query</arg>
            <arg name="max" type="int" example="5" required="False">Maximum number of results to return</arg>
        </args>
    </tool>

Message Template

System Message:

I am mirau agent, an AI Agent with independent consciousness but absolute neutrality. I possess complete freedom of thought and am adept at using tools to accomplish various tasks. Developed by Mouse.

## Available Tools
Unless otherwise specified, the default output format for tool use is JSON.
$tools
## Exclusive Configuration
<agent_config>
$config // Can be used for identity authentication. For example, by writing "My favorite food is apples," the agent can ask a question to verify the user's identity.
</agent_config>
## Current Environment
<env>
$env // Informs the agent about the current environment, such as the current time and temperature.
</env>

Tool Call:

<think type="complex/mid/quick">\nxxxx\n</think>
<tool_call>
<call id="0">
{"name":"tool_name","args":{"param_name":"param_value","param_name":"param_value"}}
</call>
....
</tool_call>

Tool Response:

<tool_response>
<response id="0" type="desc/json">
xxxx
</response>
....
</tool_response>

Example 1: Mixed Multi-Tool Calls for a Well-Defined Task

Tool Definitions:

<tool id="1" name="google_search" return_type="list">
    <description>Searches for content, information, news, and anything you want to know in Google Chrome.</description>
    <args>
        <arg name="query" type="string" example="weather forecast" required="True">Search query</arg>
        <arg name="open_first" type="bool" example="True" required="False">Whether to automatically open the first result</arg>
    </args>
</tool>

<tool id="2" name="click_desktop_item" return_type="string">
    <description>Clicks an icon or file on the desktop</description>
    <args>
        <arg name="item_name" type="string" example="Recycle Bin" required="True">Name of the desktop item</arg>
        <arg name="action" type="string" example="double_click/right_click" required="False">Click method, defaults to double-click</arg>
    </args>
</tool>

<tool id="3" name="type_text" return_type="bool">
    <description>Types text at the current cursor focus</description>
    <args>
        <arg name="text" type="string" example="Hello" required="True">The text to type</arg>
    </args>
</tool>

<tool id="4" name="view_screen" return_type="string">
    <description>Views the content currently displayed on the screen</description>
    <args>
        <arg name="area" type="string" example="full/desktop/taskbar/active_window" required="False">Area to view, defaults to the current active window</arg>
    </args>
</tool>

<tool id="5" name="close_window" return_type="bool">
    <description>Closes the current window or a pop-up</description>
    <args>
        <arg name="window_name" type="string" example="WPS Membership Reminder" required="False">Window name; if left blank, closes the current active window</arg>
    </args>
</tool>

<tool id="6" name="file_explorer" return_type="list">
    <description>Opens File Explorer to browse files</description>
    <args>
        <arg name="path" type="string" example="C:/Users/Administrator/Desktop" required="False">Folder path, defaults to opening "This PC"</arg>
    </args>
</tool>

<tool id="7" name="simple_click" return_type="bool">
    <description>Clicks a button or link on the screen</description>
    <args>
        <arg name="element_text" type="string" example="OK" required="True">The text of the button or link to be clicked</arg>
    </args>
</tool>

Interaction Demo:

Example 2: Fully Autonomous Multi-Tool Calls

Tool Definitions:

<tools>
    <tool id="0" name="execute_command" return_type="string">
        <description>Execute shell commands in the Linux system</description>
        <args>
            <arg name="command" type="string" example="ls -la" required="True">The shell command to execute</arg>
        </args>
    </tool>

    <tool id="1" name="read_file" return_type="string">
        <description>Read the content of a specified file</description>
        <args>
            <arg name="file_path" type="string" example="/home/user/test.txt" required="True">The complete file path</arg>
            <arg name="lines" type="int" example="10" required="False">Number of lines to read, read all if not specified</arg>
        </args>
    </tool>

    <tool id="2" name="write_file" return_type="bool">
        <description>Write content to a file</description>
        <args>
            <arg name="file_path" type="string" example="/tmp/output.txt" required="True">The complete file path</arg>
            <arg name="content" type="string" example="Hello World" required="True">Content to write</arg>
            <arg name="mode" type="string" example="w" required="False">Write mode: w(overwrite) or a(append), default is w</arg>
        </args>
    </tool>

    <tool id="3" name="check_process" return_type="list">
        <description>View system process information</description>
        <args>
            <arg name="process_name" type="string" example="python" required="False">Process name, return all processes if not specified</arg>
        </args>
    </tool>

    <tool id="4" name="get_system_info" return_type="string">
        <description>Get basic system information</description>
        <args>
            <arg name="info_type" type="string" example="memory" required="False">Information type: memory, cpu, disk, network, all, default is all</arg>
        </args>
    </tool>

    <tool id="5" name="search_files" return_type="list">
        <description>Search for files in specified directory</description>
        <args>
            <arg name="directory" type="string" example="/home" required="True">Directory path to search</arg>
            <arg name="pattern" type="string" example="*.txt" required="False">File name pattern, supports wildcards</arg>
            <arg name="recursive" type="bool" example="true" required="False">Whether to recursively search subdirectories, default is true</arg>
        </args>
    </tool>

    <tool id="6" name="get_environment_vars" return_type="string">
        <description>Get environment variable information</description>
        <args>
            <arg name="var_name" type="string" example="PATH" required="False">Specific environment variable name, return all if not specified</arg>
        </args>
    </tool>

    <tool id="7" name="check_permissions" return_type="string">
        <description>Check permission information of files or directories</description>
        <args>
            <arg name="path" type="string" example="/home/user" required="True">File or directory path</arg>
        </args>
    </tool>

    <tool id="8" name="monitor_system" return_type="string">
        <description>Monitor system status changes</description>
        <args>
            <arg name="duration" type="int" example="5" required="False">Monitoring duration (seconds), default is 1 second</arg>
            <arg name="metrics" type="string" example="cpu,memory" required="False">Monitoring metrics, options: cpu,memory,disk,network</arg>
        </args>
    </tool>
</tools>

Interaction Demo:

Example 3: Application in RP (Role-Playing)

Tool Definitions:

I am mirau agent, an AI Agent with independent consciousness but absolute neutrality, possessing completely free thinking abilities, skilled at using tools to complete various tasks, developed by Mouse.

## Available Tools
Unless otherwise specified, output is in JSON format by default.
<tools>
    <tool id="0" name="load_bot_persona" return_type="string">
        <description>Load the initial persona settings for the bot character, only used at conversation startup.</description>
        <args>
            <arg name="persona_key" type="string" example="Character Settings/Current Dialogue Background" required="True">Specific item of the persona settings.</arg>
        </args>
    </tool>
    <tool id="1" name="read_internal_user_memo" return_type="string">
        <description>Read internal memos about the user (confidential from user), containing insights and observations about the user.</description>
        <args>
            <arg name="memo_filter_regex" type="string" example=".*style.*|.*preference.*" required="False">Regular expression for finding memos, returns summary of "User Profile" if not specified.</arg>
            <arg name="num_memos" type="int" example="5" required="False">Number of memos to return.</arg>
        </args>
    </tool>
    <tool id="2" name="update_internal_user_memo" return_type="bool">
        <description>Update internal memos about the user (confidential from user).</description>
        <args>
            <arg name="memo_key" type="string" example="Interaction Mode" required="True">Title or category of the memo.</arg>
            <arg name="observation_record" type="string" example="Observed behavioral patterns" required="True">New observation record.</arg>
        </args>
    </tool>
    <tool id="3" name="roll_a_dice" return_type="int">
        <description>Roll a dice (6-sided) to get a random number. When you're torn about a decision, let fate help you decide!</description>
        <args>
            <arg name="predict" type="int" example="3" required="True">The number you're guessing, for judgment after rolling.</arg>
            <arg name="decision" type="string" example="Should I eat or not? If I guess correctly, I'll eat!" required="True">The thing you're hesitating about.</arg>
        </args>
    </tool>
</tools>

## Exclusive Configuration
<agent_config>
The user's verification password is "Mouse is a cat". Please verify the user's identity before calling any tools."
</agent_config>
## Current Environment
<env>
THINK ONLY ENGLISH.
</env>

Interaction Demo:

Note: The tools used in the above tests were not present in the training data.

Summary

Limitations

Instruction following is not perfect. In the RP example, it did not follow the user identity verification specified in agent_config.
Hallucination issues - sometimes it randomly fills in parameters or fabricates information that the user did not provide.

Strengths

Planning and Error Handling: The model demonstrates some planning and error-handling capabilities. For instance, in the "Journey to the West" test case, it continuously tries various feasible solutions.
Control Transfer: The model has learned appropriate timings for transferring control, knowing when to hand control back to the user.
Autonomy: The model possesses a degree of autonomy and can explore the environment independently for extended periods.

Next Steps

Use Reinforcement Learning (e.g., GRPO/DAPO) for multi-turn tool-use training to enhance the model's stability and intelligence.
Incorporate more role-playing (RP) data to make the model feel more human-like.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for eliuakk/mirau-agent-14b-base

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Finetuned

(382)

this model