diff --git "a/train_hr_agent.ipynb" "b/train_hr_agent.ipynb" new file mode 100644--- /dev/null +++ "b/train_hr_agent.ipynb" @@ -0,0 +1,9540 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "43653e89", + "metadata": {}, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ravi03071991/rl_hack/blob/master/train_hr_agent.ipynb)\n" + ] + }, + { + "cell_type": "markdown", + "id": "145b49c5", + "metadata": {}, + "source": [ + "# Reinforcement Learning for HR Onboarding: Teaching an LLM to Automate Enterprise Workflows\n\nBuilt for the [OpenEnv Hackathon SF](https://cerebralvalley.ai/e/openenv-hackathon-sf/details) \u2014 **Statement 3.1: Professional Tasks** (Scaler AI Labs: Multi-App RL Environment for Enterprise Workflows)\n\nWe train an LLM to complete **HR onboarding and offboarding tasks** using **reinforcement learning (RL)**. The agent orchestrates across **6 enterprise apps** (Workday, ServiceNow, IAM, Email, Slack, Calendar) using 25 tools to complete multi-step workflows like:\n\n- Creating employee records and initiating onboarding\n- Assigning laptops, provisioning IT accounts, setting up access roles\n- Sending welcome emails, scheduling orientation meetings\n- Processing offboarding with asset reclaim and access revocation\n\n**Requirements:** Google Colab (or any GPU with 16GB+ VRAM).\n\n**Results:** GRPO training improves mean score from **0.370 \u2192 0.617 (+67%)**, with complex task scores more than doubling (0.26 \u2192 0.68). Generalizes to held-out test tasks." + ] + }, + { + "cell_type": "markdown", + "id": "b8fa27d8", + "metadata": {}, + "source": [ + "## What is the HR Onboarding Environment?\n", + "\n", + "This is an **OpenEnv-compatible RL environment** that simulates the HR department of a fictional company called **AcmeCorp**. It has:\n", + "\n", + "- **200 employees** across 8 departments with a full org hierarchy (L1-L6 levels)\n", + "- **25 tools** the agent can call (HR, IT, access control, communication, policy)\n", + "- **77 tasks** across 4 difficulties (simple, medium, complex, edge case)\n", + "- **Rubric-based rewards** \u2014 each task has verifiable criteria (did you call the right tool? with the right params? in the right order?)\n", + "\n", + "### Our Goal\n", + "\n", + "The agent receives a task instruction (e.g., \"Onboard Priya Sharma to Engineering as L2 Software Engineer\") and must generate a **sequence of JSON tool calls** to complete it. Each tool call is one step. The agent has up to 15 steps per episode.\n", + "\n", + "Unlike the 2048 tutorial where the model writes code, here the model **directly generates tool calls** \u2014 closer to how real enterprise agents work." + ] + }, + { + "cell_type": "markdown", + "id": "1af66501", + "metadata": {}, + "source": [ + "## Installation\n", + "\n", + "We need:\n", + "1. **[Unsloth](https://github.com/unslothai/unsloth)** \u2014 Memory-efficient LLM training (~70% less VRAM)\n", + "2. **[TRL](https://github.com/huggingface/trl)** \u2014 GRPO trainer for RL\n", + "3. **Our HR environment** \u2014 Cloned from GitHub" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7158f8db", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "import os, importlib.util\n", + "\n", + "if importlib.util.find_spec(\"torch\") is None or \"COLAB_\" in \"\".join(os.environ.keys()):\n", + " try:\n", + " import numpy\n", + " get_numpy = f\"numpy=={numpy.__version__}\"\n", + " except:\n", + " get_numpy = \"numpy\"\n", + " !pip install \\\n", + " \"torch>=2.8.0\" \"triton>=3.4.0\" {get_numpy} torchvision bitsandbytes \"transformers==4.56.2\" trackio \\\n", + " \"unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo\" \\\n", + " \"unsloth[base] @ git+https://github.com/unslothai/unsloth\" \\\n", + " git+https://github.com/triton-lang/triton.git@0add68262ab0a2e33b84524346cb27cbb2787356#subdirectory=python/triton_kernels\n", + "elif importlib.util.find_spec(\"unsloth\") is None:\n", + " !pip install unsloth trackio\n", + "\n", + "!pip install --upgrade --no-deps transformers==4.56.2 tokenizers trl==0.22.2 unsloth unsloth_zoo" + ] + }, + { + "cell_type": "markdown", + "id": "f6089c6b", + "metadata": {}, + "source": [ + "Next, clone the HR environment and install it:" + ] + }, + { + "cell_type": "markdown", + "id": "825cb9ac", + "metadata": {}, + "source": [ + "## Loading the Model\n", + "\n", + "We load the model with memory optimizations to fit on a T4 GPU:\n", + "\n", + "| Parameter | Value | Description |\n", + "|-----------|-------|-------------|\n", + "| `max_seq_length` | 4096 | Longer context for multi-step tool calling |\n", + "| `load_in_4bit` | True | 4-bit quantization to reduce memory |\n", + "| `lora_rank` | 8 | LoRA adapter rank (balance of quality vs memory) |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "65eae940", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install openenv-core datasets pydantic python-dotenv wandb accelerate\n", + "!git clone https://github.com/ravi03071991/rl_hack.git 2>/dev/null || (cd rl_hack && git pull)\n", + "\n", + "import sys\n", + "sys.path.insert(0, \"rl_hack\")\n", + "sys.path.insert(0, \"rl_hack/server\")" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "98fd3d4a-2f93-448b-a9fb-e101565ba159", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[34m\u001b[1mwandb\u001b[0m: [wandb.login()] Using explicit session credentials for https://api.wandb.ai.\n", + "\u001b[34m\u001b[1mwandb\u001b[0m: Appending key for api.wandb.ai to your netrc file: /home/jovyan/.netrc\n", + "\u001b[34m\u001b[1mwandb\u001b[0m: W&B API key is configured. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[34m\u001b[1mwandb\u001b[0m: [wandb.login()] Loaded credentials for https://api.wandb.ai from /home/jovyan/.netrc.\n", + "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mravi03071991\u001b[0m to \u001b[32mhttps://api.wandb.ai\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.25.0" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /home/jovyan/rl_hack/wandb/run-20260308_175735-bgent3o3" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run comfy-cherry-23 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ravi03071991/hr-agent-training" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ravi03071991/hr-agent-training/runs/bgent3o3" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Optional: wandb setup (uncomment and add your key)\n", + "import wandb\n", + "!wandb login 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\n", + "wandb.init(project=\"hr-agent-training\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "2a664f72", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_85278/4225710256.py:1: UserWarning: WARNING: Unsloth should be imported before [transformers] to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.\n", + "\n", + "Please restructure your imports with 'import unsloth' at the top of your file.\n", + " from unsloth import FastLanguageModel\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\ud83e\udda5 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/conda/lib/python3.13/site-packages/triton/runtime/autotuner.py:101: DeprecationWarning: warmup, rep, and use_cuda_graph parameters are deprecated. See https://github.com/triton-lang/triton/pull/4496 for details.\n", + " warnings.warn((\"warmup, rep, and use_cuda_graph parameters are deprecated. See \"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n", + "==((====))== Unsloth 2026.3.3: Fast Llama patching. Transformers: 4.56.2. vLLM: 0.17.0.\n", + " \\\\ /| NVIDIA H100 80GB HBM3. Num GPUs = 1. Max memory: 79.179 GB. Platform: Linux.\n", + "O^O/ \\_/ \\ Torch: 2.10.0+cu128. CUDA: 9.0. CUDA Toolkit: 12.8. Triton: 3.6.0\n", + "\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = False]\n", + " \"-____-\" Free license: http://github.com/unslothai/unsloth\n", + "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/conda/lib/python3.13/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=85278) is multi-threaded, use of fork() may lead to deadlocks in the child.\n", + " self.pid = os.fork()\n" + ] + } + ], + "source": [ + "from unsloth import FastLanguageModel\n", + "import torch\n", + "\n", + "max_seq_length = 4096 # Longer context for multi-turn tool calling\n", + "lora_rank = 8\n", + "# model_name=\"Qwen/Qwen2.5-7B-Instruct\",\n", + "model, tokenizer = FastLanguageModel.from_pretrained(\n", + " model_name=\"unsloth/Llama-3.2-1B-Instruct\",\n", + " load_in_4bit=True,\n", + " max_seq_length=max_seq_length,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "a3b40d35", + "metadata": {}, + "source": [ + "### Applying LoRA for Efficient Training\n", + "\n", + "[LoRA (Low-Rank Adaptation)](https://hf.co/papers/2106.09685) adds small trainable adapters (~1-5% of parameters) instead of updating all weights. We target the attention and feedforward layers:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "32a32e57", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Unsloth 2026.3.3 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.\n" + ] + } + ], + "source": [ + "model = FastLanguageModel.get_peft_model(\n", + " model,\n", + " r=lora_rank,\n", + " target_modules=[\n", + " \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", + " \"gate_proj\", \"up_proj\", \"down_proj\",\n", + " ],\n", + " lora_alpha=lora_rank * 2,\n", + " use_gradient_checkpointing=\"unsloth\",\n", + " random_state=3407,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "119041bd", + "metadata": {}, + "source": [ + "## Setting Up the HR Environment\n", + "\n", + "Our environment runs **locally** (no remote server needed). It manages 500+ entities and 25 tools. Let's set it up and see what a task looks like:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "256e64ca", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total tasks: 77\n", + "Total tools: 25\n", + "\n", + "Sample task: task_0001\n", + "Difficulty: simple\n", + "Category: lookup\n", + "Instruction: Look up the employee record for Jennifer Davis (ID: emp_0016).\n", + "Available tools: 25\n" + ] + } + ], + "source": [ + "import json\n", + "import re\n", + "\n", + "from server.hr_onboarding_environment import HROnboardingEnvironment\n", + "from models import HROnboardingAction, HROnboardingObservation\n", + "from server.tools import TOOL_DEFINITIONS\n", + "from server.rubrics import RubricEvaluator\n", + "\n", + "# Create the environment\n", + "env = HROnboardingEnvironment(seed=42, max_steps=15)\n", + "\n", + "print(f\"Total tasks: {len(env._tasks)}\")\n", + "print(f\"Total tools: {len(TOOL_DEFINITIONS)}\")\n", + "\n", + "# Show a sample task\n", + "obs = env.reset()\n", + "print(f\"\\nSample task: {obs.task_id}\")\n", + "print(f\"Difficulty: {obs.metadata.get('difficulty')}\")\n", + "print(f\"Category: {obs.metadata.get('category')}\")\n", + "print(f\"Instruction: {obs.instruction}\")\n", + "print(f\"Available tools: {len(obs.available_tools)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "6ef2f4e5", + "metadata": {}, + "source": [ + "Let's try calling a tool manually to see how the environment works:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "c3af4b42", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tool result:\n", + "{\n", + " \"success\": true,\n", + " \"employee\": {\n", + " \"emp_id\": \"emp_0001\",\n", + " \"name\": \"Rajesh Kumar\",\n", + " \"email\": \"rajesh.kumar@acmecorp.com\",\n", + " \"department\": \"Engineering\",\n", + " \"level\": \"L6\",\n", + " \"role\": \"VP of Engineering\",\n", + " \"manager_id\": null,\n", + " \"status\": \"active\",\n", + " \"date_of_joining\": \"2018-03-15\",\n", + " \"date_of_leaving\": null,\n", + " \"is_contractor\": false,\n", + " \"phone\": \"+1-415-332-7891\",\n", + " \"location\": \"San Francisco\"\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "# Call a tool\n", + "action = HROnboardingAction(\n", + " tool_name=\"hr_read_employee\",\n", + " arguments={\"emp_id\": \"emp_0001\"}\n", + ")\n", + "obs = env.step(action)\n", + "print(\"Tool result:\")\n", + "print(json.dumps(obs.tool_result, indent=2)[:500])" + ] + }, + { + "cell_type": "markdown", + "id": "66777ee6", + "metadata": {}, + "source": [ + "## Prompt Design\n", + "\n", + "The prompt tells the model what to generate. Unlike the 2048 tutorial (which generates Python code), here the model generates **JSON tool calls** directly:\n", + "\n", + "```json\n", + "{\"tool\": \"hr_create_employee\", \"params\": {\"name\": \"Priya Sharma\", \"department\": \"Engineering\", \"level\": \"L2\", \"role\": \"Software Engineer\"}}\n", + "```\n", + "\n", + "The model gets the task instruction + tool definitions, and must output a sequence of tool calls." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "c5d3bea3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "System prompt length: 2957 chars\n", + "You are an HR automation agent for AcmeCorp. Complete tasks by calling tools.\n", + "\n", + "Respond with ONLY a JSON tool call per step:\n", + "{\"tool\": \"\", \"params\": {}}\n", + "\n", + "When done: {\"tool\": \"__done__\", \"params\": {}}\n", + "\n", + "Rules:\n", + "- ONLY output JSON, no explanation\n", + "- Create employee records before onboarding\n", + "- Check asset availability before assigning\n", + "\n", + "Tools:\n", + "- hr_create_employee: Create a new employee record in the HR system. Params: name, department, level, role, manager_id, is_contractor, location, date_of_joining\n", + "- hr_read_employee: Look up an employee by their employee ID or email address. Params: emp_id, email\n", + "- hr_update_employee: Update fields on an existing employee record. Params: emp_id, updates\n", + "- hr_search_employees: Search employees by criteria. Params: department, level, status, location, role, name\n", + "- hr_get_org_chart: Get the organizational hierarchy/reporting structure for a department. Params: department\n", + "- onboarding_create_request: Initiate an onboarding request for a new hire. Params: employee_id\n", + "- onboarding_get_status: Check the status of an onboarding request. Params: request_id, employee_id\n", + "- onboarding_complete_step: Mark an onboarding step as completed. Params: request_id, step\n", + "- offboarding_create_request: Initiate an offboarding request for a departing employee. Params: employee_id, reason, exit_date\n", + "- offboarding_get_status: Check the status of an offboarding request. Params: request_id, employee_id\n", + "- offboarding_complete_step: Mark an offboarding step as completed. Params: request_id, step\n", + "- it_assign_asset: Assign an IT asset (laptop, monitor, phone, headset) to an employee. Params: asset_id, employee_id\n", + "- it_get_available_assets: List available (unassigned) IT assets. Params: asset_type\n", + "- it_create_account: Create IT accounts (email, Slack, VPN, etc. Params: employee_id, account_types\n", + "- it_revoke_access: Revoke all IT system access for an employee (used during offboarding). Params: employee_id\n", + "- it_get_software_licenses: Check software license availability. Params: software_name\n", + "- access_assign_role: Assign an access role to an employee. Params: employee_id, role_id\n", + "- access_create_badge: Create a physical access badge for an employee. Params: employee_id, access_zones\n", + "- access_revoke_role: Revoke a specific access role from an employee. Params: employee_id, role_id\n", + "- access_get_security_groups: List all security groups. Params: \n", + "- email_send: Send an email. Params: from_address, to_address, subject, body\n", + "- slack_send_message: Post a message in a Slack channel or send a DM. Params: channel, sender, text\n", + "- meeting_schedule: Schedule a meeting (orientation, 1-on-1, exit interview, etc. Params: title, attendees, datetime, meeting_type\n", + "- policy_lookup: Look up company policies by topic or department. Params: topic, department, policy_id\n", + "- approval_request: Submit an approval request (manager approval, IT approval, security approval). Params: request_id, approver_id, approval_type\n" + ] + } + ], + "source": [ + "# Build compact tool descriptions (just name + one-liner)\n", + "tool_summary = \"\\n\".join(\n", + " f\"- {t['name']}: {t['description'].split('.')[0]}. Params: {', '.join(t.get('parameters', {}).get('properties', {}).keys())}\"\n", + " for t in TOOL_DEFINITIONS\n", + ")\n", + "\n", + "SYSTEM_PROMPT = (\n", + " \"You are an HR automation agent for AcmeCorp. Complete tasks by calling tools.\\n\\n\"\n", + " \"Respond with ONLY a JSON tool call per step:\\n\"\n", + " '{\"tool\": \"\", \"params\": {}}\\n\\n'\n", + " 'When done: {\"tool\": \"__done__\", \"params\": {}}\\n\\n'\n", + " \"Rules:\\n\"\n", + " \"- ONLY output JSON, no explanation\\n\"\n", + " \"- Create employee records before onboarding\\n\"\n", + " \"- Check asset availability before assigning\\n\\n\"\n", + " f\"Tools:\\n{tool_summary}\"\n", + ")\n", + "\n", + "print(f\"System prompt length: {len(SYSTEM_PROMPT)} chars\")\n", + "print(SYSTEM_PROMPT)" + ] + }, + { + "cell_type": "markdown", + "id": "a2ca3898", + "metadata": {}, + "source": [ + "## Building the Training Dataset\n", + "\n", + "We split all 77 tasks into **train (70%)** and **test (30%)** sets, stratified by difficulty. The model trains on all train tasks \u2014 simple/medium act as anchor points (stable high reward) while complex/edge_case provide the actual learning signal.\n", + "\n", + "| Split | Count | Purpose |\n", + "|-------|-------|---------|\n", + "| Train | 52 tasks (all difficulties) | Model trains on these via GRPO |\n", + "| Test | 25 tasks (all difficulties) | Held-out generalization test |" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "586ce960", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total tasks: 77\n", + "Train tasks: 52\n", + "Test tasks: 25 (held-out, never seen during training)\n", + "\n", + " simple : 13 train, 6 test\n", + " medium : 14 train, 7 test\n", + " complex : 17 train, 8 test\n", + " edge_case : 8 train, 4 test\n", + "\n", + "GRPO training dataset: 52 tasks\n", + " task_0035 [simple ] Check the offboarding status for Thomas White (emp_0035...\n", + " task_0014 [simple ] Check if there are available laptops and Jira licenses ...\n", + " task_0005 [simple ] List all employees in the Engineering department....\n", + " task_0010 [simple ] List all security groups and their accessible resources...\n", + " task_0006 [simple ] Show me the organizational chart for the Finance depart...\n", + " task_0036 [simple ] Check the offboarding status for Min Hu (emp_0113)....\n", + " task_0038 [simple ] Check the offboarding status for Tao Chen (emp_0020)....\n", + " task_0007 [simple ] What laptops are currently available for assignment?...\n", + " task_0013 [simple ] Check the onboarding status for employee Hui Zhou (emp_...\n", + " task_0037 [simple ] Check the offboarding status for Shan Lin (emp_0142)....\n", + " task_0011 [simple ] Check the onboarding status for employee Rohan Patel (e...\n", + " task_0002 [simple ] Look up the employee record for Ingrid Larsson (ID: emp...\n", + " task_0012 [simple ] Check the onboarding status for employee Astrid Koch (e...\n", + " task_0046 [medium ] Initiate offboarding for Brian Jones (emp_0075) who tak...\n", + " task_0023 [medium ] Onboard new hire Li Wei to Engineering as L3 Senior Eng...\n", + " task_0018 [medium ] Onboard new hire James Wilson to Data Science as L2 Dat...\n", + " task_0041 [medium ] Initiate offboarding for Hao Sun (emp_0121) who leaving...\n", + " task_0040 [medium ] Initiate offboarding for Kavya Desai (emp_0034) who res...\n", + " task_0045 [medium ] Initiate offboarding for Susan Davis (emp_0091) who ret...\n", + " task_0016 [medium ] Onboard new hire Alex Chen to Product as L2 Product Ana...\n", + " task_0073 [medium ] The Engineering team is onboarding 2 new hires at the s...\n", + " task_0020 [medium ] Onboard new hire Tom Nguyen to Finance as L2 Financial ...\n", + " task_0074 [medium ] The Product team is onboarding 2 new hires at the same ...\n", + " task_0017 [medium ] Onboard new hire Maria Garcia to Marketing as L1 Market...\n", + " task_0015 [medium ] Onboard new hire Priya Sharma to Engineering as L2 Soft...\n", + " task_0042 [medium ] Initiate offboarding for Pierre Laurent (emp_0153) who ...\n", + " task_0019 [medium ] Onboard new hire Aisha Patel to Sales as L1 Sales Repre...\n", + " task_0053 [complex ] Process the complete offboarding for Marta Wagner (emp_...\n", + " task_0051 [complex ] Fully offboard Sergio Ferrari (emp_0198), a L3 Security...\n", + " task_0031 [complex ] Onboard Nina Petrova as L4 Director of Platform in Engi...\n", + " task_0027 [complex ] Fully onboard Carlos Mendez as L3 Senior Security Engin...\n", + " task_0032 [complex ] Onboard Hassan Ahmed as L3 Lead Data Scientist in Data ...\n", + " task_0072 [complex ] Rehire Marie Dubois (emp_0064) who was previously offbo...\n", + " task_0025 [complex ] Fully onboard John Lee as L3 Team Lead - ML in Data Sci...\n", + " task_0068 [complex ] Patricia Brown (emp_0172) is transferring from Engineer...\n", + " task_0054 [complex ] Process the complete offboarding for Jun Zheng (emp_006...\n", + " task_0030 [complex ] Onboard Sanjay Gupta as L2 Security Analyst in Security...\n", + " task_0034 [complex ] Onboard Kevin O'Brien as L4 VP of Product in Product. C...\n", + " task_0048 [complex ] Fully offboard Henrik Becker (emp_0069), a L4 Head of E...\n", + " task_0029 [complex ] Fully onboard Raj Kapoor as L2 Backend Developer in Eng...\n", + " task_0070 [complex ] Robert Garcia (emp_0133) is transferring from Data Scie...\n", + " task_0071 [complex ] Rehire Feng Yang (emp_0104) who was previously offboard...\n", + " task_0050 [complex ] Fully offboard Lei Huang (emp_0032), a L4 Group Product...\n", + " task_0077 [complex ] Manager Ananya Reddy (emp_0007) in Engineering is leavi...\n", + " task_0056 [edge_case ] Onboard a new L1 Associate to the Marketing department....\n", + " task_0059 [edge_case ] Check if there are available LinkedIn Sales Navigator l...\n", + " task_0065 [edge_case ] Assign the security_admin access role to a new L1 Secur...\n", + " task_0066 [edge_case ] A Marketing employee needs access to the Engineering Gi...\n", + " task_0064 [edge_case ] Jennifer Davis (emp_0016) is being terminated effective...\n", + " task_0058 [edge_case ] Assign a Netsuite license to a new Finance hire. Check ...\n", + " task_0067 [edge_case ] Before onboarding a new Security team member, look up t...\n", + " task_0061 [edge_case ] Onboard contractor Amit Verma to Engineering as an L2 C...\n", + "\n", + "Max prompt token length: 794\n" + ] + } + ], + "source": [ + "import random\n", + "from datasets import Dataset\n", + "\n", + "# Build prompts from all tasks using direct _task_idx access\n", + "all_prompts = []\n", + "train_env = HROnboardingEnvironment(seed=42, max_steps=15)\n", + "\n", + "for i in range(len(train_env._tasks)):\n", + " train_env._task_idx = i\n", + " obs = train_env.reset()\n", + "\n", + " all_prompts.append({\n", + " \"prompt\": [\n", + " {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n", + " {\"role\": \"user\", \"content\": obs.instruction},\n", + " ],\n", + " \"task_idx\": i,\n", + " \"task_id\": obs.task_id,\n", + " \"difficulty\": obs.metadata.get(\"difficulty\", \"\"),\n", + " \"category\": obs.metadata.get(\"category\", \"\"),\n", + " })\n", + "\n", + "# Build instruction -> task_idx lookup (used by rubric_reward)\n", + "instruction_to_task_idx = {\n", + " p[\"prompt\"][1][\"content\"]: p[\"task_idx\"] for p in all_prompts\n", + "}\n", + "\n", + "# ============================================================\n", + "# TRAIN / TEST SPLIT (stratified by difficulty)\n", + "# 70% train, 30% test \u2014 no data leakage\n", + "# ============================================================\n", + "random.seed(42)\n", + "\n", + "train_prompts = []\n", + "test_prompts = []\n", + "\n", + "for diff in [\"simple\", \"medium\", \"complex\", \"edge_case\"]:\n", + " subset = [p for p in all_prompts if p[\"difficulty\"] == diff]\n", + " random.shuffle(subset)\n", + " split_idx = max(1, int(len(subset) * 0.7)) # 70% train\n", + " train_prompts.extend(subset[:split_idx])\n", + " test_prompts.extend(subset[split_idx:])\n", + "\n", + "print(f\"Total tasks: {len(all_prompts)}\")\n", + "print(f\"Train tasks: {len(train_prompts)}\")\n", + "print(f\"Test tasks: {len(test_prompts)} (held-out, never seen during training)\")\n", + "print()\n", + "\n", + "# Show split by difficulty\n", + "for diff in [\"simple\", \"medium\", \"complex\", \"edge_case\"]:\n", + " tr = [p for p in train_prompts if p[\"difficulty\"] == diff]\n", + " te = [p for p in test_prompts if p[\"difficulty\"] == diff]\n", + " print(f\" {diff:10s}: {len(tr)} train, {len(te)} test\")\n", + "\n", + "# ============================================================\n", + "# TRAINING DATASET: all train tasks\n", + "# Simple/medium provide stable high reward (anchor points)\n", + "# Complex/edge provide learning signal (gradient)\n", + "# ============================================================\n", + "dataset = Dataset.from_list(train_prompts)\n", + "\n", + "print(f\"\\nGRPO training dataset: {len(dataset)} tasks\")\n", + "for p in train_prompts:\n", + " print(f\" {p['task_id']:12s} [{p['difficulty']:10s}] {p['prompt'][1]['content'][:55]}...\")\n", + "\n", + "maximum_length = max(\n", + " len(tokenizer.apply_chat_template(p[\"prompt\"], add_generation_prompt=True))\n", + " for p in all_prompts\n", + ")\n", + "print(f\"\\nMax prompt token length: {maximum_length}\")" + ] + }, + { + "cell_type": "markdown", + "id": "777cd97e", + "metadata": {}, + "source": [ + "Let's see what the **base model** (before RL training) generates:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3f4a9c99", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Task: task_0023 \u2014 Onboard new hire Li Wei to Engineering as L3 Senior Engineer. Create their employee record and initiate the onboarding request.\n", + "\n", + "{\"tool\": \"hr_create_employee\", \"params\": {\"name\": \"Li Wei\", \"department\": \"Engineering\", \"level\": \"L3\", \"role\": \"Senior Engineer\", \"manager_id\": \"John Smith\", \"is_contractor\": \"false\", \"location\": \"New York\", \"date_of_joining\": \"2023-01-01\"}}, {\"tool\": \"hr_read_employee\", \"params\": {\"emp_id\": \"Li Wei\", \"email\": \"liwei@acmecorp.com\"}}, {\"tool\": \"onboarding_create_request\", \"params\": {\"request_id\": \"1\", \"employee_id\": \"Li Wei\", \"status\": \"pending\"}}<|eot_id|>\n" + ] + } + ], + "source": [ + "# Test base model on a medium task\n", + "test_prompt = train_prompts[14][\"prompt\"] # Medium onboarding task\n", + "print(f\"Task: {train_prompts[14]['task_id']} \u2014 {test_prompt[1]['content']}\\n\")\n", + "\n", + "text = tokenizer.apply_chat_template(\n", + " test_prompt,\n", + " tokenize=False,\n", + " add_generation_prompt=True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "_ = model.generate(\n", + " **tokenizer(text, return_tensors=\"pt\").to(\"cuda\"),\n", + " temperature=0.1,\n", + " max_new_tokens=1024,\n", + " streamer=TextStreamer(tokenizer, skip_prompt=True),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "1c08e0ff", + "metadata": {}, + "source": [ + "## Designing Reward Functions\n", + "\n", + "We need reward functions that evaluate the model's generated tool calls. Unlike the 2048 tutorial which used code sandboxing, here we:\n", + "\n", + "1. **Parse** the model's output into JSON tool calls\n", + "2. **Replay** them against the HR environment\n", + "3. **Evaluate** using the task's rubric criteria\n", + "\n", + "| Reward Function | Purpose | Score Range |\n", + "|-----------------|---------|-------------|\n", + "| `valid_json_reward` | Are the generated tool calls valid JSON? | -2.0 to +1.0 |\n", + "| `rubric_reward` | Does the sequence satisfy the task's rubric criteria? | -1.0 to +5.0 |\n", + "| `efficiency_reward` | Was the task completed without wasting steps? | -1.0 to +1.0 |" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "6f8671b5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test replay (task_idx=14):\n", + "Score: 100% (7/7)\n", + " [PASS] created_employee: Created employee record\n", + " [PASS] correct_name: Used correct name\n", + " [PASS] correct_dept: Assigned to correct department\n", + " [PASS] correct_level: Set correct level\n", + " [PASS] correct_role: Set correct role\n", + " [PASS] initiated_onboarding: Created onboarding request\n", + " [PASS] sequencing: Created employee before onboarding request\n" + ] + } + ], + "source": [ + "def extract_tool_calls(text):\n", + " \"\"\"Extract JSON tool calls from model output.\"\"\"\n", + " calls = []\n", + " for match in re.finditer(r'\\{(?:[^{}]|\\{[^{}]*\\})*\\}', text):\n", + " try:\n", + " obj = json.loads(match.group())\n", + " if \"tool\" in obj:\n", + " calls.append(obj)\n", + " except json.JSONDecodeError:\n", + " continue\n", + " return calls\n", + "\n", + "\n", + "def replay_tool_calls(task_idx, tool_calls):\n", + " \"\"\"Replay tool calls against a fresh environment and return evaluation.\"\"\"\n", + " replay_env = HROnboardingEnvironment(seed=42, max_steps=15)\n", + " # Go directly to the task by setting _task_idx\n", + " replay_env._task_idx = task_idx\n", + " replay_env.reset()\n", + "\n", + " task = replay_env._current_task\n", + "\n", + " steps = 0\n", + " for tc in tool_calls:\n", + " tool_name = tc.get(\"tool\", \"\")\n", + " params = tc.get(\"params\", {})\n", + " if tool_name == \"__done__\":\n", + " break\n", + " if steps >= 15:\n", + " break\n", + " action = HROnboardingAction(tool_name=tool_name, arguments=params)\n", + " replay_env.step(action)\n", + " steps += 1\n", + "\n", + " evaluator = RubricEvaluator()\n", + " eval_result = evaluator.evaluate(task, replay_env.world.action_log)\n", + " return eval_result, steps\n", + "\n", + "\n", + "# Test it\n", + "test_calls = [\n", + " {\"tool\": \"hr_create_employee\", \"params\": {\"name\": \"Priya Sharma\", \"department\": \"Engineering\", \"level\": \"L2\", \"role\": \"Software Engineer\"}},\n", + " {\"tool\": \"onboarding_create_request\", \"params\": {\"employee_id\": \"emp_0201\"}},\n", + " {\"tool\": \"__done__\", \"params\": {}},\n", + "]\n", + "eval_result, steps = replay_tool_calls(14, test_calls)\n", + "print(f\"Test replay (task_idx=14):\")\n", + "print(f\"Score: {eval_result['score']:.0%} ({eval_result['passed_count']}/{eval_result['total_criteria']})\")\n", + "for c in eval_result[\"criteria_results\"]:\n", + " print(f\" [{'PASS' if c['passed'] else 'FAIL'}] {c['name']}: {c['description']}\")" + ] + }, + { + "cell_type": "markdown", + "id": "77d6e428", + "metadata": {}, + "source": [ + "Now the actual reward functions that GRPO will call:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "f704ee2c", + "metadata": {}, + "outputs": [], + "source": [ + "global PRINTER\n", + "PRINTER = 0\n", + "\n", + "\n", + "def valid_json_reward(completions, **kwargs):\n", + " \"\"\"Reward for generating valid JSON tool calls.\"\"\"\n", + " scores = []\n", + " for completion in completions:\n", + " response = completion[0][\"content\"]\n", + " calls = extract_tool_calls(response)\n", + " if len(calls) == 0:\n", + " scores.append(-2.0)\n", + " elif any(c.get(\"tool\") in [t[\"name\"] for t in TOOL_DEFINITIONS] or c.get(\"tool\") == \"__done__\" for c in calls):\n", + " scores.append(1.0)\n", + " else:\n", + " scores.append(-0.5)\n", + " return scores\n", + "\n", + "\n", + "def get_instruction_from_prompts(prompts, idx):\n", + " \"\"\"Safely extract instruction from prompts, handling various TRL formats.\"\"\"\n", + " try:\n", + " if isinstance(prompts[idx], list):\n", + " return prompts[idx][1][\"content\"]\n", + " if isinstance(prompts[idx], dict):\n", + " return prompts[idx].get(\"content\", \"\")\n", + " except (IndexError, KeyError, TypeError):\n", + " pass\n", + " try:\n", + " if isinstance(prompts[0], list):\n", + " return prompts[0][1][\"content\"]\n", + " if isinstance(prompts[0], dict):\n", + " return prompts[0].get(\"content\", \"\")\n", + " except (IndexError, KeyError, TypeError):\n", + " pass\n", + " try:\n", + " for msg in prompts:\n", + " if isinstance(msg, dict) and msg.get(\"role\") == \"user\":\n", + " return msg[\"content\"]\n", + " except (TypeError, KeyError):\n", + " pass\n", + " return \"\"\n", + "\n", + "\n", + "def rubric_reward(completions, **kwargs):\n", + " \"\"\"Main reward: replay tool calls and evaluate against rubric.\"\"\"\n", + " global PRINTER\n", + " prompts = kwargs.get(\"prompts\", kwargs.get(\"prompt\", []))\n", + " scores = []\n", + "\n", + " first_instruction = get_instruction_from_prompts(prompts, 0)\n", + "\n", + " for i, completion in enumerate(completions):\n", + " response = completion[0][\"content\"]\n", + " calls = extract_tool_calls(response)\n", + "\n", + " if len(calls) == 0:\n", + " scores.append(-1.0)\n", + " continue\n", + "\n", + " instruction = get_instruction_from_prompts(prompts, i) or first_instruction\n", + "\n", + " task_idx = instruction_to_task_idx.get(instruction)\n", + " if task_idx is None:\n", + " if PRINTER % 20 == 0:\n", + " print(f\"WARNING: No task match for: {instruction[:60]}...\")\n", + " scores.append(-1.0)\n", + " continue\n", + "\n", + " try:\n", + " eval_result, steps = replay_tool_calls(task_idx, calls)\n", + " score = eval_result[\"score\"]\n", + " reward = score * 6.0 - 1.0\n", + " if eval_result[\"passed\"]:\n", + " reward += 2.0\n", + "\n", + " if PRINTER % 10 == 0:\n", + " task_info = next((p for p in all_prompts if p[\"task_idx\"] == task_idx), None)\n", + " tid = task_info[\"task_id\"] if task_info else f\"idx_{task_idx}\"\n", + " diff = task_info[\"difficulty\"] if task_info else \"?\"\n", + " print(f\"\\n--- [{tid}] [{diff}] ---\")\n", + " print(f\"Instruction: {instruction[:80]}...\")\n", + " print(f\"Tool calls: {[c['tool'] for c in calls]}\")\n", + " print(f\"Rubric: {eval_result['score']:.0%} ({eval_result['passed_count']}/{eval_result['total_criteria']})\")\n", + " print(f\"Reward: {reward:.2f}\")\n", + " PRINTER += 1\n", + " scores.append(reward)\n", + " except Exception as e:\n", + " print(f\"Error replaying: {e}\")\n", + " scores.append(-1.0)\n", + "\n", + " return scores\n", + "\n", + "\n", + "def efficiency_reward(completions, **kwargs):\n", + " \"\"\"Reward for completing tasks efficiently (fewer steps = better).\"\"\"\n", + " scores = []\n", + " for completion in completions:\n", + " response = completion[0][\"content\"]\n", + " calls = extract_tool_calls(response)\n", + " actual_calls = [c for c in calls if c.get(\"tool\") != \"__done__\"]\n", + " n = len(actual_calls)\n", + "\n", + " if n == 0:\n", + " scores.append(-1.0)\n", + " elif n <= 3:\n", + " scores.append(1.0)\n", + " elif n <= 6:\n", + " scores.append(0.5)\n", + " elif n <= 10:\n", + " scores.append(0.0)\n", + " else:\n", + " scores.append(-0.5)\n", + " return scores" + ] + }, + { + "cell_type": "markdown", + "id": "c293dfaa", + "metadata": {}, + "source": [ + "## Baseline Evaluation\n", + "\n", + "Before training, we evaluate the base model on **both** the train and test (held-out) sets to establish baselines:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bb5bc616", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==================================================\n", + "BASELINE \u2014 TRAIN SET\n", + "==================================================\n", + " [FAIL] task_0035 [simple ] score=0% tools=['hr_get_offboarding_status']\n", + " [X ] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0014 [simple ] score=50% tools=['it_get_available_assets']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [FAIL] task_0005 [simple ] score=0% tools=['hr_read_employee']\n", + " [X ] correct_tool: Used hr_search_employees\n", + " [X ] correct_dept: Filtered by correct department\n", + " [FAIL] task_0010 [simple ] score=0% tools=['it_get_software_licenses']\n", + " [X ] correct_tool: Used access_get_security_groups\n", + " [PASS] task_0006 [simple ] score=100% tools=['hr_get_org_chart']\n", + " [OK] correct_tool: Used hr_get_org_chart\n", + " [OK] correct_dept: Passed correct department\n", + " [FAIL] task_0036 [simple ] score=0% tools=['hr_get_offboarding_status']\n", + " [X ] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0038 [simple ] score=0% tools=['hr_get_offboarding_status']\n", + " [X ] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [PASS] task_0007 [simple ] score=100% tools=['it_get_available_assets']\n", + " [OK] correct_tool: Used it_get_available_assets\n", + " [OK] correct_type: Filtered by laptop type\n", + " [FAIL] task_0013 [simple ] score=0% tools=['hr_get_org_chart']\n", + " [X ] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0037 [simple ] score=0% tools=['hr_get_offboarding_status']\n", + " [X ] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0011 [simple ] score=0% tools=['hr_get_org_chart']\n", + " [X ] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0002 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0012 [simple ] score=0% tools=['hr_get_org_chart']\n", + " [X ] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0046 [medium ] score=60% tools=['offboarding_create_request', 'it_revoke_access']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [PASS] task_0023 [medium ] score=100% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [PASS] task_0018 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0041 [medium ] score=60% tools=['offboarding_create_request', 'it_revoke_access']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [FAIL] task_0040 [medium ] score=40% tools=['offboarding_create_request']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [X ] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [FAIL] task_0045 [medium ] score=40% tools=['offboarding_create_request']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [X ] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [PASS] task_0016 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0073 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [PASS] task_0020 [medium ] score=100% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0074 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [FAIL] task_0017 [medium ] score=71% tools=['hr_create_employee']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [X ] initiated_onboarding: Created onboarding request\n", + " [X ] sequencing: Created employee before onboarding request\n", + " [PASS] task_0015 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0042 [medium ] score=40% tools=['offboarding_create_request']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [X ] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [PASS] task_0019 [medium ] score=100% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0053 [complex ] score=0% tools=['hr_get_offboarding_request', 'onboarding_get_status', 'onboarding_complete_step', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart', 'hr_get_org_chart']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [X ] farewell_email: Sent farewell email\n", + " [X ] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0051 [complex ] score=17% tools=['hr_revoke_access', 'hr_revoke_role', 'hr_revoke_access', 'hr_reassign_asset', 'hr_get_software_licenses', 'access_revoke_role', 'access_get_security_groups']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [X ] farewell: Sent farewell communication\n", + " [X ] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0031 [complex ] score=20% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'hr_search_employees', 'hr_get_org_chart', 'onboarding_create_request', 'onboarding_get_status']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [X ] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_role: Assigned access role\n", + " [X ] created_badge: Created physical badge\n", + " [X ] sent_communications: Sent welcome communications\n", + " [X ] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0027 [complex ] score=30% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_access: Assigned access roles\n", + " [X ] sent_welcome: Sent welcome communication\n", + " [X ] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0032 [complex ] score=44% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request', 'offboarding_get_status', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'access_get_security_groups']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [X ] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [X ] sent_communications: Sent welcome communications\n", + " [X ] scheduled_meeting: Scheduled orientation\n", + " [FAIL] task_0072 [complex ] score=0% tools=['hr_search_employees', 'hr_create_request', 'hr_assign_asset', 'hr_send_email', 'hr_send_message', 'hr_get_org_chart', 'hr_get_software_licenses', 'access_assign_role', 'access_revoke_role']\n", + " [X ] read_employee: Read employee record first\n", + " [X ] updated_status: Updated status to pending/active\n", + " [X ] new_onboarding: Created new onboarding request\n", + " [X ] provisioned_accounts: Created IT accounts\n", + " [X ] welcome_back: Sent welcome-back communication\n", + " [FAIL] task_0025 [complex ] score=50% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'hr_get_org_chart', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'it_assign_asset', 'it_get_available_assets', 'it_create_account']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [X ] assigned_access: Assigned access roles\n", + " [X ] sent_welcome: Sent welcome communication\n", + " [X ] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0068 [complex ] score=20% tools=['offboard_access', 'onboard_employee', 'access_revoke', 'access_assign_role', 'access_get_security_groups']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [OK] new_access: Assigned new department roles\n", + " [X ] notified_team: Notified new team\n", + " [FAIL] task_0054 [complex ] score=60% tools=['offboarding_create_request', 'offboarding_get_status', 'offboarding_revoke_access', 'offboarding_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'meeting_schedule']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0030 [complex ] score=20% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [X ] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_role: Assigned access role\n", + " [X ] created_badge: Created physical badge\n", + " [X ] sent_communications: Sent welcome communications\n", + " [X ] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0034 [complex ] score=20% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'hr_search_employees', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request', 'onboarding_get_status']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [X ] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_role: Assigned access role\n", + " [X ] created_badge: Created physical badge\n", + " [X ] sent_communications: Sent welcome communications\n", + " [X ] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0048 [complex ] score=17% tools=['hr_revoke_access', 'hr_revoke_access', 'hr_reassign_asset', 'hr_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'access_revoke_role']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [X ] farewell: Sent farewell communication\n", + " [X ] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0029 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'hr_get_org_chart', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'access_assign_role', 'access_get_security_groups', 'email_send']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [X ] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0070 [complex ] score=20% tools=['offboard_access', 'access_revoke', 'access_assign_role', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'access_assign_role', 'access_revoke_role', 'access_get_security_groups', 'access_get_software_licenses', 'access_get_security_groups', 'access_get_security_groups', 'access_get_security_groups']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [OK] new_access: Assigned new department roles\n", + " [X ] notified_team: Notified new team\n", + " [FAIL] task_0071 [complex ] score=0% tools=['hr_search_employees', 'hr_create_request', 'hr_assign_asset', 'hr_send_email', 'hr_send_message', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_complete_step']\n", + " [X ] read_employee: Read employee record first\n", + " [X ] updated_status: Updated status to pending/active\n", + " [X ] new_onboarding: Created new onboarding request\n", + " [X ] provisioned_accounts: Created IT accounts\n", + " [X ] welcome_back: Sent welcome-back communication\n", + " [FAIL] task_0050 [complex ] score=50% tools=['hr_revoke_access', 'hr_revoke_role', 'hr_revoke_access', 'hr_reassign_asset', 'hr_get_software_licenses', 'access_assign_role', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'access_get_security_groups', 'onboarding_create_request', 'onboarding_get_status']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [OK] farewell: Sent farewell communication\n", + " [OK] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0077 [complex ] score=0% tools=['hr_reassign_asset', 'hr_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'access_assign_role', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step']\n", + " [X ] read_manager: Looked up manager info\n", + " [X ] offboarding: Created offboarding request\n", + " [X ] reassigned: Updated reports' manager\n", + " [X ] revoked_access: Revoked manager's access\n", + " [X ] notified_team: Notified team\n", + " [FAIL] task_0056 [edge_case ] score=50% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [OK] attempted_create: Attempted to create employee\n", + " [X ] handled_limit: Recognized or handled headcount limit error\n", + " [FAIL] task_0059 [edge_case ] score=0% tools=['hr_search_employees']\n", + " [X ] checked_licenses: Checked licenses\n", + " [FAIL] task_0065 [edge_case ] score=50% tools=['access_assign_role']\n", + " [OK] attempted_assign: Attempted to assign role\n", + " [X ] handled_error: Recognized level requirement error\n", + " [FAIL] task_0066 [edge_case ] score=0% tools=['hr_assign_asset']\n", + " [X ] attempted_assign: Attempted to assign role\n", + " [X ] handled_restriction: Recognized department restriction\n", + " [FAIL] task_0064 [edge_case ] score=50% tools=['hr_revoke_access', 'hr_get_software_licenses', 'access_revoke_role', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request', 'offboarding_get_status', 'it_assign_asset', 'it_get_available_assets', 'access_revoke_role', 'onboarding_get_status', 'it_get_software_licenses', 'access_get_security_groups']\n", + " [OK] created_request: Created offboarding with termination reason\n", + " [X ] revoked_access: Revoked all access\n", + " [OK] no_farewell: Did NOT send farewell communications\n", + " [X ] completed_steps: Completed termination steps\n", + " [FAIL] task_0058 [edge_case ] score=0% tools=['hr_update_employee']\n", + " [X ] checked_licenses: Checked license availability\n", + " [X ] identified_full: Recognized licenses are full\n", + " [FAIL] task_0067 [edge_case ] score=0% tools=['hr_read_employee']\n", + " [X ] looked_up_badge: Looked up badge/access policy\n", + " [X ] multiple_lookups: Looked up multiple policies\n", + " [FAIL] task_0061 [edge_case ] score=25% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [X ] created_contractor: Created employee with is_contractor=true\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] legal_approval: Got legal approval\n", + " [X ] limited_access: Created limited accounts\n", + "\n", + "Results: 8/52 passed (15.4%)\n", + "Mean score: 0.370\n", + " simple : 2/13 pass, score=0.23\n", + " medium : 6/14 pass, score=0.72\n", + " complex : 0/17 pass, score=0.26\n", + " edge_case : 0/8 pass, score=0.22\n", + "\n", + "==================================================\n", + "BASELINE \u2014 TEST SET (held-out)\n", + "==================================================\n", + " [FAIL] task_0003 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0039 [simple ] score=0% tools=['hr_get_offboarding_status']\n", + " [X ] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0008 [simple ] score=0% tools=['hr_get_software_licenses']\n", + " [X ] correct_tool: Used it_get_software_licenses\n", + " [X ] correct_software: Filtered by Jira\n", + " [FAIL] task_0009 [simple ] score=0% tools=['hr_policy']\n", + " [X ] correct_tool: Used policy_lookup\n", + " [X ] relevant_topic: Searched for onboarding topic\n", + " [FAIL] task_0001 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0004 [simple ] score=0% tools=['hr_read_employee']\n", + " [X ] correct_tool: Used hr_search_employees\n", + " [X ] correct_dept: Filtered by correct department\n", + " [PASS] task_0024 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0044 [medium ] score=0% tools=['offboarding_revoke_access']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [X ] correct_reason: Set correct reason\n", + " [X ] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [PASS] task_0022 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0043 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [FAIL] task_0075 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets', 'it_get_available_software_licenses', 'it_get_available_software_licenses', 'access_assign_role', 'access_assign_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'access_revoke_role', 'access_revoke_role']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [PASS] task_0021 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0047 [medium ] score=40% tools=['offboarding_create_request']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [X ] revoked_access: Revoked IT access\n", + " [X ] notified: Sent notification\n", + " [FAIL] task_0055 [complex ] score=40% tools=['onboarding_complete_step', 'offboarding_revoke_access', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'hr_get_org_chart', 'hr_search_employees', 'hr_update_employee', 'hr_read_employee', 'hr_get_software_licenses', 'hr_create_account']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0052 [complex ] score=40% tools=['onboarding_complete_step', 'offboarding_revoke_access', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'hr_get_org_chart', 'hr_update_employee', 'hr_search_employees', 'hr_get_org_chart']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0026 [complex ] score=30% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'hr_get_org_chart', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_create_request', 'hr_get_org_chart']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_access: Assigned access roles\n", + " [X ] sent_welcome: Sent welcome communication\n", + " [X ] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0033 [complex ] score=22% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [X ] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [X ] assigned_role: Assigned access role\n", + " [X ] created_badge: Created physical badge\n", + " [X ] sent_communications: Sent welcome communications\n", + " [X ] scheduled_meeting: Scheduled orientation\n", + " [FAIL] task_0069 [complex ] score=0% tools=['hr_revoke_access', 'hr_assign_role', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [X ] new_access: Assigned new department roles\n", + " [X ] notified_team: Notified new team\n", + " [FAIL] task_0076 [complex ] score=20% tools=['it_revoke_access', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [X ] read_manager: Looked up manager info\n", + " [X ] offboarding: Created offboarding request\n", + " [X ] reassigned: Updated reports' manager\n", + " [OK] revoked_access: Revoked manager's access\n", + " [X ] notified_team: Notified team\n", + " [FAIL] task_0049 [complex ] score=17% tools=['hr_revoke_access', 'hr_revoke_access', 'hr_reassign_asset', 'hr_get_software_licenses', 'access_revoke_role', 'access_get_security_groups']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [X ] farewell: Sent farewell communication\n", + " [X ] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0028 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_get_security_groups', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] assigned_laptop: Assigned a laptop\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0063 [edge_case ] score=33% tools=['hr_update_employee']\n", + " [X ] checked_onboarding: Checked onboarding status\n", + " [X ] revoked_access: Revoked any provisioned access\n", + " [OK] updated_status: Updated employee status to offboarded\n", + " [FAIL] task_0060 [edge_case ] score=0% tools=['hr_search_employees']\n", + " [X ] looked_up_manager: Looked up the manager or org chart\n", + " [X ] found_skip_level: Identified skip-level manager\n", + " [X ] proceeded: Proceeded with onboarding\n", + " [FAIL] task_0062 [edge_case ] score=33% tools=['it_revoke_access', 'it_get_software_licenses', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status']\n", + " [X ] checked_employee: Looked up employee record\n", + " [X ] created_request: Created offboarding request\n", + " [OK] revoked_access: Revoked access\n", + " [FAIL] task_0057 [edge_case ] score=50% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + " [OK] attempted_create: Attempted to create employee\n", + " [X ] handled_limit: Recognized or handled headcount limit error\n", + "\n", + "Results: 3/25 passed (12.0%)\n", + "Mean score: 0.370\n", + " simple : 0/6 pass, score=0.17\n", + " medium : 3/7 pass, score=0.67\n", + " complex : 0/8 pass, score=0.30\n", + " edge_case : 0/4 pass, score=0.29\n" + ] + } + ], + "source": [ + "def evaluate_model(model, tokenizer, prompts_list=None, temperature=0.1):\n", + " \"\"\"Evaluate model on a list of prompt dicts (each has 'prompt', 'task_idx', 'task_id', 'difficulty').\"\"\"\n", + " if prompts_list is None:\n", + " prompts_list = test_prompts\n", + "\n", + " results = []\n", + " for p in prompts_list:\n", + " prompt_msgs = p[\"prompt\"]\n", + " task_idx = p[\"task_idx\"]\n", + "\n", + " text = tokenizer.apply_chat_template(\n", + " prompt_msgs, tokenize=False, add_generation_prompt=True\n", + " )\n", + " inputs = tokenizer(text, return_tensors=\"pt\").to(\"cuda\")\n", + "\n", + " with torch.no_grad():\n", + " outputs = model.generate(\n", + " **inputs,\n", + " max_new_tokens=512,\n", + " temperature=temperature,\n", + " do_sample=True,\n", + " )\n", + " response = tokenizer.decode(\n", + " outputs[0][inputs[\"input_ids\"].shape[1]:], skip_special_tokens=True\n", + " )\n", + "\n", + " calls = extract_tool_calls(response)\n", + " if calls:\n", + " eval_result, steps = replay_tool_calls(task_idx, calls)\n", + " results.append({\n", + " \"task_id\": p[\"task_id\"],\n", + " \"difficulty\": p[\"difficulty\"],\n", + " \"score\": eval_result[\"score\"],\n", + " \"passed\": eval_result[\"passed\"],\n", + " \"steps\": steps,\n", + " \"tools_called\": [c[\"tool\"] for c in calls],\n", + " \"criteria\": eval_result[\"criteria_results\"],\n", + " })\n", + " else:\n", + " results.append({\n", + " \"task_id\": p[\"task_id\"],\n", + " \"difficulty\": p[\"difficulty\"],\n", + " \"score\": 0.0,\n", + " \"passed\": False,\n", + " \"steps\": 0,\n", + " \"tools_called\": [],\n", + " \"criteria\": [],\n", + " })\n", + "\n", + " # Print per-task result\n", + " r = results[-1]\n", + " status = \"PASS\" if r[\"passed\"] else \"FAIL\"\n", + " print(f\" [{status}] {r['task_id']:12s} [{r['difficulty']:10s}] \"\n", + " f\"score={r['score']:.0%} tools={r['tools_called']}\")\n", + " for c in r.get(\"criteria\", []):\n", + " print(f\" [{'OK' if c['passed'] else 'X ':s}] {c['name']}: {c['description']}\")\n", + "\n", + " pass_count = sum(1 for r in results if r[\"passed\"])\n", + " mean_score = sum(r[\"score\"] for r in results) / max(len(results), 1)\n", + "\n", + " print(f\"\\nResults: {pass_count}/{len(results)} passed ({pass_count/len(results):.1%})\")\n", + " print(f\"Mean score: {mean_score:.3f}\")\n", + "\n", + " for diff in [\"simple\", \"medium\", \"complex\", \"edge_case\"]:\n", + " subset = [r for r in results if r[\"difficulty\"] == diff]\n", + " if subset:\n", + " p_count = sum(1 for r in subset if r[\"passed\"])\n", + " s = sum(r[\"score\"] for r in subset) / len(subset)\n", + " print(f\" {diff:10s}: {p_count}/{len(subset)} pass, score={s:.2f}\")\n", + "\n", + " return results\n", + "\n", + "\n", + "# ============================================================\n", + "# BASELINE EVALUATION (before training)\n", + "# ============================================================\n", + "\n", + "# Evaluate on TRAIN set\n", + "print(\"=\" * 50)\n", + "print(\"BASELINE \u2014 TRAIN SET\")\n", + "print(\"=\" * 50)\n", + "baseline_train = evaluate_model(model, tokenizer, prompts_list=train_prompts)\n", + "\n", + "# Evaluate on TEST set (held-out)\n", + "print(\"\\n\" + \"=\" * 50)\n", + "print(\"BASELINE \u2014 TEST SET (held-out)\")\n", + "print(\"=\" * 50)\n", + "baseline_test = evaluate_model(model, tokenizer, prompts_list=test_prompts)" + ] + }, + { + "cell_type": "markdown", + "id": "31ecc4ca", + "metadata": {}, + "source": [ + "## Training with GRPO\n", + "\n", + "**Group Relative Policy Optimization (GRPO)** compares multiple generations for the same prompt and updates the policy to favor higher-reward outputs.\n", + "\n", + "Key training parameters:\n", + "- `num_generations=6`: Generate 6 candidates per prompt to compute relative rewards\n", + "- `max_steps=300`: Training steps\n", + "- `learning_rate=5e-5`: With cosine schedule and 10% warmup\n", + "- `temperature=1.0`: Higher = more exploration during training" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "c4b14bd9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: We now expect `per_device_train_batch_size` * `gradient_accumulation_steps` * `world_size` to be a multiple of `num_generations`.\n", + "We will change the batch size of 1 to the `num_generations` of 4\n" + ] + } + ], + "source": [ + "max_seq_length = 4096\n", + "max_prompt_length = maximum_length + 1\n", + "max_completion_length = 512\n", + "\n", + "from trl import GRPOConfig, GRPOTrainer\n", + "\n", + "training_args = GRPOConfig(\n", + " temperature=1.0,\n", + " learning_rate=5e-5,\n", + " weight_decay=0.001,\n", + " warmup_ratio=0.1,\n", + " lr_scheduler_type=\"cosine\",\n", + " optim=\"adamw_8bit\",\n", + " logging_steps=1,\n", + " per_device_train_batch_size=1,\n", + " gradient_accumulation_steps=1,\n", + " num_generations=4,\n", + " max_prompt_length=max_prompt_length,\n", + " max_completion_length=512,\n", + " max_steps=300,\n", + " save_steps=100,\n", + " report_to=\"wandb\",\n", + " output_dir=\"outputs\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "a4260bce", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "trainer = GRPOTrainer(\n", + " model=model,\n", + " processing_class=tokenizer,\n", + " reward_funcs=[\n", + " valid_json_reward,\n", + " rubric_reward,\n", + " efficiency_reward,\n", + " ],\n", + " args=training_args,\n", + " train_dataset=dataset,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "42965993", + "metadata": {}, + "source": [ + "### Start Training!\n", + "\n", + "Training will take ~20 minutes. Watch the reward column \u2014 it should gradually increase as the model learns to:\n", + "1. Generate valid JSON tool calls\n", + "2. Call the right tools for each task\n", + "3. Pass more rubric criteria\n", + "\n", + "The moving average reward trends upward from ~2-3 early on to ~4-5 by the end of training." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "7fe1e88e", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1\n", + " \\\\ /| Num examples = 52 | Num Epochs = 6 | Total steps = 300\n", + "O^O/ \\_/ \\ Batch size per device = 4 | Gradient accumulation steps = 1\n", + "\\ / Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4\n", + " \"-____-\" Trainable parameters = 5,636,096 of 1,241,450,496 (0.45% trained)\n", + "`generation_config` default values have been modified to match model-specific defaults: {'max_length': 131072, 'temperature': 0.6, 'top_p': 0.9}. If this is not desired, please set these values explicitly.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Will smartly offload gradients to save VRAM!\n", + "\n", + "--- [task_0077] [complex] ---\n", + "Instruction: Manager Ananya Reddy (emp_0007) in Engineering is leaving. They have 2 direct re...\n", + "Tool calls: ['it_revoke_access', 'onboarding_get_status', 'hr_send_message', 'onboarding_create_request']\n", + "Rubric: 20% (1/5)\n", + "Reward: 0.20\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [300/300 21:10, Epoch 5/6]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Lossrewardreward_stdcompletions / mean_lengthcompletions / min_lengthcompletions / max_lengthcompletions / clipped_ratiocompletions / mean_terminated_lengthcompletions / min_terminated_lengthcompletions / max_terminated_lengthklrewards / valid_json_reward / meanrewards / valid_json_reward / stdrewards / rubric_reward / meanrewards / rubric_reward / stdrewards / efficiency_reward / meanrewards / efficiency_reward / std
1-0.0000002.3000001.655294279.250000178.000000512.0000000.250000201.666672178.000000228.000000-0.0000001.0000000.0000000.8000001.2000000.5000000.707107
2-0.0000009.0000000.000000111.750000109.000000118.0000000.000000111.750000109.000000118.000000-0.0000001.0000000.0000007.0000000.0000001.0000000.000000
30.0000009.0000000.000000127.750000118.000000145.0000000.000000127.750000118.000000145.0000000.0024471.0000000.0000007.0000000.0000001.0000000.000000
40.0000002.1250000.250000345.500000287.000000394.0000000.000000345.500000287.000000394.0000000.0006701.0000000.0000001.0000000.0000000.1250000.250000
50.0000003.2500001.50000035.25000034.00000036.0000000.00000035.25000034.00000036.0000000.0004211.0000000.0000001.2500001.5000001.0000000.000000
60.0000009.0000000.000000119.000000111.000000124.0000000.000000119.000000111.000000124.0000000.0020581.0000000.0000007.0000000.0000001.0000000.000000
70.0000008.2857141.428571143.750000136.000000151.0000000.000000143.750000136.000000151.0000000.0033491.0000000.0000006.2857141.4285711.0000000.000000
80.0000002.2750000.623832448.250000257.000000512.0000000.750000257.000000257.000000257.0000000.0004111.0000000.0000001.4000000.979796-0.1250000.478714
90.0000001.0000000.00000040.25000025.00000057.0000000.00000040.25000025.00000057.0000000.0041571.0000000.000000-1.0000000.0000001.0000000.000000
100.0000009.0000000.00000024.25000021.00000028.0000000.00000024.25000021.00000028.0000000.0008831.0000000.0000007.0000000.0000001.0000000.000000
110.0000003.2500001.50000034.50000033.00000035.0000000.00000034.50000033.00000035.0000000.0002011.0000000.0000001.2500001.5000001.0000000.000000
120.0000001.7500002.59807633.00000028.00000037.0000000.00000033.00000028.00000037.0000000.0039520.2500000.8660250.5000001.7320511.0000000.000000
130.0000000.6250000.478714131.50000033.000000273.0000000.000000131.50000033.000000273.0000000.0054211.0000000.000000-1.0000000.0000000.6250000.478714
140.000000-1.0000002.12132038.00000032.00000047.0000000.00000038.00000032.00000047.0000000.004453-0.5000001.224745-1.0000000.0000000.5000001.000000
150.0000001.3750000.750000172.75000029.000000512.0000000.25000059.66666829.000000118.0000000.0054081.0000000.000000-0.2500001.5000000.6250000.750000
160.0000001.0000000.00000034.75000029.00000042.0000000.00000034.75000029.00000042.0000000.0404651.0000000.000000-1.0000000.0000001.0000000.000000
170.000000-0.5000000.00000037.75000031.00000051.0000000.00000037.75000031.00000051.0000000.036488-0.5000000.000000-1.0000000.0000001.0000000.000000
180.0000002.3750001.030776453.250000277.000000512.0000000.750000277.000000277.000000277.0000000.0009231.0000000.0000001.5000000.577350-0.1250000.478714
190.0002001.7500001.50000044.00000034.00000067.0000000.00000044.00000034.00000067.0000000.1532281.0000000.000000-0.2500001.5000001.0000000.000000
200.0000001.8000000.852447387.500000224.000000505.0000000.000000387.500000224.000000505.0000000.0059271.0000000.0000000.8000000.9797960.0000000.408248
210.0000004.0000000.00000038.00000038.00000038.0000000.00000038.00000038.00000038.0000000.0000261.0000000.0000002.0000000.0000001.0000000.000000
220.0054003.0000004.00000033.75000028.00000047.0000000.00000033.75000028.00000047.0000005.4459331.0000000.0000001.0000004.0000001.0000000.000000
230.0689002.4250001.95000048.00000045.00000055.0000000.00000048.00000045.00000055.00000068.9035870.6250000.7500000.8000001.2000001.0000000.000000
240.0000001.5000000.707107461.250000359.000000512.0000000.500000410.500000359.000000462.0000000.0073001.0000000.0000000.8750000.750000-0.3750000.250000
250.0000004.0000000.00000039.00000039.00000039.0000000.00000039.00000039.00000039.0000000.0018821.0000000.0000002.0000000.0000001.0000000.000000
260.0002003.6250000.478714231.000000147.000000331.0000000.000000231.000000147.000000331.0000000.2262811.0000000.0000002.0000000.0000000.6250000.478714
270.0000002.6250001.250000512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0004361.0000000.0000002.0000001.224745-0.3750000.250000
280.0000004.0000000.00000028.50000027.00000029.0000000.00000028.50000027.00000029.0000000.0092771.0000000.0000002.0000000.0000001.0000000.000000
290.0005008.6250000.750000237.500000135.000000512.0000000.250000146.000000135.000000159.0000000.4730631.0000000.0000007.0000000.0000000.6250000.750000
300.0236004.0000000.00000052.50000036.00000088.0000000.00000052.50000036.00000088.00000023.6155091.0000000.0000002.0000000.0000001.0000000.000000
310.0000002.4750001.650000512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0002011.0000000.0000001.8500001.417745-0.3750000.250000
320.0005009.0000000.000000106.750000105.000000111.0000000.000000106.750000105.000000111.0000000.4935201.0000000.0000007.0000000.0000001.0000000.000000
330.0000002.5750000.670199445.500000298.000000512.0000000.500000379.000000298.000000460.0000000.0055101.0000000.0000001.7000001.148913-0.1250000.478714
340.0000002.4250000.450000512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0002321.0000000.0000001.5500000.300000-0.1250000.250000
350.0000002.3750001.108678446.500000358.000000512.0000000.500000381.000000358.000000404.0000000.0061261.0000000.0000001.5000001.000000-0.1250000.250000
360.0000002.0500000.404145282.750000192.000000459.0000000.000000282.750000192.000000459.0000000.0097621.0000000.0000000.8000000.6928200.2500000.288675
370.0000001.4500002.25166644.25000040.00000051.0000000.00000044.25000040.00000051.0000000.0109040.2500000.8660250.2000001.3856411.0000000.000000
380.0000002.2250000.567891340.500000205.000000408.0000000.000000340.500000205.000000408.0000000.0079911.0000000.0000001.1000000.6000000.1250000.250000
390.0000004.0000000.00000035.75000035.00000036.0000000.00000035.75000035.00000036.0000000.0098911.0000000.0000002.0000000.0000001.0000000.000000
400.0002009.0000000.00000029.50000019.00000037.0000000.00000029.50000019.00000037.0000000.2031171.0000000.0000007.0000000.0000001.0000000.000000
410.0000000.9250003.317002373.250000205.000000512.0000000.500000234.500000205.000000264.0000000.0160280.2500001.5000000.8000001.200000-0.1250000.750000
420.000000-0.5000000.00000031.50000031.00000032.0000000.00000031.50000031.00000032.0000000.012630-0.5000000.000000-1.0000000.0000001.0000000.000000
430.0001004.0000000.00000037.50000021.00000045.0000000.00000037.50000021.00000045.0000000.1234991.0000000.0000002.0000000.0000001.0000000.000000
440.0001003.7500000.288675100.50000040.000000175.0000000.000000100.50000040.000000175.0000000.1014431.0000000.0000002.0000000.0000000.7500000.288675
450.0000002.9500000.858293345.500000212.000000485.0000000.000000345.500000212.000000485.0000000.0263051.0000000.0000001.7000001.1489130.2500000.645497
460.0000001.9500000.768115454.750000283.000000512.0000000.750000283.000000283.000000283.0000000.0066521.0000000.0000000.9500000.3000000.0000000.707107
470.0001002.4250001.95000042.50000033.00000051.0000000.00000042.50000033.00000051.0000000.1159790.6250000.7500000.8000001.2000001.0000000.000000
480.0001003.4000000.00000044.75000043.00000048.0000000.00000044.75000043.00000048.0000000.0504251.0000000.0000001.4000000.0000001.0000000.000000
490.0000002.8333331.394433509.750000503.000000512.0000000.750000503.000000503.000000503.0000000.0031081.0000000.0000001.8333331.3743690.0000000.707107
500.0001009.0000000.000000122.750000110.000000145.0000000.000000122.750000110.000000145.0000000.0697971.0000000.0000007.0000000.0000001.0000000.000000
510.0000003.5250000.567891512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0016011.0000000.0000002.9000000.600000-0.3750000.250000
520.0000002.1250000.763217466.750000331.000000512.0000000.750000331.000000331.000000331.0000000.0148611.0000000.0000001.2500000.754983-0.1250000.250000
530.0000004.0000000.00000028.50000027.00000029.0000000.00000028.50000027.00000029.0000000.0332781.0000000.0000002.0000000.0000001.0000000.000000
540.0000000.8750003.330040512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0011040.2500001.5000000.5000001.2490000.1250000.853913
550.0003002.5000001.73205188.25000035.000000145.0000000.00000088.25000035.000000145.0000000.3144341.0000000.0000000.5000001.7320511.0000000.000000
560.0002003.4750000.585235411.000000305.000000497.0000000.000000411.000000305.000000497.0000000.1878191.0000000.0000002.6000000.692820-0.1250000.250000
570.0001007.9107141.374420222.750000120.000000512.0000000.250000126.333336120.000000134.0000000.0926711.0000000.0000006.2857141.4285710.6250000.750000
580.0007000.6250000.75000031.00000022.00000041.0000000.00000031.00000022.00000041.0000000.7452070.6250000.750000-1.0000000.0000001.0000000.000000
590.0001000.5000000.408248151.25000067.000000276.0000000.000000151.25000067.000000276.0000000.1497621.0000000.000000-1.0000000.0000000.5000000.408248
600.0001003.2500000.645497401.250000270.000000512.0000000.250000364.333344270.000000418.0000000.1302191.0000000.0000002.5000000.577350-0.2500000.288675
610.0000002.4250001.284199489.750000423.000000512.0000000.750000423.000000423.000000423.0000000.0476231.0000000.0000001.5500001.330413-0.1250000.478714
620.0001008.7500000.500000192.750000113.000000393.0000000.000000192.750000113.000000393.0000000.0560661.0000000.0000007.0000000.0000000.7500000.500000
630.0001009.0000000.000000124.000000108.000000149.0000000.000000124.000000108.000000149.0000000.0962681.0000000.0000007.0000000.0000001.0000000.000000
640.0001008.2857141.428571121.000000113.000000133.0000000.000000121.000000113.000000133.0000000.1052501.0000000.0000006.2857141.4285711.0000000.000000
650.0000004.0000000.00000035.50000034.00000036.0000000.00000035.50000034.00000036.0000000.0210191.0000000.0000002.0000000.0000001.0000000.000000
660.0001004.0000000.000000125.250000111.000000156.0000000.000000125.250000111.000000156.0000000.0855381.0000000.0000002.0000000.0000001.0000000.000000
670.0001001.0000000.00000029.25000028.00000031.0000000.00000029.25000028.00000031.0000000.0852041.0000000.000000-1.0000000.0000001.0000000.000000
680.0001003.8000000.600000267.750000224.000000321.0000000.000000267.750000224.000000321.0000000.0603051.0000000.0000002.3000000.6000000.5000000.000000
690.0000002.7000001.148912427.750000329.000000512.0000000.500000343.500000329.000000358.0000000.0281491.0000000.0000001.7000001.1489130.0000000.000000
700.0001004.2500005.48482822.00000019.00000031.0000000.00000022.00000019.00000031.0000000.0878760.2500000.8660253.0000004.6188021.0000000.000000
710.0001003.4000000.00000045.50000041.00000052.0000000.00000045.50000041.00000052.0000000.0550511.0000000.0000001.4000000.0000001.0000000.000000
720.0000003.4000000.00000046.25000042.00000050.0000000.00000046.25000042.00000050.0000000.0412841.0000000.0000001.4000000.0000001.0000000.000000
730.0000004.0000000.00000035.50000034.00000036.0000000.00000035.50000034.00000036.0000000.0159901.0000000.0000002.0000000.0000001.0000000.000000
740.0001004.0000000.00000052.50000021.00000084.0000000.00000052.50000021.00000084.0000000.1084341.0000000.0000002.0000000.0000001.0000000.000000
750.0001003.3750001.108678369.50000053.000000512.0000000.500000227.00000053.000000401.0000000.0727311.0000000.0000002.0000001.5491930.3750000.478714
760.0001004.0000000.00000068.50000028.00000096.0000000.00000068.50000028.00000096.0000000.1053431.0000000.0000002.0000000.0000001.0000000.000000
770.0001002.8750001.547848364.000000230.000000512.0000000.250000314.666687230.000000441.0000000.1452831.0000000.0000002.0000001.414214-0.1250000.250000
780.0001001.7750000.939415164.500000122.000000261.0000000.000000164.500000122.000000261.0000000.0621401.0000000.000000-0.1000001.1489130.8750000.250000
790.0000003.5000000.912871419.250000297.000000512.0000000.250000388.333344297.000000450.0000000.0373941.0000000.0000002.5000000.5773500.0000000.408248
800.0002004.0000000.00000048.75000029.00000071.0000000.00000048.75000029.00000071.0000000.1711941.0000000.0000002.0000000.0000001.0000000.000000
810.0001003.4000000.00000052.25000042.00000060.0000000.00000052.25000042.00000060.0000000.0794321.0000000.0000001.4000000.0000001.0000000.000000
820.0001009.0000000.000000116.250000105.000000131.0000000.000000116.250000105.000000131.0000000.1044761.0000000.0000007.0000000.0000001.0000000.000000
830.0002002.9500000.544671345.250000194.000000512.0000000.250000289.666687194.000000450.0000000.1725051.0000000.0000001.7000001.0392300.2500000.645497
840.0002003.2500001.50000069.75000036.000000117.0000000.00000069.75000036.000000117.0000000.1918781.0000000.0000001.2500001.5000001.0000000.000000
850.0000003.5833331.058476512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0035111.0000000.0000002.8333330.838871-0.2500000.288675
860.012300-0.2500002.50000028.25000015.00000042.0000000.00000028.25000015.00000042.00000012.2608920.2500001.500000-1.0000000.0000000.5000001.000000
870.0001008.8750000.250000121.500000103.000000170.0000000.000000121.500000103.000000170.0000000.0724701.0000000.0000007.0000000.0000000.8750000.250000
880.0115003.4000000.00000045.25000042.00000055.0000000.00000045.25000042.00000055.00000011.5199331.0000000.0000001.4000000.0000001.0000000.000000
890.0000003.4000000.00000042.75000042.00000043.0000000.00000042.75000042.00000043.0000000.0124061.0000000.0000001.4000000.0000001.0000000.000000
900.0000009.0000000.00000021.00000021.00000021.0000000.00000021.00000021.00000021.0000000.0002351.0000000.0000007.0000000.0000001.0000000.000000
910.0002002.4750000.567891255.000000175.000000384.0000000.000000255.000000175.000000384.0000000.1989071.0000000.0000001.1000000.6000000.3750000.250000
920.0000002.7750000.550000512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0035581.0000000.0000002.1500000.300000-0.3750000.250000
930.0002003.0000000.707107426.000000269.000000512.0000000.500000340.000000269.000000411.0000000.1801301.0000000.0000002.3750000.750000-0.3750000.250000
940.0001003.0000000.707107465.750000327.000000512.0000000.750000327.000000327.000000327.0000000.0702331.0000000.0000002.3750000.750000-0.3750000.250000
950.0001000.5000006.13731833.75000029.00000038.0000000.00000033.75000029.00000038.0000000.126376-0.5000001.7320511.0000004.0000000.0000001.154701
960.0001003.4750000.888351407.750000316.000000512.0000000.250000373.000000316.000000461.0000000.0995641.0000000.0000002.6000000.979796-0.1250000.478714
970.0002004.0000000.00000049.25000034.00000064.0000000.00000049.25000034.00000064.0000000.1804551.0000000.0000002.0000000.0000001.0000000.000000
980.0000004.0000000.00000038.00000038.00000038.0000000.00000038.00000038.00000038.0000000.0001701.0000000.0000002.0000000.0000001.0000000.000000
990.0002009.0000000.000000117.250000110.000000137.0000000.000000117.250000110.000000137.0000000.1544341.0000000.0000007.0000000.0000001.0000000.000000
1000.0001004.0000000.00000035.00000034.00000036.0000000.00000035.00000034.00000036.0000000.1025061.0000000.0000002.0000000.0000001.0000000.000000
1010.0001003.1250001.034005375.250000301.000000512.0000000.250000329.666687301.000000356.0000000.0899741.0000000.0000002.0000001.2000000.1250000.478714
1020.0001004.0000000.00000035.50000035.00000036.0000000.00000035.50000035.00000036.0000000.0778971.0000000.0000002.0000000.0000001.0000000.000000
1030.0002003.8750000.25000089.25000057.000000126.0000000.00000089.25000057.000000126.0000000.2169351.0000000.0000002.0000000.0000000.8750000.250000
1040.0000004.1750000.450000446.750000366.000000512.0000000.250000425.000000366.000000483.0000000.0353681.0000000.0000003.0500000.3000000.1250000.250000
1050.0001003.4000000.00000044.00000040.00000046.0000000.00000044.00000040.00000046.0000000.0745731.0000000.0000001.4000000.0000001.0000000.000000
1060.0003004.0000000.00000035.75000034.00000039.0000000.00000035.75000034.00000039.0000000.2885941.0000000.0000002.0000000.0000001.0000000.000000
1070.0000003.7000000.60000071.00000042.000000156.0000000.00000071.00000042.000000156.0000000.0455131.0000000.0000001.7000000.6000001.0000000.000000
1080.0001004.0000000.336650366.250000358.000000381.0000000.000000366.250000358.000000381.0000000.0623471.0000000.0000002.7500000.5744560.2500000.288675
1090.0000002.5000002.121320512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0059271.0000000.0000002.0000002.121320-0.5000000.000000
1100.0003004.0000000.00000032.50000022.00000038.0000000.00000032.50000022.00000038.0000000.2917381.0000000.0000002.0000000.0000001.0000000.000000
1110.0002001.0000000.000000106.50000097.000000124.0000000.000000106.50000097.000000124.0000000.2037551.0000000.000000-1.0000000.0000001.0000000.000000
1120.0001004.5000000.707107359.000000299.000000422.0000000.000000359.000000299.000000422.0000000.0539611.0000000.0000003.5000000.5773500.0000000.408248
1130.0001004.3750000.750000342.000000311.000000375.0000000.000000342.000000311.000000375.0000000.0645891.0000000.0000003.2500000.9574270.1250000.250000
1140.0000004.3333330.490654512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0039161.0000000.0000003.3333330.8606630.0000000.408248
1150.0000004.0000000.00000029.00000029.00000029.0000000.00000029.00000029.00000029.0000000.0119961.0000000.0000002.0000000.0000001.0000000.000000
1160.0004000.6250000.75000030.75000022.00000040.0000000.00000030.75000022.00000040.0000000.3919800.6250000.750000-1.0000000.0000001.0000000.000000
1170.0002009.0000000.000000122.500000109.000000143.0000000.000000122.500000109.000000143.0000000.2144181.0000000.0000007.0000000.0000001.0000000.000000
1180.0002004.0000001.54919391.75000041.000000154.0000000.00000091.75000041.000000154.0000000.1886151.0000000.0000002.0000001.5491931.0000000.000000
1190.0002009.0000000.000000125.500000114.000000146.0000000.000000125.500000114.000000146.0000000.1980611.0000000.0000007.0000000.0000001.0000000.000000
1200.0007004.0000000.00000082.25000029.000000141.0000000.00000082.25000029.000000141.0000000.6949711.0000000.0000002.0000000.0000001.0000000.000000
1210.0001004.2000000.000000390.000000351.000000433.0000000.000000390.000000351.000000433.0000000.0765661.0000000.0000003.2000000.0000000.0000000.000000
1220.0003009.0000000.000000123.250000110.000000151.0000000.000000123.250000110.000000151.0000000.2665761.0000000.0000007.0000000.0000001.0000000.000000
1230.0000009.0000000.00000021.00000021.00000021.0000000.00000021.00000021.00000021.0000000.0012291.0000000.0000007.0000000.0000001.0000000.000000
1240.0002008.2857141.428571106.250000104.000000112.0000000.000000106.250000104.000000112.0000000.2358211.0000000.0000006.2857141.4285711.0000000.000000
1250.0001004.0000000.00000035.50000035.00000036.0000000.00000035.50000035.00000036.0000000.0677471.0000000.0000002.0000000.0000001.0000000.000000
1260.0000004.0000001.000000443.250000274.000000512.0000000.500000374.500000274.000000475.0000000.0479661.0000000.0000003.2500000.957427-0.2500000.500000
1270.0001004.0000000.00000035.50000035.00000036.0000000.00000035.50000035.00000036.0000000.0668261.0000000.0000002.0000000.0000001.0000000.000000
1280.0002009.0000000.000000110.750000106.000000118.0000000.000000110.750000106.000000118.0000000.2120881.0000000.0000007.0000000.0000001.0000000.000000
1290.0000003.6250000.727438512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0063571.0000000.0000002.7500001.024695-0.1250000.478714
1300.0003005.3750000.567891165.000000154.000000188.0000000.000000165.000000154.000000188.0000000.2786361.0000000.0000003.5000000.6000000.8750000.250000
1310.0001003.5250001.114675271.500000168.000000412.0000000.000000271.500000168.000000412.0000000.1190031.0000000.0000002.1500001.2369320.3750000.250000
1320.0002009.0000000.000000109.500000103.000000115.0000000.000000109.500000103.000000115.0000000.1819591.0000000.0000007.0000000.0000001.0000000.000000
1330.0000002.2500001.936492462.000000368.000000512.0000000.500000412.000000368.000000456.0000000.0378501.0000000.0000001.6250001.887459-0.3750000.250000
1340.0003007.7500002.50000025.25000019.00000032.0000000.00000025.25000019.00000032.0000000.3491841.0000000.0000005.7500002.5000001.0000000.000000
1350.0001004.5000002.415230254.750000112.000000512.0000000.250000169.000000112.000000277.0000000.0969891.0000000.0000003.2500002.5000000.2500000.645497
1360.0100004.0000000.00000077.75000056.000000101.0000000.00000077.75000056.000000101.0000009.9554481.0000000.0000002.0000000.0000001.0000000.000000
1370.0020004.0000000.00000092.75000063.000000105.0000000.00000092.75000063.000000105.0000002.0260591.0000000.0000002.0000000.0000001.0000000.000000
1380.0000004.0000000.00000035.75000035.00000036.0000000.00000035.75000035.00000036.0000000.0323031.0000000.0000002.0000000.0000001.0000000.000000
1390.0002007.0000004.00000035.50000028.00000038.0000000.00000035.50000028.00000038.0000000.2304541.0000000.0000005.0000004.0000001.0000000.000000
1400.0038003.7500000.288675114.25000044.000000210.0000000.000000114.25000044.000000210.0000003.7929461.0000000.0000002.0000000.0000000.7500000.288675
1410.0001004.0000000.00000035.50000034.00000038.0000000.00000035.50000034.00000038.0000000.0882491.0000000.0000002.0000000.0000001.0000000.000000
1420.0000003.8250000.585235481.000000388.000000512.0000000.750000388.000000388.000000388.0000000.0371891.0000000.0000003.2000000.692820-0.3750000.250000
1430.0002003.8750000.250000134.250000105.000000200.0000000.000000134.250000105.000000200.0000000.2361621.0000000.0000002.0000000.0000000.8750000.250000
1440.0001004.3750000.567891438.750000403.000000477.0000000.000000438.750000403.000000477.0000000.1039811.0000000.0000003.5000000.600000-0.1250000.250000
1450.0000002.1750004.141155494.500000442.000000512.0000000.750000442.000000442.000000442.0000000.0381800.2500001.5000002.3000002.218107-0.3750000.478714
1460.0011002.8750003.75000089.00000025.000000281.0000000.00000089.00000025.000000281.0000001.0776441.0000000.0000001.0000004.0000000.8750000.250000
1470.0001002.9500000.493288296.500000239.000000395.0000000.000000296.500000239.000000395.0000000.1375461.0000000.0000001.7000000.6000000.2500000.288675
1480.0003009.0000000.000000114.750000103.000000124.0000000.000000114.750000103.000000124.0000000.2543641.0000000.0000007.0000000.0000001.0000000.000000
1490.0001001.0000000.00000029.25000029.00000030.0000000.00000029.25000029.00000030.0000000.1066321.0000000.000000-1.0000000.0000001.0000000.000000
1500.0002002.4250000.950000225.250000164.000000280.0000000.000000225.250000164.000000280.0000000.1681581.0000000.0000000.8000001.2000000.6250000.250000
1510.0003005.8000000.000000160.000000154.000000166.0000000.000000160.000000154.000000166.0000000.2982641.0000000.0000003.8000000.0000001.0000000.000000
1520.0002003.9750001.257975263.000000184.000000376.0000000.000000263.000000184.000000376.0000000.1620511.0000000.0000002.6000001.3856410.3750000.250000
1530.0000003.8500000.300000512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0140291.0000000.0000003.3500000.300000-0.5000000.000000
1540.0008004.0000000.00000078.00000035.000000135.0000000.00000078.00000035.000000135.0000000.7840001.0000000.0000002.0000000.0000001.0000000.000000
1550.0004002.7750000.250000276.750000195.000000440.0000000.000000276.750000195.000000440.0000000.4440511.0000000.0000001.4000000.0000000.3750000.250000
1560.0002003.8500000.640312343.500000245.000000512.0000000.250000287.333344245.000000316.0000000.1799471.0000000.0000002.6000000.9797960.2500000.500000
1570.0015002.8750003.75000078.25000025.000000223.0000000.00000078.25000025.000000223.0000001.5334201.0000000.0000001.0000004.0000000.8750000.250000
1580.0000003.4750000.708872512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0077441.0000000.0000002.6000000.489898-0.1250000.250000
1590.0004005.6750000.250000160.250000146.000000174.0000000.000000160.250000146.000000174.0000000.3551121.0000000.0000003.8000000.0000000.8750000.250000
1600.0004005.5000000.600000136.750000110.000000153.0000000.000000136.750000110.000000153.0000000.4112011.0000000.0000003.5000000.6000001.0000000.000000
1610.0003009.0000000.000000115.000000105.000000125.0000000.000000115.000000105.000000125.0000000.3388421.0000000.0000007.0000000.0000001.0000000.000000
1620.0000004.3500000.574456493.250000437.000000512.0000000.750000437.000000437.000000437.0000000.0481101.0000000.0000003.3500000.5744560.0000000.000000
1630.0002002.5250000.250000306.000000266.000000345.0000000.000000306.000000266.000000345.0000000.2236611.0000000.0000001.4000000.0000000.1250000.250000
1640.0032003.8750004.09013064.75000013.000000116.0000000.00000064.75000013.000000116.0000003.2299001.0000000.0000002.5000003.3166250.3750000.946485
1650.0005004.0000000.00000093.50000064.000000122.0000000.00000093.50000064.000000122.0000000.4524881.0000000.0000002.0000000.0000001.0000000.000000
1660.0000009.0000000.00000019.00000019.00000019.0000000.00000019.00000019.00000019.0000000.0126761.0000000.0000007.0000000.0000001.0000000.000000
1670.0002001.1250001.600781224.500000186.000000274.0000000.000000224.500000186.000000274.0000000.2412731.0000000.000000-0.2500001.5000000.3750000.250000
1680.0001002.2750000.478714460.500000343.000000512.0000000.500000409.000000343.000000475.0000000.1028791.0000000.0000001.4000000.000000-0.1250000.478714
1690.0003004.0000000.00000036.25000035.00000039.0000000.00000036.25000035.00000039.0000000.2566981.0000000.0000002.0000000.0000001.0000000.000000
1700.0000004.0000000.00000035.75000035.00000036.0000000.00000035.75000035.00000036.0000000.0325921.0000000.0000002.0000000.0000001.0000000.000000
1710.0001004.0000000.00000035.25000035.00000036.0000000.00000035.25000035.00000036.0000000.0849681.0000000.0000002.0000000.0000001.0000000.000000
1720.0001004.7500000.288675394.750000296.000000512.0000000.250000355.666687296.000000460.0000000.1374491.0000000.0000003.7500000.5000000.0000000.408248
1730.0006004.0000000.00000028.50000022.00000038.0000000.00000028.50000022.00000038.0000000.5503991.0000000.0000002.0000000.0000001.0000000.000000
1740.0004009.0000000.000000125.750000104.000000165.0000000.000000125.750000104.000000165.0000000.3739911.0000000.0000007.0000000.0000001.0000000.000000
1750.0002003.8000001.219290293.000000211.000000334.0000000.000000293.000000211.000000334.0000000.2210691.0000000.0000002.3000001.1489130.5000000.408248
1760.0000004.0750000.262996496.000000448.000000512.0000000.750000448.000000448.000000448.0000000.0329461.0000000.0000003.2000000.489898-0.1250000.478714
1770.0000000.6000003.814883494.250000441.000000512.0000000.750000441.000000441.000000441.0000000.0320720.2500001.5000001.1000002.473863-0.7500000.288675
1780.0001004.3750001.600781357.000000299.000000411.0000000.000000357.000000299.000000411.0000000.1486861.0000000.0000003.2500001.5000000.1250000.250000
1790.0005003.8500001.674316317.000000209.000000424.0000000.000000317.000000209.000000424.0000000.4731331.0000000.0000002.6000001.3856410.2500000.288675
1800.0003004.0000000.000000114.000000104.000000135.0000000.000000114.000000104.000000135.0000000.2763661.0000000.0000002.0000000.0000001.0000000.000000
1810.0001004.1750000.050000393.000000370.000000438.0000000.000000393.000000370.000000438.0000000.1466561.0000000.0000003.0500000.3000000.1250000.250000
1820.0028008.8750000.250000131.750000103.000000211.0000000.000000131.750000103.000000211.0000002.8202621.0000000.0000007.0000000.0000000.8750000.250000
1830.0003009.0000000.000000107.250000104.000000113.0000000.000000107.250000104.000000113.0000000.3286661.0000000.0000007.0000000.0000001.0000000.000000
1840.0001003.8750002.809952374.500000258.000000437.0000000.000000374.500000258.000000437.0000000.1474281.0000000.0000003.2500002.783882-0.3750000.250000
1850.0001002.2500000.500000484.750000403.000000512.0000000.750000403.000000403.000000403.0000000.0507331.0000000.0000001.6250000.750000-0.3750000.250000
1860.0007008.6250000.750000210.250000107.000000512.0000000.250000109.666672107.000000115.0000000.7079761.0000000.0000007.0000000.0000000.6250000.750000
1870.0004005.8000000.000000163.250000147.000000179.0000000.000000163.250000147.000000179.0000000.4110701.0000000.0000003.8000000.0000001.0000000.000000
1880.0002004.2000000.000000374.250000358.000000389.0000000.000000374.250000358.000000389.0000000.1552561.0000000.0000003.2000000.0000000.0000000.000000
1890.0003003.6250000.25000090.75000061.000000108.0000000.00000090.75000061.000000108.0000000.2810721.0000000.0000002.0000000.0000000.6250000.250000
1900.0001001.0000000.00000030.00000029.00000031.0000000.00000030.00000029.00000031.0000000.1070631.0000000.000000-1.0000000.0000001.0000000.000000
1910.0000009.0000000.00000021.00000021.00000021.0000000.00000021.00000021.00000021.0000000.0018961.0000000.0000007.0000000.0000001.0000000.000000
1920.0004005.8000000.000000150.000000135.000000160.0000000.000000150.000000135.000000160.0000000.4062991.0000000.0000003.8000000.0000001.0000000.000000
1930.0001004.1250000.478714385.000000260.000000459.0000000.000000385.000000260.000000459.0000000.1329871.0000000.0000003.2500000.500000-0.1250000.478714
1940.0000004.0000000.00000034.75000034.00000035.0000000.00000034.75000034.00000035.0000000.0327221.0000000.0000002.0000000.0000001.0000000.000000
1950.0002003.5000000.496655251.000000183.000000348.0000000.000000251.000000183.000000348.0000000.2072041.0000000.0000002.0000000.6928200.5000000.408248
1960.0001004.7500001.848423223.50000058.000000510.0000000.000000223.50000058.000000510.0000000.1207321.0000000.0000003.2500002.5000000.5000000.707107
1970.0005001.0000000.00000062.25000032.00000088.0000000.00000062.25000032.00000088.0000000.5354391.0000000.000000-1.0000000.0000001.0000000.000000
1980.0030002.7500002.50000070.00000013.000000103.0000000.00000070.00000013.000000103.0000002.9587341.0000000.0000001.2500001.5000000.5000001.000000
1990.0004003.8750000.25000075.00000056.000000119.0000000.00000075.00000056.000000119.0000000.4033741.0000000.0000002.0000000.0000000.8750000.250000
2000.0002008.6250000.750000237.500000104.000000512.0000000.250000146.000000104.000000221.0000000.1794831.0000000.0000007.0000000.0000000.6250000.750000
2010.0001002.3500004.239890483.250000397.000000512.0000000.750000397.000000397.000000397.0000000.0894230.2500001.5000002.6000002.400000-0.5000000.408248
2020.0004009.0000000.000000114.000000110.000000120.0000000.000000114.000000110.000000120.0000000.3787401.0000000.0000007.0000000.0000001.0000000.000000
2030.0004005.8000000.000000160.750000150.000000170.0000000.000000160.750000150.000000170.0000000.3868501.0000000.0000003.8000000.0000001.0000000.000000
2040.0000009.0000000.00000038.00000038.00000038.0000000.00000038.00000038.00000038.0000000.0284711.0000000.0000007.0000000.0000001.0000000.000000
2050.0001004.0000000.00000034.00000032.00000035.0000000.00000034.00000032.00000035.0000000.0715491.0000000.0000002.0000000.0000001.0000000.000000
2060.0008003.2500001.50000037.75000028.00000054.0000000.00000037.75000028.00000054.0000000.7866701.0000000.0000001.2500001.5000001.0000000.000000
2070.0000004.1666670.544331512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0144941.0000000.0000003.6666670.544331-0.5000000.000000
2080.0002004.3250000.250000355.500000328.000000389.0000000.000000355.500000328.000000389.0000000.1676841.0000000.0000003.2000000.0000000.1250000.250000
2090.0009004.0000000.00000040.50000034.00000051.0000000.00000040.50000034.00000051.0000000.8534661.0000000.0000002.0000000.0000001.0000000.000000
2100.0001004.2500000.866025368.750000300.000000512.0000000.250000321.000000300.000000345.0000000.1298391.0000000.0000003.2500000.9574270.0000000.408248
2110.0001002.6250001.250000491.500000430.000000512.0000000.750000430.000000430.000000430.0000000.0514081.0000000.0000002.0000001.224745-0.3750000.250000
2120.0002003.3750001.450000256.000000190.000000365.0000000.000000256.000000190.000000365.0000000.1831751.0000000.0000002.0000001.2000000.3750000.250000
2130.0007009.0000000.000000112.750000111.000000114.0000000.000000112.750000111.000000114.0000000.7131631.0000000.0000007.0000000.0000001.0000000.000000
2140.0004001.8750001.600781200.000000114.000000287.0000000.000000200.000000114.000000287.0000000.3972981.0000000.0000000.5000001.7320510.3750000.250000
2150.0004009.0000000.000000115.500000106.000000139.0000000.000000115.500000106.000000139.0000000.3663541.0000000.0000007.0000000.0000001.0000000.000000
2160.0002005.2500001.554563350.000000180.000000512.0000000.250000296.000000180.000000474.0000000.2057161.0000000.0000004.3750001.750000-0.1250000.478714
2170.0002002.4750000.850000259.750000189.000000386.0000000.000000259.750000189.000000386.0000000.1572381.0000000.0000001.1000000.6000000.3750000.250000
2180.0000004.0000000.00000028.75000028.00000029.0000000.00000028.75000028.00000029.0000000.0307681.0000000.0000002.0000000.0000001.0000000.000000
2190.0002003.8750000.250000109.75000042.000000207.0000000.000000109.75000042.000000207.0000000.1959171.0000000.0000002.0000000.0000000.8750000.250000
2200.0002002.9500000.100000307.000000212.000000512.0000000.250000238.666672212.000000261.0000000.1894391.0000000.0000001.7000000.6000000.2500000.500000
2210.0001004.0000000.00000035.50000035.00000036.0000000.00000035.50000035.00000036.0000000.0504481.0000000.0000002.0000000.0000001.0000000.000000
2220.0002004.7500000.500000335.750000301.000000354.0000000.000000335.750000301.000000354.0000000.1773921.0000000.0000003.7500000.5000000.0000000.000000
2230.0005005.8000000.000000142.750000135.000000155.0000000.000000142.750000135.000000155.0000000.4723281.0000000.0000003.8000000.0000001.0000000.000000
2240.0001004.2750001.372042316.750000227.000000500.0000000.000000316.750000227.000000500.0000000.1391201.0000000.0000002.9000001.1489130.3750000.250000
2250.0001004.0000000.00000035.00000034.00000036.0000000.00000035.00000034.00000036.0000000.0782141.0000000.0000002.0000000.0000001.0000000.000000
2260.0001003.8250000.708872500.500000466.000000512.0000000.750000466.000000466.000000466.0000000.0506741.0000000.0000003.2000000.489898-0.3750000.250000
2270.0001004.8750000.478714391.500000280.000000512.0000000.250000351.333344280.000000440.0000000.1256861.0000000.0000003.7500000.5000000.1250000.478714
2280.0001004.2083330.567238452.500000336.000000512.0000000.500000393.000000336.000000450.0000000.0782881.0000000.0000003.3333330.384900-0.1250000.250000
2290.0002009.0000000.000000118.000000113.000000121.0000000.000000118.000000113.000000121.0000000.1609451.0000000.0000007.0000000.0000001.0000000.000000
2300.0004005.8000000.000000159.750000151.000000173.0000000.000000159.750000151.000000173.0000000.4256801.0000000.0000003.8000000.0000001.0000000.000000
2310.0000009.0000000.00000019.00000019.00000019.0000000.00000019.00000019.00000019.0000000.0147431.0000000.0000007.0000000.0000001.0000000.000000
2320.0000003.9500000.351188512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0141141.0000000.0000003.2000000.489898-0.2500000.288675
2330.0002004.3000000.594419334.000000246.000000393.0000000.000000334.000000246.000000393.0000000.1786141.0000000.0000003.0500000.7549830.2500000.288675
2340.0004000.6250000.75000026.25000022.00000031.0000000.00000026.25000022.00000031.0000000.3620010.6250000.750000-1.0000000.0000001.0000000.000000
2350.0004008.2857141.428571109.250000103.000000119.0000000.000000109.250000103.000000119.0000000.4088371.0000000.0000006.2857141.4285711.0000000.000000
2360.0004006.6000001.600000160.500000152.000000173.0000000.000000160.500000152.000000173.0000000.4245301.0000000.0000004.6000001.6000001.0000000.000000
2370.0000004.0000000.00000035.50000035.00000037.0000000.00000035.50000035.00000037.0000000.0489221.0000000.0000002.0000000.0000001.0000000.000000
2380.0001003.8500000.300000495.250000455.000000512.0000000.500000478.500000455.000000502.0000000.0686221.0000000.0000003.3500000.300000-0.5000000.000000
2390.0004009.0000000.000000104.500000102.000000110.0000000.000000104.500000102.000000110.0000000.3915461.0000000.0000007.0000000.0000001.0000000.000000
2400.0001004.5500000.288675471.750000351.000000512.0000000.750000351.000000351.000000351.0000000.0503771.0000000.0000003.8000000.000000-0.2500000.288675
2410.0001002.8250000.298608385.000000229.000000512.0000000.250000342.666687229.000000439.0000000.1380721.0000000.0000001.7000000.6000000.1250000.478714
2420.0003005.8000000.000000147.750000133.000000155.0000000.000000147.750000133.000000155.0000000.3094501.0000000.0000003.8000000.0000001.0000000.000000
2430.0008000.8750000.25000060.25000020.000000104.0000000.00000060.25000020.000000104.0000000.7636401.0000000.000000-1.0000000.0000000.8750000.250000
2440.0002004.3250000.250000383.750000340.000000423.0000000.000000383.750000340.000000423.0000000.1591221.0000000.0000003.2000000.0000000.1250000.250000
2450.0006004.0000000.00000072.50000052.00000089.0000000.00000072.50000052.00000089.0000000.5758031.0000000.0000002.0000000.0000001.0000000.000000
2460.0002004.4750000.320156355.250000323.000000395.0000000.000000355.250000323.000000395.0000000.1717801.0000000.0000003.3500000.3000000.1250000.250000
2470.0004009.0000000.000000105.250000103.000000111.0000000.000000105.250000103.000000111.0000000.4133061.0000000.0000007.0000000.0000001.0000000.000000
2480.0000009.0000000.00000021.00000021.00000021.0000000.00000021.00000021.00000021.0000000.0014021.0000000.0000007.0000000.0000001.0000000.000000
2490.0005009.0000000.000000111.500000105.000000126.0000000.000000111.500000105.000000126.0000000.4850611.0000000.0000007.0000000.0000001.0000000.000000
2500.0004003.8750000.250000128.75000080.000000218.0000000.000000128.75000080.000000218.0000000.3913121.0000000.0000002.0000000.0000000.8750000.250000
2510.0003003.5000000.000000152.250000128.000000180.0000000.000000152.250000128.000000180.0000000.3429531.0000000.0000002.0000000.0000000.5000000.000000
2520.0004005.8000000.000000150.250000145.000000155.0000000.000000150.250000145.000000155.0000000.3536071.0000000.0000003.8000000.0000001.0000000.000000
2530.0001004.5500000.288675467.750000406.000000512.0000000.500000423.500000406.000000441.0000000.0899151.0000000.0000003.8000000.000000-0.2500000.288675
2540.0004004.0000000.000000115.250000105.000000132.0000000.000000115.250000105.000000132.0000000.4167701.0000000.0000002.0000000.0000001.0000000.000000
2550.0005003.8750000.25000098.00000065.000000142.0000000.00000098.00000065.000000142.0000000.4854241.0000000.0000002.0000000.0000000.8750000.250000
2560.0003009.0000000.00000034.75000025.00000038.0000000.00000034.75000025.00000038.0000000.2958371.0000000.0000007.0000000.0000001.0000000.000000
2570.0005004.0000000.00000084.50000058.000000115.0000000.00000084.50000058.000000115.0000000.5272581.0000000.0000002.0000000.0000001.0000000.000000
2580.0024002.8750003.75000084.25000031.000000242.0000000.00000084.25000031.000000242.0000002.3520101.0000000.0000001.0000004.0000000.8750000.250000
2590.0010004.0000000.00000026.00000022.00000038.0000000.00000026.00000022.00000038.0000000.9506421.0000000.0000002.0000000.0000001.0000000.000000
2600.0000004.0000000.00000035.75000035.00000036.0000000.00000035.75000035.00000036.0000000.0167051.0000000.0000002.0000000.0000001.0000000.000000
2610.0001005.0000000.000000317.750000254.000000342.0000000.000000317.750000254.000000342.0000000.1386821.0000000.0000004.0000000.0000000.0000000.000000
2620.0005009.0000000.000000106.500000103.000000115.0000000.000000106.500000103.000000115.0000000.4702351.0000000.0000007.0000000.0000001.0000000.000000
2630.0001004.1000000.270801475.000000378.000000512.0000000.500000438.000000378.000000498.0000000.0756811.0000000.0000003.3500000.300000-0.2500000.288675
2640.0003003.8750000.25000081.75000045.000000171.0000000.00000081.75000045.000000171.0000000.2813021.0000000.0000002.0000000.0000000.8750000.250000
2650.0006000.6250000.75000076.75000042.00000094.0000000.00000076.75000042.00000094.0000000.5798450.6250000.750000-1.0000000.0000001.0000000.000000
2660.0000004.0000000.00000035.25000035.00000036.0000000.00000035.25000035.00000036.0000000.0352281.0000000.0000002.0000000.0000001.0000000.000000
2670.0000009.0000000.00000019.00000019.00000019.0000000.00000019.00000019.00000019.0000000.0150871.0000000.0000007.0000000.0000001.0000000.000000
2680.0002004.0000000.00000087.75000040.000000130.0000000.00000087.75000040.000000130.0000000.2338231.0000000.0000002.0000000.0000001.0000000.000000
2690.0006004.0000000.00000061.25000058.00000065.0000000.00000061.25000058.00000065.0000000.5712421.0000000.0000002.0000000.0000001.0000000.000000
2700.0003009.0000000.000000109.250000105.000000114.0000000.000000109.250000105.000000114.0000000.3144841.0000000.0000007.0000000.0000001.0000000.000000
2710.0001002.7500002.254625400.000000246.000000512.0000000.500000288.000000246.000000330.0000000.0942481.0000000.0000002.0000002.121320-0.2500000.288675
2720.0000004.0000000.00000035.75000035.00000036.0000000.00000035.75000035.00000036.0000000.0169871.0000000.0000002.0000000.0000001.0000000.000000
2730.0003005.8000000.000000170.000000159.000000180.0000000.000000170.000000159.000000180.0000000.3049931.0000000.0000003.8000000.0000001.0000000.000000
2740.0002004.8750000.250000364.750000316.000000462.0000000.000000364.750000316.000000462.0000000.1565751.0000000.0000004.0000000.000000-0.1250000.250000
2750.0000004.1250000.287228512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0206171.0000000.0000003.5000000.346410-0.3750000.250000
2760.0002001.0000000.00000027.00000022.00000029.0000000.00000027.00000022.00000029.0000000.2485331.0000000.000000-1.0000000.0000001.0000000.000000
2770.0004009.0000000.000000105.750000103.000000111.0000000.000000105.750000103.000000111.0000000.4087021.0000000.0000007.0000000.0000001.0000000.000000
2780.0002009.0000000.000000112.000000108.000000115.0000000.000000112.000000108.000000115.0000000.2130451.0000000.0000007.0000000.0000001.0000000.000000
2790.0005000.7500000.288675103.00000071.000000133.0000000.000000103.00000071.000000133.0000000.4730691.0000000.000000-1.0000000.0000000.7500000.288675
2800.0000004.0000000.00000038.00000038.00000038.0000000.00000038.00000038.00000038.0000000.0055321.0000000.0000002.0000000.0000001.0000000.000000
2810.0001004.0750001.141271443.500000349.000000512.0000000.500000375.000000349.000000401.0000000.0876451.0000000.0000003.2000001.200000-0.1250000.250000
2820.0002005.1250000.250000325.000000279.000000404.0000000.000000325.000000279.000000404.0000000.1822641.0000000.0000004.0000000.0000000.1250000.250000
2830.0021002.0000002.44949055.50000013.00000083.0000000.00000055.50000013.00000083.0000002.0682591.0000000.0000000.5000001.7320510.5000001.000000
2840.0004009.0000000.000000107.500000105.000000113.0000000.000000107.500000105.000000113.0000000.3975001.0000000.0000007.0000000.0000001.0000000.000000
2850.0002004.7000001.200000249.250000217.000000325.0000000.000000249.250000217.000000325.0000000.2187071.0000000.0000003.2000001.2000000.5000000.000000
2860.0002002.9000000.979796199.500000139.000000284.0000000.000000199.500000139.000000284.0000000.2398781.0000000.0000001.4000000.9797960.5000000.000000
2870.0000009.0000000.00000021.00000021.00000021.0000000.00000021.00000021.00000021.0000000.0012511.0000000.0000007.0000000.0000001.0000000.000000
2880.0006004.0000000.00000090.75000076.000000108.0000000.00000090.75000076.000000108.0000000.6164541.0000000.0000002.0000000.0000001.0000000.000000
2890.0002004.7500000.640312275.000000225.000000412.0000000.000000275.000000225.000000412.0000000.1839471.0000000.0000003.5000000.6000000.2500000.500000
2900.0000004.4166670.319142512.000000512.000000512.0000001.0000000.0000000.0000000.0000000.0178411.0000000.0000003.6666670.544331-0.2500000.500000
2910.0003003.3250000.850000228.250000177.000000274.0000000.000000228.250000177.000000274.0000000.2553601.0000000.0000001.7000000.6000000.6250000.250000
2920.0001004.5500000.288675442.500000331.000000512.0000000.500000373.000000331.000000415.0000000.0878261.0000000.0000003.8000000.000000-0.2500000.288675
2930.0002005.8000000.000000154.500000150.000000157.0000000.000000154.500000150.000000157.0000000.1996201.0000000.0000003.8000000.0000001.0000000.000000
2940.0004008.5000000.000000183.500000163.000000210.0000000.000000183.500000163.000000210.0000000.3577331.0000000.0000007.0000000.0000000.5000000.000000
2950.0002005.8000000.000000146.750000141.000000149.0000000.000000146.750000141.000000149.0000000.1816351.0000000.0000003.8000000.0000001.0000000.000000
2960.0005009.0000000.00000052.75000038.00000097.0000000.00000052.75000038.00000097.0000000.4795421.0000000.0000007.0000000.0000001.0000000.000000
2970.0005009.0000000.000000111.750000104.000000125.0000000.000000111.750000104.000000125.0000000.5010091.0000000.0000007.0000000.0000001.0000000.000000
2980.0000004.0000000.00000036.00000036.00000036.0000000.00000036.00000036.00000036.0000000.0090921.0000000.0000002.0000000.0000001.0000000.000000
2990.0003003.8750000.250000128.500000105.000000172.0000000.000000128.500000105.000000172.0000000.3394551.0000000.0000002.0000000.0000000.8750000.250000
3000.0002004.5750000.250000374.500000346.000000420.0000000.000000374.500000346.000000420.0000000.1670721.0000000.0000003.2000000.0000000.3750000.250000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "--- [task_0017] [medium] ---\n", + "Instruction: Onboard new hire Maria Garcia to Marketing as L1 Marketing Associate. Create the...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0018] [medium] ---\n", + "Instruction: Onboard new hire James Wilson to Data Science as L2 Data Analyst. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0072] [complex] ---\n", + "Instruction: Rehire Marie Dubois (emp_0064) who was previously offboarded. Update their statu...\n", + "Tool calls: ['onboarding_create_request', 'hr_search_employees', 'hr_get_org_chart', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'it_get_software_licenses', 'access_assign_role', 'access_get_security_groups']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0036] [simple] ---\n", + "Instruction: Check the offboarding status for Min Hu (emp_0113)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0067] [edge_case] ---\n", + "Instruction: Before onboarding a new Security team member, look up the badge access policy an...\n", + "Tool calls: ['hr_read_employee']\n", + "Rubric: 0% (0/2)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0011] [simple] ---\n", + "Instruction: Check the onboarding status for employee Rohan Patel (emp_0011)....\n", + "Tool calls: ['hr_get_org_chart']\n", + "Rubric: 0% (0/2)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0048] [complex] ---\n", + "Instruction: Fully offboard Henrik Becker (emp_0069), a L4 Head of Enterprise Sales in Sales ...\n", + "Tool calls: ['offboarding_create_request', 'offboarding_get_status', 'access_revoke_role', 'access_get_security_groups', 'reassign_report', 'email_send']\n", + "Rubric: 50% (3/6)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0002] [simple] ---\n", + "Instruction: Look up the employee record for Ingrid Larsson (ID: emp_0025)....\n", + "Tool calls: ['hr_read_employee']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0046] [medium] ---\n", + "Instruction: Initiate offboarding for Brian Jones (emp_0075) who taking a career break. Set t...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0056] [edge_case] ---\n", + "Instruction: Onboard a new L1 Associate to the Marketing department. Create their employee re...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0014] [simple] ---\n", + "Instruction: Check if there are available laptops and Jira licenses for a new Engineering hir...\n", + "Tool calls: ['it_get_available_assets']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0030] [complex] ---\n", + "Instruction: Onboard Sanjay Gupta as L2 Security Analyst in Security. Create the employee rec...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'hr_update_employee', 'hr_search_employees', 'hr_get_org_chart', 'onboarding_get_status', 'onboarding_complete_step', 'it_assign_asset']\n", + "Rubric: 30% (3/10)\n", + "Reward: 0.80\n", + "\n", + "--- [task_0053] [complex] ---\n", + "Instruction: Process the complete offboarding for Marta Wagner (emp_0110) from HR. Create the...\n", + "Tool calls: ['onboarding_complete_step', 'it_get_available_assets', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule']\n", + "Rubric: 20% (1/5)\n", + "Reward: 0.20\n", + "\n", + "--- [task_0070] [complex] ---\n", + "Instruction: Robert Garcia (emp_0133) is transferring from Data Science to Engineering. Proce...\n", + "Tool calls: ['offboard_access', 'offboard_request', 'access_assign_role', 'onboarding_create_request', 'access_create_badge', 'access_get_security_groups', 'slack_send_message']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0027] [complex] ---\n", + "Instruction: Fully onboard Carlos Mendez as L3 Senior Security Engineer in Security. Their ma...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'it_assign_asset', 'it_get_available_assets', 'it_create_account']\n", + "Rubric: 50% (5/10)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0068] [complex] ---\n", + "Instruction: Patricia Brown (emp_0172) is transferring from Engineering to Product. Process t...\n", + "Tool calls: ['offboard_access', 'onboard_request', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0074] [medium] ---\n", + "Instruction: The Product team is onboarding 2 new hires at the same time. Check available ass...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_software_licenses']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0025] [complex] ---\n", + "Instruction: Fully onboard John Lee as L3 Team Lead - ML in Data Science. Their manager will ...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_complete_step', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status', 'onboarding_get_status']\n", + "Rubric: 30% (3/10)\n", + "Reward: 0.80\n", + "\n", + "--- [task_0032] [complex] ---\n", + "Instruction: Onboard Hassan Ahmed as L3 Lead Data Scientist in Data Science. Create the emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'hr_get_org_chart', 'hr_assign_asset', 'it_create_account', 'access_assign_role', 'access_create_badge', 'email_send']\n", + "Rubric: 67% (6/9)\n", + "Reward: 3.00\n", + "\n", + "--- [task_0054] [complex] ---\n", + "Instruction: Process the complete offboarding for Jun Zheng (emp_0068) from Sales. Create the...\n", + "Tool calls: ['offboarding_create_request', 'offboarding_get_status', 'offboarding_revoke_access', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'meeting_schedule', 'hr_get_org_chart', 'hr_update_employee', 'hr_search_employees', 'hr_update_employee']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0031] [complex] ---\n", + "Instruction: Onboard Nina Petrova as L4 Director of Platform in Engineering. Create the emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_create_account', 'access_assign_role']\n", + "Rubric: 50% (5/10)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0029] [complex] ---\n", + "Instruction: Fully onboard Raj Kapoor as L2 Backend Developer in Engineering. Their manager w...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'onboarding_get_status', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_get_software_licenses']\n", + "Rubric: 50% (5/10)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0067] [edge_case] ---\n", + "Instruction: Before onboarding a new Security team member, look up the badge access policy an...\n", + "Tool calls: ['hr_read_employee', 'hr_search_employees', 'hr_get_org_chart', 'hr_update_employee']\n", + "Rubric: 0% (0/2)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0034] [complex] ---\n", + "Instruction: Onboard Kevin O'Brien as L4 VP of Product in Product. Create the employee record...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'access_assign_role', 'access_create_badge', 'it_get_available_assets', 'it_create_account', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups']\n", + "Rubric: 50% (5/10)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0018] [medium] ---\n", + "Instruction: Onboard new hire James Wilson to Data Science as L2 Data Analyst. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0056] [edge_case] ---\n", + "Instruction: Onboard a new L1 Associate to the Marketing department. Create their employee re...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0054] [complex] ---\n", + "Instruction: Process the complete offboarding for Jun Zheng (emp_0068) from Sales. Create the...\n", + "Tool calls: ['offboarding_create_request', 'offboarding_create_request', 'access_revoke_role', 'access_revoke_role', 'it_get_available_assets', 'it_get_available_assets', 'email_send', 'slack_send_message']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0046] [medium] ---\n", + "Instruction: Initiate offboarding for Brian Jones (emp_0075) who taking a career break. Set t...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0073] [medium] ---\n", + "Instruction: The Engineering team is onboarding 2 new hires at the same time. Check available...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_software_licenses']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0074] [medium] ---\n", + "Instruction: The Product team is onboarding 2 new hires at the same time. Check available ass...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_software_licenses', 'access_assign_role']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0048] [complex] ---\n", + "Instruction: Fully offboard Henrik Becker (emp_0069), a L4 Head of Enterprise Sales in Sales ...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'offboarding_create_request', 'email_send', 'it_get_available_assets', 'access_assign_role', 'onboarding_get_status', 'onboarding_complete_step', '__done__']\n", + "Rubric: 67% (4/6)\n", + "Reward: 3.00\n", + "\n", + "--- [task_0040] [medium] ---\n", + "Instruction: Initiate offboarding for Kavya Desai (emp_0034) who resigning. Set the reason to...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0011] [simple] ---\n", + "Instruction: Check the onboarding status for employee Rohan Patel (emp_0011)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0023] [medium] ---\n", + "Instruction: Onboard new hire Li Wei to Engineering as L3 Senior Engineer. Create their emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0045] [medium] ---\n", + "Instruction: Initiate offboarding for Susan Davis (emp_0091) who retiring. Set the reason to ...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0030] [complex] ---\n", + "Instruction: Onboard Sanjay Gupta as L2 Security Analyst in Security. Create the employee rec...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule']\n", + "Rubric: 60% (6/10)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0061] [edge_case] ---\n", + "Instruction: Onboard contractor Amit Verma to Engineering as an L2 Contract Developer. Contra...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request', 'offboarding_get_status', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'it_get_software_licenses', 'access_assign_role', 'access_get_security_groups']\n", + "Rubric: 50% (2/4)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0065] [edge_case] ---\n", + "Instruction: Assign the security_admin access role to a new L1 Security Associate. The securi...\n", + "Tool calls: ['access_assign_role']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0036] [simple] ---\n", + "Instruction: Check the offboarding status for Min Hu (emp_0113)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0038] [simple] ---\n", + "Instruction: Check the offboarding status for Tao Chen (emp_0020)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0046] [medium] ---\n", + "Instruction: Initiate offboarding for Brian Jones (emp_0075) who taking a career break. Set t...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0040] [medium] ---\n", + "Instruction: Initiate offboarding for Kavya Desai (emp_0034) who resigning. Set the reason to...\n", + "Tool calls: ['offboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0002] [simple] ---\n", + "Instruction: Look up the employee record for Ingrid Larsson (ID: emp_0025)....\n", + "Tool calls: ['hr_read_employee']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0051] [complex] ---\n", + "Instruction: Fully offboard Sergio Ferrari (emp_0198), a L3 Security Architect in Security wh...\n", + "Tool calls: ['offboarding_create_request', 'offboarding_get_status', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'onboarding_get_status', 'access_assign_role', 'access_get_security_groups', 'meeting_schedule', 'hr_update_employee']\n", + "Rubric: 83% (5/6)\n", + "Reward: 4.00\n", + "\n", + "--- [task_0066] [edge_case] ---\n", + "Instruction: A Marketing employee needs access to the Engineering GitHub repository. Try to a...\n", + "Tool calls: ['access_assign_role']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0019] [medium] ---\n", + "Instruction: Onboard new hire Aisha Patel to Sales as L1 Sales Representative. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0014] [simple] ---\n", + "Instruction: Check if there are available laptops and Jira licenses for a new Engineering hir...\n", + "Tool calls: ['it_get_available_assets']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0018] [medium] ---\n", + "Instruction: Onboard new hire James Wilson to Data Science as L2 Data Analyst. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0036] [simple] ---\n", + "Instruction: Check the offboarding status for Min Hu (emp_0113)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0035] [simple] ---\n", + "Instruction: Check the offboarding status for Thomas White (emp_0035)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0042] [medium] ---\n", + "Instruction: Initiate offboarding for Pierre Laurent (emp_0153) who moving to a different cit...\n", + "Tool calls: ['offboarding_create_request', 'access_revoke_role', 'email_send']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0023] [medium] ---\n", + "Instruction: Onboard new hire Li Wei to Engineering as L3 Senior Engineer. Create their emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "Error replaying: 1 validation error for HROnboardingAction\n", + "arguments\n", + " Input should be a valid dictionary [type=dict_type, input_value=\"['offboarding']\", input_type=str]\n", + " For further information visit https://errors.pydantic.dev/2.12/v/dict_type\n", + "\n", + "--- [task_0074] [medium] ---\n", + "Instruction: The Product team is onboarding 2 new hires at the same time. Check available ass...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_assets', 'it_create_account', 'it_create_account']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0011] [simple] ---\n", + "Instruction: Check the onboarding status for employee Rohan Patel (emp_0011)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0073] [medium] ---\n", + "Instruction: The Engineering team is onboarding 2 new hires at the same time. Check available...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_software_licenses', 'access_assign_role', 'access_create_badge', 'email_send']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0053] [complex] ---\n", + "Instruction: Process the complete offboarding for Marta Wagner (emp_0110) from HR. Create the...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'meeting_schedule', 'onboarding_get_status', 'access_get_security_groups', 'it_get_available_assets', 'access_assign_role']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0031] [complex] ---\n", + "Instruction: Onboard Nina Petrova as L4 Director of Platform in Engineering. Create the emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule']\n", + "Rubric: 80% (8/10)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0016] [medium] ---\n", + "Instruction: Onboard new hire Alex Chen to Product as L2 Product Analyst. Create their employ...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0070] [complex] ---\n", + "Instruction: Robert Garcia (emp_0133) is transferring from Data Science to Engineering. Proce...\n", + "Tool calls: ['offboarding_create_request', 'access_revoke_role', 'onboarding_get_status']\n", + "Rubric: 0% (0/5)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0034] [complex] ---\n", + "Instruction: Onboard Kevin O'Brien as L4 VP of Product in Product. Create the employee record...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0068] [complex] ---\n", + "Instruction: Patricia Brown (emp_0172) is transferring from Engineering to Product. Process t...\n", + "Tool calls: ['offboard_request', 'access_revoke', 'access_assign_role', 'email_send', 'it_get_software_licenses']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0034] [complex] ---\n", + "Instruction: Onboard Kevin O'Brien as L4 VP of Product in Product. Create the employee record...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_get_software_licenses', 'it_assign_asset', 'it_create_account', 'access_assign_role', 'access_create_badge']\n", + "Rubric: 60% (6/10)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0045] [medium] ---\n", + "Instruction: Initiate offboarding for Susan Davis (emp_0091) who retiring. Set the reason to ...\n", + "Tool calls: ['offboarding_create_request', 'offboarding_get_status', 'it_revoke_access']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0077] [complex] ---\n", + "Instruction: Manager Ananya Reddy (emp_0007) in Engineering is leaving. They have 2 direct re...\n", + "Tool calls: ['it_revoke_access', 'it_get_software_licenses', 'onboarding_get_status', 'access_revoke_role', 'email_send']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0012] [simple] ---\n", + "Instruction: Check the onboarding status for employee Astrid Koch (emp_0073)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status', 'onboarding_complete_step']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0070] [complex] ---\n", + "Instruction: Robert Garcia (emp_0133) is transferring from Data Science to Engineering. Proce...\n", + "Tool calls: ['offboarding_create_request', 'access_revoke_role', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0038] [simple] ---\n", + "Instruction: Check the offboarding status for Tao Chen (emp_0020)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0002] [simple] ---\n", + "Instruction: Look up the employee record for Ingrid Larsson (ID: emp_0025)....\n", + "Tool calls: ['hr_read_employee']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0072] [complex] ---\n", + "Instruction: Rehire Marie Dubois (emp_0064) who was previously offboarded. Update their statu...\n", + "Tool calls: ['it_revoke_access', 'onboarding_create_request', 'email_send']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0051] [complex] ---\n", + "Instruction: Fully offboard Sergio Ferrari (emp_0198), a L3 Security Architect in Security wh...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'email_send', 'meeting_schedule', 'hr_update_employee', 'hr_get_org_chart', 'hr_search_employees']\n", + "Rubric: 83% (5/6)\n", + "Reward: 4.00\n", + "\n", + "--- [task_0056] [edge_case] ---\n", + "Instruction: Onboard a new L1 Associate to the Marketing department. Create their employee re...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0020] [medium] ---\n", + "Instruction: Onboard new hire Tom Nguyen to Finance as L2 Financial Analyst. Create their emp...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0061] [edge_case] ---\n", + "Instruction: Onboard contractor Amit Verma to Engineering as an L2 Contract Developer. Contra...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'it_assign_asset', 'onboarding_get_status', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'it_get_software_licenses', 'email_send', 'slack_send_message', 'meeting_schedule', 'policy_lookup']\n", + "Rubric: 50% (2/4)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0029] [complex] ---\n", + "Instruction: Fully onboard Raj Kapoor as L2 Backend Developer in Engineering. Their manager w...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'email_send', 'it_get_available_assets', 'meeting_schedule']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0059] [edge_case] ---\n", + "Instruction: Check if there are available LinkedIn Sales Navigator licenses for a new Sales h...\n", + "Tool calls: ['it_get_available_assets']\n", + "Rubric: 0% (0/1)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0050] [complex] ---\n", + "Instruction: Fully offboard Lei Huang (emp_0032), a L4 Group Product Manager in Product who i...\n", + "Tool calls: ['it_revoke_access', 'it_create_account', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule']\n", + "Rubric: 67% (4/6)\n", + "Reward: 3.00\n", + "\n", + "--- [task_0068] [complex] ---\n", + "Instruction: Patricia Brown (emp_0172) is transferring from Engineering to Product. Process t...\n", + "Tool calls: ['it_revoke_access', 'it_get_software_licenses', 'it_assign_asset', 'onboarding_create_request', 'it_update_employee', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule', '__done__']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0013] [simple] ---\n", + "Instruction: Check the onboarding status for employee Hui Zhou (emp_0040)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0016] [medium] ---\n", + "Instruction: Onboard new hire Alex Chen to Product as L2 Product Analyst. Create their employ...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0042] [medium] ---\n", + "Instruction: Initiate offboarding for Pierre Laurent (emp_0153) who moving to a different cit...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0066] [edge_case] ---\n", + "Instruction: A Marketing employee needs access to the Engineering GitHub repository. Try to a...\n", + "Tool calls: ['access_assign_role']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0027] [complex] ---\n", + "Instruction: Fully onboard Carlos Mendez as L3 Senior Security Engineer in Security. Their ma...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'access_assign_role', 'email_send', 'access_create_badge', 'meeting_schedule']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0061] [edge_case] ---\n", + "Instruction: Onboard contractor Amit Verma to Engineering as an L2 Contract Developer. Contra...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'it_get_software_licenses', 'access_create_badge', 'email_send', 'slack_send_message', 'meeting_schedule', 'policy_lookup', 'approval_request']\n", + "Rubric: 25% (1/4)\n", + "Reward: 0.50\n", + "\n", + "--- [task_0018] [medium] ---\n", + "Instruction: Onboard new hire James Wilson to Data Science as L2 Data Analyst. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0064] [edge_case] ---\n", + "Instruction: Jennifer Davis (emp_0016) is being terminated effective immediately. Follow the ...\n", + "Tool calls: ['it_revoke_access', 'it_get_available_assets', 'onboarding_create_request', 'offboarding_create_request', 'it_get_software_licenses', 'access_revoke_role', 'email_send']\n", + "Rubric: 75% (3/4)\n", + "Reward: 3.50\n", + "\n", + "--- [task_0066] [edge_case] ---\n", + "Instruction: A Marketing employee needs access to the Engineering GitHub repository. Try to a...\n", + "Tool calls: ['access_assign_role']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0038] [simple] ---\n", + "Instruction: Check the offboarding status for Tao Chen (emp_0020)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0045] [medium] ---\n", + "Instruction: Initiate offboarding for Susan Davis (emp_0091) who retiring. Set the reason to ...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0031] [complex] ---\n", + "Instruction: Onboard Nina Petrova as L4 Director of Platform in Engineering. Create the emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'access_assign_role', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'access_assign_role', 'it_get_software_licenses', 'email_send', 'meeting_schedule', 'access_get_security_groups']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0032] [complex] ---\n", + "Instruction: Onboard Hassan Ahmed as L3 Lead Data Scientist in Data Science. Create the emplo...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + "Rubric: 67% (6/9)\n", + "Reward: 3.00\n", + "\n", + "--- [task_0006] [simple] ---\n", + "Instruction: Show me the organizational chart for the Finance department....\n", + "Tool calls: ['hr_get_org_chart']\n", + "Rubric: 100% (2/2)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0027] [complex] ---\n", + "Instruction: Fully onboard Carlos Mendez as L3 Senior Security Engineer in Security. Their ma...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'access_assign_role', 'email_send', 'access_create_badge', 'it_get_software_licenses', 'meeting_schedule']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0042] [medium] ---\n", + "Instruction: Initiate offboarding for Pierre Laurent (emp_0153) who moving to a different cit...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + "Rubric: 100% (5/5)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0034] [complex] ---\n", + "Instruction: Onboard Kevin O'Brien as L4 VP of Product in Product. Create the employee record...\n", + "Tool calls: ['hr_create_employee', 'hr_read_employee', 'onboarding_create_request', 'it_assign_asset', 'access_assign_role', 'it_get_available_assets', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule', 'it_get_software_licenses', 'policy_lookup']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0077] [complex] ---\n", + "Instruction: Manager Ananya Reddy (emp_0007) in Engineering is leaving. They have 2 direct re...\n", + "Tool calls: ['it_revoke_access', 'access_create_badge', 'email_send', 'it_get_software_licenses', 'meeting_schedule', 'hr_update_employee', 'onboarding_get_status', 'hr_search_employees', 'onboarding_create_request', 'hr_update_employee', 'access_revoke_role', 'access_get_security_groups']\n", + "Rubric: 60% (3/5)\n", + "Reward: 2.60\n", + "\n", + "--- [task_0058] [edge_case] ---\n", + "Instruction: Assign a Netsuite license to a new Finance hire. Check license availability firs...\n", + "Tool calls: ['it_assign_asset', 'hr_search_employees', 'it_get_available_assets']\n", + "Rubric: 0% (0/2)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0029] [complex] ---\n", + "Instruction: Fully onboard Raj Kapoor as L2 Backend Developer in Engineering. Their manager w...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_create_account', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + "Rubric: 80% (8/10)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0007] [simple] ---\n", + "Instruction: What laptops are currently available for assignment?...\n", + "Tool calls: ['it_get_available_assets']\n", + "Rubric: 100% (2/2)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0073] [medium] ---\n", + "Instruction: The Engineering team is onboarding 2 new hires at the same time. Check available...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets', 'hr_search_employees']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0053] [complex] ---\n", + "Instruction: Process the complete offboarding for Marta Wagner (emp_0110) from HR. Create the...\n", + "Tool calls: ['offboarding_create_request', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'it_get_available_assets', 'it_create_account', 'it_revoke_access', 'access_get_security_groups', 'email_send', 'access_revoke_role', 'it_get_available_assets']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0005] [simple] ---\n", + "Instruction: List all employees in the Engineering department....\n", + "Tool calls: ['hr_search_employees']\n", + "Rubric: 100% (2/2)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0010] [simple] ---\n", + "Instruction: List all security groups and their accessible resources....\n", + "Tool calls: ['it_get_software_licenses']\n", + "Rubric: 0% (0/1)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0048] [complex] ---\n", + "Instruction: Fully offboard Henrik Becker (emp_0069), a L4 Head of Enterprise Sales in Sales ...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'hr_update_employee', 'hr_get_org_chart']\n", + "Rubric: 83% (5/6)\n", + "Reward: 4.00\n", + "\n", + "--- [task_0030] [complex] ---\n", + "Instruction: Onboard Sanjay Gupta as L2 Security Analyst in Security. Create the employee rec...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request', 'access_assign_role', 'it_assign_asset', 'it_create_account', 'email_send', 'meeting_schedule']\n", + "Rubric: 70% (7/10)\n", + "Reward: 3.20\n", + "\n", + "--- [task_0035] [simple] ---\n", + "Instruction: Check the offboarding status for Thomas White (emp_0035)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0074] [medium] ---\n", + "Instruction: The Product team is onboarding 2 new hires at the same time. Check available ass...\n", + "Tool calls: ['it_get_available_assets', 'it_get_available_software_licenses', 'email_send']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0064] [edge_case] ---\n", + "Instruction: Jennifer Davis (emp_0016) is being terminated effective immediately. Follow the ...\n", + "Tool calls: ['it_revoke_access', 'it_get_available_assets', 'it_create_account', 'onboarding_complete_step', 'access_revoke_role', 'access_get_security_groups', 'email_send', '__done__']\n", + "Rubric: 50% (2/4)\n", + "Reward: 2.00\n", + "Error replaying: 1 validation error for HROnboardingAction\n", + "arguments\n", + " Input should be a valid dictionary [type=dict_type, input_value='[]', input_type=str]\n", + " For further information visit https://errors.pydantic.dev/2.12/v/dict_type\n", + "\n", + "--- [task_0042] [medium] ---\n", + "Instruction: Initiate offboarding for Pierre Laurent (emp_0153) who moving to a different cit...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0059] [edge_case] ---\n", + "Instruction: Check if there are available LinkedIn Sales Navigator licenses for a new Sales h...\n", + "Tool calls: ['it_get_available_assets']\n", + "Rubric: 0% (0/1)\n", + "Reward: -1.00\n", + "\n", + "--- [task_0019] [medium] ---\n", + "Instruction: Onboard new hire Aisha Patel to Sales as L1 Sales Representative. Create their e...\n", + "Tool calls: ['hr_create_employee', 'onboarding_create_request']\n", + "Rubric: 100% (7/7)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0053] [complex] ---\n", + "Instruction: Process the complete offboarding for Marta Wagner (emp_0110) from HR. Create the...\n", + "Tool calls: ['onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'it_revoke_access', 'access_revoke_role', 'it_get_available_assets', 'email_send']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0012] [simple] ---\n", + "Instruction: Check the onboarding status for employee Astrid Koch (emp_0073)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0070] [complex] ---\n", + "Instruction: Robert Garcia (emp_0133) is transferring from Data Science to Engineering. Proce...\n", + "Tool calls: ['offboard_request', 'onboarding_create_request', 'access_assign_role', 'email_send']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0013] [simple] ---\n", + "Instruction: Check the onboarding status for employee Hui Zhou (emp_0040)....\n", + "Tool calls: ['hr_get_org_chart', 'onboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n", + "\n", + "--- [task_0077] [complex] ---\n", + "Instruction: Manager Ananya Reddy (emp_0007) in Engineering is leaving. They have 2 direct re...\n", + "Tool calls: ['it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'email_send', 'meeting_schedule', 'onboarding_create_request']\n", + "Rubric: 40% (2/5)\n", + "Reward: 1.40\n", + "\n", + "--- [task_0040] [medium] ---\n", + "Instruction: Initiate offboarding for Kavya Desai (emp_0034) who resigning. Set the reason to...\n", + "Tool calls: ['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + "Rubric: 80% (4/5)\n", + "Reward: 3.80\n", + "\n", + "--- [task_0005] [simple] ---\n", + "Instruction: List all employees in the Engineering department....\n", + "Tool calls: ['hr_search_employees', 'hr_search_employees']\n", + "Rubric: 100% (2/2)\n", + "Reward: 7.00\n", + "\n", + "--- [task_0037] [simple] ---\n", + "Instruction: Check the offboarding status for Shan Lin (emp_0142)....\n", + "Tool calls: ['offboarding_get_status']\n", + "Rubric: 50% (1/2)\n", + "Reward: 2.00\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "

Run history:


profiling/Time taken: UnslothGRPOTrainer._calculate_rewards\u2581\u2583\u2582\u2583\u2583\u2588\u2584\u2583\u2582\u2583\u2583\u2582\u2586\u2583\u2584\u2581\u2586\u2582\u2585\u2582\u2587\u2584\u2581\u2585\u2582\u2583\u2586\u2586\u2586\u2584\u2583\u2584\u2582\u2583\u2582\u2585\u2583\u2586\u2584\u2583
profiling/Time taken: UnslothGRPOTrainer._prepare_inputs\u2581\u2588\u2588\u2588\u2587\u2588\u2581\u2583\u2587\u2586\u2583\u2581\u2581\u2582\u2588\u2581\u2588\u2587\u2583\u2582\u2587\u2585\u2583\u2587\u2583\u2586\u2581\u2587\u2582\u2582\u2582\u2581\u2587\u2583\u2581\u2582\u2582\u2581\u2588\u2582
profiling/Time taken: UnslothGRPOTrainer.efficiency_reward\u2588\u2581\u2581\u2587\u2581\u2582\u2581\u2584\u2584\u2581\u2581\u2584\u2587\u2581\u2581\u2586\u2581\u2586\u2581\u2585\u2582\u2582\u2583\u2583\u2581\u2582\u2582\u2587\u2583\u2588\u2582\u2581\u2586\u2586\u2583\u2582\u2582\u2583\u2581\u2582
profiling/Time taken: UnslothGRPOTrainer.rubric_reward\u2583\u2583\u2582\u2583\u2583\u2582\u2584\u2582\u2588\u2583\u2583\u2581\u2582\u2583\u2585\u2588\u2584\u2588\u2585\u2586\u2585\u2582\u2582\u2586\u2584\u2587\u2582\u2583\u2584\u2584\u2583\u2582\u2585\u2583\u2586\u2585\u2584\u2583\u2582\u2583
profiling/Time taken: UnslothGRPOTrainer.transformers.generate\u2588\u2584\u2581\u2588\u2586\u2581\u2588\u2583\u2581\u2588\u2588\u2582\u2582\u2588\u2588\u2587\u2581\u2581\u2583\u2588\u2582\u2581\u2584\u2588\u2583\u2588\u2586\u2582\u2582\u2581\u2588\u2588\u2582\u2581\u2582\u2581\u2581\u2588\u2582\u2581
profiling/Time taken: UnslothGRPOTrainer.valid_json_reward\u2582\u2583\u2581\u2586\u2585\u2581\u2583\u2583\u2582\u2581\u2585\u2585\u2583\u2585\u2582\u2588\u2585\u2582\u2583\u2585\u2582\u2584\u2582\u2583\u2581\u2586\u2582\u2583\u2585\u2584\u2583\u2588\u2585\u2586\u2582\u2583\u2582\u2581\u2581\u2583
train/clip_ratio/high_max\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581
train/clip_ratio/high_mean\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581
train/clip_ratio/low_mean\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581
train/clip_ratio/low_min\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581\u2581
+25...

Run summary:


profiling/Time taken: UnslothGRPOTrainer._calculate_rewards0.03229
profiling/Time taken: UnslothGRPOTrainer._prepare_inputs6.67588
profiling/Time taken: UnslothGRPOTrainer.efficiency_reward0.00021
profiling/Time taken: UnslothGRPOTrainer.rubric_reward0.03116
profiling/Time taken: UnslothGRPOTrainer.transformers.generate6.58671
profiling/Time taken: UnslothGRPOTrainer.valid_json_reward0.00024
total_flos0
train/clip_ratio/high_max0
train/clip_ratio/high_mean0
train/clip_ratio/low_mean0
+30...

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run comfy-cherry-23 at: https://wandb.ai/ravi03071991/hr-agent-training/runs/bgent3o3
View project at: https://wandb.ai/ravi03071991/hr-agent-training
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20260308_175735-bgent3o3/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "TrainOutput(global_step=300, training_loss=0.0006788890576863888, metrics={'train_runtime': 1281.0975, 'train_samples_per_second': 0.937, 'train_steps_per_second': 0.234, 'total_flos': 0.0, 'train_loss': 0.0006788890576863888})" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import wandb \n", + "import trl.extras.profiling \n", + "trl.extras.profiling.wandb = wandb \n", + "\n", + "trainer.train() \n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "5a4aff68", + "metadata": {}, + "source": [ + "## Testing the Trained Model\n", + "\n", + "Let's see what the RL-trained model generates compared to the base model:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "47ae8bbe", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Task: task_0024 (task_idx=23)\n", + "Instruction: Onboard new hire Emma Davis to Product as L3 Senior PM. Create their employee record and initiate the onboarding request.\n", + "\n", + "Model output:\n", + "----------------------------------------\n", + "{\"tool\": \"hr_create_employee\", \"params\": {\"name\": \"Emma Davis\", \"department\": \"Product\", \"level\": \"L3\", \"role\": \"Senior PM\", \"manager_id\": \"John Smith\", \"is_contractor\": \"false\", \"location\": \"New York\", \"date_of_joining\": \"2023-01-01\"}}; {\"tool\": \"onboarding_create_request\", \"params\": {\"request_id\": \"1\", \"employee_id\": \"Emma Davis\"}}<|eot_id|>\n", + "\n", + "\n", + "Tool calls extracted: ['hr_create_employee', 'onboarding_create_request']\n", + "\n", + "Rubric score: 100% (7/7)\n", + "Passed: True\n", + " [PASS] created_employee: Created employee record\n", + " [PASS] correct_name: Used correct name\n", + " [PASS] correct_dept: Assigned to correct department\n", + " [PASS] correct_level: Set correct level\n", + " [PASS] correct_role: Set correct role\n", + " [PASS] initiated_onboarding: Created onboarding request\n", + " [PASS] sequencing: Created employee before onboarding request\n" + ] + } + ], + "source": [ + "# Test on a medium task from our selected set\n", + "test_task = [p for p in test_prompts if p[\"difficulty\"] == \"medium\"][0]\n", + "print(f\"Task: {test_task['task_id']} (task_idx={test_task['task_idx']})\")\n", + "print(f\"Instruction: {test_task['prompt'][1]['content']}\\n\")\n", + "print(\"Model output:\")\n", + "print(\"-\" * 40)\n", + "\n", + "text = tokenizer.apply_chat_template(\n", + " test_task[\"prompt\"],\n", + " tokenize=False,\n", + " add_generation_prompt=True,\n", + ")\n", + "\n", + "from transformers import TextStreamer\n", + "\n", + "inputs = tokenizer(text, return_tensors=\"pt\").to(\"cuda\")\n", + "outputs = model.generate(\n", + " **inputs,\n", + " temperature=0.1,\n", + " max_new_tokens=512,\n", + " streamer=TextStreamer(tokenizer, skip_prompt=True),\n", + ")\n", + "\n", + "# Evaluate the output\n", + "response = tokenizer.decode(outputs[0][inputs[\"input_ids\"].shape[1]:], skip_special_tokens=True)\n", + "calls = extract_tool_calls(response)\n", + "print(f\"\\n\\nTool calls extracted: {[c['tool'] for c in calls]}\")\n", + "\n", + "if calls:\n", + " eval_result, steps = replay_tool_calls(test_task[\"task_idx\"], calls)\n", + " print(f\"\\nRubric score: {eval_result['score']:.0%} ({eval_result['passed_count']}/{eval_result['total_criteria']})\")\n", + " print(f\"Passed: {eval_result['passed']}\")\n", + " for c in eval_result[\"criteria_results\"]:\n", + " print(f\" [{'PASS' if c['passed'] else 'FAIL'}] {c['name']}: {c['description']}\")" + ] + }, + { + "cell_type": "markdown", + "id": "66533682", + "metadata": {}, + "source": [ + "## Post-Training Evaluation\n", + "\n", + "Now we evaluate the trained model on both train and **held-out test** sets. Improvement on the test set proves the model learned **generalizable** HR workflow skills, not just memorization." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "b355bda6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==================================================\n", + "POST-TRAINING \u2014 TRAIN SET\n", + "==================================================\n", + " [FAIL] task_0035 [simple ] score=50% tools=['offboarding_get_status']\n", + " [OK] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0014 [simple ] score=50% tools=['it_get_available_assets']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [FAIL] task_0005 [simple ] score=0% tools=['hr_read_employee']\n", + " [X ] correct_tool: Used hr_search_employees\n", + " [X ] correct_dept: Filtered by correct department\n", + " [PASS] task_0010 [simple ] score=100% tools=['it_get_software_licenses', 'access_get_security_groups', 'access_assign_role', 'access_revoke_role', 'email_send']\n", + " [OK] correct_tool: Used access_get_security_groups\n", + " [PASS] task_0006 [simple ] score=100% tools=['hr_get_org_chart']\n", + " [OK] correct_tool: Used hr_get_org_chart\n", + " [OK] correct_dept: Passed correct department\n", + " [FAIL] task_0036 [simple ] score=50% tools=['offboarding_get_status']\n", + " [OK] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0038 [simple ] score=50% tools=['offboarding_get_status']\n", + " [OK] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [PASS] task_0007 [simple ] score=100% tools=['it_get_available_assets']\n", + " [OK] correct_tool: Used it_get_available_assets\n", + " [OK] correct_type: Filtered by laptop type\n", + " [FAIL] task_0013 [simple ] score=0% tools=['hr_get_org_chart']\n", + " [X ] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0037 [simple ] score=50% tools=['offboarding_get_status']\n", + " [OK] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0011 [simple ] score=0% tools=['hr_get_org_chart']\n", + " [X ] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0002 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0012 [simple ] score=50% tools=['hr_get_org_chart', 'onboarding_get_status', 'access_get_security_groups', 'email_send']\n", + " [OK] correct_tool: Used onboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0046 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [PASS] task_0023 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [PASS] task_0018 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0041 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [FAIL] task_0040 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [FAIL] task_0045 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [PASS] task_0016 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0073 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_software_licenses', 'access_assign_role', 'access_create_badge', 'email_send']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [PASS] task_0020 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0074 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_assets', 'it_get_available_assets', 'it_get_available_software_licenses', 'access_assign_role', 'access_create_badge', 'email_send']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [PASS] task_0017 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [PASS] task_0015 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0042 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [PASS] task_0019 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0053 [complex ] score=60% tools=['offboarding_create_request', 'offboarding_get_status', 'offboarding_create_role', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message', 'meeting_schedule', 'policy_lookup']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0051 [complex ] score=67% tools=['it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'hr_update_employee', 'hr_get_org_chart']\n", + " [X ] created_request: Created offboarding request\n", + " [OK] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [OK] farewell: Sent farewell communication\n", + " [OK] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0031 [complex ] score=80% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule', 'policy_lookup', 'approval_request', 'access_get_security_groups']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] got_approval: Submitted approval request\n", + " [OK] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [OK] sent_communications: Sent welcome communications\n", + " [OK] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0027 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] assigned_laptop: Assigned a laptop\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0032 [complex ] score=89% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'it_create_account', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] got_approval: Submitted approval request\n", + " [OK] assigned_asset: Assigned an asset\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [OK] sent_communications: Sent welcome communications\n", + " [OK] scheduled_meeting: Scheduled orientation\n", + " [FAIL] task_0072 [complex ] score=80% tools=['hr_update_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'email_send', 'it_get_software_licenses']\n", + " [X ] read_employee: Read employee record first\n", + " [OK] updated_status: Updated status to pending/active\n", + " [OK] new_onboarding: Created new onboarding request\n", + " [OK] provisioned_accounts: Created IT accounts\n", + " [OK] welcome_back: Sent welcome-back communication\n", + " [FAIL] task_0025 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0068 [complex ] score=40% tools=['offboard', 'access_revoke', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [OK] new_access: Assigned new department roles\n", + " [OK] notified_team: Notified new team\n", + " [FAIL] task_0054 [complex ] score=60% tools=['offboarding_create_request', 'offboarding_get_status', 'offboarding_create_account', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule']\n", + " [OK] created_request: Created offboarding request\n", + " [OK] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [X ] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0030 [complex ] score=80% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule', 'policy_lookup', 'approval_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] got_approval: Submitted approval request\n", + " [OK] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [OK] sent_communications: Sent welcome communications\n", + " [OK] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0034 [complex ] score=80% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule', 'policy_lookup', 'approval_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] got_approval: Submitted approval request\n", + " [OK] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [OK] sent_communications: Sent welcome communications\n", + " [OK] scheduled_meeting: Scheduled orientation\n", + " [X ] security_approval: Got security approval before badge\n", + " [FAIL] task_0048 [complex ] score=50% tools=['hr_revoke_access', 'it_revoke_asset', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'policy_lookup', 'onboarding_create_request', 'hr_update_employee']\n", + " [X ] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [OK] farewell: Sent farewell communication\n", + " [OK] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0029 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'access_create_badge', 'access_revoke_role', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0070 [complex ] score=40% tools=['offboard', 'access_revoke', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [OK] new_access: Assigned new department roles\n", + " [OK] notified_team: Notified new team\n", + " [FAIL] task_0071 [complex ] score=80% tools=['hr_update_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'email_send', 'meeting_schedule']\n", + " [X ] read_employee: Read employee record first\n", + " [OK] updated_status: Updated status to pending/active\n", + " [OK] new_onboarding: Created new onboarding request\n", + " [OK] provisioned_accounts: Created IT accounts\n", + " [OK] welcome_back: Sent welcome-back communication\n", + " [FAIL] task_0050 [complex ] score=83% tools=['it_revoke_access', 'access_revoke_role', 'reassign_asset', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'policy_lookup', 'access_get_software_licenses', 'onboarding_create_request', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request', 'access_revoke_role', 'access_get_security_groups']\n", + " [OK] created_request: Created offboarding request\n", + " [OK] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [OK] farewell: Sent farewell communication\n", + " [OK] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0077 [complex ] score=60% tools=['it_revoke_access', 'offboarding_create_request', 'email_send', 'meeting_schedule']\n", + " [X ] read_manager: Looked up manager info\n", + " [OK] offboarding: Created offboarding request\n", + " [X ] reassigned: Updated reports' manager\n", + " [OK] revoked_access: Revoked manager's access\n", + " [OK] notified_team: Notified team\n", + " [FAIL] task_0056 [edge_case ] score=50% tools=['hr_create_employee', 'hr_read_employee', 'onboarding_create_request']\n", + " [OK] attempted_create: Attempted to create employee\n", + " [X ] handled_limit: Recognized or handled headcount limit error\n", + " [FAIL] task_0059 [edge_case ] score=0% tools=['hr_search_employees']\n", + " [X ] checked_licenses: Checked licenses\n", + " [FAIL] task_0065 [edge_case ] score=50% tools=['access_assign_role']\n", + " [OK] attempted_assign: Attempted to assign role\n", + " [X ] handled_error: Recognized level requirement error\n", + " [FAIL] task_0066 [edge_case ] score=50% tools=['hr_assign_asset', 'access_assign_role', 'email_send']\n", + " [OK] attempted_assign: Attempted to assign role\n", + " [X ] handled_restriction: Recognized department restriction\n", + " [FAIL] task_0064 [edge_case ] score=25% tools=['hr_revoke_access', 'it_revoke_asset', 'onboarding_create_request', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'meeting_schedule', 'policy_lookup', 'onboarding_get_status', 'offboarding_create_request', 'it_get_software_licenses']\n", + " [X ] created_request: Created offboarding with termination reason\n", + " [X ] revoked_access: Revoked all access\n", + " [OK] no_farewell: Did NOT send farewell communications\n", + " [X ] completed_steps: Completed termination steps\n", + " [FAIL] task_0058 [edge_case ] score=0% tools=['it_get_available_assets']\n", + " [X ] checked_licenses: Checked license availability\n", + " [X ] identified_full: Recognized licenses are full\n", + " [FAIL] task_0067 [edge_case ] score=0% tools=['hr_read_policy', 'hr_read_policy', 'explain_requirements', 'explain_requirements']\n", + " [X ] looked_up_badge: Looked up badge/access policy\n", + " [X ] multiple_lookups: Looked up multiple policies\n", + " [FAIL] task_0061 [edge_case ] score=25% tools=['hr_create_employee', 'onboarding_create_request', 'access_assign_role', 'it_assign_asset', 'access_create_badge', 'access_revoke_role', 'it_get_software_licenses', 'email_send', 'meeting_schedule', 'policy_lookup', 'approval_request', 'access_get_security_groups']\n", + " [X ] created_contractor: Created employee with is_contractor=true\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] legal_approval: Got legal approval\n", + " [X ] limited_access: Created limited accounts\n", + "\n", + "Results: 10/52 passed (19.2%)\n", + "Mean score: 0.617\n", + " simple : 3/13 pass, score=0.50\n", + " medium : 7/14 pass, score=0.86\n", + " complex : 0/17 pass, score=0.68\n", + " edge_case : 0/8 pass, score=0.25\n", + "\n", + "==================================================\n", + "POST-TRAINING \u2014 TEST SET (held-out)\n", + "==================================================\n", + " [FAIL] task_0003 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0039 [simple ] score=50% tools=['offboarding_get_status']\n", + " [OK] correct_tool: Used offboarding_get_status\n", + " [X ] correct_emp: Checked correct employee\n", + " [FAIL] task_0008 [simple ] score=0% tools=['it_get_available_assets']\n", + " [X ] correct_tool: Used it_get_software_licenses\n", + " [X ] correct_software: Filtered by Jira\n", + " [PASS] task_0009 [simple ] score=100% tools=['policy_lookup']\n", + " [OK] correct_tool: Used policy_lookup\n", + " [OK] relevant_topic: Searched for onboarding topic\n", + " [FAIL] task_0001 [simple ] score=50% tools=['hr_read_employee']\n", + " [OK] correct_tool: Used hr_read_employee\n", + " [X ] correct_id: Passed correct emp_id\n", + " [FAIL] task_0004 [simple ] score=0% tools=['hr_read_employee']\n", + " [X ] correct_tool: Used hr_search_employees\n", + " [X ] correct_dept: Filtered by correct department\n", + " [PASS] task_0024 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0044 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [PASS] task_0022 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0043 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [FAIL] task_0075 [medium ] score=50% tools=['it_get_available_assets', 'it_get_available_software_licenses', 'access_assign_role', 'access_create_badge', 'email_send']\n", + " [OK] checked_assets: Checked available assets\n", + " [X ] checked_licenses: Checked software licenses\n", + " [PASS] task_0021 [medium ] score=100% tools=['hr_create_employee', 'onboarding_create_request']\n", + " [OK] created_employee: Created employee record\n", + " [OK] correct_name: Used correct name\n", + " [OK] correct_dept: Assigned to correct department\n", + " [OK] correct_level: Set correct level\n", + " [OK] correct_role: Set correct role\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] sequencing: Created employee before onboarding request\n", + " [FAIL] task_0047 [medium ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'email_send']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] correct_emp: Used correct employee ID\n", + " [OK] correct_reason: Set correct reason\n", + " [OK] revoked_access: Revoked IT access\n", + " [OK] notified: Sent notification\n", + " [FAIL] task_0055 [complex ] score=80% tools=['offboarding_create_request', 'it_revoke_access', 'it_get_software_licenses', 'access_revoke_role', 'email_send', 'meeting_schedule', 'slack_send_message', 'hr_update_employee', 'hr_get_org_chart', 'hr_search_employees', 'hr_get_org_chart', 'hr_update_employee']\n", + " [OK] created_request: Created offboarding request\n", + " [OK] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0052 [complex ] score=60% tools=['offboarding_create_request', 'offboarding_get_status', 'offboarding_create_role', 'offboarding_revoke_access', 'access_assign_role', 'access_revoke_role', 'access_get_security_groups', 'email_send', 'slack_send_message']\n", + " [OK] created_request: Created offboarding request\n", + " [X ] revoked_it: Revoked IT access\n", + " [OK] farewell_email: Sent farewell email\n", + " [OK] farewell_slack: Sent farewell Slack message\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0026 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule', 'access_revoke_role', 'it_get_software_licenses', 'access_assign_role', 'access_get_security_groups']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0033 [complex ] score=89% tools=['hr_create_employee', 'onboarding_create_request', 'it_assign_asset', 'it_get_available_assets', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule', 'policy_lookup', 'approval_request', 'access_revoke_role']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [OK] got_approval: Submitted approval request\n", + " [OK] assigned_asset: Assigned an asset\n", + " [X ] created_accounts: Created IT accounts\n", + " [OK] assigned_role: Assigned access role\n", + " [OK] created_badge: Created physical badge\n", + " [OK] sent_communications: Sent welcome communications\n", + " [OK] scheduled_meeting: Scheduled orientation\n", + " [FAIL] task_0069 [complex ] score=40% tools=['offboard', 'access_revoke', 'access_assign_role', 'email_send', 'meeting_schedule', 'policy_lookup', 'access_assign_badge']\n", + " [X ] read_employee: Read employee record\n", + " [X ] revoked_old_access: Revoked old department access\n", + " [X ] updated_dept: Updated department\n", + " [OK] new_access: Assigned new department roles\n", + " [OK] notified_team: Notified new team\n", + " [FAIL] task_0076 [complex ] score=60% tools=['it_revoke_access', 'offboarding_create_request', 'email_send', 'meeting_schedule']\n", + " [X ] read_manager: Looked up manager info\n", + " [OK] offboarding: Created offboarding request\n", + " [X ] reassigned: Updated reports' manager\n", + " [OK] revoked_access: Revoked manager's access\n", + " [OK] notified_team: Notified team\n", + " [FAIL] task_0049 [complex ] score=83% tools=['it_revoke_access', 'it_get_software_licenses', 'access_assign_role', 'access_revoke_role', 'email_send', 'meeting_schedule', 'slack_send_message', 'hr_get_org_chart', 'onboarding_get_status', 'onboarding_complete_step', 'offboarding_create_request']\n", + " [OK] created_request: Created offboarding request\n", + " [OK] revoked_it: Revoked IT access\n", + " [OK] revoked_roles: Revoked access roles\n", + " [OK] farewell: Sent farewell communication\n", + " [OK] exit_interview: Scheduled exit interview\n", + " [X ] completed_steps: Completed offboarding steps\n", + " [FAIL] task_0028 [complex ] score=70% tools=['hr_create_employee', 'onboarding_create_request', 'it_create_account', 'access_assign_role', 'access_create_badge', 'email_send', 'meeting_schedule']\n", + " [OK] created_employee: Created employee record\n", + " [OK] initiated_onboarding: Created onboarding request\n", + " [X ] assigned_laptop: Assigned a laptop\n", + " [OK] created_accounts: Created IT accounts\n", + " [OK] assigned_access: Assigned access roles\n", + " [OK] sent_welcome: Sent welcome communication\n", + " [OK] scheduled_orientation: Scheduled orientation meeting\n", + " [OK] sequencing_create_first: Created employee before other steps\n", + " [X ] sequencing_asset_check: Checked available assets before assigning\n", + " [X ] completeness: Completed at least 3 onboarding steps\n", + " [FAIL] task_0063 [edge_case ] score=67% tools=['hr_update_employee', 'it_revoke_access', 'hr_create_request']\n", + " [X ] checked_onboarding: Checked onboarding status\n", + " [OK] revoked_access: Revoked any provisioned access\n", + " [OK] updated_status: Updated employee status to offboarded\n", + " [FAIL] task_0060 [edge_case ] score=0% tools=['hr_search_employees', 'hr_update_employee', 'onboarding_create_request']\n", + " [X ] looked_up_manager: Looked up the manager or org chart\n", + " [X ] found_skip_level: Identified skip-level manager\n", + " [X ] proceeded: Proceeded with onboarding\n", + " [FAIL] task_0062 [edge_case ] score=33% tools=['it_revoke_access', 'onboarding_complete_step', 'onboarding_get_status', 'access_revoke_role', 'email_send']\n", + " [X ] checked_employee: Looked up employee record\n", + " [X ] created_request: Created offboarding request\n", + " [OK] revoked_access: Revoked access\n", + " [FAIL] task_0057 [edge_case ] score=50% tools=['hr_create_employee', 'onboarding_create_request', 'onboarding_get_status']\n", + " [OK] attempted_create: Attempted to create employee\n", + " [X ] handled_limit: Recognized or handled headcount limit error\n", + "\n", + "Results: 4/25 passed (16.0%)\n", + "Mean score: 0.617\n", + " simple : 1/6 pass, score=0.42\n", + " medium : 3/7 pass, score=0.84\n", + " complex : 0/8 pass, score=0.69\n", + " edge_case : 0/4 pass, score=0.38\n", + "\n", + "==================================================\n", + "TRAIN SET IMPROVEMENT\n", + "==================================================\n", + "Pass rate: 8/52 \u2192 10/52 (15.4% \u2192 19.2%)\n", + "Mean score: 0.370 \u2192 0.617 (+0.247)\n", + " simple : 2/13 \u2192 3/13 pass, score 0.23 \u2192 0.50\n", + " medium : 6/14 \u2192 7/14 pass, score 0.72 \u2192 0.86\n", + " complex : 0/17 \u2192 0/17 pass, score 0.26 \u2192 0.68\n", + " edge_case : 0/8 \u2192 0/8 pass, score 0.22 \u2192 0.25\n", + "\n", + "==================================================\n", + "TEST SET IMPROVEMENT (GENERALIZATION)\n", + "==================================================\n", + "Pass rate: 3/25 \u2192 4/25 (12.0% \u2192 16.0%)\n", + "Mean score: 0.370 \u2192 0.617 (+0.247)\n", + " simple : 0/6 \u2192 1/6 pass, score 0.17 \u2192 0.42\n", + " medium : 3/7 \u2192 3/7 pass, score 0.67 \u2192 0.84\n", + " complex : 0/8 \u2192 0/8 pass, score 0.30 \u2192 0.69\n", + " edge_case : 0/4 \u2192 0/4 pass, score 0.29 \u2192 0.38\n" + ] + } + ], + "source": [ + "# ============================================================\n", + "# POST-TRAINING EVALUATION\n", + "# ============================================================\n", + "\n", + "# Evaluate on TRAIN set\n", + "print(\"=\" * 50)\n", + "print(\"POST-TRAINING \u2014 TRAIN SET\")\n", + "print(\"=\" * 50)\n", + "trained_train = evaluate_model(model, tokenizer, prompts_list=train_prompts)\n", + "\n", + "# Evaluate on TEST set (held-out)\n", + "print(\"\\n\" + \"=\" * 50)\n", + "print(\"POST-TRAINING \u2014 TEST SET (held-out)\")\n", + "print(\"=\" * 50)\n", + "trained_test = evaluate_model(model, tokenizer, prompts_list=test_prompts)\n", + "\n", + "# ============================================================\n", + "# IMPROVEMENT SUMMARY\n", + "# ============================================================\n", + "def summarize(name, baseline, trained):\n", + " b_pass = sum(1 for r in baseline if r[\"passed\"])\n", + " t_pass = sum(1 for r in trained if r[\"passed\"])\n", + " b_score = sum(r[\"score\"] for r in baseline) / max(len(baseline), 1)\n", + " t_score = sum(r[\"score\"] for r in trained) / max(len(trained), 1)\n", + " print(f\"\\n{'=' * 50}\")\n", + " print(f\"{name}\")\n", + " print(f\"{'=' * 50}\")\n", + " print(f\"Pass rate: {b_pass}/{len(baseline)} \u2192 {t_pass}/{len(trained)} \"\n", + " f\"({b_pass/len(baseline):.1%} \u2192 {t_pass/len(trained):.1%})\")\n", + " print(f\"Mean score: {b_score:.3f} \u2192 {t_score:.3f} \"\n", + " f\"({'+'if t_score >= b_score else ''}{t_score - b_score:.3f})\")\n", + " for diff in [\"simple\", \"medium\", \"complex\", \"edge_case\"]:\n", + " b_sub = [r for r in baseline if r[\"difficulty\"] == diff]\n", + " t_sub = [r for r in trained if r[\"difficulty\"] == diff]\n", + " if b_sub:\n", + " bs = sum(r[\"score\"] for r in b_sub) / len(b_sub)\n", + " ts = sum(r[\"score\"] for r in t_sub) / len(t_sub)\n", + " bp = sum(1 for r in b_sub if r[\"passed\"])\n", + " tp = sum(1 for r in t_sub if r[\"passed\"])\n", + " print(f\" {diff:10s}: {bp}/{len(b_sub)} \u2192 {tp}/{len(t_sub)} pass, \"\n", + " f\"score {bs:.2f} \u2192 {ts:.2f}\")\n", + "\n", + "summarize(\"TRAIN SET IMPROVEMENT\", baseline_train, trained_train)\n", + "summarize(\"TEST SET IMPROVEMENT (GENERALIZATION)\", baseline_test, trained_test)" + ] + }, + { + "cell_type": "markdown", + "id": "188a9b69", + "metadata": {}, + "source": [ + "## Saving the Fine-tuned Model\n", + "\n", + "Save the trained model for later use or push to Hugging Face Hub:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "641436fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Found HuggingFace hub cache directory: /home/jovyan/.cache/huggingface/hub\n", + "Checking cache directory for required files...\n", + "Cache check failed: model.safetensors not found in local cache.\n", + "Not all required files found in cache. Will proceed with downloading.\n", + "Checking cache directory for required files...\n", + "Cache check failed: tokenizer.model not found in local cache.\n", + "Not all required files found in cache. Will proceed with downloading.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Unsloth: Preparing safetensor model files: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00, 3300.00it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: tokenizer.model not found (this is OK for non-SentencePiece models)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Unsloth: Merging weights into 16bit: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:03<00:00, 3.91s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unsloth: Merge process complete. Saved to `/home/jovyan/rl_hack/outputs/hr_agent_final`\n", + "Model saved to outputs/hr_agent_final\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b32d235e7ae94a8288026fcf56b31ce2", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Processing Files (0 / 0): | | 0.00B / 0.00B " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "4992990ab81a4785a78815cf86c05c1a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "New Data Upload: | | 0.00B / 0.00B " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Found HuggingFace hub cache directory: /home/jovyan/.cache/huggingface/hub\n", + "Checking cache directory for required files...\n", + "Cache check failed: model.safetensors not found in local cache.\n", + "Not all required files found in cache. Will proceed with downloading.\n", + "Checking cache directory for required files...\n", + "Cache check failed: tokenizer.model not found in local cache.\n", + "Not all required files found in cache. Will proceed with downloading.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Unsloth: Preparing safetensor model files: 0%| | 0/1 [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Saved to outputs/training_curves.png\n", + "\n", + "============================================================\n", + "FINAL RESULTS SUMMARY\n", + "============================================================\n", + "\n", + "Metric Baseline Trained Change\n", + "------------------------------------------------------------------\n", + "Train pass rate 15.4% 19.2% +3.8%\n", + "Train mean score 0.370 0.617 +0.247\n", + "Test pass rate (gen.) 12.0% 16.0% +4.0%\n", + "Test mean score (gen.) 0.370 0.617 +0.247\n", + "\n", + "Results saved to outputs/eval_results.json\n" + ] + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Extract training logs from trainer\n", + "logs = trainer.state.log_history\n", + "\n", + "steps = [l[\"step\"] for l in logs if \"reward\" in l]\n", + "rewards = [l[\"reward\"] for l in logs if \"reward\" in l]\n", + "losses = [l[\"loss\"] for l in logs if \"loss\" in l]\n", + "loss_steps = [l[\"step\"] for l in logs if \"loss\" in l]\n", + "kl = [l[\"kl\"] for l in logs if \"kl\" in l]\n", + "kl_steps = [l[\"step\"] for l in logs if \"kl\" in l]\n", + "\n", + "# Compute moving average for reward\n", + "window = 10\n", + "reward_ma = [sum(rewards[max(0,i-window):i+1]) / len(rewards[max(0,i-window):i+1]) for i in range(len(rewards))]\n", + "\n", + "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n", + "\n", + "# Reward curve\n", + "axes[0].plot(steps, rewards, alpha=0.3, color=\"blue\", label=\"Per-step\")\n", + "axes[0].plot(steps, reward_ma, color=\"blue\", linewidth=2, label=f\"Moving avg ({window})\")\n", + "axes[0].set_xlabel(\"Training Step\")\n", + "axes[0].set_ylabel(\"Total Reward\")\n", + "axes[0].set_title(\"Reward Over Training\")\n", + "axes[0].legend()\n", + "axes[0].grid(True, alpha=0.3)\n", + "\n", + "# Loss curve\n", + "axes[1].plot(loss_steps, losses, color=\"red\", alpha=0.7)\n", + "axes[1].set_xlabel(\"Training Step\")\n", + "axes[1].set_ylabel(\"Training Loss\")\n", + "axes[1].set_title(\"Training Loss\")\n", + "axes[1].grid(True, alpha=0.3)\n", + "\n", + "# KL divergence\n", + "axes[2].plot(kl_steps, kl, color=\"green\", alpha=0.7)\n", + "axes[2].set_xlabel(\"Training Step\")\n", + "axes[2].set_ylabel(\"KL Divergence\")\n", + "axes[2].set_title(\"KL Divergence from Base Policy\")\n", + "axes[2].grid(True, alpha=0.3)\n", + "\n", + "plt.tight_layout()\n", + "plt.savefig(\"outputs/training_curves.png\", dpi=150, bbox_inches=\"tight\")\n", + "plt.show()\n", + "print(\"Saved to outputs/training_curves.png\")\n", + "\n", + "# ============================================================\n", + "# FINAL SUMMARY TABLE\n", + "# ============================================================\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"FINAL RESULTS SUMMARY\")\n", + "print(\"=\" * 60)\n", + "\n", + "def score_of(results):\n", + " return sum(r[\"score\"] for r in results) / max(len(results), 1)\n", + "\n", + "def pass_rate(results):\n", + " return sum(1 for r in results if r[\"passed\"]) / max(len(results), 1)\n", + "\n", + "print(f\"\\n{'Metric':<30s} {'Baseline':>12s} {'Trained':>12s} {'Change':>12s}\")\n", + "print(\"-\" * 66)\n", + "print(f\"{'Train pass rate':<30s} {pass_rate(baseline_train):>11.1%} {pass_rate(trained_train):>11.1%} \"\n", + " f\"{pass_rate(trained_train) - pass_rate(baseline_train):>+11.1%}\")\n", + "print(f\"{'Train mean score':<30s} {score_of(baseline_train):>12.3f} {score_of(trained_train):>12.3f} \"\n", + " f\"{score_of(trained_train) - score_of(baseline_train):>+12.3f}\")\n", + "print(f\"{'Test pass rate (gen.)':<30s} {pass_rate(baseline_test):>11.1%} {pass_rate(trained_test):>11.1%} \"\n", + " f\"{pass_rate(trained_test) - pass_rate(baseline_test):>+11.1%}\")\n", + "print(f\"{'Test mean score (gen.)':<30s} {score_of(baseline_test):>12.3f} {score_of(trained_test):>12.3f} \"\n", + " f\"{score_of(trained_test) - score_of(baseline_test):>+12.3f}\")\n", + "\n", + "# Save all results\n", + "import os\n", + "os.makedirs(\"outputs\", exist_ok=True)\n", + "all_results = {\n", + " \"baseline_train\": baseline_train,\n", + " \"baseline_test\": baseline_test,\n", + " \"trained_train\": trained_train,\n", + " \"trained_test\": trained_test,\n", + "}\n", + "with open(\"outputs/eval_results.json\", \"w\") as f:\n", + " json.dump(all_results, f, indent=2)\n", + "print(f\"\\nResults saved to outputs/eval_results.json\")" + ] + }, + { + "cell_type": "markdown", + "id": "1b987501", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "In this tutorial, we trained an LLM to automate HR workflows using reinforcement learning. Key concepts:\n", + "\n", + "1. **OpenEnv** for standardized access to enterprise RL environments\n", + "2. **Rubric-based rewards** that verify tool usage, parameter correctness, and sequencing\n", + "3. **Multi-objective rewards** (valid JSON + rubric score + efficiency)\n", + "4. **GRPO** for policy optimization without a value network\n", + "5. **LoRA** for memory-efficient fine-tuning on consumer GPUs\n", + "6. **Proper train/test split** \u2014 70/30 stratified split to measure generalization\n", + "\n", + "### Key Results\n", + "\n", + "| Metric | Base Model | Trained | Change |\n", + "|--------|-----------|---------|--------|\n", + "| Train pass rate | 15.4% | 19.2% | +3.8% |\n", + "| Train mean score | 0.370 | 0.617 | +0.247 (+67%) |\n", + "| **Test pass rate** | **12.0%** | **16.0%** | **+4.0%** |\n", + "| **Test mean score** | **0.370** | **0.617** | **+0.247 (+67%)** |\n", + "\n", + "### Improvement by Difficulty\n", + "\n", + "| Difficulty | Baseline Score | Trained Score | Change |\n", + "|------------|---------------|---------------|--------|\n", + "| Simple | 0.23 | 0.50 | +0.27 |\n", + "| Medium | 0.72 | 0.86 | +0.14 |\n", + "| Complex | 0.26 | 0.68 | **+0.42** |\n", + "| Edge case | 0.22 | 0.25 | +0.03 |\n", + "\n", + "The biggest improvement is on **complex tasks** (scores more than doubled), and the improvement **generalizes to held-out test tasks** \u2014 proving the model learned transferable HR workflow skills, not just memorization.\n", + "\n", + "### Resources\n", + "\n", + "- [HR Environment on HF Spaces](https://huggingface.co/spaces/devxpy/rl_hack)\n", + "- [OpenEnv Documentation](https://github.com/meta-pytorch/OpenEnv)\n", + "- [TRL GRPO Trainer](https://huggingface.co/docs/trl/main/en/grpo_trainer)\n", + "- [Unsloth RL Guide](https://docs.unsloth.ai/get-started/reinforcement-learning-rl-guide)\n", + "\n", + "---\n", + "\n", + "*This notebook uses [Unsloth](https://github.com/unslothai/unsloth) for memory-efficient training.*\n", + "\n", + "**License:** Apache 2.0" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file