rl_hack / README.md
devxpy's picture
Upload folder using huggingface_hub
126c21b verified
metadata
title: HR Onboarding & Offboarding Environment
emoji: 🏒
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /playground
tags:
  - openenv

HR Onboarding & Offboarding Environment

Open In Colab

An OpenEnv-compatible RL environment that simulates enterprise HR onboarding and offboarding workflows. The agent orchestrates across 6 enterprise apps β€” Workday, ServiceNow, Okta, Email, Slack, and Calendar β€” using 25 tools to complete multi-step tasks in a realistic HR system (200+ employees, 8 departments, RBAC, approval chains).

Built for the OpenEnv Hackathon SF β€” Statement 3.1: Professional Tasks (Scaler AI Labs partner theme: Multi-App RL Environment for Enterprise Workflows).

Key Results

GRPO training on Llama 3.2-1B-Instruct improves mean task score by +67% (0.37 β†’ 0.62). Complex multi-step task scores more than double (0.26 β†’ 0.68). Gains generalize to held-out test tasks.

Baseline Trained Improvement
Mean Score 0.370 0.617 +67%
Complex Tasks 0.26 0.68 +162%
Pass Rate 15.4% 19.2% +3.8pp

Quick Start

from rl_hack import HROnboardingAction, HROnboardingEnv

# Connect to the environment
with HROnboardingEnv(base_url="http://localhost:7860") as env:
    result = env.reset()
    print(result.observation)  # Task instruction + available tools

    # Agent calls tools to complete the task
    result = env.step(HROnboardingAction(
        tool_name="hr_create_employee",
        arguments={"name": "Priya Sharma", "department": "Engineering", "level": "L2", "role": "Software Engineer"}
    ))
    print(result.observation)  # Tool result
    print(result.reward)       # Rubric-based reward

Tools / Actions (25 MCP Tools)

The agent interacts with the environment by calling these tools. Each tool modifies the world state and returns a result.

HR System (5 tools)

# Tool Description Key Parameters
1 hr_create_employee Create a new employee record name, department, level, role, manager_id, is_contractor
2 hr_read_employee Look up employee by ID or email emp_id or email
3 hr_update_employee Update employee fields (status, department, etc.) emp_id, updates (dict)
4 hr_search_employees Search/filter employees by criteria department, level, status, location, role
5 hr_get_org_chart Get reporting hierarchy for a department department

Onboarding / Offboarding (6 tools)

# Tool Description Key Parameters
6 onboarding_create_request Initiate onboarding for a new hire employee_id
7 onboarding_get_status Check onboarding progress request_id or employee_id
8 onboarding_complete_step Mark an onboarding step as done request_id, step
9 offboarding_create_request Initiate offboarding for departing employee employee_id, reason, exit_date
10 offboarding_get_status Check offboarding progress request_id or employee_id
11 offboarding_complete_step Mark an offboarding step as done request_id, step

IT Provisioning (5 tools)

# Tool Description Key Parameters
12 it_assign_asset Assign laptop/monitor/phone to employee asset_id, employee_id
13 it_get_available_assets List unassigned assets by type asset_type (laptop, monitor, phone, headset)
14 it_create_account Create email/Slack/VPN/GitHub accounts employee_id, account_types
15 it_revoke_access Revoke all IT access (for offboarding) employee_id
16 it_get_software_licenses Check license seat availability software_name

Access Control (4 tools)

# Tool Description Key Parameters
17 access_assign_role Assign RBAC role (checks level/dept restrictions) employee_id, role_id
18 access_create_badge Create physical access badge employee_id, access_zones
19 access_revoke_role Revoke a specific access role employee_id, role_id
20 access_get_security_groups List all security groups and resources (none)

Communication (3 tools)

# Tool Description Key Parameters
21 email_send Send email (welcome, farewell, notifications) from_address, to_address, subject, body
22 slack_send_message Post in Slack channel or DM channel, sender, text
23 meeting_schedule Schedule orientation, 1-on-1, exit interview title, attendees, datetime, meeting_type

Policy & Approval (2 tools)

# Tool Description Key Parameters
24 policy_lookup Look up company policies by topic/department topic, department, policy_id
25 approval_request Submit approval (manager/IT/security/legal) request_id, approver_id, approval_type

Tasks (77 tasks across 4 categories)

Each episode presents one task. The agent must call the right tools in the right order.

Task Categories

Category Count Example
Lookup (simple) 11 "List all employees in the Engineering department"
Onboarding 32 "Fully onboard John Lee as L3 Team Lead in Data Science β€” create record, assign laptop, provision accounts, set up access, send welcome email, schedule orientation"
Offboarding 24 "Offboard departing director β€” revoke all access, reclaim assets, reassign reports, send farewell, schedule exit interview"
Cross-workflow 10 "Employee transferring from Engineering to Product β€” offboard from old dept, onboard to new"

Difficulty Levels

Difficulty Count Tools per task Description
Simple 19 1-2 Single lookups or status checks
Medium 21 2-4 Create + initiate workflows
Complex 25 5-10 Full end-to-end workflows with approvals
Edge case 12 2-5 Business rule violations, policy constraints

Edge Cases (designed to test policy compliance)

  • Department at headcount limit β€” create employee should fail
  • Software license seats full (Netsuite, LinkedIn Sales Navigator)
  • Manager on leave β€” must find skip-level manager for approvals
  • Contractor onboarding β€” different rules (no VPN, limited access, legal approval required)
  • Termination vs resignation β€” different offboarding steps, no farewell email
  • Offer rescinded β€” offboard someone mid-onboarding
  • Level mismatch β€” L1 employee can't get L4+ access roles
  • Department restriction β€” Marketing employee can't get Engineering GitHub role

World State (500+ entities)

Entity Count Description
Employees 200 Full org hierarchy across 8 departments (L1-L6)
Departments 8 Engineering, Product, Marketing, Sales, Finance, HR, Data Science, Security
IT Assets 100 Laptops (50), monitors (25), phones (15), headsets (10)
Access Roles 20 RBAC roles with level/department restrictions
Software Licenses 15 Jira, GitHub, AWS, Slack, Salesforce, etc. (2 intentionally full)
Policies 15 Onboarding, offboarding, badge access, contractor, termination, etc.
Security Groups 15 engineering_team, vpn_users, server_room_access, etc.
Message Templates 12 Welcome/farewell emails, Slack messages, notifications

RBAC Rules

  • L1 Associate β†’ L2 Senior β†’ L3 Team Lead β†’ L4 Manager β†’ L5 Director β†’ L6 VP
  • L3+ can approve onboarding
  • L4+ required for security approvals and server room badge access
  • Contractors require legal approval
  • Access roles have minimum level requirements and department restrictions

Reward / Rubric

Each task has a rubric with verifiable criteria. Reward = proportion of criteria satisfied.

Rubric Check Types

Check Example What it verifies
tool_used tool_used:hr_create_employee Tool was called at least once
tool_not_used tool_not_used:slack_send_message Tool was NOT called (e.g. no farewell for terminations)
tool_used_any tool_used_any:email_send,slack_send_message At least one of the tools was used
param_value param_value:hr_create_employee.name=Priya Sharma Tool called with specific parameter value
param_contains param_contains:policy_lookup.topic=onboard Parameter contains substring
tool_order tool_order:hr_create_employee<onboarding_create_request Tool A called before Tool B
tool_count tool_count:onboarding_complete_step>=3 Tool called at least N times
result_contains result_contains:headcount_limit Any tool result contains substring

Example Rubric (medium task)

Task: "Onboard Priya Sharma to Engineering as L2 Software Engineer"

Criterion Check
Created employee record tool_used:hr_create_employee
Correct name param_value:hr_create_employee.name=Priya Sharma
Correct department param_value:hr_create_employee.department=Engineering
Correct level param_value:hr_create_employee.level=L2
Correct role param_value:hr_create_employee.role=Software Engineer
Initiated onboarding tool_used:onboarding_create_request
Correct sequencing tool_order:hr_create_employee<onboarding_create_request

Score: 7/7 = 1.0 (pass) or partial (e.g. 5/7 = 0.71)

Environment API

OpenEnv Interface (MCPEnvironment)

reset()  β†’ Observation   # Pick task, reset world state, return instruction
step()   β†’ Observation   # Agent calls a tool, get result + reward
state    β†’ State         # Current step count, episode ID

Episode Flow

1. env.reset()
   β†’ Task: "Fully onboard John Lee as L3 Team Lead..."

2. Agent calls: hr_create_employee(name="John Lee", department="Data Science", level="L3", ...)
   β†’ env.step() β†’ {"success": true, "emp_id": "emp_0201"}

3. Agent calls: onboarding_create_request(employee_id="emp_0201")
   β†’ env.step() β†’ {"success": true, "request_id": "onb_0001", "steps": {...}}

4. Agent calls: it_get_available_assets(asset_type="laptop")
   β†’ env.step() β†’ {"success": true, "assets": [...]}

5. Agent calls: it_assign_asset(asset_id="asset_003", employee_id="emp_0201")
   β†’ env.step() β†’ {"success": true}

   ... more tool calls ...

N. Episode ends (max 15 steps or agent signals done)
   β†’ Reward: 8/10 criteria satisfied = 0.8

Project Structure

rl_hack/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ openenv.yaml                       # OpenEnv manifest
β”œβ”€β”€ pyproject.toml                     # Project metadata
β”œβ”€β”€ __init__.py                        # Module exports
β”œβ”€β”€ client.py                          # HROnboardingEnv client
β”œβ”€β”€ models.py                          # Action/Observation Pydantic models
β”œβ”€β”€ test_with_llm.py                   # Test single task with GPT agent
β”œβ”€β”€ test_all_tasks.py                  # Evaluate all 77 tasks
β”œβ”€β”€ train_hr_agent.ipynb               # GRPO training notebook (Unsloth)
β”œβ”€β”€ .env                               # API keys (gitignored)
β”œβ”€β”€ outputs/                           # Evaluation results
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ app.py                         # FastAPI application
    β”œβ”€β”€ hr_onboarding_environment.py   # Core environment (Environment subclass)
    β”œβ”€β”€ world.py                       # World state (entities, RBAC, mutations)
    β”œβ”€β”€ tools.py                       # Tool registry (25 tools)
    β”œβ”€β”€ tasks.py                       # Task definitions + generation (77 tasks)
    β”œβ”€β”€ rubrics.py                     # Rubric evaluator (reward computation)
    β”œβ”€β”€ data/
    β”‚   β”œβ”€β”€ employees.json             # 200 employee records
    β”‚   β”œβ”€β”€ departments.json           # 8 departments with policies
    β”‚   β”œβ”€β”€ policies.json              # 15 business rule documents
    β”‚   β”œβ”€β”€ it_assets.json             # 100 IT assets
    β”‚   β”œβ”€β”€ access_roles.json          # 20 RBAC roles
    β”‚   └── templates.json             # 12 message templates
    β”œβ”€β”€ Dockerfile                     # Container image
    └── requirements.txt               # Server dependencies

Testing with an LLM Agent

You can test the environment locally using GPT (or any OpenAI-compatible model) as the agent.

Setup

  1. Create a .env file in the repo root:

    OPENAI_API_KEY="sk-proj-..."
    
  2. Install dependencies:

    uv pip install -e ".[eval]"
    

Run

cd rl_hack

# Test on default task (simple lookup)
uv run python -m test_with_llm

# Test a specific task by index (0-76)
uv run python -m test_with_llm 14    # medium onboarding task
uv run python -m test_with_llm 24    # complex full onboarding
uv run python -m test_with_llm 55    # edge case (headcount limit)

# Run full evaluation across all 77 tasks
uv run python test_all_tasks.py

The script will:

  • Reset the environment and pick a task
  • Use GPT-4o-mini to generate tool calls
  • Execute each tool call against the environment
  • Print the rubric evaluation with pass/fail per criterion

Example Output

Task ID: task_0015
Difficulty: medium
Instruction: Onboard new hire Priya Sharma to Engineering as L2 Software Engineer...

--- Step 1/15 ---
LLM: {"tool": "hr_create_employee", "params": {"name": "Priya Sharma", ...}}
  Tool: hr_create_employee
  Result: {"success": true, "employee": {"emp_id": "emp_0201", ...}}

--- Step 2/15 ---
LLM: {"tool": "onboarding_create_request", "params": {"employee_id": "emp_0201"}}
  Tool: onboarding_create_request
  Result: {"success": true, ...}

FINAL EVALUATION
Score: 100% (7/7 criteria)
Passed: True
  [PASS] created_employee
  [PASS] correct_name
  [PASS] correct_dept
  [PASS] initiated_onboarding
  [PASS] sequencing

Task Index Reference

Index Difficulty Category Description
0-13 Simple Lookup/Onboarding Single lookups, status checks
14-23 Medium Onboarding Create employee + initiate workflow
24-34 Complex Onboarding Full end-to-end with IT, access, comms
35-46 Medium Offboarding Initiate offboarding + revoke access
47-54 Complex Offboarding Full offboarding with asset reclaim
55-66 Edge case Various Headcount limits, license caps, RBAC
67-76 Complex Cross-workflow Transfers, rehires, manager departures

Installation

# Clone the repo
git clone https://github.com/ravi03071991/rl_hack.git
cd rl_hack

# Install core dependencies
uv pip install -e .

# Install with evaluation support (adds openai)
uv pip install -e ".[eval]"

# Install with training support (adds unsloth, trl, torch, etc.)
uv pip install -e ".[train]"

# Install everything
uv pip install -e ".[eval,train,dev]"

Building & Running

# Run locally (as OpenEnv HTTP server with playground UI)
uvicorn server.app:app --reload --host 0.0.0.0 --port 7860

# Build Docker image
docker build -t hr-onboarding-env:latest -f server/Dockerfile .

# Deploy to HF Spaces
openenv push

Training & Results

We use Unsloth + GRPO to train an LLM agent on this environment. See train_hr_agent.ipynb for the full training notebook and W&B run for live training metrics.

Setup

  • Model: Llama 3.2-1B-Instruct (4-bit quantized, LoRA rank 8)
  • Algorithm: GRPO (Group Relative Policy Optimization)
  • Reward functions: Valid JSON + rubric score + efficiency
  • Training: 300 steps, 6 generations per prompt, lr=5e-5 with cosine schedule
  • Data split: 70/30 stratified train/test (52 train, 25 test tasks)

Results

GRPO training significantly improves the model's ability to complete HR workflows:

Metric Base Model Trained Change
Train pass rate 15.4% 19.2% +3.8%
Train mean score 0.370 0.617 +0.247 (+67%)
Test pass rate 12.0% 16.0% +4.0%
Test mean score 0.370 0.617 +0.247 (+67%)

Improvement by difficulty

Difficulty Baseline Trained Change
Simple 0.23 0.50 +0.27
Medium 0.72 0.86 +0.14
Complex 0.26 0.68 +0.42
Edge case 0.22 0.25 +0.03

The biggest gains are on complex multi-step tasks β€” scores more than doubled. The improvement generalizes to held-out test tasks, proving the model learned transferable HR workflow skills.

Reward Curve

Reward Curve

The moving average reward trends upward from ~2-3 early in training to ~4-5 by the end, showing consistent learning.

Quick start (Colab)

  1. Click the Colab badge at the top to open train_hr_agent.ipynb in Google Colab
  2. Select a GPU runtime
  3. Run all cells β€” installs dependencies, trains, and evaluates automatically

Live Demo

Try the environment on Hugging Face Spaces: https://huggingface.co/spaces/devxpy/rl_hack