| | --- |
| | title: HR Onboarding & Offboarding Environment |
| | emoji: π’ |
| | colorFrom: green |
| | colorTo: blue |
| | sdk: docker |
| | pinned: false |
| | app_port: 7860 |
| | base_path: /playground |
| | tags: |
| | - openenv |
| | --- |
| | |
| | # HR Onboarding & Offboarding Environment |
| |
|
| | [](https://colab.research.google.com/github/ravi03071991/rl_hack/blob/master/train_hr_agent.ipynb) |
| |
|
| | An OpenEnv-compatible RL environment that simulates enterprise HR onboarding and offboarding workflows. The agent orchestrates across **6 enterprise apps** β Workday, ServiceNow, Okta, Email, Slack, and Calendar β using 25 tools to complete multi-step tasks in a realistic HR system (200+ employees, 8 departments, RBAC, approval chains). |
| |
|
| | Built for the [OpenEnv Hackathon SF](https://cerebralvalley.ai/e/openenv-hackathon-sf/details) β **Statement 3.1: Professional Tasks** (Scaler AI Labs partner theme: Multi-App RL Environment for Enterprise Workflows). |
| |
|
| | ### Key Results |
| |
|
| | > **GRPO training on Llama 3.2-1B-Instruct improves mean task score by +67% (0.37 β 0.62).** |
| | > Complex multi-step task scores **more than double** (0.26 β 0.68). Gains generalize to held-out test tasks. |
| |
|
| | | | Baseline | Trained | Improvement | |
| | |---|---------|---------|-------------| |
| | | Mean Score | 0.370 | 0.617 | **+67%** | |
| | | Complex Tasks | 0.26 | 0.68 | **+162%** | |
| | | Pass Rate | 15.4% | 19.2% | +3.8pp | |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | from rl_hack import HROnboardingAction, HROnboardingEnv |
| | |
| | # Connect to the environment |
| | with HROnboardingEnv(base_url="http://localhost:7860") as env: |
| | result = env.reset() |
| | print(result.observation) # Task instruction + available tools |
| | |
| | # Agent calls tools to complete the task |
| | result = env.step(HROnboardingAction( |
| | tool_name="hr_create_employee", |
| | arguments={"name": "Priya Sharma", "department": "Engineering", "level": "L2", "role": "Software Engineer"} |
| | )) |
| | print(result.observation) # Tool result |
| | print(result.reward) # Rubric-based reward |
| | ``` |
| |
|
| | ## Tools / Actions (25 MCP Tools) |
| |
|
| | The agent interacts with the environment by calling these tools. Each tool modifies the world state and returns a result. |
| |
|
| | ### HR System (5 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 1 | `hr_create_employee` | Create a new employee record | `name`, `department`, `level`, `role`, `manager_id`, `is_contractor` | |
| | | 2 | `hr_read_employee` | Look up employee by ID or email | `emp_id` or `email` | |
| | | 3 | `hr_update_employee` | Update employee fields (status, department, etc.) | `emp_id`, `updates` (dict) | |
| | | 4 | `hr_search_employees` | Search/filter employees by criteria | `department`, `level`, `status`, `location`, `role` | |
| | | 5 | `hr_get_org_chart` | Get reporting hierarchy for a department | `department` | |
| |
|
| | ### Onboarding / Offboarding (6 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 6 | `onboarding_create_request` | Initiate onboarding for a new hire | `employee_id` | |
| | | 7 | `onboarding_get_status` | Check onboarding progress | `request_id` or `employee_id` | |
| | | 8 | `onboarding_complete_step` | Mark an onboarding step as done | `request_id`, `step` | |
| | | 9 | `offboarding_create_request` | Initiate offboarding for departing employee | `employee_id`, `reason`, `exit_date` | |
| | | 10 | `offboarding_get_status` | Check offboarding progress | `request_id` or `employee_id` | |
| | | 11 | `offboarding_complete_step` | Mark an offboarding step as done | `request_id`, `step` | |
| |
|
| | ### IT Provisioning (5 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 12 | `it_assign_asset` | Assign laptop/monitor/phone to employee | `asset_id`, `employee_id` | |
| | | 13 | `it_get_available_assets` | List unassigned assets by type | `asset_type` (laptop, monitor, phone, headset) | |
| | | 14 | `it_create_account` | Create email/Slack/VPN/GitHub accounts | `employee_id`, `account_types` | |
| | | 15 | `it_revoke_access` | Revoke all IT access (for offboarding) | `employee_id` | |
| | | 16 | `it_get_software_licenses` | Check license seat availability | `software_name` | |
| |
|
| | ### Access Control (4 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 17 | `access_assign_role` | Assign RBAC role (checks level/dept restrictions) | `employee_id`, `role_id` | |
| | | 18 | `access_create_badge` | Create physical access badge | `employee_id`, `access_zones` | |
| | | 19 | `access_revoke_role` | Revoke a specific access role | `employee_id`, `role_id` | |
| | | 20 | `access_get_security_groups` | List all security groups and resources | _(none)_ | |
| |
|
| | ### Communication (3 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 21 | `email_send` | Send email (welcome, farewell, notifications) | `from_address`, `to_address`, `subject`, `body` | |
| | | 22 | `slack_send_message` | Post in Slack channel or DM | `channel`, `sender`, `text` | |
| | | 23 | `meeting_schedule` | Schedule orientation, 1-on-1, exit interview | `title`, `attendees`, `datetime`, `meeting_type` | |
| |
|
| | ### Policy & Approval (2 tools) |
| |
|
| | | # | Tool | Description | Key Parameters | |
| | |---|------|-------------|----------------| |
| | | 24 | `policy_lookup` | Look up company policies by topic/department | `topic`, `department`, `policy_id` | |
| | | 25 | `approval_request` | Submit approval (manager/IT/security/legal) | `request_id`, `approver_id`, `approval_type` | |
| |
|
| | ## Tasks (77 tasks across 4 categories) |
| |
|
| | Each episode presents one task. The agent must call the right tools in the right order. |
| |
|
| | ### Task Categories |
| |
|
| | | Category | Count | Example | |
| | |----------|-------|---------| |
| | | **Lookup** (simple) | 11 | "List all employees in the Engineering department" | |
| | | **Onboarding** | 32 | "Fully onboard John Lee as L3 Team Lead in Data Science β create record, assign laptop, provision accounts, set up access, send welcome email, schedule orientation" | |
| | | **Offboarding** | 24 | "Offboard departing director β revoke all access, reclaim assets, reassign reports, send farewell, schedule exit interview" | |
| | | **Cross-workflow** | 10 | "Employee transferring from Engineering to Product β offboard from old dept, onboard to new" | |
| |
|
| | ### Difficulty Levels |
| |
|
| | | Difficulty | Count | Tools per task | Description | |
| | |------------|-------|---------------|-------------| |
| | | Simple | 19 | 1-2 | Single lookups or status checks | |
| | | Medium | 21 | 2-4 | Create + initiate workflows | |
| | | Complex | 25 | 5-10 | Full end-to-end workflows with approvals | |
| | | Edge case | 12 | 2-5 | Business rule violations, policy constraints | |
| |
|
| | ### Edge Cases (designed to test policy compliance) |
| |
|
| | - Department at **headcount limit** β create employee should fail |
| | - Software license **seats full** (Netsuite, LinkedIn Sales Navigator) |
| | - Manager **on leave** β must find skip-level manager for approvals |
| | - **Contractor** onboarding β different rules (no VPN, limited access, legal approval required) |
| | - **Termination** vs resignation β different offboarding steps, no farewell email |
| | - **Offer rescinded** β offboard someone mid-onboarding |
| | - **Level mismatch** β L1 employee can't get L4+ access roles |
| | - **Department restriction** β Marketing employee can't get Engineering GitHub role |
| |
|
| | ## World State (500+ entities) |
| |
|
| | | Entity | Count | Description | |
| | |--------|-------|-------------| |
| | | Employees | 200 | Full org hierarchy across 8 departments (L1-L6) | |
| | | Departments | 8 | Engineering, Product, Marketing, Sales, Finance, HR, Data Science, Security | |
| | | IT Assets | 100 | Laptops (50), monitors (25), phones (15), headsets (10) | |
| | | Access Roles | 20 | RBAC roles with level/department restrictions | |
| | | Software Licenses | 15 | Jira, GitHub, AWS, Slack, Salesforce, etc. (2 intentionally full) | |
| | | Policies | 15 | Onboarding, offboarding, badge access, contractor, termination, etc. | |
| | | Security Groups | 15 | engineering_team, vpn_users, server_room_access, etc. | |
| | | Message Templates | 12 | Welcome/farewell emails, Slack messages, notifications | |
| |
|
| | ### RBAC Rules |
| |
|
| | - **L1** Associate β **L2** Senior β **L3** Team Lead β **L4** Manager β **L5** Director β **L6** VP |
| | - L3+ can approve onboarding |
| | - L4+ required for security approvals and server room badge access |
| | - Contractors require legal approval |
| | - Access roles have minimum level requirements and department restrictions |
| |
|
| | ## Reward / Rubric |
| |
|
| | Each task has a rubric with verifiable criteria. Reward = proportion of criteria satisfied. |
| |
|
| | ### Rubric Check Types |
| |
|
| | | Check | Example | What it verifies | |
| | |-------|---------|-----------------| |
| | | `tool_used` | `tool_used:hr_create_employee` | Tool was called at least once | |
| | | `tool_not_used` | `tool_not_used:slack_send_message` | Tool was NOT called (e.g. no farewell for terminations) | |
| | | `tool_used_any` | `tool_used_any:email_send,slack_send_message` | At least one of the tools was used | |
| | | `param_value` | `param_value:hr_create_employee.name=Priya Sharma` | Tool called with specific parameter value | |
| | | `param_contains` | `param_contains:policy_lookup.topic=onboard` | Parameter contains substring | |
| | | `tool_order` | `tool_order:hr_create_employee<onboarding_create_request` | Tool A called before Tool B | |
| | | `tool_count` | `tool_count:onboarding_complete_step>=3` | Tool called at least N times | |
| | | `result_contains` | `result_contains:headcount_limit` | Any tool result contains substring | |
| |
|
| | ### Example Rubric (medium task) |
| |
|
| | Task: "Onboard Priya Sharma to Engineering as L2 Software Engineer" |
| |
|
| | | Criterion | Check | |
| | |-----------|-------| |
| | | Created employee record | `tool_used:hr_create_employee` | |
| | | Correct name | `param_value:hr_create_employee.name=Priya Sharma` | |
| | | Correct department | `param_value:hr_create_employee.department=Engineering` | |
| | | Correct level | `param_value:hr_create_employee.level=L2` | |
| | | Correct role | `param_value:hr_create_employee.role=Software Engineer` | |
| | | Initiated onboarding | `tool_used:onboarding_create_request` | |
| | | Correct sequencing | `tool_order:hr_create_employee<onboarding_create_request` | |
| |
|
| | **Score**: 7/7 = 1.0 (pass) or partial (e.g. 5/7 = 0.71) |
| |
|
| | ## Environment API |
| |
|
| | ### OpenEnv Interface (MCPEnvironment) |
| |
|
| | ``` |
| | reset() β Observation # Pick task, reset world state, return instruction |
| | step() β Observation # Agent calls a tool, get result + reward |
| | state β State # Current step count, episode ID |
| | ``` |
| |
|
| | ### Episode Flow |
| |
|
| | ``` |
| | 1. env.reset() |
| | β Task: "Fully onboard John Lee as L3 Team Lead..." |
| | |
| | 2. Agent calls: hr_create_employee(name="John Lee", department="Data Science", level="L3", ...) |
| | β env.step() β {"success": true, "emp_id": "emp_0201"} |
| | |
| | 3. Agent calls: onboarding_create_request(employee_id="emp_0201") |
| | β env.step() β {"success": true, "request_id": "onb_0001", "steps": {...}} |
| | |
| | 4. Agent calls: it_get_available_assets(asset_type="laptop") |
| | β env.step() β {"success": true, "assets": [...]} |
| | |
| | 5. Agent calls: it_assign_asset(asset_id="asset_003", employee_id="emp_0201") |
| | β env.step() β {"success": true} |
| | |
| | ... more tool calls ... |
| | |
| | N. Episode ends (max 15 steps or agent signals done) |
| | β Reward: 8/10 criteria satisfied = 0.8 |
| | ``` |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | rl_hack/ |
| | βββ README.md # This file |
| | βββ openenv.yaml # OpenEnv manifest |
| | βββ pyproject.toml # Project metadata |
| | βββ __init__.py # Module exports |
| | βββ client.py # HROnboardingEnv client |
| | βββ models.py # Action/Observation Pydantic models |
| | βββ test_with_llm.py # Test single task with GPT agent |
| | βββ test_all_tasks.py # Evaluate all 77 tasks |
| | βββ train_hr_agent.ipynb # GRPO training notebook (Unsloth) |
| | βββ .env # API keys (gitignored) |
| | βββ outputs/ # Evaluation results |
| | βββ server/ |
| | βββ __init__.py |
| | βββ app.py # FastAPI application |
| | βββ hr_onboarding_environment.py # Core environment (Environment subclass) |
| | βββ world.py # World state (entities, RBAC, mutations) |
| | βββ tools.py # Tool registry (25 tools) |
| | βββ tasks.py # Task definitions + generation (77 tasks) |
| | βββ rubrics.py # Rubric evaluator (reward computation) |
| | βββ data/ |
| | β βββ employees.json # 200 employee records |
| | β βββ departments.json # 8 departments with policies |
| | β βββ policies.json # 15 business rule documents |
| | β βββ it_assets.json # 100 IT assets |
| | β βββ access_roles.json # 20 RBAC roles |
| | β βββ templates.json # 12 message templates |
| | βββ Dockerfile # Container image |
| | βββ requirements.txt # Server dependencies |
| | ``` |
| |
|
| | ## Testing with an LLM Agent |
| |
|
| | You can test the environment locally using GPT (or any OpenAI-compatible model) as the agent. |
| |
|
| | ### Setup |
| |
|
| | 1. Create a `.env` file in the repo root: |
| | ``` |
| | OPENAI_API_KEY="sk-proj-..." |
| | ``` |
| |
|
| | 2. Install dependencies: |
| | ```bash |
| | uv pip install -e ".[eval]" |
| | ``` |
| |
|
| | ### Run |
| |
|
| | ```bash |
| | cd rl_hack |
| | |
| | # Test on default task (simple lookup) |
| | uv run python -m test_with_llm |
| | |
| | # Test a specific task by index (0-76) |
| | uv run python -m test_with_llm 14 # medium onboarding task |
| | uv run python -m test_with_llm 24 # complex full onboarding |
| | uv run python -m test_with_llm 55 # edge case (headcount limit) |
| | |
| | # Run full evaluation across all 77 tasks |
| | uv run python test_all_tasks.py |
| | ``` |
| |
|
| | The script will: |
| | - Reset the environment and pick a task |
| | - Use GPT-4o-mini to generate tool calls |
| | - Execute each tool call against the environment |
| | - Print the rubric evaluation with pass/fail per criterion |
| |
|
| | ### Example Output |
| |
|
| | ``` |
| | Task ID: task_0015 |
| | Difficulty: medium |
| | Instruction: Onboard new hire Priya Sharma to Engineering as L2 Software Engineer... |
| | |
| | --- Step 1/15 --- |
| | LLM: {"tool": "hr_create_employee", "params": {"name": "Priya Sharma", ...}} |
| | Tool: hr_create_employee |
| | Result: {"success": true, "employee": {"emp_id": "emp_0201", ...}} |
| | |
| | --- Step 2/15 --- |
| | LLM: {"tool": "onboarding_create_request", "params": {"employee_id": "emp_0201"}} |
| | Tool: onboarding_create_request |
| | Result: {"success": true, ...} |
| | |
| | FINAL EVALUATION |
| | Score: 100% (7/7 criteria) |
| | Passed: True |
| | [PASS] created_employee |
| | [PASS] correct_name |
| | [PASS] correct_dept |
| | [PASS] initiated_onboarding |
| | [PASS] sequencing |
| | ``` |
| |
|
| | ### Task Index Reference |
| |
|
| | | Index | Difficulty | Category | Description | |
| | |-------|-----------|----------|-------------| |
| | | 0-13 | Simple | Lookup/Onboarding | Single lookups, status checks | |
| | | 14-23 | Medium | Onboarding | Create employee + initiate workflow | |
| | | 24-34 | Complex | Onboarding | Full end-to-end with IT, access, comms | |
| | | 35-46 | Medium | Offboarding | Initiate offboarding + revoke access | |
| | | 47-54 | Complex | Offboarding | Full offboarding with asset reclaim | |
| | | 55-66 | Edge case | Various | Headcount limits, license caps, RBAC | |
| | | 67-76 | Complex | Cross-workflow | Transfers, rehires, manager departures | |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | # Clone the repo |
| | git clone https://github.com/ravi03071991/rl_hack.git |
| | cd rl_hack |
| | |
| | # Install core dependencies |
| | uv pip install -e . |
| | |
| | # Install with evaluation support (adds openai) |
| | uv pip install -e ".[eval]" |
| | |
| | # Install with training support (adds unsloth, trl, torch, etc.) |
| | uv pip install -e ".[train]" |
| | |
| | # Install everything |
| | uv pip install -e ".[eval,train,dev]" |
| | ``` |
| |
|
| | ## Building & Running |
| |
|
| | ```bash |
| | # Run locally (as OpenEnv HTTP server with playground UI) |
| | uvicorn server.app:app --reload --host 0.0.0.0 --port 7860 |
| | |
| | # Build Docker image |
| | docker build -t hr-onboarding-env:latest -f server/Dockerfile . |
| | |
| | # Deploy to HF Spaces |
| | openenv push |
| | ``` |
| |
|
| | ## Training & Results |
| |
|
| | We use Unsloth + GRPO to train an LLM agent on this environment. See [`train_hr_agent.ipynb`](train_hr_agent.ipynb) for the full training notebook and [W&B run](https://wandb.ai/ravi03071991/hr-agent-training/runs/bgent3o3?nw=nwuserravi03071991) for live training metrics. |
| |
|
| | ### Setup |
| |
|
| | - **Model**: Llama 3.2-1B-Instruct (4-bit quantized, LoRA rank 8) |
| | - **Algorithm**: GRPO (Group Relative Policy Optimization) |
| | - **Reward functions**: Valid JSON + rubric score + efficiency |
| | - **Training**: 300 steps, 6 generations per prompt, lr=5e-5 with cosine schedule |
| | - **Data split**: 70/30 stratified train/test (52 train, 25 test tasks) |
| |
|
| | ### Results |
| |
|
| | GRPO training significantly improves the model's ability to complete HR workflows: |
| |
|
| | | Metric | Base Model | Trained | Change | |
| | |--------|-----------|---------|--------| |
| | | **Train pass rate** | 15.4% | 19.2% | +3.8% | |
| | | **Train mean score** | 0.370 | 0.617 | **+0.247 (+67%)** | |
| | | **Test pass rate** | 12.0% | 16.0% | +4.0% | |
| | | **Test mean score** | 0.370 | 0.617 | **+0.247 (+67%)** | |
| |
|
| | #### Improvement by difficulty |
| |
|
| | | Difficulty | Baseline | Trained | Change | |
| | |------------|----------|---------|--------| |
| | | Simple | 0.23 | 0.50 | +0.27 | |
| | | Medium | 0.72 | 0.86 | +0.14 | |
| | | **Complex** | **0.26** | **0.68** | **+0.42** | |
| | | Edge case | 0.22 | 0.25 | +0.03 | |
| |
|
| | The biggest gains are on **complex multi-step tasks** β scores more than doubled. The improvement **generalizes to held-out test tasks**, proving the model learned transferable HR workflow skills. |
| |
|
| | ### Reward Curve |
| |
|
| |  |
| |
|
| | The moving average reward trends upward from ~2-3 early in training to ~4-5 by the end, showing consistent learning. |
| |
|
| | ### Quick start (Colab) |
| |
|
| | 1. Click the Colab badge at the top to open `train_hr_agent.ipynb` in Google Colab |
| | 2. Select a GPU runtime |
| | 3. Run all cells β installs dependencies, trains, and evaluates automatically |
| |
|
| | ## Live Demo |
| |
|
| | Try the environment on Hugging Face Spaces: https://huggingface.co/spaces/devxpy/rl_hack |
| | |