title: HR Onboarding & Offboarding Environment
emoji: π’
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /playground
tags:
- openenv
HR Onboarding & Offboarding Environment
An OpenEnv-compatible RL environment that simulates enterprise HR onboarding and offboarding workflows. The agent orchestrates across 6 enterprise apps β Workday, ServiceNow, Okta, Email, Slack, and Calendar β using 25 tools to complete multi-step tasks in a realistic HR system (200+ employees, 8 departments, RBAC, approval chains).
Built for the OpenEnv Hackathon SF β Statement 3.1: Professional Tasks (Scaler AI Labs partner theme: Multi-App RL Environment for Enterprise Workflows).
Key Results
GRPO training on Llama 3.2-1B-Instruct improves mean task score by +67% (0.37 β 0.62). Complex multi-step task scores more than double (0.26 β 0.68). Gains generalize to held-out test tasks.
| Baseline | Trained | Improvement | |
|---|---|---|---|
| Mean Score | 0.370 | 0.617 | +67% |
| Complex Tasks | 0.26 | 0.68 | +162% |
| Pass Rate | 15.4% | 19.2% | +3.8pp |
Quick Start
from rl_hack import HROnboardingAction, HROnboardingEnv
# Connect to the environment
with HROnboardingEnv(base_url="http://localhost:7860") as env:
result = env.reset()
print(result.observation) # Task instruction + available tools
# Agent calls tools to complete the task
result = env.step(HROnboardingAction(
tool_name="hr_create_employee",
arguments={"name": "Priya Sharma", "department": "Engineering", "level": "L2", "role": "Software Engineer"}
))
print(result.observation) # Tool result
print(result.reward) # Rubric-based reward
Tools / Actions (25 MCP Tools)
The agent interacts with the environment by calling these tools. Each tool modifies the world state and returns a result.
HR System (5 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 1 | hr_create_employee |
Create a new employee record | name, department, level, role, manager_id, is_contractor |
| 2 | hr_read_employee |
Look up employee by ID or email | emp_id or email |
| 3 | hr_update_employee |
Update employee fields (status, department, etc.) | emp_id, updates (dict) |
| 4 | hr_search_employees |
Search/filter employees by criteria | department, level, status, location, role |
| 5 | hr_get_org_chart |
Get reporting hierarchy for a department | department |
Onboarding / Offboarding (6 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 6 | onboarding_create_request |
Initiate onboarding for a new hire | employee_id |
| 7 | onboarding_get_status |
Check onboarding progress | request_id or employee_id |
| 8 | onboarding_complete_step |
Mark an onboarding step as done | request_id, step |
| 9 | offboarding_create_request |
Initiate offboarding for departing employee | employee_id, reason, exit_date |
| 10 | offboarding_get_status |
Check offboarding progress | request_id or employee_id |
| 11 | offboarding_complete_step |
Mark an offboarding step as done | request_id, step |
IT Provisioning (5 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 12 | it_assign_asset |
Assign laptop/monitor/phone to employee | asset_id, employee_id |
| 13 | it_get_available_assets |
List unassigned assets by type | asset_type (laptop, monitor, phone, headset) |
| 14 | it_create_account |
Create email/Slack/VPN/GitHub accounts | employee_id, account_types |
| 15 | it_revoke_access |
Revoke all IT access (for offboarding) | employee_id |
| 16 | it_get_software_licenses |
Check license seat availability | software_name |
Access Control (4 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 17 | access_assign_role |
Assign RBAC role (checks level/dept restrictions) | employee_id, role_id |
| 18 | access_create_badge |
Create physical access badge | employee_id, access_zones |
| 19 | access_revoke_role |
Revoke a specific access role | employee_id, role_id |
| 20 | access_get_security_groups |
List all security groups and resources | (none) |
Communication (3 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 21 | email_send |
Send email (welcome, farewell, notifications) | from_address, to_address, subject, body |
| 22 | slack_send_message |
Post in Slack channel or DM | channel, sender, text |
| 23 | meeting_schedule |
Schedule orientation, 1-on-1, exit interview | title, attendees, datetime, meeting_type |
Policy & Approval (2 tools)
| # | Tool | Description | Key Parameters |
|---|---|---|---|
| 24 | policy_lookup |
Look up company policies by topic/department | topic, department, policy_id |
| 25 | approval_request |
Submit approval (manager/IT/security/legal) | request_id, approver_id, approval_type |
Tasks (77 tasks across 4 categories)
Each episode presents one task. The agent must call the right tools in the right order.
Task Categories
| Category | Count | Example |
|---|---|---|
| Lookup (simple) | 11 | "List all employees in the Engineering department" |
| Onboarding | 32 | "Fully onboard John Lee as L3 Team Lead in Data Science β create record, assign laptop, provision accounts, set up access, send welcome email, schedule orientation" |
| Offboarding | 24 | "Offboard departing director β revoke all access, reclaim assets, reassign reports, send farewell, schedule exit interview" |
| Cross-workflow | 10 | "Employee transferring from Engineering to Product β offboard from old dept, onboard to new" |
Difficulty Levels
| Difficulty | Count | Tools per task | Description |
|---|---|---|---|
| Simple | 19 | 1-2 | Single lookups or status checks |
| Medium | 21 | 2-4 | Create + initiate workflows |
| Complex | 25 | 5-10 | Full end-to-end workflows with approvals |
| Edge case | 12 | 2-5 | Business rule violations, policy constraints |
Edge Cases (designed to test policy compliance)
- Department at headcount limit β create employee should fail
- Software license seats full (Netsuite, LinkedIn Sales Navigator)
- Manager on leave β must find skip-level manager for approvals
- Contractor onboarding β different rules (no VPN, limited access, legal approval required)
- Termination vs resignation β different offboarding steps, no farewell email
- Offer rescinded β offboard someone mid-onboarding
- Level mismatch β L1 employee can't get L4+ access roles
- Department restriction β Marketing employee can't get Engineering GitHub role
World State (500+ entities)
| Entity | Count | Description |
|---|---|---|
| Employees | 200 | Full org hierarchy across 8 departments (L1-L6) |
| Departments | 8 | Engineering, Product, Marketing, Sales, Finance, HR, Data Science, Security |
| IT Assets | 100 | Laptops (50), monitors (25), phones (15), headsets (10) |
| Access Roles | 20 | RBAC roles with level/department restrictions |
| Software Licenses | 15 | Jira, GitHub, AWS, Slack, Salesforce, etc. (2 intentionally full) |
| Policies | 15 | Onboarding, offboarding, badge access, contractor, termination, etc. |
| Security Groups | 15 | engineering_team, vpn_users, server_room_access, etc. |
| Message Templates | 12 | Welcome/farewell emails, Slack messages, notifications |
RBAC Rules
- L1 Associate β L2 Senior β L3 Team Lead β L4 Manager β L5 Director β L6 VP
- L3+ can approve onboarding
- L4+ required for security approvals and server room badge access
- Contractors require legal approval
- Access roles have minimum level requirements and department restrictions
Reward / Rubric
Each task has a rubric with verifiable criteria. Reward = proportion of criteria satisfied.
Rubric Check Types
| Check | Example | What it verifies |
|---|---|---|
tool_used |
tool_used:hr_create_employee |
Tool was called at least once |
tool_not_used |
tool_not_used:slack_send_message |
Tool was NOT called (e.g. no farewell for terminations) |
tool_used_any |
tool_used_any:email_send,slack_send_message |
At least one of the tools was used |
param_value |
param_value:hr_create_employee.name=Priya Sharma |
Tool called with specific parameter value |
param_contains |
param_contains:policy_lookup.topic=onboard |
Parameter contains substring |
tool_order |
tool_order:hr_create_employee<onboarding_create_request |
Tool A called before Tool B |
tool_count |
tool_count:onboarding_complete_step>=3 |
Tool called at least N times |
result_contains |
result_contains:headcount_limit |
Any tool result contains substring |
Example Rubric (medium task)
Task: "Onboard Priya Sharma to Engineering as L2 Software Engineer"
| Criterion | Check |
|---|---|
| Created employee record | tool_used:hr_create_employee |
| Correct name | param_value:hr_create_employee.name=Priya Sharma |
| Correct department | param_value:hr_create_employee.department=Engineering |
| Correct level | param_value:hr_create_employee.level=L2 |
| Correct role | param_value:hr_create_employee.role=Software Engineer |
| Initiated onboarding | tool_used:onboarding_create_request |
| Correct sequencing | tool_order:hr_create_employee<onboarding_create_request |
Score: 7/7 = 1.0 (pass) or partial (e.g. 5/7 = 0.71)
Environment API
OpenEnv Interface (MCPEnvironment)
reset() β Observation # Pick task, reset world state, return instruction
step() β Observation # Agent calls a tool, get result + reward
state β State # Current step count, episode ID
Episode Flow
1. env.reset()
β Task: "Fully onboard John Lee as L3 Team Lead..."
2. Agent calls: hr_create_employee(name="John Lee", department="Data Science", level="L3", ...)
β env.step() β {"success": true, "emp_id": "emp_0201"}
3. Agent calls: onboarding_create_request(employee_id="emp_0201")
β env.step() β {"success": true, "request_id": "onb_0001", "steps": {...}}
4. Agent calls: it_get_available_assets(asset_type="laptop")
β env.step() β {"success": true, "assets": [...]}
5. Agent calls: it_assign_asset(asset_id="asset_003", employee_id="emp_0201")
β env.step() β {"success": true}
... more tool calls ...
N. Episode ends (max 15 steps or agent signals done)
β Reward: 8/10 criteria satisfied = 0.8
Project Structure
rl_hack/
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Project metadata
βββ __init__.py # Module exports
βββ client.py # HROnboardingEnv client
βββ models.py # Action/Observation Pydantic models
βββ test_with_llm.py # Test single task with GPT agent
βββ test_all_tasks.py # Evaluate all 77 tasks
βββ train_hr_agent.ipynb # GRPO training notebook (Unsloth)
βββ .env # API keys (gitignored)
βββ outputs/ # Evaluation results
βββ server/
βββ __init__.py
βββ app.py # FastAPI application
βββ hr_onboarding_environment.py # Core environment (Environment subclass)
βββ world.py # World state (entities, RBAC, mutations)
βββ tools.py # Tool registry (25 tools)
βββ tasks.py # Task definitions + generation (77 tasks)
βββ rubrics.py # Rubric evaluator (reward computation)
βββ data/
β βββ employees.json # 200 employee records
β βββ departments.json # 8 departments with policies
β βββ policies.json # 15 business rule documents
β βββ it_assets.json # 100 IT assets
β βββ access_roles.json # 20 RBAC roles
β βββ templates.json # 12 message templates
βββ Dockerfile # Container image
βββ requirements.txt # Server dependencies
Testing with an LLM Agent
You can test the environment locally using GPT (or any OpenAI-compatible model) as the agent.
Setup
Create a
.envfile in the repo root:OPENAI_API_KEY="sk-proj-..."Install dependencies:
uv pip install -e ".[eval]"
Run
cd rl_hack
# Test on default task (simple lookup)
uv run python -m test_with_llm
# Test a specific task by index (0-76)
uv run python -m test_with_llm 14 # medium onboarding task
uv run python -m test_with_llm 24 # complex full onboarding
uv run python -m test_with_llm 55 # edge case (headcount limit)
# Run full evaluation across all 77 tasks
uv run python test_all_tasks.py
The script will:
- Reset the environment and pick a task
- Use GPT-4o-mini to generate tool calls
- Execute each tool call against the environment
- Print the rubric evaluation with pass/fail per criterion
Example Output
Task ID: task_0015
Difficulty: medium
Instruction: Onboard new hire Priya Sharma to Engineering as L2 Software Engineer...
--- Step 1/15 ---
LLM: {"tool": "hr_create_employee", "params": {"name": "Priya Sharma", ...}}
Tool: hr_create_employee
Result: {"success": true, "employee": {"emp_id": "emp_0201", ...}}
--- Step 2/15 ---
LLM: {"tool": "onboarding_create_request", "params": {"employee_id": "emp_0201"}}
Tool: onboarding_create_request
Result: {"success": true, ...}
FINAL EVALUATION
Score: 100% (7/7 criteria)
Passed: True
[PASS] created_employee
[PASS] correct_name
[PASS] correct_dept
[PASS] initiated_onboarding
[PASS] sequencing
Task Index Reference
| Index | Difficulty | Category | Description |
|---|---|---|---|
| 0-13 | Simple | Lookup/Onboarding | Single lookups, status checks |
| 14-23 | Medium | Onboarding | Create employee + initiate workflow |
| 24-34 | Complex | Onboarding | Full end-to-end with IT, access, comms |
| 35-46 | Medium | Offboarding | Initiate offboarding + revoke access |
| 47-54 | Complex | Offboarding | Full offboarding with asset reclaim |
| 55-66 | Edge case | Various | Headcount limits, license caps, RBAC |
| 67-76 | Complex | Cross-workflow | Transfers, rehires, manager departures |
Installation
# Clone the repo
git clone https://github.com/ravi03071991/rl_hack.git
cd rl_hack
# Install core dependencies
uv pip install -e .
# Install with evaluation support (adds openai)
uv pip install -e ".[eval]"
# Install with training support (adds unsloth, trl, torch, etc.)
uv pip install -e ".[train]"
# Install everything
uv pip install -e ".[eval,train,dev]"
Building & Running
# Run locally (as OpenEnv HTTP server with playground UI)
uvicorn server.app:app --reload --host 0.0.0.0 --port 7860
# Build Docker image
docker build -t hr-onboarding-env:latest -f server/Dockerfile .
# Deploy to HF Spaces
openenv push
Training & Results
We use Unsloth + GRPO to train an LLM agent on this environment. See train_hr_agent.ipynb for the full training notebook and W&B run for live training metrics.
Setup
- Model: Llama 3.2-1B-Instruct (4-bit quantized, LoRA rank 8)
- Algorithm: GRPO (Group Relative Policy Optimization)
- Reward functions: Valid JSON + rubric score + efficiency
- Training: 300 steps, 6 generations per prompt, lr=5e-5 with cosine schedule
- Data split: 70/30 stratified train/test (52 train, 25 test tasks)
Results
GRPO training significantly improves the model's ability to complete HR workflows:
| Metric | Base Model | Trained | Change |
|---|---|---|---|
| Train pass rate | 15.4% | 19.2% | +3.8% |
| Train mean score | 0.370 | 0.617 | +0.247 (+67%) |
| Test pass rate | 12.0% | 16.0% | +4.0% |
| Test mean score | 0.370 | 0.617 | +0.247 (+67%) |
Improvement by difficulty
| Difficulty | Baseline | Trained | Change |
|---|---|---|---|
| Simple | 0.23 | 0.50 | +0.27 |
| Medium | 0.72 | 0.86 | +0.14 |
| Complex | 0.26 | 0.68 | +0.42 |
| Edge case | 0.22 | 0.25 | +0.03 |
The biggest gains are on complex multi-step tasks β scores more than doubled. The improvement generalizes to held-out test tasks, proving the model learned transferable HR workflow skills.
Reward Curve
The moving average reward trends upward from ~2-3 early in training to ~4-5 by the end, showing consistent learning.
Quick start (Colab)
- Click the Colab badge at the top to open
train_hr_agent.ipynbin Google Colab - Select a GPU runtime
- Run all cells β installs dependencies, trains, and evaluates automatically
Live Demo
Try the environment on Hugging Face Spaces: https://huggingface.co/spaces/devxpy/rl_hack
