File size: 17,939 Bytes
5ed5e7a e181764 ec17c6d 5ed5e7a 6c8a204 e181764 ec17c6d 5ed5e7a e181764 ec17c6d 126c21b ec17c6d 126c21b ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 126c21b e181764 126c21b e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 126c21b e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 126c21b ec17c6d e181764 126c21b ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d e181764 ec17c6d 126c21b e181764 ec17c6d 126c21b e181764 ec17c6d 126c21b ec17c6d 126c21b ec17c6d 126c21b e181764 126c21b ec17c6d 126c21b ec17c6d e181764 ec17c6d e181764 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | ---
title: HR Onboarding & Offboarding Environment
emoji: π’
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /playground
tags:
- openenv
---
# HR Onboarding & Offboarding Environment
[](https://colab.research.google.com/github/ravi03071991/rl_hack/blob/master/train_hr_agent.ipynb)
An OpenEnv-compatible RL environment that simulates enterprise HR onboarding and offboarding workflows. The agent orchestrates across **6 enterprise apps** β Workday, ServiceNow, Okta, Email, Slack, and Calendar β using 25 tools to complete multi-step tasks in a realistic HR system (200+ employees, 8 departments, RBAC, approval chains).
Built for the [OpenEnv Hackathon SF](https://cerebralvalley.ai/e/openenv-hackathon-sf/details) β **Statement 3.1: Professional Tasks** (Scaler AI Labs partner theme: Multi-App RL Environment for Enterprise Workflows).
### Key Results
> **GRPO training on Llama 3.2-1B-Instruct improves mean task score by +67% (0.37 β 0.62).**
> Complex multi-step task scores **more than double** (0.26 β 0.68). Gains generalize to held-out test tasks.
| | Baseline | Trained | Improvement |
|---|---------|---------|-------------|
| Mean Score | 0.370 | 0.617 | **+67%** |
| Complex Tasks | 0.26 | 0.68 | **+162%** |
| Pass Rate | 15.4% | 19.2% | +3.8pp |
## Quick Start
```python
from rl_hack import HROnboardingAction, HROnboardingEnv
# Connect to the environment
with HROnboardingEnv(base_url="http://localhost:7860") as env:
result = env.reset()
print(result.observation) # Task instruction + available tools
# Agent calls tools to complete the task
result = env.step(HROnboardingAction(
tool_name="hr_create_employee",
arguments={"name": "Priya Sharma", "department": "Engineering", "level": "L2", "role": "Software Engineer"}
))
print(result.observation) # Tool result
print(result.reward) # Rubric-based reward
```
## Tools / Actions (25 MCP Tools)
The agent interacts with the environment by calling these tools. Each tool modifies the world state and returns a result.
### HR System (5 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 1 | `hr_create_employee` | Create a new employee record | `name`, `department`, `level`, `role`, `manager_id`, `is_contractor` |
| 2 | `hr_read_employee` | Look up employee by ID or email | `emp_id` or `email` |
| 3 | `hr_update_employee` | Update employee fields (status, department, etc.) | `emp_id`, `updates` (dict) |
| 4 | `hr_search_employees` | Search/filter employees by criteria | `department`, `level`, `status`, `location`, `role` |
| 5 | `hr_get_org_chart` | Get reporting hierarchy for a department | `department` |
### Onboarding / Offboarding (6 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 6 | `onboarding_create_request` | Initiate onboarding for a new hire | `employee_id` |
| 7 | `onboarding_get_status` | Check onboarding progress | `request_id` or `employee_id` |
| 8 | `onboarding_complete_step` | Mark an onboarding step as done | `request_id`, `step` |
| 9 | `offboarding_create_request` | Initiate offboarding for departing employee | `employee_id`, `reason`, `exit_date` |
| 10 | `offboarding_get_status` | Check offboarding progress | `request_id` or `employee_id` |
| 11 | `offboarding_complete_step` | Mark an offboarding step as done | `request_id`, `step` |
### IT Provisioning (5 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 12 | `it_assign_asset` | Assign laptop/monitor/phone to employee | `asset_id`, `employee_id` |
| 13 | `it_get_available_assets` | List unassigned assets by type | `asset_type` (laptop, monitor, phone, headset) |
| 14 | `it_create_account` | Create email/Slack/VPN/GitHub accounts | `employee_id`, `account_types` |
| 15 | `it_revoke_access` | Revoke all IT access (for offboarding) | `employee_id` |
| 16 | `it_get_software_licenses` | Check license seat availability | `software_name` |
### Access Control (4 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 17 | `access_assign_role` | Assign RBAC role (checks level/dept restrictions) | `employee_id`, `role_id` |
| 18 | `access_create_badge` | Create physical access badge | `employee_id`, `access_zones` |
| 19 | `access_revoke_role` | Revoke a specific access role | `employee_id`, `role_id` |
| 20 | `access_get_security_groups` | List all security groups and resources | _(none)_ |
### Communication (3 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 21 | `email_send` | Send email (welcome, farewell, notifications) | `from_address`, `to_address`, `subject`, `body` |
| 22 | `slack_send_message` | Post in Slack channel or DM | `channel`, `sender`, `text` |
| 23 | `meeting_schedule` | Schedule orientation, 1-on-1, exit interview | `title`, `attendees`, `datetime`, `meeting_type` |
### Policy & Approval (2 tools)
| # | Tool | Description | Key Parameters |
|---|------|-------------|----------------|
| 24 | `policy_lookup` | Look up company policies by topic/department | `topic`, `department`, `policy_id` |
| 25 | `approval_request` | Submit approval (manager/IT/security/legal) | `request_id`, `approver_id`, `approval_type` |
## Tasks (77 tasks across 4 categories)
Each episode presents one task. The agent must call the right tools in the right order.
### Task Categories
| Category | Count | Example |
|----------|-------|---------|
| **Lookup** (simple) | 11 | "List all employees in the Engineering department" |
| **Onboarding** | 32 | "Fully onboard John Lee as L3 Team Lead in Data Science β create record, assign laptop, provision accounts, set up access, send welcome email, schedule orientation" |
| **Offboarding** | 24 | "Offboard departing director β revoke all access, reclaim assets, reassign reports, send farewell, schedule exit interview" |
| **Cross-workflow** | 10 | "Employee transferring from Engineering to Product β offboard from old dept, onboard to new" |
### Difficulty Levels
| Difficulty | Count | Tools per task | Description |
|------------|-------|---------------|-------------|
| Simple | 19 | 1-2 | Single lookups or status checks |
| Medium | 21 | 2-4 | Create + initiate workflows |
| Complex | 25 | 5-10 | Full end-to-end workflows with approvals |
| Edge case | 12 | 2-5 | Business rule violations, policy constraints |
### Edge Cases (designed to test policy compliance)
- Department at **headcount limit** β create employee should fail
- Software license **seats full** (Netsuite, LinkedIn Sales Navigator)
- Manager **on leave** β must find skip-level manager for approvals
- **Contractor** onboarding β different rules (no VPN, limited access, legal approval required)
- **Termination** vs resignation β different offboarding steps, no farewell email
- **Offer rescinded** β offboard someone mid-onboarding
- **Level mismatch** β L1 employee can't get L4+ access roles
- **Department restriction** β Marketing employee can't get Engineering GitHub role
## World State (500+ entities)
| Entity | Count | Description |
|--------|-------|-------------|
| Employees | 200 | Full org hierarchy across 8 departments (L1-L6) |
| Departments | 8 | Engineering, Product, Marketing, Sales, Finance, HR, Data Science, Security |
| IT Assets | 100 | Laptops (50), monitors (25), phones (15), headsets (10) |
| Access Roles | 20 | RBAC roles with level/department restrictions |
| Software Licenses | 15 | Jira, GitHub, AWS, Slack, Salesforce, etc. (2 intentionally full) |
| Policies | 15 | Onboarding, offboarding, badge access, contractor, termination, etc. |
| Security Groups | 15 | engineering_team, vpn_users, server_room_access, etc. |
| Message Templates | 12 | Welcome/farewell emails, Slack messages, notifications |
### RBAC Rules
- **L1** Associate β **L2** Senior β **L3** Team Lead β **L4** Manager β **L5** Director β **L6** VP
- L3+ can approve onboarding
- L4+ required for security approvals and server room badge access
- Contractors require legal approval
- Access roles have minimum level requirements and department restrictions
## Reward / Rubric
Each task has a rubric with verifiable criteria. Reward = proportion of criteria satisfied.
### Rubric Check Types
| Check | Example | What it verifies |
|-------|---------|-----------------|
| `tool_used` | `tool_used:hr_create_employee` | Tool was called at least once |
| `tool_not_used` | `tool_not_used:slack_send_message` | Tool was NOT called (e.g. no farewell for terminations) |
| `tool_used_any` | `tool_used_any:email_send,slack_send_message` | At least one of the tools was used |
| `param_value` | `param_value:hr_create_employee.name=Priya Sharma` | Tool called with specific parameter value |
| `param_contains` | `param_contains:policy_lookup.topic=onboard` | Parameter contains substring |
| `tool_order` | `tool_order:hr_create_employee<onboarding_create_request` | Tool A called before Tool B |
| `tool_count` | `tool_count:onboarding_complete_step>=3` | Tool called at least N times |
| `result_contains` | `result_contains:headcount_limit` | Any tool result contains substring |
### Example Rubric (medium task)
Task: "Onboard Priya Sharma to Engineering as L2 Software Engineer"
| Criterion | Check |
|-----------|-------|
| Created employee record | `tool_used:hr_create_employee` |
| Correct name | `param_value:hr_create_employee.name=Priya Sharma` |
| Correct department | `param_value:hr_create_employee.department=Engineering` |
| Correct level | `param_value:hr_create_employee.level=L2` |
| Correct role | `param_value:hr_create_employee.role=Software Engineer` |
| Initiated onboarding | `tool_used:onboarding_create_request` |
| Correct sequencing | `tool_order:hr_create_employee<onboarding_create_request` |
**Score**: 7/7 = 1.0 (pass) or partial (e.g. 5/7 = 0.71)
## Environment API
### OpenEnv Interface (MCPEnvironment)
```
reset() β Observation # Pick task, reset world state, return instruction
step() β Observation # Agent calls a tool, get result + reward
state β State # Current step count, episode ID
```
### Episode Flow
```
1. env.reset()
β Task: "Fully onboard John Lee as L3 Team Lead..."
2. Agent calls: hr_create_employee(name="John Lee", department="Data Science", level="L3", ...)
β env.step() β {"success": true, "emp_id": "emp_0201"}
3. Agent calls: onboarding_create_request(employee_id="emp_0201")
β env.step() β {"success": true, "request_id": "onb_0001", "steps": {...}}
4. Agent calls: it_get_available_assets(asset_type="laptop")
β env.step() β {"success": true, "assets": [...]}
5. Agent calls: it_assign_asset(asset_id="asset_003", employee_id="emp_0201")
β env.step() β {"success": true}
... more tool calls ...
N. Episode ends (max 15 steps or agent signals done)
β Reward: 8/10 criteria satisfied = 0.8
```
## Project Structure
```
rl_hack/
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Project metadata
βββ __init__.py # Module exports
βββ client.py # HROnboardingEnv client
βββ models.py # Action/Observation Pydantic models
βββ test_with_llm.py # Test single task with GPT agent
βββ test_all_tasks.py # Evaluate all 77 tasks
βββ train_hr_agent.ipynb # GRPO training notebook (Unsloth)
βββ .env # API keys (gitignored)
βββ outputs/ # Evaluation results
βββ server/
βββ __init__.py
βββ app.py # FastAPI application
βββ hr_onboarding_environment.py # Core environment (Environment subclass)
βββ world.py # World state (entities, RBAC, mutations)
βββ tools.py # Tool registry (25 tools)
βββ tasks.py # Task definitions + generation (77 tasks)
βββ rubrics.py # Rubric evaluator (reward computation)
βββ data/
β βββ employees.json # 200 employee records
β βββ departments.json # 8 departments with policies
β βββ policies.json # 15 business rule documents
β βββ it_assets.json # 100 IT assets
β βββ access_roles.json # 20 RBAC roles
β βββ templates.json # 12 message templates
βββ Dockerfile # Container image
βββ requirements.txt # Server dependencies
```
## Testing with an LLM Agent
You can test the environment locally using GPT (or any OpenAI-compatible model) as the agent.
### Setup
1. Create a `.env` file in the repo root:
```
OPENAI_API_KEY="sk-proj-..."
```
2. Install dependencies:
```bash
uv pip install -e ".[eval]"
```
### Run
```bash
cd rl_hack
# Test on default task (simple lookup)
uv run python -m test_with_llm
# Test a specific task by index (0-76)
uv run python -m test_with_llm 14 # medium onboarding task
uv run python -m test_with_llm 24 # complex full onboarding
uv run python -m test_with_llm 55 # edge case (headcount limit)
# Run full evaluation across all 77 tasks
uv run python test_all_tasks.py
```
The script will:
- Reset the environment and pick a task
- Use GPT-4o-mini to generate tool calls
- Execute each tool call against the environment
- Print the rubric evaluation with pass/fail per criterion
### Example Output
```
Task ID: task_0015
Difficulty: medium
Instruction: Onboard new hire Priya Sharma to Engineering as L2 Software Engineer...
--- Step 1/15 ---
LLM: {"tool": "hr_create_employee", "params": {"name": "Priya Sharma", ...}}
Tool: hr_create_employee
Result: {"success": true, "employee": {"emp_id": "emp_0201", ...}}
--- Step 2/15 ---
LLM: {"tool": "onboarding_create_request", "params": {"employee_id": "emp_0201"}}
Tool: onboarding_create_request
Result: {"success": true, ...}
FINAL EVALUATION
Score: 100% (7/7 criteria)
Passed: True
[PASS] created_employee
[PASS] correct_name
[PASS] correct_dept
[PASS] initiated_onboarding
[PASS] sequencing
```
### Task Index Reference
| Index | Difficulty | Category | Description |
|-------|-----------|----------|-------------|
| 0-13 | Simple | Lookup/Onboarding | Single lookups, status checks |
| 14-23 | Medium | Onboarding | Create employee + initiate workflow |
| 24-34 | Complex | Onboarding | Full end-to-end with IT, access, comms |
| 35-46 | Medium | Offboarding | Initiate offboarding + revoke access |
| 47-54 | Complex | Offboarding | Full offboarding with asset reclaim |
| 55-66 | Edge case | Various | Headcount limits, license caps, RBAC |
| 67-76 | Complex | Cross-workflow | Transfers, rehires, manager departures |
## Installation
```bash
# Clone the repo
git clone https://github.com/ravi03071991/rl_hack.git
cd rl_hack
# Install core dependencies
uv pip install -e .
# Install with evaluation support (adds openai)
uv pip install -e ".[eval]"
# Install with training support (adds unsloth, trl, torch, etc.)
uv pip install -e ".[train]"
# Install everything
uv pip install -e ".[eval,train,dev]"
```
## Building & Running
```bash
# Run locally (as OpenEnv HTTP server with playground UI)
uvicorn server.app:app --reload --host 0.0.0.0 --port 7860
# Build Docker image
docker build -t hr-onboarding-env:latest -f server/Dockerfile .
# Deploy to HF Spaces
openenv push
```
## Training & Results
We use Unsloth + GRPO to train an LLM agent on this environment. See [`train_hr_agent.ipynb`](train_hr_agent.ipynb) for the full training notebook and [W&B run](https://wandb.ai/ravi03071991/hr-agent-training/runs/bgent3o3?nw=nwuserravi03071991) for live training metrics.
### Setup
- **Model**: Llama 3.2-1B-Instruct (4-bit quantized, LoRA rank 8)
- **Algorithm**: GRPO (Group Relative Policy Optimization)
- **Reward functions**: Valid JSON + rubric score + efficiency
- **Training**: 300 steps, 6 generations per prompt, lr=5e-5 with cosine schedule
- **Data split**: 70/30 stratified train/test (52 train, 25 test tasks)
### Results
GRPO training significantly improves the model's ability to complete HR workflows:
| Metric | Base Model | Trained | Change |
|--------|-----------|---------|--------|
| **Train pass rate** | 15.4% | 19.2% | +3.8% |
| **Train mean score** | 0.370 | 0.617 | **+0.247 (+67%)** |
| **Test pass rate** | 12.0% | 16.0% | +4.0% |
| **Test mean score** | 0.370 | 0.617 | **+0.247 (+67%)** |
#### Improvement by difficulty
| Difficulty | Baseline | Trained | Change |
|------------|----------|---------|--------|
| Simple | 0.23 | 0.50 | +0.27 |
| Medium | 0.72 | 0.86 | +0.14 |
| **Complex** | **0.26** | **0.68** | **+0.42** |
| Edge case | 0.22 | 0.25 | +0.03 |
The biggest gains are on **complex multi-step tasks** β scores more than doubled. The improvement **generalizes to held-out test tasks**, proving the model learned transferable HR workflow skills.
### Reward Curve

The moving average reward trends upward from ~2-3 early in training to ~4-5 by the end, showing consistent learning.
### Quick start (Colab)
1. Click the Colab badge at the top to open `train_hr_agent.ipynb` in Google Colab
2. Select a GPU runtime
3. Run all cells β installs dependencies, trains, and evaluates automatically
## Live Demo
Try the environment on Hugging Face Spaces: https://huggingface.co/spaces/devxpy/rl_hack
|