Spaces:
Runtime error
title: SimLab HR — AI Recruiting & People Management Agent Environment
emoji: 👔
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
short_description: MCP gym for benchmarking & training AI HR agents
tags:
- openenv
- hr
- human-resources
- recruiting
- hrms
- agent-evaluation
- agent-benchmark
- simlab
- reinforcement-learning
- rl-environment
- ai-agent
- tool-use
- function-calling
- enterprise
- multi-tool
- gymnasium
- gym
- benchmark
- mcp
- model-context-protocol
- reward-model
- verifier
- collinear
- langchain
- openai
- sandbox
- docker
- toolbench
- swe-bench
- bfcl
pinned: true
license: apache-2.0
SimLab HR — MCP-Powered Gym for AI HR Agent Evaluation
A fully-functional HR simulation gym for training, evaluating, and benchmarking AI recruiting and people management agents. Test your agent's tool-use and function-calling abilities across 4 MCP tool servers — HRMS, email, calendar, and team chat — with automated rubric-based evaluation.
Built on OpenEnv and powered by SimLab.
Unlike single-API function-calling benchmarks like BFCL or ToolBench, SimLab HR gives your agent a full workplace — HRMS, email, calendar, and team chat — and asks it to complete real multi-step HR workflows end-to-end.
4 MCP Tool Servers, 1 Environment
| Tool Server | Port | What it does |
|---|---|---|
| HRMS (Frappe) | 8030 | Employee records, leave management, attendance, payroll |
| Email (MailHog) | 8040 | Send and read emails, inbox management |
| Calendar (Baikal/Chronos) | 8050 | Schedule meetings, check availability, manage events |
| RocketChat | 8060 | Team messaging, channels, direct messages |
Agents must reason and coordinate tool calls across all four MCP servers to complete real HR workflows — the kind of multi-step function calling that separates real tool-use from single-API benchmarks.
Quickstart
from simlab_hr import HRAction
from simlab_hr.client import HREnv
client = HREnv(base_url="http://localhost:8000")
with client:
obs = client.reset()
print(obs.observation.task_instruction)
print(obs.observation.tools_available) # {'hrms': [...], 'email': [...], ...}
# Check leave balance in HRMS
result = client.step(HRAction(
tool_server="hrms",
tool_name="get_leave_balance",
parameters={"employee_id": "EMP-0042"}
))
# Send an email notification
result = client.step(HRAction(
tool_server="email",
tool_name="send_email",
parameters={"to": "manager@company.com", "subject": "Leave approved", "body": "..."}
))
Benchmark Tasks
8 sample tasks covering real HR workflows across three difficulty levels:
| Difficulty | Example Tasks |
|---|---|
| Easy | Approve a leave request, update an employee's designation |
| Medium | Schedule a phone screen + send confirmation, run an attendance report |
| Hard | Multi-person panel interview scheduling, full new-hire onboarding flow |
Every task requires the agent to coordinate function calls across multiple MCP tool servers — this is what makes it hard.
Automated Evaluation
SimLab HR includes a rubric-based LLM judge that evaluates agent performance after each episode:
- 0.8–1.0: All requirements fully met with clear evidence
- 0.6–0.8: Core requirements met with minor gaps (0.6 = PASS threshold)
- 0.4–0.6: Partial completion, significant gaps remain
- 0.0–0.4: Minimal or no meaningful progress
Configure the verifier model:
export VERIFIER_MODEL="gpt-4o"
export VERIFIER_API_KEY="sk-..."
Run Locally
git clone https://github.com/collinear-ai/simlab.git
cd simlab/envs/simlab_hr
# Start all services (HRMS, Email, Calendar, RocketChat, OpenEnv wrapper)
docker compose up
# First run pulls ~10 images and takes a few minutes for HRMS to initialize
Or run from Hugging Face:
from simlab_hr.client import HREnv
client = HREnv.from_hub("collinear/simlab-hr")
Unlock 14+ Tasks from the API
This environment ships with 8 sample tasks. Want more?
Set your Collinear API key to unlock the full task set with real HR scenarios:
export COLLINEAR_API_KEY="your-key-here"
Get a free API key at platform.collinear.ai (Developer Resources → API Keys).
With the API key, every reset() pulls a fresh task from Collinear's Scenario Manager — recruiting workflows, people management scenarios, compliance tasks, and more.
Use with TRL / GRPOTrainer
Compatible with Hugging Face TRL for reinforcement learning fine-tuning:
from simlab_hr import HRAction
from simlab_hr.client import HREnv
env = HREnv.from_hub("collinear/simlab-hr")
with env:
obs = env.reset()
# ... your RL training loop
How SimLab HR Compares
Most tool-use benchmarks evaluate function calling in isolation — single API calls with predefined schemas. SimLab HR tests whether your agent can actually get work done across a real enterprise environment.
| SimLab HR | BFCL | ToolBench | EnterpriseOps-Gym | tau-bench | |
|---|---|---|---|---|---|
| What it tests | Multi-tool HR workflows end-to-end | Function call accuracy (single/parallel) | API discovery & chaining across 16k APIs | Enterprise planning across 8 domains | Customer service policy compliance |
| Real backing services | ✅ Frappe HRMS, MailHog, CalDAV, RocketChat | ❌ Schema validation only | ❌ API simulation | ❌ Mock APIs | ❌ Simulated |
| MCP tool servers | ✅ 4 servers | ❌ | ❌ | ❌ REST APIs | ❌ |
| Multi-step workflows | ✅ 10+ steps, cross-system | ❌ Single/parallel calls | ✅ Multi-hop chains | ✅ Avg 9 steps | ✅ Multi-turn |
| HR-specific | ✅ Dedicated | ❌ | ❌ | ✅ 1 of 8 domains | ❌ |
| Automated evaluation | ✅ Rubric-based LLM judge | ✅ AST matching | ✅ Pass rate + win rate | ✅ Expert-curated | ✅ Policy checks |
| RL / Gymnasium support | ✅ OpenEnv-compatible | ❌ | ❌ | ❌ | ❌ |
| Task generation | ✅ API pipeline | ❌ | ❌ | ❌ | ❌ |
More Environments
SimLab includes 5 enterprise simulation scenarios with 14 MCP tool servers:
| Scenario | MCP Tool Servers |
|---|---|
| Human Resources | HRMS, email, calendar, team chat ← you are here |
| Customer Service | Helpdesk ticketing, team chat, email |
| Finance | SEC filings, market data, Google Workspace |
| Coding | Sandboxed IDE, browser automation, team chat |
| CRM | Contacts, deals, pipelines, activities |
Install the full toolkit:
pip install simulationlab
simlab templates list
Learn more: github.com/collinear-ai/simlab | docs.collinear.ai
License
Apache 2.0 — Collinear AI