simulationlab-hr / README.md
renanserrano's picture
Update comparison table — BFCL, ToolBench, EnterpriseOps-Gym, tau-bench with links and richer descriptions
3d3bc40 verified
metadata
title: SimLab HR  AI Recruiting & People Management Agent Environment
emoji: 👔
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
short_description: MCP gym for benchmarking & training AI HR agents
tags:
  - openenv
  - hr
  - human-resources
  - recruiting
  - hrms
  - agent-evaluation
  - agent-benchmark
  - simlab
  - reinforcement-learning
  - rl-environment
  - ai-agent
  - tool-use
  - function-calling
  - enterprise
  - multi-tool
  - gymnasium
  - gym
  - benchmark
  - mcp
  - model-context-protocol
  - reward-model
  - verifier
  - collinear
  - langchain
  - openai
  - sandbox
  - docker
  - toolbench
  - swe-bench
  - bfcl
pinned: true
license: apache-2.0

SimLab HR — MCP-Powered Gym for AI HR Agent Evaluation

A fully-functional HR simulation gym for training, evaluating, and benchmarking AI recruiting and people management agents. Test your agent's tool-use and function-calling abilities across 4 MCP tool servers — HRMS, email, calendar, and team chat — with automated rubric-based evaluation.

Built on OpenEnv and powered by SimLab.

Unlike single-API function-calling benchmarks like BFCL or ToolBench, SimLab HR gives your agent a full workplace — HRMS, email, calendar, and team chat — and asks it to complete real multi-step HR workflows end-to-end.

4 MCP Tool Servers, 1 Environment

Tool Server Port What it does
HRMS (Frappe) 8030 Employee records, leave management, attendance, payroll
Email (MailHog) 8040 Send and read emails, inbox management
Calendar (Baikal/Chronos) 8050 Schedule meetings, check availability, manage events
RocketChat 8060 Team messaging, channels, direct messages

Agents must reason and coordinate tool calls across all four MCP servers to complete real HR workflows — the kind of multi-step function calling that separates real tool-use from single-API benchmarks.

Quickstart

from simlab_hr import HRAction
from simlab_hr.client import HREnv

client = HREnv(base_url="http://localhost:8000")

with client:
    obs = client.reset()
    print(obs.observation.task_instruction)
    print(obs.observation.tools_available)  # {'hrms': [...], 'email': [...], ...}

    # Check leave balance in HRMS
    result = client.step(HRAction(
        tool_server="hrms",
        tool_name="get_leave_balance",
        parameters={"employee_id": "EMP-0042"}
    ))

    # Send an email notification
    result = client.step(HRAction(
        tool_server="email",
        tool_name="send_email",
        parameters={"to": "manager@company.com", "subject": "Leave approved", "body": "..."}
    ))

Benchmark Tasks

8 sample tasks covering real HR workflows across three difficulty levels:

Difficulty Example Tasks
Easy Approve a leave request, update an employee's designation
Medium Schedule a phone screen + send confirmation, run an attendance report
Hard Multi-person panel interview scheduling, full new-hire onboarding flow

Every task requires the agent to coordinate function calls across multiple MCP tool servers — this is what makes it hard.

Automated Evaluation

SimLab HR includes a rubric-based LLM judge that evaluates agent performance after each episode:

  • 0.8–1.0: All requirements fully met with clear evidence
  • 0.6–0.8: Core requirements met with minor gaps (0.6 = PASS threshold)
  • 0.4–0.6: Partial completion, significant gaps remain
  • 0.0–0.4: Minimal or no meaningful progress

Configure the verifier model:

export VERIFIER_MODEL="gpt-4o"
export VERIFIER_API_KEY="sk-..."

Run Locally

git clone https://github.com/collinear-ai/simlab.git
cd simlab/envs/simlab_hr

# Start all services (HRMS, Email, Calendar, RocketChat, OpenEnv wrapper)
docker compose up

# First run pulls ~10 images and takes a few minutes for HRMS to initialize

Or run from Hugging Face:

from simlab_hr.client import HREnv

client = HREnv.from_hub("collinear/simlab-hr")

Unlock 14+ Tasks from the API

This environment ships with 8 sample tasks. Want more?

Set your Collinear API key to unlock the full task set with real HR scenarios:

export COLLINEAR_API_KEY="your-key-here"

Get a free API key at platform.collinear.ai (Developer Resources → API Keys).

With the API key, every reset() pulls a fresh task from Collinear's Scenario Manager — recruiting workflows, people management scenarios, compliance tasks, and more.

Use with TRL / GRPOTrainer

Compatible with Hugging Face TRL for reinforcement learning fine-tuning:

from simlab_hr import HRAction
from simlab_hr.client import HREnv

env = HREnv.from_hub("collinear/simlab-hr")
with env:
    obs = env.reset()
    # ... your RL training loop

How SimLab HR Compares

Most tool-use benchmarks evaluate function calling in isolation — single API calls with predefined schemas. SimLab HR tests whether your agent can actually get work done across a real enterprise environment.

SimLab HR BFCL ToolBench EnterpriseOps-Gym tau-bench
What it tests Multi-tool HR workflows end-to-end Function call accuracy (single/parallel) API discovery & chaining across 16k APIs Enterprise planning across 8 domains Customer service policy compliance
Real backing services ✅ Frappe HRMS, MailHog, CalDAV, RocketChat ❌ Schema validation only ❌ API simulation ❌ Mock APIs ❌ Simulated
MCP tool servers ✅ 4 servers ❌ REST APIs
Multi-step workflows ✅ 10+ steps, cross-system ❌ Single/parallel calls ✅ Multi-hop chains ✅ Avg 9 steps ✅ Multi-turn
HR-specific ✅ Dedicated ✅ 1 of 8 domains
Automated evaluation ✅ Rubric-based LLM judge ✅ AST matching ✅ Pass rate + win rate ✅ Expert-curated ✅ Policy checks
RL / Gymnasium support ✅ OpenEnv-compatible
Task generation ✅ API pipeline

More Environments

SimLab includes 5 enterprise simulation scenarios with 14 MCP tool servers:

Scenario MCP Tool Servers
Human Resources HRMS, email, calendar, team chat ← you are here
Customer Service Helpdesk ticketing, team chat, email
Finance SEC filings, market data, Google Workspace
Coding Sandboxed IDE, browser automation, team chat
CRM Contacts, deals, pipelines, activities

Install the full toolkit:

pip install simulationlab
simlab templates list

Learn more: github.com/collinear-ai/simlab | docs.collinear.ai

License

Apache 2.0 — Collinear AI