Spaces:
Runtime error
Runtime error
metadata
title: Driver Recruit Environment
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- reinforcement-learning
- recruiting
- multi-turn
π Driver Recruit Environment
A multi-turn, tool-based RL environment for training LLMs to recruit truck drivers through a CRM system. Built on OpenEnv 0.2.1.
The agent must discover driver qualifications through conversation, record info in the CRM, get management approval, and hire β all using structured tool calls across 15-40+ step episodes.
Pipeline
lead β contacted β interested β approval_pending β offer_sent β hired
Tools
| Tool | Actions | Purpose |
|---|---|---|
| crm | read_candidate, update_stage, update_field, add_note |
Manage pipeline & record info |
| messaging | send_message, read_reply |
Screen driver (18 topics) |
| approval | request_approval, check_approval |
Get management sign-off |
| workflow | wait |
Advance time for approval processing |
Reward Signal
- Successful hire (good job fit): +10 to +15 (base + CRM bonus)
- Bad hire (poor match): -5
- Ghosted (trust runs out): -4
- Per-step: Small rewards/penalties for correct/incorrect actions
What Makes This Hard
- Long horizon: 15-40+ tool calls per episode
- Information gathering: Must ask the right screening questions to match driver to the right job
- Trust dynamics: Each message costs trust β ask too many questions and the driver ghosts
- Job matching: 6 jobs per episode (1-2 good, 1-2 traps with deal-breakers, 2-3 partial)
- Procedural correctness: Must follow stage order, read replies before messaging, get approval before offering
Quick Start
from recruitopenenv import RecruitopenenvEnv, RecruitopenenvAction
env = RecruitopenenvEnv(base_url="YOUR_SPACE_URL")
result = env.reset(seed=42)
obs = result.observation
print(f"Driver: {obs.driver_name}, Stage: {obs.stage}")
# Read CRM
result = env.step(RecruitopenenvAction(tool="crm", action="read_candidate"))
print(result.observation.jobs_summary)
# Greet driver
result = env.step(RecruitopenenvAction(tool="messaging", action="send_message", topic="greeting"))
print(f"Reward: {result.reward}")
# Read reply
result = env.step(RecruitopenenvAction(tool="messaging", action="read_reply"))
print(result.observation.discovered_info)
env.close()
Training
We train using GRPO/REINFORCE with the model choosing screening topics. See train_grpo.py for the full training script.
python train_grpo.py --model Qwen/Qwen2.5-3B-Instruct
Deploying
# From the recruitopenenv/ directory
openenv push
Action Format
{"tool": "crm", "action": "read_candidate"}
{"tool": "messaging", "action": "send_message", "topic": "experience"}
{"tool": "messaging", "action": "read_reply"}
{"tool": "crm", "action": "update_field", "field": "cdl_class", "value": "A"}
{"tool": "crm", "action": "update_stage", "stage": "contacted"}
{"tool": "approval", "action": "request_approval", "job_id": 2}
{"tool": "workflow", "action": "wait"}
{"tool": "approval", "action": "check_approval"}
{"tool": "messaging", "action": "send_message", "topic": "offer", "job_id": 2}
{"tool": "crm", "action": "update_stage", "stage": "hired"}
Observation Fields
| Field | Description |
|---|---|
driver_name |
Driver's name |
crm_summary |
Full CRM record (empty until read_candidate) |
jobs_summary |
6 available job listings |
discovered_info |
Info from screening conversations |
stage |
Current pipeline stage |
feedback |
API response from last action |
pending_reply |
Whether driver has unread message |
Screening Topics
greeting, call, experience, home_time, pay, equipment, route, deal_breakers, availability, violations, medical_card, references, pitch, offer, negotiate_pay, negotiate_home_time, signing_bonus, address_concern