File size: 4,119 Bytes
e8d8505
be37527
 
 
 
e8d8505
 
be37527
 
 
 
 
 
 
e8d8505
 
be37527
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: Driver Recruit Environment
emoji: πŸš›
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - reinforcement-learning
  - recruiting
  - multi-turn
---

# πŸš› Driver Recruit Environment

A **multi-turn, tool-based RL environment** for training LLMs to recruit truck drivers through a CRM system. Built on [OpenEnv 0.2.1](https://github.com/meta-pytorch/OpenEnv).

The agent must discover driver qualifications through conversation, record info in the CRM, get management approval, and hire β€” all using structured tool calls across 15-40+ step episodes.

## Pipeline

```
lead β†’ contacted β†’ interested β†’ approval_pending β†’ offer_sent β†’ hired
```

## Tools

| Tool | Actions | Purpose |
|------|---------|---------|
| **crm** | `read_candidate`, `update_stage`, `update_field`, `add_note` | Manage pipeline & record info |
| **messaging** | `send_message`, `read_reply` | Screen driver (18 topics) |
| **approval** | `request_approval`, `check_approval` | Get management sign-off |
| **workflow** | `wait` | Advance time for approval processing |

## Reward Signal

- **Successful hire** (good job fit): **+10** to **+15** (base + CRM bonus)
- **Bad hire** (poor match): **-5**
- **Ghosted** (trust runs out): **-4**
- **Per-step**: Small rewards/penalties for correct/incorrect actions

## What Makes This Hard

- **Long horizon**: 15-40+ tool calls per episode
- **Information gathering**: Must ask the right screening questions to match driver to the right job
- **Trust dynamics**: Each message costs trust β€” ask too many questions and the driver ghosts
- **Job matching**: 6 jobs per episode (1-2 good, 1-2 traps with deal-breakers, 2-3 partial)
- **Procedural correctness**: Must follow stage order, read replies before messaging, get approval before offering

## Quick Start

```python
from recruitopenenv import RecruitopenenvEnv, RecruitopenenvAction

env = RecruitopenenvEnv(base_url="YOUR_SPACE_URL")

result = env.reset(seed=42)
obs = result.observation
print(f"Driver: {obs.driver_name}, Stage: {obs.stage}")

# Read CRM
result = env.step(RecruitopenenvAction(tool="crm", action="read_candidate"))
print(result.observation.jobs_summary)

# Greet driver
result = env.step(RecruitopenenvAction(tool="messaging", action="send_message", topic="greeting"))
print(f"Reward: {result.reward}")

# Read reply
result = env.step(RecruitopenenvAction(tool="messaging", action="read_reply"))
print(result.observation.discovered_info)

env.close()
```

## Training

We train using GRPO/REINFORCE with the model choosing screening topics. See `train_grpo.py` for the full training script.

```bash
python train_grpo.py --model Qwen/Qwen2.5-3B-Instruct
```

## Deploying

```bash
# From the recruitopenenv/ directory
openenv push
```

## Action Format

```json
{"tool": "crm", "action": "read_candidate"}
{"tool": "messaging", "action": "send_message", "topic": "experience"}
{"tool": "messaging", "action": "read_reply"}
{"tool": "crm", "action": "update_field", "field": "cdl_class", "value": "A"}
{"tool": "crm", "action": "update_stage", "stage": "contacted"}
{"tool": "approval", "action": "request_approval", "job_id": 2}
{"tool": "workflow", "action": "wait"}
{"tool": "approval", "action": "check_approval"}
{"tool": "messaging", "action": "send_message", "topic": "offer", "job_id": 2}
{"tool": "crm", "action": "update_stage", "stage": "hired"}
```

## Observation Fields

| Field | Description |
|-------|-------------|
| `driver_name` | Driver's name |
| `crm_summary` | Full CRM record (empty until `read_candidate`) |
| `jobs_summary` | 6 available job listings |
| `discovered_info` | Info from screening conversations |
| `stage` | Current pipeline stage |
| `feedback` | API response from last action |
| `pending_reply` | Whether driver has unread message |

## Screening Topics

`greeting`, `call`, `experience`, `home_time`, `pay`, `equipment`, `route`, `deal_breakers`, `availability`, `violations`, `medical_card`, `references`, `pitch`, `offer`, `negotiate_pay`, `negotiate_home_time`, `signing_bonus`, `address_concern`