File size: 10,243 Bytes
81ddf95
c1be7c3
 
81ddf95
c1be7c3
81ddf95
 
c1be7c3
 
 
 
 
 
 
 
81ddf95
 
c1be7c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
---
title: ProcureRL Environment
emoji: 🀝
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv
  - negotiation
  - procurement
  - rl
  - real-world
---

# ProcureRL: Procurement Negotiation RL Environment

An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior.

## The Key Innovation: Language-Sensitive Opponent

The opponent's concession rate is directly affected by the **quality of the agent's natural language**:

- **Collaborative language** ("let's work together", "mutual benefit") β†’ increases rapport β†’ opponent concedes more
- **Neutral language** β†’ opponent concedes at baseline rate
- **Aggressive language** ("final offer", "take it or leave it") β†’ rapport drops β†’ opponent hardens

This makes LLM genuinely required β€” output quality directly affects negotiation outcomes.

## Quick Start

```python
from server.Procure_RL_environment import ProcureRLEnvironment
from models import NegotiationAction

env = ProcureRLEnvironment()
obs = env.reset(task_id="single_issue", seed=42)

print(f"Supplier: {obs.supplier_message}")
print(f"Offer: {obs.current_offer}")
print(f"Your target: {obs.buyer_constraints}")

action = NegotiationAction(
    move_type="make_offer",
    terms={"price": 42000},
    message="Let's find a mutually beneficial solution."
)
obs = env.step(action)
print(f"Response: {obs.supplier_message}")
print(f"New offer: {obs.current_offer}")
```

## Web Interface Example

The web interface at `/web` provides a visual playground. Here's how to use it:

### Step 1: Reset the Environment

Click **Reset** to start a new negotiation episode. You can customize the reset by passing JSON:

```json
{"task_id": "single_issue", "seed": 42}
```

**Available tasks:**
- `single_issue` β€” Price-only negotiation (6 rounds max)
- `multi_issue` β€” Price + payment terms (8 rounds max)  
- `adversarial` β€” Price + payment + support hours (10 rounds max)

### Step 2: Make an Offer

Fill in the form fields:

| Field | Example Value | Notes |
|-------|--------------|-------|
| `move_type` | `make_offer` | Options: make_offer, accept, reject, bundle |
| `terms` | `{"price": 42000}` | JSON object with negotiation terms |
| `message` | `I value our partnership and believe we can find a fair solution.` | Your natural language message (affects opponent rapport!) |

**Example: Making a collaborative offer**
```
move_type: make_offer
terms: {"price": 45000}
message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties.
```

### Step 3: Read the Response

After clicking **Step**, you'll see:
- `supplier_message` β€” The opponent's natural language response
- `current_offer` β€” Updated terms on the table
- `rapport_hint` β€” "positive", "neutral", or "negative" based on your language
- `round_number` β€” Current round (0-indexed)

### Step 4: Continue or Accept

- **Make another offer** to continue negotiating
- **Use `accept`** when you're satisfied with the current terms
- **Use `reject`** only if you want to walk away (no reward)

**Example: Accepting current terms**
```
move_type: accept
terms: {}
message: 
```

### Multi-Issue Negotiation (Task 2 & 3)

For `multi_issue` and `adversarial`, include multiple terms:

```json
{
  "move_type": "make_offer",
  "terms": {
    "price": 44000,
    "payment_days": 30
  },
  "message": "We can offer faster payment terms if that helps your cash flow."
}
```

**Key insight:** In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price!

### Example Full Episode

**Round 0 (Reset):**
- Task: `single_issue`
- Supplier opens: ~$52,000
- Your target: $36,000

**Round 1:**
- `move_type`: `make_offer`
- `terms`: `{"price": 48000}`
- `message`: `We value your partnership and want to find a fair price for both parties.`

**Round 2:**
- Supplier counter-offers at ~$46,000 (rapport is positive!)
- `move_type`: `make_offer`
- `terms`: `{"price": 45000}`
- `message`: `I appreciate your movement. Let's see if we can get to $45,000.`

**Round 3:**
- Supplier accepts or counter-offers near your target
- `move_type`: `accept`
- `terms`: `{}`
- Final score: Based on how close to target and how efficiently

## The Three Tasks

### 1. `single_issue` (Easy)
Renew software license. Price only.

- Buyer target: $36,000, Budget: $53,000
- Seller opens: ~$52,000 (varies by seed)
- Opponent persona: Cooperative
- Max rounds: 6

**Scoring:** Deal quality (how close to target) Γ— Efficiency (how few rounds)

### 2. `multi_issue` (Medium)
Enterprise software deal. Price + payment terms.

- Buyer weights: price 70%, payment 30%
- Seller persona: Cash Flow Stressed (cares more about payment timing)
- **Trade opportunity**: offer Net-30 payment to get lower price
- Max rounds: 8

**Scoring:** Weighted combination of price improvement + payment terms

### 3. `adversarial` (Hard)
Large contract negotiation. Price + payment + support hours.

- Opponent persona: Aggressive Anchor
  - Opens at ceiling on all issues
  - Hardens position if you make 2+ consecutive concessions
  - Requires consistent collaborative framing
- Survival floor: any deal scores at least 0.15
- Max rounds: 10

**Scoring:** Multi-dimensional value minus pattern penalty for consecutive concessions

## Action Space

```python
NegotiationAction(
    move_type="make_offer",  # make_offer | accept | reject | bundle
    terms={"price": 44000, "payment_days": 45, "support_hours": 120},
    message="We appreciate your flexibility on this."
)
```

| move_type | Description |
|-----------|-------------|
| `make_offer` | Propose terms (price required, others optional) |
| `accept` | Accept current offer on table |
| `reject` | Walk away (only use at final round) |
| `bundle` | Alias for make_offer with multi-issue terms |

## Observation Space

```python
NegotiationObservation(
    task_id="single_issue",
    round_number=2,
    max_rounds=6,
    supplier_message="I appreciate your offer. Based on our costs...",
    current_offer={"price": 46000},
    last_4_exchanges=[...],
    buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}},
    rapport_hint="positive",  # positive | neutral | negative
    done=False
)
```

## Running the Server

```bash
# Build Docker image
docker build -t procure-rl -f server/Dockerfile .

# Run container (port 7860 - required for HF Spaces)
docker run -p 7860:7860 procure-rl

# Access web interface at http://localhost:7860/web
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment metadata |
| `/reset` | POST | Reset environment |
| `/step` | POST | Execute action |
| `/state` | GET | Get current state |
| `/ws` | WS | WebSocket for persistent sessions |

## Baseline Inference

Run inference against all three tasks:

```bash
cp .env.example .env
# Edit .env and add your HF_TOKEN
HF_TOKEN=your_token python inference.py
```

Output format (exact):

```
[START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null
[STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null
[END] success=true steps=2 score=0.52 rewards=0.00,0.52
```

## Environment Design

### Rapport System

The opponent maintains a rapport score (0.0 to 1.0) updated per-round:

```python
COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...]
AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...]

delta = +0.08 per collaborative signal detected
delta = -0.08 per aggressive signal detected
delta = max(-0.20, min(0.20, delta))  # cap per round
```

### Opponent Personas

| Persona | Base Concession | Rapport Modifier | Special Behavior |
|---------|----------------|-------------------|-------------------|
| `cooperative` | 5% | Β±50% | Responsive to language |
| `cash_flow_stressed` | 7% | Β±50% | Accepts Net-45+, comments on payment |
| `aggressive_anchor` | 4% | Β±50% | Hardens after 2+ consecutive concessions |

### Grading

Graders are pure Python β€” zero LLM calls. They combine:
- **Value**: how close to buyer's target
- **Efficiency**: penalty for taking too many rounds
- **Pattern penalty** (adversarial only): for consecutive concession behavior

Graders never crash on malformed input β€” they fall back to worst-case values.

## Project Structure

```
Procure_RL/
β”œβ”€β”€ __init__.py                    # Package exports
β”œβ”€β”€ client.py                      # EnvClient wrapper
β”œβ”€β”€ models.py                      # NegotiationAction, NegotiationObservation, NegotiationState
β”œβ”€β”€ opponent.py                    # ScriptedPersonaOpponent with 3 personas + rapport
β”œβ”€β”€ graders.py                     # grade_single_issue, grade_multi_issue, grade_adversarial
β”œβ”€β”€ inference.py                   # Baseline agent with [START][STEP][END] output
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                    # FastAPI app
β”‚   β”œβ”€β”€ Procure_RL_environment.py # ProcureRLEnvironment
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── Dockerfile
β”œβ”€β”€ openenv.yaml                  # OpenEnv manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ plan.md                       # Full design specification
└── README.md                     # This file
```

## Why This Environment?

**Market validation**: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025.

**Research gap**: Zero negotiation environments in OpenEnv hub.

**LLM advantage**: Language quality directly affects opponent rapport β€” the language IS the policy.

**Reproducibility**: Deterministic scripted opponent, pure Python graders, no LLM in environment loop.

## Calibration

If base LLM scores above 0.55 on single_issue β†’ opponent too easy, reduce cooperative concession rate.

If base LLM scores below 0.15 on single_issue β†’ opponent too hard, increase cooperative concession rate.