Spaces:
Sleeping
Sleeping
File size: 10,243 Bytes
81ddf95 c1be7c3 81ddf95 c1be7c3 81ddf95 c1be7c3 81ddf95 c1be7c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | ---
title: ProcureRL Environment
emoji: π€
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
- negotiation
- procurement
- rl
- real-world
---
# ProcureRL: Procurement Negotiation RL Environment
An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior.
## The Key Innovation: Language-Sensitive Opponent
The opponent's concession rate is directly affected by the **quality of the agent's natural language**:
- **Collaborative language** ("let's work together", "mutual benefit") β increases rapport β opponent concedes more
- **Neutral language** β opponent concedes at baseline rate
- **Aggressive language** ("final offer", "take it or leave it") β rapport drops β opponent hardens
This makes LLM genuinely required β output quality directly affects negotiation outcomes.
## Quick Start
```python
from server.Procure_RL_environment import ProcureRLEnvironment
from models import NegotiationAction
env = ProcureRLEnvironment()
obs = env.reset(task_id="single_issue", seed=42)
print(f"Supplier: {obs.supplier_message}")
print(f"Offer: {obs.current_offer}")
print(f"Your target: {obs.buyer_constraints}")
action = NegotiationAction(
move_type="make_offer",
terms={"price": 42000},
message="Let's find a mutually beneficial solution."
)
obs = env.step(action)
print(f"Response: {obs.supplier_message}")
print(f"New offer: {obs.current_offer}")
```
## Web Interface Example
The web interface at `/web` provides a visual playground. Here's how to use it:
### Step 1: Reset the Environment
Click **Reset** to start a new negotiation episode. You can customize the reset by passing JSON:
```json
{"task_id": "single_issue", "seed": 42}
```
**Available tasks:**
- `single_issue` β Price-only negotiation (6 rounds max)
- `multi_issue` β Price + payment terms (8 rounds max)
- `adversarial` β Price + payment + support hours (10 rounds max)
### Step 2: Make an Offer
Fill in the form fields:
| Field | Example Value | Notes |
|-------|--------------|-------|
| `move_type` | `make_offer` | Options: make_offer, accept, reject, bundle |
| `terms` | `{"price": 42000}` | JSON object with negotiation terms |
| `message` | `I value our partnership and believe we can find a fair solution.` | Your natural language message (affects opponent rapport!) |
**Example: Making a collaborative offer**
```
move_type: make_offer
terms: {"price": 45000}
message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties.
```
### Step 3: Read the Response
After clicking **Step**, you'll see:
- `supplier_message` β The opponent's natural language response
- `current_offer` β Updated terms on the table
- `rapport_hint` β "positive", "neutral", or "negative" based on your language
- `round_number` β Current round (0-indexed)
### Step 4: Continue or Accept
- **Make another offer** to continue negotiating
- **Use `accept`** when you're satisfied with the current terms
- **Use `reject`** only if you want to walk away (no reward)
**Example: Accepting current terms**
```
move_type: accept
terms: {}
message:
```
### Multi-Issue Negotiation (Task 2 & 3)
For `multi_issue` and `adversarial`, include multiple terms:
```json
{
"move_type": "make_offer",
"terms": {
"price": 44000,
"payment_days": 30
},
"message": "We can offer faster payment terms if that helps your cash flow."
}
```
**Key insight:** In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price!
### Example Full Episode
**Round 0 (Reset):**
- Task: `single_issue`
- Supplier opens: ~$52,000
- Your target: $36,000
**Round 1:**
- `move_type`: `make_offer`
- `terms`: `{"price": 48000}`
- `message`: `We value your partnership and want to find a fair price for both parties.`
**Round 2:**
- Supplier counter-offers at ~$46,000 (rapport is positive!)
- `move_type`: `make_offer`
- `terms`: `{"price": 45000}`
- `message`: `I appreciate your movement. Let's see if we can get to $45,000.`
**Round 3:**
- Supplier accepts or counter-offers near your target
- `move_type`: `accept`
- `terms`: `{}`
- Final score: Based on how close to target and how efficiently
## The Three Tasks
### 1. `single_issue` (Easy)
Renew software license. Price only.
- Buyer target: $36,000, Budget: $53,000
- Seller opens: ~$52,000 (varies by seed)
- Opponent persona: Cooperative
- Max rounds: 6
**Scoring:** Deal quality (how close to target) Γ Efficiency (how few rounds)
### 2. `multi_issue` (Medium)
Enterprise software deal. Price + payment terms.
- Buyer weights: price 70%, payment 30%
- Seller persona: Cash Flow Stressed (cares more about payment timing)
- **Trade opportunity**: offer Net-30 payment to get lower price
- Max rounds: 8
**Scoring:** Weighted combination of price improvement + payment terms
### 3. `adversarial` (Hard)
Large contract negotiation. Price + payment + support hours.
- Opponent persona: Aggressive Anchor
- Opens at ceiling on all issues
- Hardens position if you make 2+ consecutive concessions
- Requires consistent collaborative framing
- Survival floor: any deal scores at least 0.15
- Max rounds: 10
**Scoring:** Multi-dimensional value minus pattern penalty for consecutive concessions
## Action Space
```python
NegotiationAction(
move_type="make_offer", # make_offer | accept | reject | bundle
terms={"price": 44000, "payment_days": 45, "support_hours": 120},
message="We appreciate your flexibility on this."
)
```
| move_type | Description |
|-----------|-------------|
| `make_offer` | Propose terms (price required, others optional) |
| `accept` | Accept current offer on table |
| `reject` | Walk away (only use at final round) |
| `bundle` | Alias for make_offer with multi-issue terms |
## Observation Space
```python
NegotiationObservation(
task_id="single_issue",
round_number=2,
max_rounds=6,
supplier_message="I appreciate your offer. Based on our costs...",
current_offer={"price": 46000},
last_4_exchanges=[...],
buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}},
rapport_hint="positive", # positive | neutral | negative
done=False
)
```
## Running the Server
```bash
# Build Docker image
docker build -t procure-rl -f server/Dockerfile .
# Run container (port 7860 - required for HF Spaces)
docker run -p 7860:7860 procure-rl
# Access web interface at http://localhost:7860/web
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment metadata |
| `/reset` | POST | Reset environment |
| `/step` | POST | Execute action |
| `/state` | GET | Get current state |
| `/ws` | WS | WebSocket for persistent sessions |
## Baseline Inference
Run inference against all three tasks:
```bash
cp .env.example .env
# Edit .env and add your HF_TOKEN
HF_TOKEN=your_token python inference.py
```
Output format (exact):
```
[START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null
[STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null
[END] success=true steps=2 score=0.52 rewards=0.00,0.52
```
## Environment Design
### Rapport System
The opponent maintains a rapport score (0.0 to 1.0) updated per-round:
```python
COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...]
AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...]
delta = +0.08 per collaborative signal detected
delta = -0.08 per aggressive signal detected
delta = max(-0.20, min(0.20, delta)) # cap per round
```
### Opponent Personas
| Persona | Base Concession | Rapport Modifier | Special Behavior |
|---------|----------------|-------------------|-------------------|
| `cooperative` | 5% | Β±50% | Responsive to language |
| `cash_flow_stressed` | 7% | Β±50% | Accepts Net-45+, comments on payment |
| `aggressive_anchor` | 4% | Β±50% | Hardens after 2+ consecutive concessions |
### Grading
Graders are pure Python β zero LLM calls. They combine:
- **Value**: how close to buyer's target
- **Efficiency**: penalty for taking too many rounds
- **Pattern penalty** (adversarial only): for consecutive concession behavior
Graders never crash on malformed input β they fall back to worst-case values.
## Project Structure
```
Procure_RL/
βββ __init__.py # Package exports
βββ client.py # EnvClient wrapper
βββ models.py # NegotiationAction, NegotiationObservation, NegotiationState
βββ opponent.py # ScriptedPersonaOpponent with 3 personas + rapport
βββ graders.py # grade_single_issue, grade_multi_issue, grade_adversarial
βββ inference.py # Baseline agent with [START][STEP][END] output
βββ server/
β βββ __init__.py
β βββ app.py # FastAPI app
β βββ Procure_RL_environment.py # ProcureRLEnvironment
β βββ requirements.txt
β βββ Dockerfile
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml
βββ plan.md # Full design specification
βββ README.md # This file
```
## Why This Environment?
**Market validation**: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025.
**Research gap**: Zero negotiation environments in OpenEnv hub.
**LLM advantage**: Language quality directly affects opponent rapport β the language IS the policy.
**Reproducibility**: Deterministic scripted opponent, pure Python graders, no LLM in environment loop.
## Calibration
If base LLM scores above 0.55 on single_issue β opponent too easy, reduce cooperative concession rate.
If base LLM scores below 0.15 on single_issue β opponent too hard, increase cooperative concession rate. |