Spaces:

akshaypulla
/

procure-rl

Sleeping

File size: 10,243 Bytes

81ddf95
c1be7c3
 
81ddf95
c1be7c3
81ddf95
 
c1be7c3
 
 
 
 
 
 
 
81ddf95
 
c1be7c3

---
title: ProcureRL Environment
emoji: 🤝
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv
  - negotiation
  - procurement
  - rl
  - real-world
---

# ProcureRL: Procurement Negotiation RL Environment

An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior.

## The Key Innovation: Language-Sensitive Opponent

The opponent's concession rate is directly affected by the **quality of the agent's natural language**:

- **Collaborative language** ("let's work together", "mutual benefit") → increases rapport → opponent concedes more
- **Neutral language** → opponent concedes at baseline rate
- **Aggressive language** ("final offer", "take it or leave it") → rapport drops → opponent hardens

This makes LLM genuinely required — output quality directly affects negotiation outcomes.

## Quick Start

```python
from server.Procure_RL_environment import ProcureRLEnvironment
from models import NegotiationAction

env = ProcureRLEnvironment()
obs = env.reset(task_id="single_issue", seed=42)

print(f"Supplier: {obs.supplier_message}")
print(f"Offer: {obs.current_offer}")
print(f"Your target: {obs.buyer_constraints}")

action = NegotiationAction(
    move_type="make_offer",
    terms={"price": 42000},
    message="Let's find a mutually beneficial solution."
)
obs = env.step(action)
print(f"Response: {obs.supplier_message}")
print(f"New offer: {obs.current_offer}")
```

## Web Interface Example

The web interface at `/web` provides a visual playground. Here's how to use it:

### Step 1: Reset the Environment

Click **Reset** to start a new negotiation episode. You can customize the reset by passing JSON:

```json
{"task_id": "single_issue", "seed": 42}
```

**Available tasks:**
- `single_issue` — Price-only negotiation (6 rounds max)
- `multi_issue` — Price + payment terms (8 rounds max)  
- `adversarial` — Price + payment + support hours (10 rounds max)

### Step 2: Make an Offer

Fill in the form fields:

| Field | Example Value | Notes |
|-------|--------------|-------|
| `move_type` | `make_offer` | Options: make_offer, accept, reject, bundle |
| `terms` | `{"price": 42000}` | JSON object with negotiation terms |
| `message` | `I value our partnership and believe we can find a fair solution.` | Your natural language message (affects opponent rapport!) |

**Example: Making a collaborative offer**
```
move_type: make_offer
terms: {"price": 45000}
message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties.
```

### Step 3: Read the Response

After clicking **Step**, you'll see:
- `supplier_message` — The opponent's natural language response
- `current_offer` — Updated terms on the table
- `rapport_hint` — "positive", "neutral", or "negative" based on your language
- `round_number` — Current round (0-indexed)

### Step 4: Continue or Accept

- **Make another offer** to continue negotiating
- **Use `accept`** when you're satisfied with the current terms
- **Use `reject`** only if you want to walk away (no reward)

**Example: Accepting current terms**
```
move_type: accept
terms: {}
message: 
```

### Multi-Issue Negotiation (Task 2 & 3)

For `multi_issue` and `adversarial`, include multiple terms:

```json
{
  "move_type": "make_offer",
  "terms": {
    "price": 44000,
    "payment_days": 30
  },
  "message": "We can offer faster payment terms if that helps your cash flow."
}
```

**Key insight:** In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price!

### Example Full Episode

**Round 0 (Reset):**
- Task: `single_issue`
- Supplier opens: ~$52,000
- Your target: $36,000

**Round 1:**
- `move_type`: `make_offer`
- `terms`: `{"price": 48000}`
- `message`: `We value your partnership and want to find a fair price for both parties.`

**Round 2:**
- Supplier counter-offers at ~$46,000 (rapport is positive!)
- `move_type`: `make_offer`
- `terms`: `{"price": 45000}`
- `message`: `I appreciate your movement. Let's see if we can get to $45,000.`

**Round 3:**
- Supplier accepts or counter-offers near your target
- `move_type`: `accept`
- `terms`: `{}`
- Final score: Based on how close to target and how efficiently

## The Three Tasks

### 1. `single_issue` (Easy)
Renew software license. Price only.

- Buyer target: $36,000, Budget: $53,000
- Seller opens: ~$52,000 (varies by seed)
- Opponent persona: Cooperative
- Max rounds: 6

**Scoring:** Deal quality (how close to target) × Efficiency (how few rounds)

### 2. `multi_issue` (Medium)
Enterprise software deal. Price + payment terms.

- Buyer weights: price 70%, payment 30%
- Seller persona: Cash Flow Stressed (cares more about payment timing)
- **Trade opportunity**: offer Net-30 payment to get lower price
- Max rounds: 8

**Scoring:** Weighted combination of price improvement + payment terms

### 3. `adversarial` (Hard)
Large contract negotiation. Price + payment + support hours.

- Opponent persona: Aggressive Anchor
  - Opens at ceiling on all issues
  - Hardens position if you make 2+ consecutive concessions
  - Requires consistent collaborative framing
- Survival floor: any deal scores at least 0.15
- Max rounds: 10

**Scoring:** Multi-dimensional value minus pattern penalty for consecutive concessions

## Action Space

```python
NegotiationAction(
    move_type="make_offer",  # make_offer | accept | reject | bundle
    terms={"price": 44000, "payment_days": 45, "support_hours": 120},
    message="We appreciate your flexibility on this."
)
```

| move_type | Description |
|-----------|-------------|
| `make_offer` | Propose terms (price required, others optional) |
| `accept` | Accept current offer on table |
| `reject` | Walk away (only use at final round) |
| `bundle` | Alias for make_offer with multi-issue terms |

## Observation Space

```python
NegotiationObservation(
    task_id="single_issue",
    round_number=2,
    max_rounds=6,
    supplier_message="I appreciate your offer. Based on our costs...",
    current_offer={"price": 46000},
    last_4_exchanges=[...],
    buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}},
    rapport_hint="positive",  # positive | neutral | negative
    done=False
)
```

## Running the Server

```bash
# Build Docker image
docker build -t procure-rl -f server/Dockerfile .

# Run container (port 7860 - required for HF Spaces)
docker run -p 7860:7860 procure-rl

# Access web interface at http://localhost:7860/web
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment metadata |
| `/reset` | POST | Reset environment |
| `/step` | POST | Execute action |
| `/state` | GET | Get current state |
| `/ws` | WS | WebSocket for persistent sessions |

## Baseline Inference

Run inference against all three tasks:

```bash
cp .env.example .env
# Edit .env and add your HF_TOKEN
HF_TOKEN=your_token python inference.py
```

Output format (exact):

```
[START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null
[STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null
[END] success=true steps=2 score=0.52 rewards=0.00,0.52
```

## Environment Design

### Rapport System

The opponent maintains a rapport score (0.0 to 1.0) updated per-round:

```python
COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...]
AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...]

delta = +0.08 per collaborative signal detected
delta = -0.08 per aggressive signal detected
delta = max(-0.20, min(0.20, delta))  # cap per round
```

### Opponent Personas

| Persona | Base Concession | Rapport Modifier | Special Behavior |
|---------|----------------|-------------------|-------------------|
| `cooperative` | 5% | ±50% | Responsive to language |
| `cash_flow_stressed` | 7% | ±50% | Accepts Net-45+, comments on payment |
| `aggressive_anchor` | 4% | ±50% | Hardens after 2+ consecutive concessions |

### Grading

Graders are pure Python — zero LLM calls. They combine:
- **Value**: how close to buyer's target
- **Efficiency**: penalty for taking too many rounds
- **Pattern penalty** (adversarial only): for consecutive concession behavior

Graders never crash on malformed input — they fall back to worst-case values.

## Project Structure

```
Procure_RL/
├── __init__.py                    # Package exports
├── client.py                      # EnvClient wrapper
├── models.py                      # NegotiationAction, NegotiationObservation, NegotiationState
├── opponent.py                    # ScriptedPersonaOpponent with 3 personas + rapport
├── graders.py                     # grade_single_issue, grade_multi_issue, grade_adversarial
├── inference.py                   # Baseline agent with [START][STEP][END] output
├── server/
│   ├── __init__.py
│   ├── app.py                    # FastAPI app
│   ├── Procure_RL_environment.py # ProcureRLEnvironment
│   ├── requirements.txt
│   └── Dockerfile
├── openenv.yaml                  # OpenEnv manifest
├── pyproject.toml
├── plan.md                       # Full design specification
└── README.md                     # This file
```

## Why This Environment?

**Market validation**: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025.

**Research gap**: Zero negotiation environments in OpenEnv hub.

**LLM advantage**: Language quality directly affects opponent rapport — the language IS the policy.

**Reproducibility**: Deterministic scripted opponent, pure Python graders, no LLM in environment loop.

## Calibration

If base LLM scores above 0.55 on single_issue → opponent too easy, reduce cooperative concession rate.

If base LLM scores below 0.15 on single_issue → opponent too hard, increase cooperative concession rate.