Spaces:
Sleeping
title: ProcureRL Environment
emoji: π€
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
- negotiation
- procurement
- rl
- real-world
ProcureRL: Procurement Negotiation RL Environment
An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior.
The Key Innovation: Language-Sensitive Opponent
The opponent's concession rate is directly affected by the quality of the agent's natural language:
- Collaborative language ("let's work together", "mutual benefit") β increases rapport β opponent concedes more
- Neutral language β opponent concedes at baseline rate
- Aggressive language ("final offer", "take it or leave it") β rapport drops β opponent hardens
This makes LLM genuinely required β output quality directly affects negotiation outcomes.
Quick Start
from server.Procure_RL_environment import ProcureRLEnvironment
from models import NegotiationAction
env = ProcureRLEnvironment()
obs = env.reset(task_id="single_issue", seed=42)
print(f"Supplier: {obs.supplier_message}")
print(f"Offer: {obs.current_offer}")
print(f"Your target: {obs.buyer_constraints}")
action = NegotiationAction(
move_type="make_offer",
terms={"price": 42000},
message="Let's find a mutually beneficial solution."
)
obs = env.step(action)
print(f"Response: {obs.supplier_message}")
print(f"New offer: {obs.current_offer}")
Web Interface Example
The web interface at /web provides a visual playground. Here's how to use it:
Step 1: Reset the Environment
Click Reset to start a new negotiation episode. You can customize the reset by passing JSON:
{"task_id": "single_issue", "seed": 42}
Available tasks:
single_issueβ Price-only negotiation (6 rounds max)multi_issueβ Price + payment terms (8 rounds max)adversarialβ Price + payment + support hours (10 rounds max)
Step 2: Make an Offer
Fill in the form fields:
| Field | Example Value | Notes |
|---|---|---|
move_type |
make_offer |
Options: make_offer, accept, reject, bundle |
terms |
{"price": 42000} |
JSON object with negotiation terms |
message |
I value our partnership and believe we can find a fair solution. |
Your natural language message (affects opponent rapport!) |
Example: Making a collaborative offer
move_type: make_offer
terms: {"price": 45000}
message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties.
Step 3: Read the Response
After clicking Step, you'll see:
supplier_messageβ The opponent's natural language responsecurrent_offerβ Updated terms on the tablerapport_hintβ "positive", "neutral", or "negative" based on your languageround_numberβ Current round (0-indexed)
Step 4: Continue or Accept
- Make another offer to continue negotiating
- Use
acceptwhen you're satisfied with the current terms - Use
rejectonly if you want to walk away (no reward)
Example: Accepting current terms
move_type: accept
terms: {}
message:
Multi-Issue Negotiation (Task 2 & 3)
For multi_issue and adversarial, include multiple terms:
{
"move_type": "make_offer",
"terms": {
"price": 44000,
"payment_days": 30
},
"message": "We can offer faster payment terms if that helps your cash flow."
}
Key insight: In multi_issue, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price!
Example Full Episode
Round 0 (Reset):
- Task:
single_issue - Supplier opens: ~$52,000
- Your target: $36,000
Round 1:
move_type:make_offerterms:{"price": 48000}message:We value your partnership and want to find a fair price for both parties.
Round 2:
- Supplier counter-offers at ~$46,000 (rapport is positive!)
move_type:make_offerterms:{"price": 45000}message:I appreciate your movement. Let's see if we can get to $45,000.
Round 3:
- Supplier accepts or counter-offers near your target
move_type:acceptterms:{}- Final score: Based on how close to target and how efficiently
The Three Tasks
1. single_issue (Easy)
Renew software license. Price only.
- Buyer target: $36,000, Budget: $53,000
- Seller opens: ~$52,000 (varies by seed)
- Opponent persona: Cooperative
- Max rounds: 6
Scoring: Deal quality (how close to target) Γ Efficiency (how few rounds)
2. multi_issue (Medium)
Enterprise software deal. Price + payment terms.
- Buyer weights: price 70%, payment 30%
- Seller persona: Cash Flow Stressed (cares more about payment timing)
- Trade opportunity: offer Net-30 payment to get lower price
- Max rounds: 8
Scoring: Weighted combination of price improvement + payment terms
3. adversarial (Hard)
Large contract negotiation. Price + payment + support hours.
- Opponent persona: Aggressive Anchor
- Opens at ceiling on all issues
- Hardens position if you make 2+ consecutive concessions
- Requires consistent collaborative framing
- Survival floor: any deal scores at least 0.15
- Max rounds: 10
Scoring: Multi-dimensional value minus pattern penalty for consecutive concessions
Action Space
NegotiationAction(
move_type="make_offer", # make_offer | accept | reject | bundle
terms={"price": 44000, "payment_days": 45, "support_hours": 120},
message="We appreciate your flexibility on this."
)
| move_type | Description |
|---|---|
make_offer |
Propose terms (price required, others optional) |
accept |
Accept current offer on table |
reject |
Walk away (only use at final round) |
bundle |
Alias for make_offer with multi-issue terms |
Observation Space
NegotiationObservation(
task_id="single_issue",
round_number=2,
max_rounds=6,
supplier_message="I appreciate your offer. Based on our costs...",
current_offer={"price": 46000},
last_4_exchanges=[...],
buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}},
rapport_hint="positive", # positive | neutral | negative
done=False
)
Running the Server
# Build Docker image
docker build -t procure-rl -f server/Dockerfile .
# Run container (port 7860 - required for HF Spaces)
docker run -p 7860:7860 procure-rl
# Access web interface at http://localhost:7860/web
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/metadata |
GET | Environment metadata |
/reset |
POST | Reset environment |
/step |
POST | Execute action |
/state |
GET | Get current state |
/ws |
WS | WebSocket for persistent sessions |
Baseline Inference
Run inference against all three tasks:
cp .env.example .env
# Edit .env and add your HF_TOKEN
HF_TOKEN=your_token python inference.py
Output format (exact):
[START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null
[STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null
[END] success=true steps=2 score=0.52 rewards=0.00,0.52
Environment Design
Rapport System
The opponent maintains a rapport score (0.0 to 1.0) updated per-round:
COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...]
AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...]
delta = +0.08 per collaborative signal detected
delta = -0.08 per aggressive signal detected
delta = max(-0.20, min(0.20, delta)) # cap per round
Opponent Personas
| Persona | Base Concession | Rapport Modifier | Special Behavior |
|---|---|---|---|
cooperative |
5% | Β±50% | Responsive to language |
cash_flow_stressed |
7% | Β±50% | Accepts Net-45+, comments on payment |
aggressive_anchor |
4% | Β±50% | Hardens after 2+ consecutive concessions |
Grading
Graders are pure Python β zero LLM calls. They combine:
- Value: how close to buyer's target
- Efficiency: penalty for taking too many rounds
- Pattern penalty (adversarial only): for consecutive concession behavior
Graders never crash on malformed input β they fall back to worst-case values.
Project Structure
Procure_RL/
βββ __init__.py # Package exports
βββ client.py # EnvClient wrapper
βββ models.py # NegotiationAction, NegotiationObservation, NegotiationState
βββ opponent.py # ScriptedPersonaOpponent with 3 personas + rapport
βββ graders.py # grade_single_issue, grade_multi_issue, grade_adversarial
βββ inference.py # Baseline agent with [START][STEP][END] output
βββ server/
β βββ __init__.py
β βββ app.py # FastAPI app
β βββ Procure_RL_environment.py # ProcureRLEnvironment
β βββ requirements.txt
β βββ Dockerfile
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml
βββ plan.md # Full design specification
βββ README.md # This file
Why This Environment?
Market validation: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025.
Research gap: Zero negotiation environments in OpenEnv hub.
LLM advantage: Language quality directly affects opponent rapport β the language IS the policy.
Reproducibility: Deterministic scripted opponent, pure Python graders, no LLM in environment loop.
Calibration
If base LLM scores above 0.55 on single_issue β opponent too easy, reduce cooperative concession rate.
If base LLM scores below 0.15 on single_issue β opponent too hard, increase cooperative concession rate.