--- title: ProcureRL Environment emoji: 🤝 colorFrom: green colorTo: blue sdk: docker pinned: false app_port: 7860 base_path: /web tags: - openenv - negotiation - procurement - rl - real-world --- # ProcureRL: Procurement Negotiation RL Environment An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior. ## The Key Innovation: Language-Sensitive Opponent The opponent's concession rate is directly affected by the **quality of the agent's natural language**: - **Collaborative language** ("let's work together", "mutual benefit") → increases rapport → opponent concedes more - **Neutral language** → opponent concedes at baseline rate - **Aggressive language** ("final offer", "take it or leave it") → rapport drops → opponent hardens This makes LLM genuinely required — output quality directly affects negotiation outcomes. ## Quick Start ```python from server.Procure_RL_environment import ProcureRLEnvironment from models import NegotiationAction env = ProcureRLEnvironment() obs = env.reset(task_id="single_issue", seed=42) print(f"Supplier: {obs.supplier_message}") print(f"Offer: {obs.current_offer}") print(f"Your target: {obs.buyer_constraints}") action = NegotiationAction( move_type="make_offer", terms={"price": 42000}, message="Let's find a mutually beneficial solution." ) obs = env.step(action) print(f"Response: {obs.supplier_message}") print(f"New offer: {obs.current_offer}") ``` ## Web Interface Example The web interface at `/web` provides a visual playground. Here's how to use it: ### Step 1: Reset the Environment Click **Reset** to start a new negotiation episode. You can customize the reset by passing JSON: ```json {"task_id": "single_issue", "seed": 42} ``` **Available tasks:** - `single_issue` — Price-only negotiation (6 rounds max) - `multi_issue` — Price + payment terms (8 rounds max) - `adversarial` — Price + payment + support hours (10 rounds max) ### Step 2: Make an Offer Fill in the form fields: | Field | Example Value | Notes | |-------|--------------|-------| | `move_type` | `make_offer` | Options: make_offer, accept, reject, bundle | | `terms` | `{"price": 42000}` | JSON object with negotiation terms | | `message` | `I value our partnership and believe we can find a fair solution.` | Your natural language message (affects opponent rapport!) | **Example: Making a collaborative offer** ``` move_type: make_offer terms: {"price": 45000} message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties. ``` ### Step 3: Read the Response After clicking **Step**, you'll see: - `supplier_message` — The opponent's natural language response - `current_offer` — Updated terms on the table - `rapport_hint` — "positive", "neutral", or "negative" based on your language - `round_number` — Current round (0-indexed) ### Step 4: Continue or Accept - **Make another offer** to continue negotiating - **Use `accept`** when you're satisfied with the current terms - **Use `reject`** only if you want to walk away (no reward) **Example: Accepting current terms** ``` move_type: accept terms: {} message: ``` ### Multi-Issue Negotiation (Task 2 & 3) For `multi_issue` and `adversarial`, include multiple terms: ```json { "move_type": "make_offer", "terms": { "price": 44000, "payment_days": 30 }, "message": "We can offer faster payment terms if that helps your cash flow." } ``` **Key insight:** In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price! ### Example Full Episode **Round 0 (Reset):** - Task: `single_issue` - Supplier opens: ~$52,000 - Your target: $36,000 **Round 1:** - `move_type`: `make_offer` - `terms`: `{"price": 48000}` - `message`: `We value your partnership and want to find a fair price for both parties.` **Round 2:** - Supplier counter-offers at ~$46,000 (rapport is positive!) - `move_type`: `make_offer` - `terms`: `{"price": 45000}` - `message`: `I appreciate your movement. Let's see if we can get to $45,000.` **Round 3:** - Supplier accepts or counter-offers near your target - `move_type`: `accept` - `terms`: `{}` - Final score: Based on how close to target and how efficiently ## The Three Tasks ### 1. `single_issue` (Easy) Renew software license. Price only. - Buyer target: $36,000, Budget: $53,000 - Seller opens: ~$52,000 (varies by seed) - Opponent persona: Cooperative - Max rounds: 6 **Scoring:** Deal quality (how close to target) × Efficiency (how few rounds) ### 2. `multi_issue` (Medium) Enterprise software deal. Price + payment terms. - Buyer weights: price 70%, payment 30% - Seller persona: Cash Flow Stressed (cares more about payment timing) - **Trade opportunity**: offer Net-30 payment to get lower price - Max rounds: 8 **Scoring:** Weighted combination of price improvement + payment terms ### 3. `adversarial` (Hard) Large contract negotiation. Price + payment + support hours. - Opponent persona: Aggressive Anchor - Opens at ceiling on all issues - Hardens position if you make 2+ consecutive concessions - Requires consistent collaborative framing - Survival floor: any deal scores at least 0.15 - Max rounds: 10 **Scoring:** Multi-dimensional value minus pattern penalty for consecutive concessions ## Action Space ```python NegotiationAction( move_type="make_offer", # make_offer | accept | reject | bundle terms={"price": 44000, "payment_days": 45, "support_hours": 120}, message="We appreciate your flexibility on this." ) ``` | move_type | Description | |-----------|-------------| | `make_offer` | Propose terms (price required, others optional) | | `accept` | Accept current offer on table | | `reject` | Walk away (only use at final round) | | `bundle` | Alias for make_offer with multi-issue terms | ## Observation Space ```python NegotiationObservation( task_id="single_issue", round_number=2, max_rounds=6, supplier_message="I appreciate your offer. Based on our costs...", current_offer={"price": 46000}, last_4_exchanges=[...], buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}}, rapport_hint="positive", # positive | neutral | negative done=False ) ``` ## Running the Server ```bash # Build Docker image docker build -t procure-rl -f server/Dockerfile . # Run container (port 7860 - required for HF Spaces) docker run -p 7860:7860 procure-rl # Access web interface at http://localhost:7860/web ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/metadata` | GET | Environment metadata | | `/reset` | POST | Reset environment | | `/step` | POST | Execute action | | `/state` | GET | Get current state | | `/ws` | WS | WebSocket for persistent sessions | ## Baseline Inference Run inference against all three tasks: ```bash cp .env.example .env # Edit .env and add your HF_TOKEN HF_TOKEN=your_token python inference.py ``` Output format (exact): ``` [START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct [STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null [STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null [END] success=true steps=2 score=0.52 rewards=0.00,0.52 ``` ## Environment Design ### Rapport System The opponent maintains a rapport score (0.0 to 1.0) updated per-round: ```python COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...] AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...] delta = +0.08 per collaborative signal detected delta = -0.08 per aggressive signal detected delta = max(-0.20, min(0.20, delta)) # cap per round ``` ### Opponent Personas | Persona | Base Concession | Rapport Modifier | Special Behavior | |---------|----------------|-------------------|-------------------| | `cooperative` | 5% | ±50% | Responsive to language | | `cash_flow_stressed` | 7% | ±50% | Accepts Net-45+, comments on payment | | `aggressive_anchor` | 4% | ±50% | Hardens after 2+ consecutive concessions | ### Grading Graders are pure Python — zero LLM calls. They combine: - **Value**: how close to buyer's target - **Efficiency**: penalty for taking too many rounds - **Pattern penalty** (adversarial only): for consecutive concession behavior Graders never crash on malformed input — they fall back to worst-case values. ## Project Structure ``` Procure_RL/ ├── __init__.py # Package exports ├── client.py # EnvClient wrapper ├── models.py # NegotiationAction, NegotiationObservation, NegotiationState ├── opponent.py # ScriptedPersonaOpponent with 3 personas + rapport ├── graders.py # grade_single_issue, grade_multi_issue, grade_adversarial ├── inference.py # Baseline agent with [START][STEP][END] output ├── server/ │ ├── __init__.py │ ├── app.py # FastAPI app │ ├── Procure_RL_environment.py # ProcureRLEnvironment │ ├── requirements.txt │ └── Dockerfile ├── openenv.yaml # OpenEnv manifest ├── pyproject.toml ├── plan.md # Full design specification └── README.md # This file ``` ## Why This Environment? **Market validation**: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025. **Research gap**: Zero negotiation environments in OpenEnv hub. **LLM advantage**: Language quality directly affects opponent rapport — the language IS the policy. **Reproducibility**: Deterministic scripted opponent, pure Python graders, no LLM in environment loop. ## Calibration If base LLM scores above 0.55 on single_issue → opponent too easy, reduce cooperative concession rate. If base LLM scores below 0.15 on single_issue → opponent too hard, increase cooperative concession rate.