Spaces:
Sleeping
Sleeping
| title: ProcureRL Environment | |
| emoji: π€ | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| app_port: 7860 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| - negotiation | |
| - procurement | |
| - rl | |
| - real-world | |
| # ProcureRL: Procurement Negotiation RL Environment | |
| An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior. | |
| ## The Key Innovation: Language-Sensitive Opponent | |
| The opponent's concession rate is directly affected by the **quality of the agent's natural language**: | |
| - **Collaborative language** ("let's work together", "mutual benefit") β increases rapport β opponent concedes more | |
| - **Neutral language** β opponent concedes at baseline rate | |
| - **Aggressive language** ("final offer", "take it or leave it") β rapport drops β opponent hardens | |
| This makes LLM genuinely required β output quality directly affects negotiation outcomes. | |
| ## Quick Start | |
| ```python | |
| from server.Procure_RL_environment import ProcureRLEnvironment | |
| from models import NegotiationAction | |
| env = ProcureRLEnvironment() | |
| obs = env.reset(task_id="single_issue", seed=42) | |
| print(f"Supplier: {obs.supplier_message}") | |
| print(f"Offer: {obs.current_offer}") | |
| print(f"Your target: {obs.buyer_constraints}") | |
| action = NegotiationAction( | |
| move_type="make_offer", | |
| terms={"price": 42000}, | |
| message="Let's find a mutually beneficial solution." | |
| ) | |
| obs = env.step(action) | |
| print(f"Response: {obs.supplier_message}") | |
| print(f"New offer: {obs.current_offer}") | |
| ``` | |
| ## Web Interface Example | |
| The web interface at `/web` provides a visual playground. Here's how to use it: | |
| ### Step 1: Reset the Environment | |
| Click **Reset** to start a new negotiation episode. You can customize the reset by passing JSON: | |
| ```json | |
| {"task_id": "single_issue", "seed": 42} | |
| ``` | |
| **Available tasks:** | |
| - `single_issue` β Price-only negotiation (6 rounds max) | |
| - `multi_issue` β Price + payment terms (8 rounds max) | |
| - `adversarial` β Price + payment + support hours (10 rounds max) | |
| ### Step 2: Make an Offer | |
| Fill in the form fields: | |
| | Field | Example Value | Notes | | |
| |-------|--------------|-------| | |
| | `move_type` | `make_offer` | Options: make_offer, accept, reject, bundle | | |
| | `terms` | `{"price": 42000}` | JSON object with negotiation terms | | |
| | `message` | `I value our partnership and believe we can find a fair solution.` | Your natural language message (affects opponent rapport!) | | |
| **Example: Making a collaborative offer** | |
| ``` | |
| move_type: make_offer | |
| terms: {"price": 45000} | |
| message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties. | |
| ``` | |
| ### Step 3: Read the Response | |
| After clicking **Step**, you'll see: | |
| - `supplier_message` β The opponent's natural language response | |
| - `current_offer` β Updated terms on the table | |
| - `rapport_hint` β "positive", "neutral", or "negative" based on your language | |
| - `round_number` β Current round (0-indexed) | |
| ### Step 4: Continue or Accept | |
| - **Make another offer** to continue negotiating | |
| - **Use `accept`** when you're satisfied with the current terms | |
| - **Use `reject`** only if you want to walk away (no reward) | |
| **Example: Accepting current terms** | |
| ``` | |
| move_type: accept | |
| terms: {} | |
| message: | |
| ``` | |
| ### Multi-Issue Negotiation (Task 2 & 3) | |
| For `multi_issue` and `adversarial`, include multiple terms: | |
| ```json | |
| { | |
| "move_type": "make_offer", | |
| "terms": { | |
| "price": 44000, | |
| "payment_days": 30 | |
| }, | |
| "message": "We can offer faster payment terms if that helps your cash flow." | |
| } | |
| ``` | |
| **Key insight:** In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price! | |
| ### Example Full Episode | |
| **Round 0 (Reset):** | |
| - Task: `single_issue` | |
| - Supplier opens: ~$52,000 | |
| - Your target: $36,000 | |
| **Round 1:** | |
| - `move_type`: `make_offer` | |
| - `terms`: `{"price": 48000}` | |
| - `message`: `We value your partnership and want to find a fair price for both parties.` | |
| **Round 2:** | |
| - Supplier counter-offers at ~$46,000 (rapport is positive!) | |
| - `move_type`: `make_offer` | |
| - `terms`: `{"price": 45000}` | |
| - `message`: `I appreciate your movement. Let's see if we can get to $45,000.` | |
| **Round 3:** | |
| - Supplier accepts or counter-offers near your target | |
| - `move_type`: `accept` | |
| - `terms`: `{}` | |
| - Final score: Based on how close to target and how efficiently | |
| ## The Three Tasks | |
| ### 1. `single_issue` (Easy) | |
| Renew software license. Price only. | |
| - Buyer target: $36,000, Budget: $53,000 | |
| - Seller opens: ~$52,000 (varies by seed) | |
| - Opponent persona: Cooperative | |
| - Max rounds: 6 | |
| **Scoring:** Deal quality (how close to target) Γ Efficiency (how few rounds) | |
| ### 2. `multi_issue` (Medium) | |
| Enterprise software deal. Price + payment terms. | |
| - Buyer weights: price 70%, payment 30% | |
| - Seller persona: Cash Flow Stressed (cares more about payment timing) | |
| - **Trade opportunity**: offer Net-30 payment to get lower price | |
| - Max rounds: 8 | |
| **Scoring:** Weighted combination of price improvement + payment terms | |
| ### 3. `adversarial` (Hard) | |
| Large contract negotiation. Price + payment + support hours. | |
| - Opponent persona: Aggressive Anchor | |
| - Opens at ceiling on all issues | |
| - Hardens position if you make 2+ consecutive concessions | |
| - Requires consistent collaborative framing | |
| - Survival floor: any deal scores at least 0.15 | |
| - Max rounds: 10 | |
| **Scoring:** Multi-dimensional value minus pattern penalty for consecutive concessions | |
| ## Action Space | |
| ```python | |
| NegotiationAction( | |
| move_type="make_offer", # make_offer | accept | reject | bundle | |
| terms={"price": 44000, "payment_days": 45, "support_hours": 120}, | |
| message="We appreciate your flexibility on this." | |
| ) | |
| ``` | |
| | move_type | Description | | |
| |-----------|-------------| | |
| | `make_offer` | Propose terms (price required, others optional) | | |
| | `accept` | Accept current offer on table | | |
| | `reject` | Walk away (only use at final round) | | |
| | `bundle` | Alias for make_offer with multi-issue terms | | |
| ## Observation Space | |
| ```python | |
| NegotiationObservation( | |
| task_id="single_issue", | |
| round_number=2, | |
| max_rounds=6, | |
| supplier_message="I appreciate your offer. Based on our costs...", | |
| current_offer={"price": 46000}, | |
| last_4_exchanges=[...], | |
| buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}}, | |
| rapport_hint="positive", # positive | neutral | negative | |
| done=False | |
| ) | |
| ``` | |
| ## Running the Server | |
| ```bash | |
| # Build Docker image | |
| docker build -t procure-rl -f server/Dockerfile . | |
| # Run container (port 7860 - required for HF Spaces) | |
| docker run -p 7860:7860 procure-rl | |
| # Access web interface at http://localhost:7860/web | |
| ``` | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/health` | GET | Health check | | |
| | `/metadata` | GET | Environment metadata | | |
| | `/reset` | POST | Reset environment | | |
| | `/step` | POST | Execute action | | |
| | `/state` | GET | Get current state | | |
| | `/ws` | WS | WebSocket for persistent sessions | | |
| ## Baseline Inference | |
| Run inference against all three tasks: | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and add your HF_TOKEN | |
| HF_TOKEN=your_token python inference.py | |
| ``` | |
| Output format (exact): | |
| ``` | |
| [START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct | |
| [STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null | |
| [STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null | |
| [END] success=true steps=2 score=0.52 rewards=0.00,0.52 | |
| ``` | |
| ## Environment Design | |
| ### Rapport System | |
| The opponent maintains a rapport score (0.0 to 1.0) updated per-round: | |
| ```python | |
| COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...] | |
| AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...] | |
| delta = +0.08 per collaborative signal detected | |
| delta = -0.08 per aggressive signal detected | |
| delta = max(-0.20, min(0.20, delta)) # cap per round | |
| ``` | |
| ### Opponent Personas | |
| | Persona | Base Concession | Rapport Modifier | Special Behavior | | |
| |---------|----------------|-------------------|-------------------| | |
| | `cooperative` | 5% | Β±50% | Responsive to language | | |
| | `cash_flow_stressed` | 7% | Β±50% | Accepts Net-45+, comments on payment | | |
| | `aggressive_anchor` | 4% | Β±50% | Hardens after 2+ consecutive concessions | | |
| ### Grading | |
| Graders are pure Python β zero LLM calls. They combine: | |
| - **Value**: how close to buyer's target | |
| - **Efficiency**: penalty for taking too many rounds | |
| - **Pattern penalty** (adversarial only): for consecutive concession behavior | |
| Graders never crash on malformed input β they fall back to worst-case values. | |
| ## Project Structure | |
| ``` | |
| Procure_RL/ | |
| βββ __init__.py # Package exports | |
| βββ client.py # EnvClient wrapper | |
| βββ models.py # NegotiationAction, NegotiationObservation, NegotiationState | |
| βββ opponent.py # ScriptedPersonaOpponent with 3 personas + rapport | |
| βββ graders.py # grade_single_issue, grade_multi_issue, grade_adversarial | |
| βββ inference.py # Baseline agent with [START][STEP][END] output | |
| βββ server/ | |
| β βββ __init__.py | |
| β βββ app.py # FastAPI app | |
| β βββ Procure_RL_environment.py # ProcureRLEnvironment | |
| β βββ requirements.txt | |
| β βββ Dockerfile | |
| βββ openenv.yaml # OpenEnv manifest | |
| βββ pyproject.toml | |
| βββ plan.md # Full design specification | |
| βββ README.md # This file | |
| ``` | |
| ## Why This Environment? | |
| **Market validation**: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025. | |
| **Research gap**: Zero negotiation environments in OpenEnv hub. | |
| **LLM advantage**: Language quality directly affects opponent rapport β the language IS the policy. | |
| **Reproducibility**: Deterministic scripted opponent, pure Python graders, no LLM in environment loop. | |
| ## Calibration | |
| If base LLM scores above 0.55 on single_issue β opponent too easy, reduce cooperative concession rate. | |
| If base LLM scores below 0.15 on single_issue β opponent too hard, increase cooperative concession rate. |