Spaces:

akshaypulla
/

procure-rl

Sleeping

App Files Files Community

procure-rl / README.md

akshaypulla

Upload folder using huggingface_hub

c1be7c3 verified about 1 month ago

preview code

raw

history blame contribute delete

10.2 kB

	---
	title: ProcureRL Environment
	emoji: 🤝
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 7860
	base_path: /web
	tags:
	- openenv
	- negotiation
	- procurement
	- rl
	- real-world
	---

	# ProcureRL: Procurement Negotiation RL Environment

	An OpenEnv-compliant RL environment where an LLM agent learns to negotiate procurement deals against scripted supplier opponents with language-sensitive behavior.

	## The Key Innovation: Language-Sensitive Opponent

	The opponent's concession rate is directly affected by the quality of the agent's natural language:

	- Collaborative language ("let's work together", "mutual benefit") → increases rapport → opponent concedes more
	- Neutral language → opponent concedes at baseline rate
	- Aggressive language ("final offer", "take it or leave it") → rapport drops → opponent hardens

	This makes LLM genuinely required — output quality directly affects negotiation outcomes.

	## Quick Start

	```python
	from server.Procure_RL_environment import ProcureRLEnvironment
	from models import NegotiationAction

	env = ProcureRLEnvironment()
	obs = env.reset(task_id="single_issue", seed=42)

	print(f"Supplier: {obs.supplier_message}")
	print(f"Offer: {obs.current_offer}")
	print(f"Your target: {obs.buyer_constraints}")

	action = NegotiationAction(
	move_type="make_offer",
	terms={"price": 42000},
	message="Let's find a mutually beneficial solution."
	)
	obs = env.step(action)
	print(f"Response: {obs.supplier_message}")
	print(f"New offer: {obs.current_offer}")
	```

	## Web Interface Example

	The web interface at `/web` provides a visual playground. Here's how to use it:

	### Step 1: Reset the Environment

	Click Reset to start a new negotiation episode. You can customize the reset by passing JSON:

	```json
	{"task_id": "single_issue", "seed": 42}
	```

	Available tasks:
	- `single_issue` — Price-only negotiation (6 rounds max)
	- `multi_issue` — Price + payment terms (8 rounds max)
	- `adversarial` — Price + payment + support hours (10 rounds max)

	### Step 2: Make an Offer

	Fill in the form fields:

	\| Field \| Example Value \| Notes \|
	\|-------\|--------------\|-------\|
	\| `move_type` \| `make_offer` \| Options: make_offer, accept, reject, bundle \|
	\| `terms` \| `{"price": 42000}` \| JSON object with negotiation terms \|
	\| `message` \| `I value our partnership and believe we can find a fair solution.` \| Your natural language message (affects opponent rapport!) \|

	Example: Making a collaborative offer
	```
	move_type: make_offer
	terms: {"price": 45000}
	message: We appreciate your flexibility and would like to work together to find a solution that benefits both parties.
	```

	### Step 3: Read the Response

	After clicking Step, you'll see:
	- `supplier_message` — The opponent's natural language response
	- `current_offer` — Updated terms on the table
	- `rapport_hint` — "positive", "neutral", or "negative" based on your language
	- `round_number` — Current round (0-indexed)

	### Step 4: Continue or Accept

	- Make another offer to continue negotiating
	- Use `accept` when you're satisfied with the current terms
	- Use `reject` only if you want to walk away (no reward)

	Example: Accepting current terms
	```
	move_type: accept
	terms: {}
	message:
	```

	### Multi-Issue Negotiation (Task 2 & 3)

	For `multi_issue` and `adversarial`, include multiple terms:

	```json
	{
	"move_type": "make_offer",
	"terms": {
	"price": 44000,
	"payment_days": 30
	},
	"message": "We can offer faster payment terms if that helps your cash flow."
	}
	```

	Key insight: In `multi_issue`, the opponent cares more about payment timing than price. Offering Net-30 payment can get you a better price!

	### Example Full Episode

	Round 0 (Reset):
	- Task: `single_issue`
	- Supplier opens: ~$52,000
	- Your target: $36,000

	Round 1:
	- `move_type`: `make_offer`
	- `terms`: `{"price": 48000}`
	- `message`: `We value your partnership and want to find a fair price for both parties.`

	Round 2:
	- Supplier counter-offers at ~$46,000 (rapport is positive!)
	- `move_type`: `make_offer`
	- `terms`: `{"price": 45000}`
	- `message`: `I appreciate your movement. Let's see if we can get to $45,000.`

	Round 3:
	- Supplier accepts or counter-offers near your target
	- `move_type`: `accept`
	- `terms`: `{}`
	- Final score: Based on how close to target and how efficiently

	## The Three Tasks

	### 1. `single_issue` (Easy)
	Renew software license. Price only.

	- Buyer target: $36,000, Budget: $53,000
	- Seller opens: ~$52,000 (varies by seed)
	- Opponent persona: Cooperative
	- Max rounds: 6

	Scoring: Deal quality (how close to target) × Efficiency (how few rounds)

	### 2. `multi_issue` (Medium)
	Enterprise software deal. Price + payment terms.

	- Buyer weights: price 70%, payment 30%
	- Seller persona: Cash Flow Stressed (cares more about payment timing)
	- Trade opportunity: offer Net-30 payment to get lower price
	- Max rounds: 8

	Scoring: Weighted combination of price improvement + payment terms

	### 3. `adversarial` (Hard)
	Large contract negotiation. Price + payment + support hours.

	- Opponent persona: Aggressive Anchor
	- Opens at ceiling on all issues
	- Hardens position if you make 2+ consecutive concessions
	- Requires consistent collaborative framing
	- Survival floor: any deal scores at least 0.15
	- Max rounds: 10

	Scoring: Multi-dimensional value minus pattern penalty for consecutive concessions

	## Action Space

	```python
	NegotiationAction(
	move_type="make_offer", # make_offer \| accept \| reject \| bundle
	terms={"price": 44000, "payment_days": 45, "support_hours": 120},
	message="We appreciate your flexibility on this."
	)
	```

	\| move_type \| Description \|
	\|-----------\|-------------\|
	\| `make_offer` \| Propose terms (price required, others optional) \|
	\| `accept` \| Accept current offer on table \|
	\| `reject` \| Walk away (only use at final round) \|
	\| `bundle` \| Alias for make_offer with multi-issue terms \|

	## Observation Space

	```python
	NegotiationObservation(
	task_id="single_issue",
	round_number=2,
	max_rounds=6,
	supplier_message="I appreciate your offer. Based on our costs...",
	current_offer={"price": 46000},
	last_4_exchanges=[...],
	buyer_constraints={"price": {"target": 36000, "worst": 55000, "budget": 53000}},
	rapport_hint="positive", # positive \| neutral \| negative
	done=False
	)
	```

	## Running the Server

	```bash
	# Build Docker image
	docker build -t procure-rl -f server/Dockerfile .

	# Run container (port 7860 - required for HF Spaces)
	docker run -p 7860:7860 procure-rl

	# Access web interface at http://localhost:7860/web
	```

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/health` \| GET \| Health check \|
	\| `/metadata` \| GET \| Environment metadata \|
	\| `/reset` \| POST \| Reset environment \|
	\| `/step` \| POST \| Execute action \|
	\| `/state` \| GET \| Get current state \|
	\| `/ws` \| WS \| WebSocket for persistent sessions \|

	## Baseline Inference

	Run inference against all three tasks:

	```bash
	cp .env.example .env
	# Edit .env and add your HF_TOKEN
	HF_TOKEN=your_token python inference.py
	```

	Output format (exact):

	```
	[START] task=single_issue env=procure-rl model=Qwen/Qwen2.5-72B-Instruct
	[STEP] step=1 action=make_offer({"price": 42000}) reward=0.00 done=false error=null
	[STEP] step=2 action=make_offer({"price": 41000}) reward=0.52 done=true error=null
	[END] success=true steps=2 score=0.52 rewards=0.00,0.52
	```

	## Environment Design

	### Rapport System

	The opponent maintains a rapport score (0.0 to 1.0) updated per-round:

	```python
	COLLABORATIVE_SIGNALS = ["understand", "partnership", "mutual", "together", ...]
	AGGRESSIVE_SIGNALS = ["demand", "require", "final offer", "unacceptable", ...]

	delta = +0.08 per collaborative signal detected
	delta = -0.08 per aggressive signal detected
	delta = max(-0.20, min(0.20, delta)) # cap per round
	```

	### Opponent Personas

	\| Persona \| Base Concession \| Rapport Modifier \| Special Behavior \|
	\|---------\|----------------\|-------------------\|-------------------\|
	\| `cooperative` \| 5% \| ±50% \| Responsive to language \|
	\| `cash_flow_stressed` \| 7% \| ±50% \| Accepts Net-45+, comments on payment \|
	\| `aggressive_anchor` \| 4% \| ±50% \| Hardens after 2+ consecutive concessions \|

	### Grading

	Graders are pure Python — zero LLM calls. They combine:
	- Value: how close to buyer's target
	- Efficiency: penalty for taking too many rounds
	- Pattern penalty (adversarial only): for consecutive concession behavior

	Graders never crash on malformed input — they fall back to worst-case values.

	## Project Structure

	```
	Procure_RL/
	├── __init__.py # Package exports
	├── client.py # EnvClient wrapper
	├── models.py # NegotiationAction, NegotiationObservation, NegotiationState
	├── opponent.py # ScriptedPersonaOpponent with 3 personas + rapport
	├── graders.py # grade_single_issue, grade_multi_issue, grade_adversarial
	├── inference.py # Baseline agent with [START][STEP][END] output
	├── server/
	│ ├── __init__.py
	│ ├── app.py # FastAPI app
	│ ├── Procure_RL_environment.py # ProcureRLEnvironment
	│ ├── requirements.txt
	│ └── Dockerfile
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml
	├── plan.md # Full design specification
	└── README.md # This file
	```

	## Why This Environment?

	Market validation: Walmart deployed Pactum for AI negotiation. 90% of CPOs adopting AI negotiation in 2025.

	Research gap: Zero negotiation environments in OpenEnv hub.

	LLM advantage: Language quality directly affects opponent rapport — the language IS the policy.

	Reproducibility: Deterministic scripted opponent, pure Python graders, no LLM in environment loop.

	## Calibration

	If base LLM scores above 0.55 on single_issue → opponent too easy, reduce cooperative concession rate.

	If base LLM scores below 0.15 on single_issue → opponent too hard, increase cooperative concession rate.