Spaces:

akshaypulla
/

procure-rl

Sleeping

App Files Files Community

procure-rl / Instructions.md

akshaypulla

Upload folder using huggingface_hub

c1be7c3 verified about 1 month ago

preview code

raw

history blame contribute delete

3.78 kB

	## Overview

	Build a deterministic OpenEnv environment for real-world procurement negotiation.

	- Must follow OpenEnv API (`reset / step / state`)
	- Must include 3 tasks (easy → medium → hard)
	- Must produce deterministic rewards in [0.0, 1.0]
	- Must be fully reproducible and deployable

	---

	## Core Requirements

	### 1. Environment

	Implement in:

	```
	procure_rl/environment.py
	```

	- `reset(task_id, seed)` → initial observation
	- `step(action)` → `(observation, reward, done, info)`
	- `state()` → internal state

	Use typed models from:

	```
	procure_rl/models.py
	```

	---

	### 2. Tasks (MANDATORY: 3)

	Defined in:

	```
	procure_rl/environment.py (TASK_CONFIG)
	```

	\| Task \| Description \|
	\| ------------ \| --------------------------------- \|
	\| single_issue \| price-only negotiation \|
	\| multi_issue \| price + payment tradeoff \|
	\| adversarial \| multi-issue + aggressive opponent \|

	Each must:

	- have different difficulty
	- run within step limits
	- produce score ∈ [0,1]

	---

	### 3. Opponent (CRITICAL)

	Implemented in:

	```
	procure_rl/opponent.py
	```

	Requirements:

	- deterministic (seeded RNG)
	- no LLM usage
	- language-sensitive behavior (via keyword detection)

	👉 This is what makes LLM useful without breaking reproducibility.

	---

	### 4. Reward / Graders

	Implemented in:

	```
	procure_rl/graders.py
	```

	Requirements:

	- deterministic
	- bounded [0.0, 1.0]
	- reflect:
	- deal quality
	- efficiency (rounds)

	- no randomness, no LLM

	---

	### 5. API Server

	Implemented in:

	```
	server/app.py
	```

	Endpoints:

	- `/reset`
	- `/step`
	- `/state`
	- `/health`

	Must return valid JSON and HTTP 200.

	---

	### 6. OpenEnv Spec

	File:

	```
	openenv.yaml
	```

	Must define:

	- environment name
	- tasks (3+)
	- reward range
	- action/observation description

	Validate with:

	```
	openenv validate
	```

	---

	### 7. Inference Script (MANDATORY)

	File:

	```
	inference.py
	```

	Requirements:

	- uses OpenAI client
	- reads:
	- `API_BASE_URL`
	- `MODEL_NAME`
	- `HF_TOKEN`

	- interacts with env via loop
	- prints EXACT format:

	```
	[START] ...
	[STEP] ...
	[END] ...
	```

	⚠️ Any formatting deviation → failure

	---

	### 8. Docker + Deployment

	File:

	```
	Dockerfile
	```

	Must:

	- build successfully
	- expose port `7860`
	- run FastAPI server

	Test:

	```
	docker build -t procure-rl .
	docker run -p 7860:7860 procure-rl
	```

	---

	### 9. Hugging Face Space

	Must:

	- deploy successfully
	- respond to `/reset` with HTTP 200

	---

	### 10. README

	Must include:

	- environment description
	- action & observation formats
	- task descriptions
	- setup instructions
	- baseline scores

	---

	## Validation Checklist (ALL REQUIRED)

	Run before submission:

	```
	openenv validate
	docker build .
	python inference.py
	```

	Ensure:

	- all 3 tasks run
	- scores ∈ [0,1]
	- runtime < 20 minutes
	- no crashes

	---

	## Constraints

	- No LLM inside environment
	- No randomness without seed
	- Must run on:
	- 2 vCPU
	- 8GB RAM

	---

	## Key Design Principle

	> LLM is used for decision-making, not environment logic.

	- Environment = deterministic
	- Agent (LLM) = intelligent

	---

	## File Reference Summary

	```
	procure_rl/
	models.py # dataclasses
	environment.py # core logic
	opponent.py # scripted opponent
	graders.py # reward functions

	server/
	app.py # API

	inference.py # baseline agent
	openenv.yaml # spec
	Dockerfile # deployment
	README.md # docs
	```

	---

	## Final Rule

	If any of these fail:

	- Docker build
	- openenv validate
	- inference script

	👉 Submission is disqualified

	---

	## One-line Goal

	> Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.

	---