procure-rl / Instructions.md
akshaypulla's picture
Upload folder using huggingface_hub
c1be7c3 verified
## Overview
Build a **deterministic OpenEnv environment** for real-world procurement negotiation.
- Must follow OpenEnv API (`reset / step / state`)
- Must include **3 tasks (easy → medium → hard)**
- Must produce **deterministic rewards in [0.0, 1.0]**
- Must be **fully reproducible and deployable**
---
## Core Requirements
### 1. Environment
Implement in:
```
procure_rl/environment.py
```
- `reset(task_id, seed)` → initial observation
- `step(action)``(observation, reward, done, info)`
- `state()` → internal state
Use typed models from:
```
procure_rl/models.py
```
---
### 2. Tasks (MANDATORY: 3)
Defined in:
```
procure_rl/environment.py (TASK_CONFIG)
```
| Task | Description |
| ------------ | --------------------------------- |
| single_issue | price-only negotiation |
| multi_issue | price + payment tradeoff |
| adversarial | multi-issue + aggressive opponent |
Each must:
- have different difficulty
- run within step limits
- produce score ∈ [0,1]
---
### 3. Opponent (CRITICAL)
Implemented in:
```
procure_rl/opponent.py
```
Requirements:
- deterministic (seeded RNG)
- no LLM usage
- **language-sensitive behavior** (via keyword detection)
👉 This is what makes LLM useful without breaking reproducibility.
---
### 4. Reward / Graders
Implemented in:
```
procure_rl/graders.py
```
Requirements:
- deterministic
- bounded [0.0, 1.0]
- reflect:
- deal quality
- efficiency (rounds)
- no randomness, no LLM
---
### 5. API Server
Implemented in:
```
server/app.py
```
Endpoints:
- `/reset`
- `/step`
- `/state`
- `/health`
Must return valid JSON and HTTP 200.
---
### 6. OpenEnv Spec
File:
```
openenv.yaml
```
Must define:
- environment name
- tasks (3+)
- reward range
- action/observation description
Validate with:
```
openenv validate
```
---
### 7. Inference Script (MANDATORY)
File:
```
inference.py
```
Requirements:
- uses OpenAI client
- reads:
- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN`
- interacts with env via loop
- prints EXACT format:
```
[START] ...
[STEP] ...
[END] ...
```
⚠️ Any formatting deviation → failure
---
### 8. Docker + Deployment
File:
```
Dockerfile
```
Must:
- build successfully
- expose port `7860`
- run FastAPI server
Test:
```
docker build -t procure-rl .
docker run -p 7860:7860 procure-rl
```
---
### 9. Hugging Face Space
Must:
- deploy successfully
- respond to `/reset` with HTTP 200
---
### 10. README
Must include:
- environment description
- action & observation formats
- task descriptions
- setup instructions
- baseline scores
---
## Validation Checklist (ALL REQUIRED)
Run before submission:
```
openenv validate
docker build .
python inference.py
```
Ensure:
- all 3 tasks run
- scores ∈ [0,1]
- runtime < 20 minutes
- no crashes
---
## Constraints
- No LLM inside environment
- No randomness without seed
- Must run on:
- 2 vCPU
- 8GB RAM
---
## Key Design Principle
> LLM is used for **decision-making**, not environment logic.
- Environment = deterministic
- Agent (LLM) = intelligent
---
## File Reference Summary
```
procure_rl/
models.py # dataclasses
environment.py # core logic
opponent.py # scripted opponent
graders.py # reward functions
server/
app.py # API
inference.py # baseline agent
openenv.yaml # spec
Dockerfile # deployment
README.md # docs
```
---
## Final Rule
If any of these fail:
- Docker build
- openenv validate
- inference script
👉 **Submission is disqualified**
---
## One-line Goal
> Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.
---