## Overview Build a **deterministic OpenEnv environment** for real-world procurement negotiation. - Must follow OpenEnv API (`reset / step / state`) - Must include **3 tasks (easy → medium → hard)** - Must produce **deterministic rewards in [0.0, 1.0]** - Must be **fully reproducible and deployable** --- ## Core Requirements ### 1. Environment Implement in: ``` procure_rl/environment.py ``` - `reset(task_id, seed)` → initial observation - `step(action)` → `(observation, reward, done, info)` - `state()` → internal state Use typed models from: ``` procure_rl/models.py ``` --- ### 2. Tasks (MANDATORY: 3) Defined in: ``` procure_rl/environment.py (TASK_CONFIG) ``` | Task | Description | | ------------ | --------------------------------- | | single_issue | price-only negotiation | | multi_issue | price + payment tradeoff | | adversarial | multi-issue + aggressive opponent | Each must: - have different difficulty - run within step limits - produce score ∈ [0,1] --- ### 3. Opponent (CRITICAL) Implemented in: ``` procure_rl/opponent.py ``` Requirements: - deterministic (seeded RNG) - no LLM usage - **language-sensitive behavior** (via keyword detection) 👉 This is what makes LLM useful without breaking reproducibility. --- ### 4. Reward / Graders Implemented in: ``` procure_rl/graders.py ``` Requirements: - deterministic - bounded [0.0, 1.0] - reflect: - deal quality - efficiency (rounds) - no randomness, no LLM --- ### 5. API Server Implemented in: ``` server/app.py ``` Endpoints: - `/reset` - `/step` - `/state` - `/health` Must return valid JSON and HTTP 200. --- ### 6. OpenEnv Spec File: ``` openenv.yaml ``` Must define: - environment name - tasks (3+) - reward range - action/observation description Validate with: ``` openenv validate ``` --- ### 7. Inference Script (MANDATORY) File: ``` inference.py ``` Requirements: - uses OpenAI client - reads: - `API_BASE_URL` - `MODEL_NAME` - `HF_TOKEN` - interacts with env via loop - prints EXACT format: ``` [START] ... [STEP] ... [END] ... ``` ⚠️ Any formatting deviation → failure --- ### 8. Docker + Deployment File: ``` Dockerfile ``` Must: - build successfully - expose port `7860` - run FastAPI server Test: ``` docker build -t procure-rl . docker run -p 7860:7860 procure-rl ``` --- ### 9. Hugging Face Space Must: - deploy successfully - respond to `/reset` with HTTP 200 --- ### 10. README Must include: - environment description - action & observation formats - task descriptions - setup instructions - baseline scores --- ## Validation Checklist (ALL REQUIRED) Run before submission: ``` openenv validate docker build . python inference.py ``` Ensure: - all 3 tasks run - scores ∈ [0,1] - runtime < 20 minutes - no crashes --- ## Constraints - No LLM inside environment - No randomness without seed - Must run on: - 2 vCPU - 8GB RAM --- ## Key Design Principle > LLM is used for **decision-making**, not environment logic. - Environment = deterministic - Agent (LLM) = intelligent --- ## File Reference Summary ``` procure_rl/ models.py # dataclasses environment.py # core logic opponent.py # scripted opponent graders.py # reward functions server/ app.py # API inference.py # baseline agent openenv.yaml # spec Dockerfile # deployment README.md # docs ``` --- ## Final Rule If any of these fail: - Docker build - openenv validate - inference script 👉 **Submission is disqualified** --- ## One-line Goal > Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward. ---