Spaces:
Sleeping
Sleeping
| ## Overview | |
| Build a **deterministic OpenEnv environment** for real-world procurement negotiation. | |
| - Must follow OpenEnv API (`reset / step / state`) | |
| - Must include **3 tasks (easy → medium → hard)** | |
| - Must produce **deterministic rewards in [0.0, 1.0]** | |
| - Must be **fully reproducible and deployable** | |
| --- | |
| ## Core Requirements | |
| ### 1. Environment | |
| Implement in: | |
| ``` | |
| procure_rl/environment.py | |
| ``` | |
| - `reset(task_id, seed)` → initial observation | |
| - `step(action)` → `(observation, reward, done, info)` | |
| - `state()` → internal state | |
| Use typed models from: | |
| ``` | |
| procure_rl/models.py | |
| ``` | |
| --- | |
| ### 2. Tasks (MANDATORY: 3) | |
| Defined in: | |
| ``` | |
| procure_rl/environment.py (TASK_CONFIG) | |
| ``` | |
| | Task | Description | | |
| | ------------ | --------------------------------- | | |
| | single_issue | price-only negotiation | | |
| | multi_issue | price + payment tradeoff | | |
| | adversarial | multi-issue + aggressive opponent | | |
| Each must: | |
| - have different difficulty | |
| - run within step limits | |
| - produce score ∈ [0,1] | |
| --- | |
| ### 3. Opponent (CRITICAL) | |
| Implemented in: | |
| ``` | |
| procure_rl/opponent.py | |
| ``` | |
| Requirements: | |
| - deterministic (seeded RNG) | |
| - no LLM usage | |
| - **language-sensitive behavior** (via keyword detection) | |
| 👉 This is what makes LLM useful without breaking reproducibility. | |
| --- | |
| ### 4. Reward / Graders | |
| Implemented in: | |
| ``` | |
| procure_rl/graders.py | |
| ``` | |
| Requirements: | |
| - deterministic | |
| - bounded [0.0, 1.0] | |
| - reflect: | |
| - deal quality | |
| - efficiency (rounds) | |
| - no randomness, no LLM | |
| --- | |
| ### 5. API Server | |
| Implemented in: | |
| ``` | |
| server/app.py | |
| ``` | |
| Endpoints: | |
| - `/reset` | |
| - `/step` | |
| - `/state` | |
| - `/health` | |
| Must return valid JSON and HTTP 200. | |
| --- | |
| ### 6. OpenEnv Spec | |
| File: | |
| ``` | |
| openenv.yaml | |
| ``` | |
| Must define: | |
| - environment name | |
| - tasks (3+) | |
| - reward range | |
| - action/observation description | |
| Validate with: | |
| ``` | |
| openenv validate | |
| ``` | |
| --- | |
| ### 7. Inference Script (MANDATORY) | |
| File: | |
| ``` | |
| inference.py | |
| ``` | |
| Requirements: | |
| - uses OpenAI client | |
| - reads: | |
| - `API_BASE_URL` | |
| - `MODEL_NAME` | |
| - `HF_TOKEN` | |
| - interacts with env via loop | |
| - prints EXACT format: | |
| ``` | |
| [START] ... | |
| [STEP] ... | |
| [END] ... | |
| ``` | |
| ⚠️ Any formatting deviation → failure | |
| --- | |
| ### 8. Docker + Deployment | |
| File: | |
| ``` | |
| Dockerfile | |
| ``` | |
| Must: | |
| - build successfully | |
| - expose port `7860` | |
| - run FastAPI server | |
| Test: | |
| ``` | |
| docker build -t procure-rl . | |
| docker run -p 7860:7860 procure-rl | |
| ``` | |
| --- | |
| ### 9. Hugging Face Space | |
| Must: | |
| - deploy successfully | |
| - respond to `/reset` with HTTP 200 | |
| --- | |
| ### 10. README | |
| Must include: | |
| - environment description | |
| - action & observation formats | |
| - task descriptions | |
| - setup instructions | |
| - baseline scores | |
| --- | |
| ## Validation Checklist (ALL REQUIRED) | |
| Run before submission: | |
| ``` | |
| openenv validate | |
| docker build . | |
| python inference.py | |
| ``` | |
| Ensure: | |
| - all 3 tasks run | |
| - scores ∈ [0,1] | |
| - runtime < 20 minutes | |
| - no crashes | |
| --- | |
| ## Constraints | |
| - No LLM inside environment | |
| - No randomness without seed | |
| - Must run on: | |
| - 2 vCPU | |
| - 8GB RAM | |
| --- | |
| ## Key Design Principle | |
| > LLM is used for **decision-making**, not environment logic. | |
| - Environment = deterministic | |
| - Agent (LLM) = intelligent | |
| --- | |
| ## File Reference Summary | |
| ``` | |
| procure_rl/ | |
| models.py # dataclasses | |
| environment.py # core logic | |
| opponent.py # scripted opponent | |
| graders.py # reward functions | |
| server/ | |
| app.py # API | |
| inference.py # baseline agent | |
| openenv.yaml # spec | |
| Dockerfile # deployment | |
| README.md # docs | |
| ``` | |
| --- | |
| ## Final Rule | |
| If any of these fail: | |
| - Docker build | |
| - openenv validate | |
| - inference script | |
| 👉 **Submission is disqualified** | |
| --- | |
| ## One-line Goal | |
| > Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward. | |
| --- | |