Spaces:
Sleeping
Overview
Build a deterministic OpenEnv environment for real-world procurement negotiation.
- Must follow OpenEnv API (
reset / step / state) - Must include 3 tasks (easy β medium β hard)
- Must produce deterministic rewards in [0.0, 1.0]
- Must be fully reproducible and deployable
Core Requirements
1. Environment
Implement in:
procure_rl/environment.py
reset(task_id, seed)β initial observationstep(action)β(observation, reward, done, info)state()β internal state
Use typed models from:
procure_rl/models.py
2. Tasks (MANDATORY: 3)
Defined in:
procure_rl/environment.py (TASK_CONFIG)
| Task | Description |
|---|---|
| single_issue | price-only negotiation |
| multi_issue | price + payment tradeoff |
| adversarial | multi-issue + aggressive opponent |
Each must:
- have different difficulty
- run within step limits
- produce score β [0,1]
3. Opponent (CRITICAL)
Implemented in:
procure_rl/opponent.py
Requirements:
- deterministic (seeded RNG)
- no LLM usage
- language-sensitive behavior (via keyword detection)
π This is what makes LLM useful without breaking reproducibility.
4. Reward / Graders
Implemented in:
procure_rl/graders.py
Requirements:
deterministic
bounded [0.0, 1.0]
reflect:
- deal quality
- efficiency (rounds)
no randomness, no LLM
5. API Server
Implemented in:
server/app.py
Endpoints:
/reset/step/state/health
Must return valid JSON and HTTP 200.
6. OpenEnv Spec
File:
openenv.yaml
Must define:
- environment name
- tasks (3+)
- reward range
- action/observation description
Validate with:
openenv validate
7. Inference Script (MANDATORY)
File:
inference.py
Requirements:
uses OpenAI client
reads:
API_BASE_URLMODEL_NAMEHF_TOKEN
interacts with env via loop
prints EXACT format:
[START] ...
[STEP] ...
[END] ...
β οΈ Any formatting deviation β failure
8. Docker + Deployment
File:
Dockerfile
Must:
- build successfully
- expose port
7860 - run FastAPI server
Test:
docker build -t procure-rl .
docker run -p 7860:7860 procure-rl
9. Hugging Face Space
Must:
- deploy successfully
- respond to
/resetwith HTTP 200
10. README
Must include:
- environment description
- action & observation formats
- task descriptions
- setup instructions
- baseline scores
Validation Checklist (ALL REQUIRED)
Run before submission:
openenv validate
docker build .
python inference.py
Ensure:
- all 3 tasks run
- scores β [0,1]
- runtime < 20 minutes
- no crashes
Constraints
- No LLM inside environment
- No randomness without seed
- Must run on:
- 2 vCPU
- 8GB RAM
Key Design Principle
LLM is used for decision-making, not environment logic.
- Environment = deterministic
- Agent (LLM) = intelligent
File Reference Summary
procure_rl/
models.py # dataclasses
environment.py # core logic
opponent.py # scripted opponent
graders.py # reward functions
server/
app.py # API
inference.py # baseline agent
openenv.yaml # spec
Dockerfile # deployment
README.md # docs
Final Rule
If any of these fail:
- Docker build
- openenv validate
- inference script
π Submission is disqualified
One-line Goal
Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.