procure-rl / Instructions.md
akshaypulla's picture
Upload folder using huggingface_hub
c1be7c3 verified

Overview

Build a deterministic OpenEnv environment for real-world procurement negotiation.

  • Must follow OpenEnv API (reset / step / state)
  • Must include 3 tasks (easy β†’ medium β†’ hard)
  • Must produce deterministic rewards in [0.0, 1.0]
  • Must be fully reproducible and deployable

Core Requirements

1. Environment

Implement in:

procure_rl/environment.py
  • reset(task_id, seed) β†’ initial observation
  • step(action) β†’ (observation, reward, done, info)
  • state() β†’ internal state

Use typed models from:

procure_rl/models.py

2. Tasks (MANDATORY: 3)

Defined in:

procure_rl/environment.py (TASK_CONFIG)
Task Description
single_issue price-only negotiation
multi_issue price + payment tradeoff
adversarial multi-issue + aggressive opponent

Each must:

  • have different difficulty
  • run within step limits
  • produce score ∈ [0,1]

3. Opponent (CRITICAL)

Implemented in:

procure_rl/opponent.py

Requirements:

  • deterministic (seeded RNG)
  • no LLM usage
  • language-sensitive behavior (via keyword detection)

πŸ‘‰ This is what makes LLM useful without breaking reproducibility.


4. Reward / Graders

Implemented in:

procure_rl/graders.py

Requirements:

  • deterministic

  • bounded [0.0, 1.0]

  • reflect:

    • deal quality
    • efficiency (rounds)
  • no randomness, no LLM


5. API Server

Implemented in:

server/app.py

Endpoints:

  • /reset
  • /step
  • /state
  • /health

Must return valid JSON and HTTP 200.


6. OpenEnv Spec

File:

openenv.yaml

Must define:

  • environment name
  • tasks (3+)
  • reward range
  • action/observation description

Validate with:

openenv validate

7. Inference Script (MANDATORY)

File:

inference.py

Requirements:

  • uses OpenAI client

  • reads:

    • API_BASE_URL
    • MODEL_NAME
    • HF_TOKEN
  • interacts with env via loop

  • prints EXACT format:

[START] ...
[STEP] ...
[END] ...

⚠️ Any formatting deviation β†’ failure


8. Docker + Deployment

File:

Dockerfile

Must:

  • build successfully
  • expose port 7860
  • run FastAPI server

Test:

docker build -t procure-rl .
docker run -p 7860:7860 procure-rl

9. Hugging Face Space

Must:

  • deploy successfully
  • respond to /reset with HTTP 200

10. README

Must include:

  • environment description
  • action & observation formats
  • task descriptions
  • setup instructions
  • baseline scores

Validation Checklist (ALL REQUIRED)

Run before submission:

openenv validate
docker build .
python inference.py

Ensure:

  • all 3 tasks run
  • scores ∈ [0,1]
  • runtime < 20 minutes
  • no crashes

Constraints

  • No LLM inside environment
  • No randomness without seed
  • Must run on:
    • 2 vCPU
    • 8GB RAM

Key Design Principle

LLM is used for decision-making, not environment logic.

  • Environment = deterministic
  • Agent (LLM) = intelligent

File Reference Summary

procure_rl/
  models.py        # dataclasses
  environment.py   # core logic
  opponent.py      # scripted opponent
  graders.py       # reward functions

server/
  app.py           # API

inference.py       # baseline agent
openenv.yaml       # spec
Dockerfile         # deployment
README.md          # docs

Final Rule

If any of these fail:

  • Docker build
  • openenv validate
  • inference script

πŸ‘‰ Submission is disqualified


One-line Goal

Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.