Spaces:

akshaypulla
/

procure-rl

Sleeping

App Files Files Community

procure-rl / Instructions.md

akshaypulla

Upload folder using huggingface_hub

c1be7c3 verified about 1 month ago

preview code

raw

history blame contribute delete

3.78 kB

Overview

Build a deterministic OpenEnv environment for real-world procurement negotiation.

Must follow OpenEnv API (reset / step / state)
Must include 3 tasks (easy → medium → hard)
Must produce deterministic rewards in [0.0, 1.0]
Must be fully reproducible and deployable

Core Requirements

1. Environment

Implement in:

procure_rl/environment.py

reset(task_id, seed) → initial observation
step(action) → (observation, reward, done, info)
state() → internal state

Use typed models from:

procure_rl/models.py

2. Tasks (MANDATORY: 3)

Defined in:

procure_rl/environment.py (TASK_CONFIG)

Task	Description
single_issue	price-only negotiation
multi_issue	price + payment tradeoff
adversarial	multi-issue + aggressive opponent

Each must:

have different difficulty
run within step limits
produce score ∈ [0,1]

3. Opponent (CRITICAL)

Implemented in:

procure_rl/opponent.py

Requirements:

deterministic (seeded RNG)
no LLM usage
language-sensitive behavior (via keyword detection)

👉 This is what makes LLM useful without breaking reproducibility.

4. Reward / Graders

Implemented in:

procure_rl/graders.py

Requirements:

deterministic
bounded [0.0, 1.0]
reflect:
- deal quality
- efficiency (rounds)
no randomness, no LLM

5. API Server

Implemented in:

server/app.py

Endpoints:

/reset
/step
/state
/health

Must return valid JSON and HTTP 200.

6. OpenEnv Spec

File:

openenv.yaml

Must define:

environment name
tasks (3+)
reward range
action/observation description

Validate with:

openenv validate

7. Inference Script (MANDATORY)

File:

inference.py

Requirements:

uses OpenAI client
reads:
- API_BASE_URL
- MODEL_NAME
- HF_TOKEN
interacts with env via loop
prints EXACT format:

[START] ...
[STEP] ...
[END] ...

⚠️ Any formatting deviation → failure

8. Docker + Deployment

File:

Dockerfile

Must:

build successfully
expose port 7860
run FastAPI server

Test:

docker build -t procure-rl .
docker run -p 7860:7860 procure-rl

9. Hugging Face Space

Must:

deploy successfully
respond to /reset with HTTP 200

10. README

Must include:

environment description
action & observation formats
task descriptions
setup instructions
baseline scores

Validation Checklist (ALL REQUIRED)

Run before submission:

openenv validate
docker build .
python inference.py

Ensure:

all 3 tasks run
scores ∈ [0,1]
runtime < 20 minutes
no crashes

Constraints

No LLM inside environment
No randomness without seed
Must run on:
- 2 vCPU
- 8GB RAM

Key Design Principle

LLM is used for decision-making, not environment logic.

Environment = deterministic
Agent (LLM) = intelligent

File Reference Summary

procure_rl/
  models.py        # dataclasses
  environment.py   # core logic
  opponent.py      # scripted opponent
  graders.py       # reward functions

server/
  app.py           # API

inference.py       # baseline agent
openenv.yaml       # spec
Dockerfile         # deployment
README.md          # docs

Final Rule

If any of these fail:

Docker build
openenv validate
inference script

👉 Submission is disqualified

One-line Goal

Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.