## Overview

Build a **deterministic OpenEnv environment** for real-world procurement negotiation.

- Must follow OpenEnv API (`reset / step / state`)
- Must include **3 tasks (easy → medium → hard)**
- Must produce **deterministic rewards in [0.0, 1.0]**
- Must be **fully reproducible and deployable**

---

## Core Requirements

### 1. Environment

Implement in:

```
procure_rl/environment.py
```

- `reset(task_id, seed)` → initial observation
- `step(action)` → `(observation, reward, done, info)`
- `state()` → internal state

Use typed models from:

```
procure_rl/models.py
```

---

### 2. Tasks (MANDATORY: 3)

Defined in:

```
procure_rl/environment.py (TASK_CONFIG)
```

| Task         | Description                       |
| ------------ | --------------------------------- |
| single_issue | price-only negotiation            |
| multi_issue  | price + payment tradeoff          |
| adversarial  | multi-issue + aggressive opponent |

Each must:

- have different difficulty
- run within step limits
- produce score ∈ [0,1]

---

### 3. Opponent (CRITICAL)

Implemented in:

```
procure_rl/opponent.py
```

Requirements:

- deterministic (seeded RNG)
- no LLM usage
- **language-sensitive behavior** (via keyword detection)

👉 This is what makes LLM useful without breaking reproducibility.

---

### 4. Reward / Graders

Implemented in:

```
procure_rl/graders.py
```

Requirements:

- deterministic
- bounded [0.0, 1.0]
- reflect:
  - deal quality
  - efficiency (rounds)

- no randomness, no LLM

---

### 5. API Server

Implemented in:

```
server/app.py
```

Endpoints:

- `/reset`
- `/step`
- `/state`
- `/health`

Must return valid JSON and HTTP 200.

---

### 6. OpenEnv Spec

File:

```
openenv.yaml
```

Must define:

- environment name
- tasks (3+)
- reward range
- action/observation description

Validate with:

```
openenv validate
```

---

### 7. Inference Script (MANDATORY)

File:

```
inference.py
```

Requirements:

- uses OpenAI client
- reads:
  - `API_BASE_URL`
  - `MODEL_NAME`
  - `HF_TOKEN`

- interacts with env via loop
- prints EXACT format:

```
[START] ...
[STEP] ...
[END] ...
```

⚠️ Any formatting deviation → failure

---

### 8. Docker + Deployment

File:

```
Dockerfile
```

Must:

- build successfully
- expose port `7860`
- run FastAPI server

Test:

```
docker build -t procure-rl .
docker run -p 7860:7860 procure-rl
```

---

### 9. Hugging Face Space

Must:

- deploy successfully
- respond to `/reset` with HTTP 200

---

### 10. README

Must include:

- environment description
- action & observation formats
- task descriptions
- setup instructions
- baseline scores

---

## Validation Checklist (ALL REQUIRED)

Run before submission:

```
openenv validate
docker build .
python inference.py
```

Ensure:

- all 3 tasks run
- scores ∈ [0,1]
- runtime < 20 minutes
- no crashes

---

## Constraints

- No LLM inside environment
- No randomness without seed
- Must run on:
  - 2 vCPU
  - 8GB RAM

---

## Key Design Principle

> LLM is used for **decision-making**, not environment logic.

- Environment = deterministic
- Agent (LLM) = intelligent

---

## File Reference Summary

```
procure_rl/
  models.py        # dataclasses
  environment.py   # core logic
  opponent.py      # scripted opponent
  graders.py       # reward functions

server/
  app.py           # API

inference.py       # baseline agent
openenv.yaml       # spec
Dockerfile         # deployment
README.md          # docs
```

---

## Final Rule

If any of these fail:

- Docker build
- openenv validate
- inference script

👉 **Submission is disqualified**

---

## One-line Goal

> Build a deterministic, real-world negotiation environment where an LLM agent must make sequential decisions to maximize reward.

---