File size: 10,690 Bytes
f99fb2c
7c2f148
 
 
f99fb2c
 
 
7c2f148
 
 
 
f99fb2c
 
7c2f148
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
---
title: Compiler Pass Ordering RL Environment
emoji: ⚙️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
---

# Compiler Pass Ordering RL Environment

An OpenEnv environment that simulates **compiler optimization pass ordering** — a real-world task performed by production compilers (GCC, LLVM) every time they build software. The agent must select a sequence of optimization passes to apply to a program's Intermediate Representation (IR) to minimize estimated runtime cost.

This is the same class of problem addressed by DeepMind's MLGO paper (2022), which showed RL finds better pass orderings than the hand-tuned heuristics used in production compilers today.

---

## Motivation

When a compiler optimizes code, it applies a sequence of transformations called **passes**: dead code elimination, loop unrolling, vectorization, function inlining, etc. There are ~50 such passes in LLVM. The order you apply them matters enormously — applying pass A before B can unlock optimizations that B before A cannot.

**Why RL is necessary here (not greedy or heuristics):**

- Greedy selection (always pick the pass with highest immediate benefit) achieves ~20% cost reduction.
- An RL agent that learns prerequisite chains (e.g., `alias_analysis → vectorization` gives 7x effectiveness) achieves ~42-50%.
- The search space is factorial (15! orderings), and the reward for an "enabling" pass only materializes 3-4 steps later — exactly the credit assignment problem RL is built for.

---

## Environment Design

### Prerequisite Gates

The core mechanism that makes RL necessary. Each high-value pass has prerequisites:

| Pass | Prerequisites Required | Synergy Multiplier | Without Prerequisites |
|---|---|---|---|
| vectorization | alias_analysis + dead_code_elimination | 7.0x | 0.3x (penalty) |
| function_inlining | interprocedural_analysis | 5.0x | 0.3x |
| memory_coalescing | alias_analysis | 3.5x | 0.3x |
| strength_reduction | alias_analysis + constant_folding | 3.0x | 0.3x |
| instruction_scheduling | register_allocation | 2.5x | 0.3x |
| loop_unrolling | loop_invariant_motion | 2.0x | 0.3x |

A greedy agent never applies `alias_analysis` (base effect = 0.01) and so never unlocks `vectorization`. An RL agent learns to sacrifice early reward to enable these chains.

---

## Action Space

```python
class CompilerOptAction(Action):
    pass_id: int   # Integer in [0, 14] — which pass to apply
    task_id: int   # 1 = easy, 2 = medium, 3 = hard
```

**Available passes:**

| ID | Pass Name | Base Effect | Role |
|---|---|---|---|
| 0 | dead_code_elimination | 0.08 | Enabler for vectorization |
| 1 | constant_folding | 0.06 | Enabler for strength_reduction |
| 2 | loop_unrolling | 0.10 | High value, needs LIM first |
| 3 | function_inlining | 0.03 | High potential, needs interproc |
| 4 | vectorization | 0.02 | Highest potential (7x with prereqs) |
| 5 | loop_invariant_motion | 0.06 | Enabler for loop_unrolling |
| 6 | strength_reduction | 0.05 | Good with alias + const_fold |
| 7 | common_subexpr_elimination | 0.07 | Standalone value |
| 8 | tail_call_optimization | 0.04 | Standalone value |
| 9 | branch_prediction_hints | 0.03 | Standalone value |
| 10 | register_allocation | 0.05 | Enabler for instr_scheduling |
| 11 | instruction_scheduling | 0.04 | Needs reg_alloc first |
| 12 | memory_coalescing | 0.06 | Needs alias first |
| 13 | alias_analysis | 0.01 | **Key enabler** — low base, high unlock |
| 14 | interprocedural_analysis | 0.01 | **Key enabler** — low base, high unlock |

---

## Observation Space

```python
class CompilerOptObservation(Observation):
    # Cost tracking
    estimated_cost: float      # Current runtime cost estimate
    baseline_cost: float       # Cost before any optimization

    # Program IR features (static per episode)
    num_instructions: int
    num_loops: int
    num_branches: int
    num_functions: int
    loop_depth: int
    program_type: str          # e.g. "vectorizable", "loop_heavy"

    # Episode progress
    passes_applied: List[int]  # Ordered history of applied passes
    passes_available: List[int]
    step_count: int
    max_steps: int             # 10

    # Synergy state
    synergy_state: List[float] # Per-pass current effectiveness multiplier

    # Task info
    task_id: int
    task_description: str

    # Results
    done: bool
    reward: float
    improvement_pct: float     # Total % cost reduction from baseline
    last_pass_name: str
    grader_score: float        # 0.0–1.0, populated when done=True
```

---

## Tasks

### Task 1 — Easy
**Goal:** Optimize a vectorizable or loop-heavy program by discovering the primary synergy chain.

**What the agent must learn:** Apply `alias_analysis` (low immediate value) then `dead_code_elimination`, then `vectorization` fires at 7x base effectiveness.

**Grader thresholds:**
- Score 1.0: ≥35% cost reduction
- Score 0.7: ≥25% cost reduction
- Score 0.4: ≥15% cost reduction
- Score 0.0: <15%

**Greedy baseline:** ~18-20% | **RL target:** ~38-45%

---

### Task 2 — Medium
**Goal:** Optimize a compute-heavy or inlining-heavy program by discovering **two independent synergy chains**.

**What the agent must learn:**
- Chain 1: `alias_analysis + DCE → vectorization` (7x)
- Chain 2: `interprocedural_analysis → function_inlining` (5x)
Both chains must be activated within 10 steps.

**Grader thresholds:**
- Score 1.0: ≥45% cost reduction
- Score 0.7: ≥35% cost reduction
- Score 0.4: ≥22% cost reduction
- Score 0.0: <22%

**Greedy baseline:** ~20-24% | **RL target:** ~42-50%

---

### Task 3 — Hard
**Goal:** Optimize any program type. Program is sampled randomly from 7 templates. The agent must generalize across program types with different optimal orderings.

**What the agent must learn:** Generalized pass ordering policy. Different programs reward different chains. A `vectorizable` program rewards chain 1; a `recursive` program rewards chain 2; a `loop_heavy` program rewards `LIM → loop_unrolling`.

**Grader thresholds:**
- Score 1.0: ≥52% cost reduction
- Score 0.7: ≥42% cost reduction
- Score 0.4: ≥28% cost reduction
- Score 0.0: <28%

**Greedy baseline:** ~19-23% | **RL target:** ~45-52%

---

## Reward Function

Reward is provided at every step (not just terminal), giving a dense signal across the full trajectory:

```
Per-step reward = marginal_improvement - 0.02 (step penalty)

marginal_improvement = (cost_before - cost_after) / baseline_cost

Terminal bonus:
  improvement >= 50%: +0.6
  improvement >= 40%: +0.4
  improvement >= 30%: +0.2
  improvement >= 20%: +0.05
  improvement <= 0%:  -0.2

Penalties:
  Re-applying an already-used pass: -0.3
  Applying pass without prerequisites: reward naturally low (0.3x effectiveness)
```

The step penalty encourages the agent to find efficient sequences (fewer passes to reach the same improvement), matching real compiler efficiency goals.

---

## Grader

The grader is called at episode end and returns a score in [0.0, 1.0]:

```python
score = grade_by_threshold(improvement_pct, task_id)
score -= 0.05 * max(0, steps_used - minimum_steps_needed)
```

The efficiency deduction (0.05 per extra step) rewards agents that find the optimal chain quickly over those that brute-force all 10 steps.

---

## Baseline Scores

Measured using `gpt-4o-mini` via the OpenAI API, 5 episodes per task:

| Task | Avg Score | Avg Improvement | Best Score |
|---|---|---|---|
| Task 1 (Easy) | ~0.42 | ~28% | ~0.70 |
| Task 2 (Medium) | ~0.28 | ~24% | ~0.40 |
| Task 3 (Hard) | ~0.21 | ~21% | ~0.40 |
| **Overall** | **~0.30** | **~24%** | — |

The LLM agent partially discovers the alias → vectorization chain (~30% of episodes) but rarely sequences both chains correctly in Task 2/3.

---

## Setup & Usage

### Prerequisites
- Python 3.10+
- Docker (for containerized deployment)

### Local Development (no Docker)

```bash
# Clone / navigate to project
cd compiler_opt_env

# Install
pip install -e .

# Start server
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

# In another terminal — run sanity test
python test_env.py

# Run LLM baseline
export OPENAI_API_KEY=your_key
python baseline_agent.py --base-url http://localhost:8000 --episodes 3
```

### Docker

```bash
# Build
docker build -t compiler-opt-env:latest -f server/Dockerfile .

# Run
docker run -p 8000:8000 compiler-opt-env:latest

# Web UI
open http://localhost:8000/web
```

### Programmatic Usage

```python
from compiler_opt_env import CompilerOptAction, CompilerOptEnv
from compiler_opt_env.models import TASK_EASY, TASK_MEDIUM, TASK_HARD

with CompilerOptEnv(base_url="http://localhost:8000").sync() as env:
    # Task 1: easy
    obs = env.reset()
    
    # Optimal sequence for Task 1:
    for pass_id in [13, 0, 4]:   # alias → DCE → vectorization
        result = env.step(CompilerOptAction(pass_id=pass_id, task_id=TASK_EASY))
        print(f"{result.observation.last_pass_name}: {result.observation.improvement_pct:.1f}%")
    
    print(f"Grader score: {result.observation.grader_score}")
```

---

## API Endpoints

| Endpoint | Method | Description |
|---|---|---|
| `/reset` | POST | Start new episode |
| `/step` | POST | Apply an action |
| `/state` | GET | Current episode metadata |
| `/schema` | GET | Action/observation schemas |
| `/health` | GET | Health check |
| `/web` | GET | Interactive web UI |
| `/docs` | GET | Swagger API documentation |
| `/ws` | WS | WebSocket for low-latency sessions |

---

## Project Structure

```
compiler_opt_env/
├── __init__.py                          # Exports: CompilerOptEnv, CompilerOptAction, CompilerOptObservation
├── models.py                            # Action, Observation, constants, task IDs
├── client.py                            # CompilerOptEnv(EnvClient) — HTTP/WebSocket client
├── baseline_agent.py                    # LLM baseline using OpenAI API
├── openenv.yaml                         # OpenEnv manifest
├── pyproject.toml                       # Dependencies
├── README.md                            # This file
└── server/
    ├── __init__.py
    ├── compiler_opt_env_environment.py  # Core RL logic: passes, synergy, grader
    ├── app.py                           # FastAPI app
    └── Dockerfile                       # Container definition
```

---

## References

- [DeepMind MLGO: A Machine Learning Guided Compiler Optimizations Framework](https://arxiv.org/abs/2101.04808)
- [Meta OpenEnv](https://github.com/meta-pytorch/OpenEnv)
- [LLVM Optimization Passes](https://llvm.org/docs/Passes.html)