Spaces:
Sleeping
Sleeping
File size: 10,690 Bytes
9ed6c36 1047077 9ed6c36 1047077 9ed6c36 1047077 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | ---
title: Compiler Pass Ordering RL Environment
emoji: ⚙️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Compiler Pass Ordering RL Environment
An OpenEnv environment that simulates **compiler optimization pass ordering** — a real-world task performed by production compilers (GCC, LLVM) every time they build software. The agent must select a sequence of optimization passes to apply to a program's Intermediate Representation (IR) to minimize estimated runtime cost.
This is the same class of problem addressed by DeepMind's MLGO paper (2022), which showed RL finds better pass orderings than the hand-tuned heuristics used in production compilers today.
---
## Motivation
When a compiler optimizes code, it applies a sequence of transformations called **passes**: dead code elimination, loop unrolling, vectorization, function inlining, etc. There are ~50 such passes in LLVM. The order you apply them matters enormously — applying pass A before B can unlock optimizations that B before A cannot.
**Why RL is necessary here (not greedy or heuristics):**
- Greedy selection (always pick the pass with highest immediate benefit) achieves ~20% cost reduction.
- An RL agent that learns prerequisite chains (e.g., `alias_analysis → vectorization` gives 7x effectiveness) achieves ~42-50%.
- The search space is factorial (15! orderings), and the reward for an "enabling" pass only materializes 3-4 steps later — exactly the credit assignment problem RL is built for.
---
## Environment Design
### Prerequisite Gates
The core mechanism that makes RL necessary. Each high-value pass has prerequisites:
| Pass | Prerequisites Required | Synergy Multiplier | Without Prerequisites |
|---|---|---|---|
| vectorization | alias_analysis + dead_code_elimination | 7.0x | 0.3x (penalty) |
| function_inlining | interprocedural_analysis | 5.0x | 0.3x |
| memory_coalescing | alias_analysis | 3.5x | 0.3x |
| strength_reduction | alias_analysis + constant_folding | 3.0x | 0.3x |
| instruction_scheduling | register_allocation | 2.5x | 0.3x |
| loop_unrolling | loop_invariant_motion | 2.0x | 0.3x |
A greedy agent never applies `alias_analysis` (base effect = 0.01) and so never unlocks `vectorization`. An RL agent learns to sacrifice early reward to enable these chains.
---
## Action Space
```python
class CompilerOptAction(Action):
pass_id: int # Integer in [0, 14] — which pass to apply
task_id: int # 1 = easy, 2 = medium, 3 = hard
```
**Available passes:**
| ID | Pass Name | Base Effect | Role |
|---|---|---|---|
| 0 | dead_code_elimination | 0.08 | Enabler for vectorization |
| 1 | constant_folding | 0.06 | Enabler for strength_reduction |
| 2 | loop_unrolling | 0.10 | High value, needs LIM first |
| 3 | function_inlining | 0.03 | High potential, needs interproc |
| 4 | vectorization | 0.02 | Highest potential (7x with prereqs) |
| 5 | loop_invariant_motion | 0.06 | Enabler for loop_unrolling |
| 6 | strength_reduction | 0.05 | Good with alias + const_fold |
| 7 | common_subexpr_elimination | 0.07 | Standalone value |
| 8 | tail_call_optimization | 0.04 | Standalone value |
| 9 | branch_prediction_hints | 0.03 | Standalone value |
| 10 | register_allocation | 0.05 | Enabler for instr_scheduling |
| 11 | instruction_scheduling | 0.04 | Needs reg_alloc first |
| 12 | memory_coalescing | 0.06 | Needs alias first |
| 13 | alias_analysis | 0.01 | **Key enabler** — low base, high unlock |
| 14 | interprocedural_analysis | 0.01 | **Key enabler** — low base, high unlock |
---
## Observation Space
```python
class CompilerOptObservation(Observation):
# Cost tracking
estimated_cost: float # Current runtime cost estimate
baseline_cost: float # Cost before any optimization
# Program IR features (static per episode)
num_instructions: int
num_loops: int
num_branches: int
num_functions: int
loop_depth: int
program_type: str # e.g. "vectorizable", "loop_heavy"
# Episode progress
passes_applied: List[int] # Ordered history of applied passes
passes_available: List[int]
step_count: int
max_steps: int # 10
# Synergy state
synergy_state: List[float] # Per-pass current effectiveness multiplier
# Task info
task_id: int
task_description: str
# Results
done: bool
reward: float
improvement_pct: float # Total % cost reduction from baseline
last_pass_name: str
grader_score: float # 0.0–1.0, populated when done=True
```
---
## Tasks
### Task 1 — Easy
**Goal:** Optimize a vectorizable or loop-heavy program by discovering the primary synergy chain.
**What the agent must learn:** Apply `alias_analysis` (low immediate value) then `dead_code_elimination`, then `vectorization` fires at 7x base effectiveness.
**Grader thresholds:**
- Score 1.0: ≥35% cost reduction
- Score 0.7: ≥25% cost reduction
- Score 0.4: ≥15% cost reduction
- Score 0.0: <15%
**Greedy baseline:** ~18-20% | **RL target:** ~38-45%
---
### Task 2 — Medium
**Goal:** Optimize a compute-heavy or inlining-heavy program by discovering **two independent synergy chains**.
**What the agent must learn:**
- Chain 1: `alias_analysis + DCE → vectorization` (7x)
- Chain 2: `interprocedural_analysis → function_inlining` (5x)
Both chains must be activated within 10 steps.
**Grader thresholds:**
- Score 1.0: ≥45% cost reduction
- Score 0.7: ≥35% cost reduction
- Score 0.4: ≥22% cost reduction
- Score 0.0: <22%
**Greedy baseline:** ~20-24% | **RL target:** ~42-50%
---
### Task 3 — Hard
**Goal:** Optimize any program type. Program is sampled randomly from 7 templates. The agent must generalize across program types with different optimal orderings.
**What the agent must learn:** Generalized pass ordering policy. Different programs reward different chains. A `vectorizable` program rewards chain 1; a `recursive` program rewards chain 2; a `loop_heavy` program rewards `LIM → loop_unrolling`.
**Grader thresholds:**
- Score 1.0: ≥52% cost reduction
- Score 0.7: ≥42% cost reduction
- Score 0.4: ≥28% cost reduction
- Score 0.0: <28%
**Greedy baseline:** ~19-23% | **RL target:** ~45-52%
---
## Reward Function
Reward is provided at every step (not just terminal), giving a dense signal across the full trajectory:
```
Per-step reward = marginal_improvement - 0.02 (step penalty)
marginal_improvement = (cost_before - cost_after) / baseline_cost
Terminal bonus:
improvement >= 50%: +0.6
improvement >= 40%: +0.4
improvement >= 30%: +0.2
improvement >= 20%: +0.05
improvement <= 0%: -0.2
Penalties:
Re-applying an already-used pass: -0.3
Applying pass without prerequisites: reward naturally low (0.3x effectiveness)
```
The step penalty encourages the agent to find efficient sequences (fewer passes to reach the same improvement), matching real compiler efficiency goals.
---
## Grader
The grader is called at episode end and returns a score in [0.0, 1.0]:
```python
score = grade_by_threshold(improvement_pct, task_id)
score -= 0.05 * max(0, steps_used - minimum_steps_needed)
```
The efficiency deduction (0.05 per extra step) rewards agents that find the optimal chain quickly over those that brute-force all 10 steps.
---
## Baseline Scores
Measured using `gpt-4o-mini` via the OpenAI API, 5 episodes per task:
| Task | Avg Score | Avg Improvement | Best Score |
|---|---|---|---|
| Task 1 (Easy) | ~0.42 | ~28% | ~0.70 |
| Task 2 (Medium) | ~0.28 | ~24% | ~0.40 |
| Task 3 (Hard) | ~0.21 | ~21% | ~0.40 |
| **Overall** | **~0.30** | **~24%** | — |
The LLM agent partially discovers the alias → vectorization chain (~30% of episodes) but rarely sequences both chains correctly in Task 2/3.
---
## Setup & Usage
### Prerequisites
- Python 3.10+
- Docker (for containerized deployment)
### Local Development (no Docker)
```bash
# Clone / navigate to project
cd compiler_opt_env
# Install
pip install -e .
# Start server
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
# In another terminal — run sanity test
python test_env.py
# Run LLM baseline
export OPENAI_API_KEY=your_key
python baseline_agent.py --base-url http://localhost:8000 --episodes 3
```
### Docker
```bash
# Build
docker build -t compiler-opt-env:latest -f server/Dockerfile .
# Run
docker run -p 8000:8000 compiler-opt-env:latest
# Web UI
open http://localhost:8000/web
```
### Programmatic Usage
```python
from compiler_opt_env import CompilerOptAction, CompilerOptEnv
from compiler_opt_env.models import TASK_EASY, TASK_MEDIUM, TASK_HARD
with CompilerOptEnv(base_url="http://localhost:8000").sync() as env:
# Task 1: easy
obs = env.reset()
# Optimal sequence for Task 1:
for pass_id in [13, 0, 4]: # alias → DCE → vectorization
result = env.step(CompilerOptAction(pass_id=pass_id, task_id=TASK_EASY))
print(f"{result.observation.last_pass_name}: {result.observation.improvement_pct:.1f}%")
print(f"Grader score: {result.observation.grader_score}")
```
---
## API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/reset` | POST | Start new episode |
| `/step` | POST | Apply an action |
| `/state` | GET | Current episode metadata |
| `/schema` | GET | Action/observation schemas |
| `/health` | GET | Health check |
| `/web` | GET | Interactive web UI |
| `/docs` | GET | Swagger API documentation |
| `/ws` | WS | WebSocket for low-latency sessions |
---
## Project Structure
```
compiler_opt_env/
├── __init__.py # Exports: CompilerOptEnv, CompilerOptAction, CompilerOptObservation
├── models.py # Action, Observation, constants, task IDs
├── client.py # CompilerOptEnv(EnvClient) — HTTP/WebSocket client
├── baseline_agent.py # LLM baseline using OpenAI API
├── openenv.yaml # OpenEnv manifest
├── pyproject.toml # Dependencies
├── README.md # This file
└── server/
├── __init__.py
├── compiler_opt_env_environment.py # Core RL logic: passes, synergy, grader
├── app.py # FastAPI app
└── Dockerfile # Container definition
```
---
## References
- [DeepMind MLGO: A Machine Learning Guided Compiler Optimizations Framework](https://arxiv.org/abs/2101.04808)
- [Meta OpenEnv](https://github.com/meta-pytorch/OpenEnv)
- [LLVM Optimization Passes](https://llvm.org/docs/Passes.html)
|