Spaces:
Sleeping
Sleeping
| title: Compiler Pass Ordering RL Environment | |
| emoji: ⚙️ | |
| colorFrom: gray | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| # Compiler Pass Ordering RL Environment | |
| An OpenEnv environment that simulates **compiler optimization pass ordering** — a real-world task performed by production compilers (GCC, LLVM) every time they build software. The agent must select a sequence of optimization passes to apply to a program's Intermediate Representation (IR) to minimize estimated runtime cost. | |
| This is the same class of problem addressed by DeepMind's MLGO paper (2022), which showed RL finds better pass orderings than the hand-tuned heuristics used in production compilers today. | |
| --- | |
| ## Motivation | |
| When a compiler optimizes code, it applies a sequence of transformations called **passes**: dead code elimination, loop unrolling, vectorization, function inlining, etc. There are ~50 such passes in LLVM. The order you apply them matters enormously — applying pass A before B can unlock optimizations that B before A cannot. | |
| **Why RL is necessary here (not greedy or heuristics):** | |
| - Greedy selection (always pick the pass with highest immediate benefit) achieves ~20% cost reduction. | |
| - An RL agent that learns prerequisite chains (e.g., `alias_analysis → vectorization` gives 7x effectiveness) achieves ~42-50%. | |
| - The search space is factorial (15! orderings), and the reward for an "enabling" pass only materializes 3-4 steps later — exactly the credit assignment problem RL is built for. | |
| --- | |
| ## Environment Design | |
| ### Prerequisite Gates | |
| The core mechanism that makes RL necessary. Each high-value pass has prerequisites: | |
| | Pass | Prerequisites Required | Synergy Multiplier | Without Prerequisites | | |
| |---|---|---|---| | |
| | vectorization | alias_analysis + dead_code_elimination | 7.0x | 0.3x (penalty) | | |
| | function_inlining | interprocedural_analysis | 5.0x | 0.3x | | |
| | memory_coalescing | alias_analysis | 3.5x | 0.3x | | |
| | strength_reduction | alias_analysis + constant_folding | 3.0x | 0.3x | | |
| | instruction_scheduling | register_allocation | 2.5x | 0.3x | | |
| | loop_unrolling | loop_invariant_motion | 2.0x | 0.3x | | |
| A greedy agent never applies `alias_analysis` (base effect = 0.01) and so never unlocks `vectorization`. An RL agent learns to sacrifice early reward to enable these chains. | |
| --- | |
| ## Action Space | |
| ```python | |
| class CompilerOptAction(Action): | |
| pass_id: int # Integer in [0, 14] — which pass to apply | |
| task_id: int # 1 = easy, 2 = medium, 3 = hard | |
| ``` | |
| **Available passes:** | |
| | ID | Pass Name | Base Effect | Role | | |
| |---|---|---|---| | |
| | 0 | dead_code_elimination | 0.08 | Enabler for vectorization | | |
| | 1 | constant_folding | 0.06 | Enabler for strength_reduction | | |
| | 2 | loop_unrolling | 0.10 | High value, needs LIM first | | |
| | 3 | function_inlining | 0.03 | High potential, needs interproc | | |
| | 4 | vectorization | 0.02 | Highest potential (7x with prereqs) | | |
| | 5 | loop_invariant_motion | 0.06 | Enabler for loop_unrolling | | |
| | 6 | strength_reduction | 0.05 | Good with alias + const_fold | | |
| | 7 | common_subexpr_elimination | 0.07 | Standalone value | | |
| | 8 | tail_call_optimization | 0.04 | Standalone value | | |
| | 9 | branch_prediction_hints | 0.03 | Standalone value | | |
| | 10 | register_allocation | 0.05 | Enabler for instr_scheduling | | |
| | 11 | instruction_scheduling | 0.04 | Needs reg_alloc first | | |
| | 12 | memory_coalescing | 0.06 | Needs alias first | | |
| | 13 | alias_analysis | 0.01 | **Key enabler** — low base, high unlock | | |
| | 14 | interprocedural_analysis | 0.01 | **Key enabler** — low base, high unlock | | |
| --- | |
| ## Observation Space | |
| ```python | |
| class CompilerOptObservation(Observation): | |
| # Cost tracking | |
| estimated_cost: float # Current runtime cost estimate | |
| baseline_cost: float # Cost before any optimization | |
| # Program IR features (static per episode) | |
| num_instructions: int | |
| num_loops: int | |
| num_branches: int | |
| num_functions: int | |
| loop_depth: int | |
| program_type: str # e.g. "vectorizable", "loop_heavy" | |
| # Episode progress | |
| passes_applied: List[int] # Ordered history of applied passes | |
| passes_available: List[int] | |
| step_count: int | |
| max_steps: int # 10 | |
| # Synergy state | |
| synergy_state: List[float] # Per-pass current effectiveness multiplier | |
| # Task info | |
| task_id: int | |
| task_description: str | |
| # Results | |
| done: bool | |
| reward: float | |
| improvement_pct: float # Total % cost reduction from baseline | |
| last_pass_name: str | |
| grader_score: float # 0.0–1.0, populated when done=True | |
| ``` | |
| --- | |
| ## Tasks | |
| ### Task 1 — Easy | |
| **Goal:** Optimize a vectorizable or loop-heavy program by discovering the primary synergy chain. | |
| **What the agent must learn:** Apply `alias_analysis` (low immediate value) then `dead_code_elimination`, then `vectorization` fires at 7x base effectiveness. | |
| **Grader thresholds:** | |
| - Score 1.0: ≥35% cost reduction | |
| - Score 0.7: ≥25% cost reduction | |
| - Score 0.4: ≥15% cost reduction | |
| - Score 0.0: <15% | |
| **Greedy baseline:** ~18-20% | **RL target:** ~38-45% | |
| --- | |
| ### Task 2 — Medium | |
| **Goal:** Optimize a compute-heavy or inlining-heavy program by discovering **two independent synergy chains**. | |
| **What the agent must learn:** | |
| - Chain 1: `alias_analysis + DCE → vectorization` (7x) | |
| - Chain 2: `interprocedural_analysis → function_inlining` (5x) | |
| Both chains must be activated within 10 steps. | |
| **Grader thresholds:** | |
| - Score 1.0: ≥45% cost reduction | |
| - Score 0.7: ≥35% cost reduction | |
| - Score 0.4: ≥22% cost reduction | |
| - Score 0.0: <22% | |
| **Greedy baseline:** ~20-24% | **RL target:** ~42-50% | |
| --- | |
| ### Task 3 — Hard | |
| **Goal:** Optimize any program type. Program is sampled randomly from 7 templates. The agent must generalize across program types with different optimal orderings. | |
| **What the agent must learn:** Generalized pass ordering policy. Different programs reward different chains. A `vectorizable` program rewards chain 1; a `recursive` program rewards chain 2; a `loop_heavy` program rewards `LIM → loop_unrolling`. | |
| **Grader thresholds:** | |
| - Score 1.0: ≥52% cost reduction | |
| - Score 0.7: ≥42% cost reduction | |
| - Score 0.4: ≥28% cost reduction | |
| - Score 0.0: <28% | |
| **Greedy baseline:** ~19-23% | **RL target:** ~45-52% | |
| --- | |
| ## Reward Function | |
| Reward is provided at every step (not just terminal), giving a dense signal across the full trajectory: | |
| ``` | |
| Per-step reward = marginal_improvement - 0.02 (step penalty) | |
| marginal_improvement = (cost_before - cost_after) / baseline_cost | |
| Terminal bonus: | |
| improvement >= 50%: +0.6 | |
| improvement >= 40%: +0.4 | |
| improvement >= 30%: +0.2 | |
| improvement >= 20%: +0.05 | |
| improvement <= 0%: -0.2 | |
| Penalties: | |
| Re-applying an already-used pass: -0.3 | |
| Applying pass without prerequisites: reward naturally low (0.3x effectiveness) | |
| ``` | |
| The step penalty encourages the agent to find efficient sequences (fewer passes to reach the same improvement), matching real compiler efficiency goals. | |
| --- | |
| ## Grader | |
| The grader is called at episode end and returns a score in [0.0, 1.0]: | |
| ```python | |
| score = grade_by_threshold(improvement_pct, task_id) | |
| score -= 0.05 * max(0, steps_used - minimum_steps_needed) | |
| ``` | |
| The efficiency deduction (0.05 per extra step) rewards agents that find the optimal chain quickly over those that brute-force all 10 steps. | |
| --- | |
| ## Baseline Scores | |
| Measured using `gpt-4o-mini` via the OpenAI API, 5 episodes per task: | |
| | Task | Avg Score | Avg Improvement | Best Score | | |
| |---|---|---|---| | |
| | Task 1 (Easy) | ~0.42 | ~28% | ~0.70 | | |
| | Task 2 (Medium) | ~0.28 | ~24% | ~0.40 | | |
| | Task 3 (Hard) | ~0.21 | ~21% | ~0.40 | | |
| | **Overall** | **~0.30** | **~24%** | — | | |
| The LLM agent partially discovers the alias → vectorization chain (~30% of episodes) but rarely sequences both chains correctly in Task 2/3. | |
| --- | |
| ## Setup & Usage | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - Docker (for containerized deployment) | |
| ### Local Development (no Docker) | |
| ```bash | |
| # Clone / navigate to project | |
| cd compiler_opt_env | |
| # Install | |
| pip install -e . | |
| # Start server | |
| uvicorn server.app:app --reload --host 0.0.0.0 --port 8000 | |
| # In another terminal — run sanity test | |
| python test_env.py | |
| # Run LLM baseline | |
| export OPENAI_API_KEY=your_key | |
| python baseline_agent.py --base-url http://localhost:8000 --episodes 3 | |
| ``` | |
| ### Docker | |
| ```bash | |
| # Build | |
| docker build -t compiler-opt-env:latest -f server/Dockerfile . | |
| # Run | |
| docker run -p 8000:8000 compiler-opt-env:latest | |
| # Web UI | |
| open http://localhost:8000/web | |
| ``` | |
| ### Programmatic Usage | |
| ```python | |
| from compiler_opt_env import CompilerOptAction, CompilerOptEnv | |
| from compiler_opt_env.models import TASK_EASY, TASK_MEDIUM, TASK_HARD | |
| with CompilerOptEnv(base_url="http://localhost:8000").sync() as env: | |
| # Task 1: easy | |
| obs = env.reset() | |
| # Optimal sequence for Task 1: | |
| for pass_id in [13, 0, 4]: # alias → DCE → vectorization | |
| result = env.step(CompilerOptAction(pass_id=pass_id, task_id=TASK_EASY)) | |
| print(f"{result.observation.last_pass_name}: {result.observation.improvement_pct:.1f}%") | |
| print(f"Grader score: {result.observation.grader_score}") | |
| ``` | |
| --- | |
| ## API Endpoints | |
| | Endpoint | Method | Description | | |
| |---|---|---| | |
| | `/reset` | POST | Start new episode | | |
| | `/step` | POST | Apply an action | | |
| | `/state` | GET | Current episode metadata | | |
| | `/schema` | GET | Action/observation schemas | | |
| | `/health` | GET | Health check | | |
| | `/web` | GET | Interactive web UI | | |
| | `/docs` | GET | Swagger API documentation | | |
| | `/ws` | WS | WebSocket for low-latency sessions | | |
| --- | |
| ## Project Structure | |
| ``` | |
| compiler_opt_env/ | |
| ├── __init__.py # Exports: CompilerOptEnv, CompilerOptAction, CompilerOptObservation | |
| ├── models.py # Action, Observation, constants, task IDs | |
| ├── client.py # CompilerOptEnv(EnvClient) — HTTP/WebSocket client | |
| ├── baseline_agent.py # LLM baseline using OpenAI API | |
| ├── openenv.yaml # OpenEnv manifest | |
| ├── pyproject.toml # Dependencies | |
| ├── README.md # This file | |
| └── server/ | |
| ├── __init__.py | |
| ├── compiler_opt_env_environment.py # Core RL logic: passes, synergy, grader | |
| ├── app.py # FastAPI app | |
| └── Dockerfile # Container definition | |
| ``` | |
| --- | |
| ## References | |
| - [DeepMind MLGO: A Machine Learning Guided Compiler Optimizations Framework](https://arxiv.org/abs/2101.04808) | |
| - [Meta OpenEnv](https://github.com/meta-pytorch/OpenEnv) | |
| - [LLVM Optimization Passes](https://llvm.org/docs/Passes.html) | |