Spaces:

Parth3841
/

compiler_opt_env

Sleeping

App Files Files Community

compiler_opt_env / README.md

Parth3841

Upload folder using huggingface_hub

1047077 verified 8 days ago

preview code

raw

history blame contribute delete

10.7 kB

metadata

title: Compiler Pass Ordering RL Environment
emoji: ⚙️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Compiler Pass Ordering RL Environment

An OpenEnv environment that simulates compiler optimization pass ordering — a real-world task performed by production compilers (GCC, LLVM) every time they build software. The agent must select a sequence of optimization passes to apply to a program's Intermediate Representation (IR) to minimize estimated runtime cost.

This is the same class of problem addressed by DeepMind's MLGO paper (2022), which showed RL finds better pass orderings than the hand-tuned heuristics used in production compilers today.

Motivation

When a compiler optimizes code, it applies a sequence of transformations called passes: dead code elimination, loop unrolling, vectorization, function inlining, etc. There are ~50 such passes in LLVM. The order you apply them matters enormously — applying pass A before B can unlock optimizations that B before A cannot.

Why RL is necessary here (not greedy or heuristics):

Greedy selection (always pick the pass with highest immediate benefit) achieves ~20% cost reduction.
An RL agent that learns prerequisite chains (e.g., alias_analysis → vectorization gives 7x effectiveness) achieves ~42-50%.
The search space is factorial (15! orderings), and the reward for an "enabling" pass only materializes 3-4 steps later — exactly the credit assignment problem RL is built for.

Environment Design

Prerequisite Gates

The core mechanism that makes RL necessary. Each high-value pass has prerequisites:

Pass	Prerequisites Required	Synergy Multiplier	Without Prerequisites
vectorization	alias_analysis + dead_code_elimination	7.0x	0.3x (penalty)
function_inlining	interprocedural_analysis	5.0x	0.3x
memory_coalescing	alias_analysis	3.5x	0.3x
strength_reduction	alias_analysis + constant_folding	3.0x	0.3x
instruction_scheduling	register_allocation	2.5x	0.3x
loop_unrolling	loop_invariant_motion	2.0x	0.3x

A greedy agent never applies alias_analysis (base effect = 0.01) and so never unlocks vectorization. An RL agent learns to sacrifice early reward to enable these chains.

Action Space

class CompilerOptAction(Action):
    pass_id: int   # Integer in [0, 14] — which pass to apply
    task_id: int   # 1 = easy, 2 = medium, 3 = hard

Available passes:

ID	Pass Name	Base Effect	Role
0	dead_code_elimination	0.08	Enabler for vectorization
1	constant_folding	0.06	Enabler for strength_reduction
2	loop_unrolling	0.10	High value, needs LIM first
3	function_inlining	0.03	High potential, needs interproc
4	vectorization	0.02	Highest potential (7x with prereqs)
5	loop_invariant_motion	0.06	Enabler for loop_unrolling
6	strength_reduction	0.05	Good with alias + const_fold
7	common_subexpr_elimination	0.07	Standalone value
8	tail_call_optimization	0.04	Standalone value
9	branch_prediction_hints	0.03	Standalone value
10	register_allocation	0.05	Enabler for instr_scheduling
11	instruction_scheduling	0.04	Needs reg_alloc first
12	memory_coalescing	0.06	Needs alias first
13	alias_analysis	0.01	Key enabler — low base, high unlock
14	interprocedural_analysis	0.01	Key enabler — low base, high unlock

Observation Space

class CompilerOptObservation(Observation):
    # Cost tracking
    estimated_cost: float      # Current runtime cost estimate
    baseline_cost: float       # Cost before any optimization

    # Program IR features (static per episode)
    num_instructions: int
    num_loops: int
    num_branches: int
    num_functions: int
    loop_depth: int
    program_type: str          # e.g. "vectorizable", "loop_heavy"

    # Episode progress
    passes_applied: List[int]  # Ordered history of applied passes
    passes_available: List[int]
    step_count: int
    max_steps: int             # 10

    # Synergy state
    synergy_state: List[float] # Per-pass current effectiveness multiplier

    # Task info
    task_id: int
    task_description: str

    # Results
    done: bool
    reward: float
    improvement_pct: float     # Total % cost reduction from baseline
    last_pass_name: str
    grader_score: float        # 0.0–1.0, populated when done=True

Tasks

Task 1 — Easy

Goal: Optimize a vectorizable or loop-heavy program by discovering the primary synergy chain.

What the agent must learn: Apply alias_analysis (low immediate value) then dead_code_elimination, then vectorization fires at 7x base effectiveness.

Grader thresholds:

Score 1.0: ≥35% cost reduction
Score 0.7: ≥25% cost reduction
Score 0.4: ≥15% cost reduction
Score 0.0: <15%

Greedy baseline: ~18-20% | RL target: ~38-45%

Task 2 — Medium

Goal: Optimize a compute-heavy or inlining-heavy program by discovering two independent synergy chains.

What the agent must learn:

Chain 1: alias_analysis + DCE → vectorization (7x)
Chain 2: interprocedural_analysis → function_inlining (5x) Both chains must be activated within 10 steps.

Grader thresholds:

Score 1.0: ≥45% cost reduction
Score 0.7: ≥35% cost reduction
Score 0.4: ≥22% cost reduction
Score 0.0: <22%

Greedy baseline: ~20-24% | RL target: ~42-50%

Task 3 — Hard

Goal: Optimize any program type. Program is sampled randomly from 7 templates. The agent must generalize across program types with different optimal orderings.

What the agent must learn: Generalized pass ordering policy. Different programs reward different chains. A vectorizable program rewards chain 1; a recursive program rewards chain 2; a loop_heavy program rewards LIM → loop_unrolling.

Grader thresholds:

Score 1.0: ≥52% cost reduction
Score 0.7: ≥42% cost reduction
Score 0.4: ≥28% cost reduction
Score 0.0: <28%

Greedy baseline: ~19-23% | RL target: ~45-52%

Reward Function

Reward is provided at every step (not just terminal), giving a dense signal across the full trajectory:

Per-step reward = marginal_improvement - 0.02 (step penalty)

marginal_improvement = (cost_before - cost_after) / baseline_cost

Terminal bonus:
  improvement >= 50%: +0.6
  improvement >= 40%: +0.4
  improvement >= 30%: +0.2
  improvement >= 20%: +0.05
  improvement <= 0%:  -0.2

Penalties:
  Re-applying an already-used pass: -0.3
  Applying pass without prerequisites: reward naturally low (0.3x effectiveness)

The step penalty encourages the agent to find efficient sequences (fewer passes to reach the same improvement), matching real compiler efficiency goals.

Grader

The grader is called at episode end and returns a score in [0.0, 1.0]:

score = grade_by_threshold(improvement_pct, task_id)
score -= 0.05 * max(0, steps_used - minimum_steps_needed)

The efficiency deduction (0.05 per extra step) rewards agents that find the optimal chain quickly over those that brute-force all 10 steps.

Baseline Scores

Measured using gpt-4o-mini via the OpenAI API, 5 episodes per task:

Task	Avg Score	Avg Improvement	Best Score
Task 1 (Easy)	~0.42	~28%	~0.70
Task 2 (Medium)	~0.28	~24%	~0.40
Task 3 (Hard)	~0.21	~21%	~0.40
Overall	~0.30	~24%	—

The LLM agent partially discovers the alias → vectorization chain (~30% of episodes) but rarely sequences both chains correctly in Task 2/3.

Setup & Usage

Prerequisites

Python 3.10+
Docker (for containerized deployment)

Local Development (no Docker)

# Clone / navigate to project
cd compiler_opt_env

# Install
pip install -e .

# Start server
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

# In another terminal — run sanity test
python test_env.py

# Run LLM baseline
export OPENAI_API_KEY=your_key
python baseline_agent.py --base-url http://localhost:8000 --episodes 3

Docker

# Build
docker build -t compiler-opt-env:latest -f server/Dockerfile .

# Run
docker run -p 8000:8000 compiler-opt-env:latest

# Web UI
open http://localhost:8000/web

Programmatic Usage

from compiler_opt_env import CompilerOptAction, CompilerOptEnv
from compiler_opt_env.models import TASK_EASY, TASK_MEDIUM, TASK_HARD

with CompilerOptEnv(base_url="http://localhost:8000").sync() as env:
    # Task 1: easy
    obs = env.reset()
    
    # Optimal sequence for Task 1:
    for pass_id in [13, 0, 4]:   # alias → DCE → vectorization
        result = env.step(CompilerOptAction(pass_id=pass_id, task_id=TASK_EASY))
        print(f"{result.observation.last_pass_name}: {result.observation.improvement_pct:.1f}%")
    
    print(f"Grader score: {result.observation.grader_score}")

API Endpoints

Endpoint	Method	Description
`/reset`	POST	Start new episode
`/step`	POST	Apply an action
`/state`	GET	Current episode metadata
`/schema`	GET	Action/observation schemas
`/health`	GET	Health check
`/web`	GET	Interactive web UI
`/docs`	GET	Swagger API documentation
`/ws`	WS	WebSocket for low-latency sessions

Project Structure

compiler_opt_env/
├── __init__.py                          # Exports: CompilerOptEnv, CompilerOptAction, CompilerOptObservation
├── models.py                            # Action, Observation, constants, task IDs
├── client.py                            # CompilerOptEnv(EnvClient) — HTTP/WebSocket client
├── baseline_agent.py                    # LLM baseline using OpenAI API
├── openenv.yaml                         # OpenEnv manifest
├── pyproject.toml                       # Dependencies
├── README.md                            # This file
└── server/
    ├── __init__.py
    ├── compiler_opt_env_environment.py  # Core RL logic: passes, synergy, grader
    ├── app.py                           # FastAPI app
    └── Dockerfile                       # Container definition

Compiler Pass Ordering RL Environment

Motivation

Environment Design

Prerequisite Gates

Action Space

Observation Space

Tasks

Task 1 — Easy

Task 2 — Medium

Task 3 — Hard

Reward Function

Grader

Baseline Scores

Setup & Usage

Prerequisites

Local Development (no Docker)

Docker

Programmatic Usage

API Endpoints

Project Structure

References