Spaces:

Parth3841
/

compiler_opt_env

Sleeping

App Files Files Community

compiler_opt_env / README.md

Parth3841

Upload folder using huggingface_hub

1047077 verified 8 days ago

preview code

raw

history blame contribute delete

10.7 kB

	---
	title: Compiler Pass Ordering RL Environment
	emoji: ⚙️
	colorFrom: gray
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Compiler Pass Ordering RL Environment

	An OpenEnv environment that simulates compiler optimization pass ordering — a real-world task performed by production compilers (GCC, LLVM) every time they build software. The agent must select a sequence of optimization passes to apply to a program's Intermediate Representation (IR) to minimize estimated runtime cost.

	This is the same class of problem addressed by DeepMind's MLGO paper (2022), which showed RL finds better pass orderings than the hand-tuned heuristics used in production compilers today.

	---

	## Motivation

	When a compiler optimizes code, it applies a sequence of transformations called passes: dead code elimination, loop unrolling, vectorization, function inlining, etc. There are ~50 such passes in LLVM. The order you apply them matters enormously — applying pass A before B can unlock optimizations that B before A cannot.

	Why RL is necessary here (not greedy or heuristics):

	- Greedy selection (always pick the pass with highest immediate benefit) achieves ~20% cost reduction.
	- An RL agent that learns prerequisite chains (e.g., `alias_analysis → vectorization` gives 7x effectiveness) achieves ~42-50%.
	- The search space is factorial (15! orderings), and the reward for an "enabling" pass only materializes 3-4 steps later — exactly the credit assignment problem RL is built for.

	---

	## Environment Design

	### Prerequisite Gates

	The core mechanism that makes RL necessary. Each high-value pass has prerequisites:

	\| Pass \| Prerequisites Required \| Synergy Multiplier \| Without Prerequisites \|
	\|---\|---\|---\|---\|
	\| vectorization \| alias_analysis + dead_code_elimination \| 7.0x \| 0.3x (penalty) \|
	\| function_inlining \| interprocedural_analysis \| 5.0x \| 0.3x \|
	\| memory_coalescing \| alias_analysis \| 3.5x \| 0.3x \|
	\| strength_reduction \| alias_analysis + constant_folding \| 3.0x \| 0.3x \|
	\| instruction_scheduling \| register_allocation \| 2.5x \| 0.3x \|
	\| loop_unrolling \| loop_invariant_motion \| 2.0x \| 0.3x \|

	A greedy agent never applies `alias_analysis` (base effect = 0.01) and so never unlocks `vectorization`. An RL agent learns to sacrifice early reward to enable these chains.

	---

	## Action Space

	```python
	class CompilerOptAction(Action):
	pass_id: int # Integer in [0, 14] — which pass to apply
	task_id: int # 1 = easy, 2 = medium, 3 = hard
	```

	Available passes:

	\| ID \| Pass Name \| Base Effect \| Role \|
	\|---\|---\|---\|---\|
	\| 0 \| dead_code_elimination \| 0.08 \| Enabler for vectorization \|
	\| 1 \| constant_folding \| 0.06 \| Enabler for strength_reduction \|
	\| 2 \| loop_unrolling \| 0.10 \| High value, needs LIM first \|
	\| 3 \| function_inlining \| 0.03 \| High potential, needs interproc \|
	\| 4 \| vectorization \| 0.02 \| Highest potential (7x with prereqs) \|
	\| 5 \| loop_invariant_motion \| 0.06 \| Enabler for loop_unrolling \|
	\| 6 \| strength_reduction \| 0.05 \| Good with alias + const_fold \|
	\| 7 \| common_subexpr_elimination \| 0.07 \| Standalone value \|
	\| 8 \| tail_call_optimization \| 0.04 \| Standalone value \|
	\| 9 \| branch_prediction_hints \| 0.03 \| Standalone value \|
	\| 10 \| register_allocation \| 0.05 \| Enabler for instr_scheduling \|
	\| 11 \| instruction_scheduling \| 0.04 \| Needs reg_alloc first \|
	\| 12 \| memory_coalescing \| 0.06 \| Needs alias first \|
	\| 13 \| alias_analysis \| 0.01 \| Key enabler — low base, high unlock \|
	\| 14 \| interprocedural_analysis \| 0.01 \| Key enabler — low base, high unlock \|

	---

	## Observation Space

	```python
	class CompilerOptObservation(Observation):
	# Cost tracking
	estimated_cost: float # Current runtime cost estimate
	baseline_cost: float # Cost before any optimization

	# Program IR features (static per episode)
	num_instructions: int
	num_loops: int
	num_branches: int
	num_functions: int
	loop_depth: int
	program_type: str # e.g. "vectorizable", "loop_heavy"

	# Episode progress
	passes_applied: List[int] # Ordered history of applied passes
	passes_available: List[int]
	step_count: int
	max_steps: int # 10

	# Synergy state
	synergy_state: List[float] # Per-pass current effectiveness multiplier

	# Task info
	task_id: int
	task_description: str

	# Results
	done: bool
	reward: float
	improvement_pct: float # Total % cost reduction from baseline
	last_pass_name: str
	grader_score: float # 0.0–1.0, populated when done=True
	```

	---

	## Tasks

	### Task 1 — Easy
	Goal: Optimize a vectorizable or loop-heavy program by discovering the primary synergy chain.

	What the agent must learn: Apply `alias_analysis` (low immediate value) then `dead_code_elimination`, then `vectorization` fires at 7x base effectiveness.

	Grader thresholds:
	- Score 1.0: ≥35% cost reduction
	- Score 0.7: ≥25% cost reduction
	- Score 0.4: ≥15% cost reduction
	- Score 0.0: <15%

	Greedy baseline: ~18-20% \| RL target: ~38-45%

	---

	### Task 2 — Medium
	Goal: Optimize a compute-heavy or inlining-heavy program by discovering two independent synergy chains.

	What the agent must learn:
	- Chain 1: `alias_analysis + DCE → vectorization` (7x)
	- Chain 2: `interprocedural_analysis → function_inlining` (5x)
	Both chains must be activated within 10 steps.

	Grader thresholds:
	- Score 1.0: ≥45% cost reduction
	- Score 0.7: ≥35% cost reduction
	- Score 0.4: ≥22% cost reduction
	- Score 0.0: <22%

	Greedy baseline: ~20-24% \| RL target: ~42-50%

	---

	### Task 3 — Hard
	Goal: Optimize any program type. Program is sampled randomly from 7 templates. The agent must generalize across program types with different optimal orderings.

	What the agent must learn: Generalized pass ordering policy. Different programs reward different chains. A `vectorizable` program rewards chain 1; a `recursive` program rewards chain 2; a `loop_heavy` program rewards `LIM → loop_unrolling`.

	Grader thresholds:
	- Score 1.0: ≥52% cost reduction
	- Score 0.7: ≥42% cost reduction
	- Score 0.4: ≥28% cost reduction
	- Score 0.0: <28%

	Greedy baseline: ~19-23% \| RL target: ~45-52%

	---

	## Reward Function

	Reward is provided at every step (not just terminal), giving a dense signal across the full trajectory:

	```
	Per-step reward = marginal_improvement - 0.02 (step penalty)

	marginal_improvement = (cost_before - cost_after) / baseline_cost

	Terminal bonus:
	improvement >= 50%: +0.6
	improvement >= 40%: +0.4
	improvement >= 30%: +0.2
	improvement >= 20%: +0.05
	improvement <= 0%: -0.2

	Penalties:
	Re-applying an already-used pass: -0.3
	Applying pass without prerequisites: reward naturally low (0.3x effectiveness)
	```

	The step penalty encourages the agent to find efficient sequences (fewer passes to reach the same improvement), matching real compiler efficiency goals.

	---

	## Grader

	The grader is called at episode end and returns a score in [0.0, 1.0]:

	```python
	score = grade_by_threshold(improvement_pct, task_id)
	score -= 0.05 * max(0, steps_used - minimum_steps_needed)
	```

	The efficiency deduction (0.05 per extra step) rewards agents that find the optimal chain quickly over those that brute-force all 10 steps.

	---

	## Baseline Scores

	Measured using `gpt-4o-mini` via the OpenAI API, 5 episodes per task:

	\| Task \| Avg Score \| Avg Improvement \| Best Score \|
	\|---\|---\|---\|---\|
	\| Task 1 (Easy) \| ~0.42 \| ~28% \| ~0.70 \|
	\| Task 2 (Medium) \| ~0.28 \| ~24% \| ~0.40 \|
	\| Task 3 (Hard) \| ~0.21 \| ~21% \| ~0.40 \|
	\| Overall \| ~0.30 \| ~24% \| — \|

	The LLM agent partially discovers the alias → vectorization chain (~30% of episodes) but rarely sequences both chains correctly in Task 2/3.

	---

	## Setup & Usage

	### Prerequisites
	- Python 3.10+
	- Docker (for containerized deployment)

	### Local Development (no Docker)

	```bash
	# Clone / navigate to project
	cd compiler_opt_env

	# Install
	pip install -e .

	# Start server
	uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

	# In another terminal — run sanity test
	python test_env.py

	# Run LLM baseline
	export OPENAI_API_KEY=your_key
	python baseline_agent.py --base-url http://localhost:8000 --episodes 3
	```

	### Docker

	```bash
	# Build
	docker build -t compiler-opt-env:latest -f server/Dockerfile .

	# Run
	docker run -p 8000:8000 compiler-opt-env:latest

	# Web UI
	open http://localhost:8000/web
	```

	### Programmatic Usage

	```python
	from compiler_opt_env import CompilerOptAction, CompilerOptEnv
	from compiler_opt_env.models import TASK_EASY, TASK_MEDIUM, TASK_HARD

	with CompilerOptEnv(base_url="http://localhost:8000").sync() as env:
	# Task 1: easy
	obs = env.reset()

	# Optimal sequence for Task 1:
	for pass_id in [13, 0, 4]: # alias → DCE → vectorization
	result = env.step(CompilerOptAction(pass_id=pass_id, task_id=TASK_EASY))
	print(f"{result.observation.last_pass_name}: {result.observation.improvement_pct:.1f}%")

	print(f"Grader score: {result.observation.grader_score}")
	```

	---

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\|---\|---\|---\|
	\| `/reset` \| POST \| Start new episode \|
	\| `/step` \| POST \| Apply an action \|
	\| `/state` \| GET \| Current episode metadata \|
	\| `/schema` \| GET \| Action/observation schemas \|
	\| `/health` \| GET \| Health check \|
	\| `/web` \| GET \| Interactive web UI \|
	\| `/docs` \| GET \| Swagger API documentation \|
	\| `/ws` \| WS \| WebSocket for low-latency sessions \|

	---

	## Project Structure

	```
	compiler_opt_env/
	├── __init__.py # Exports: CompilerOptEnv, CompilerOptAction, CompilerOptObservation
	├── models.py # Action, Observation, constants, task IDs
	├── client.py # CompilerOptEnv(EnvClient) — HTTP/WebSocket client
	├── baseline_agent.py # LLM baseline using OpenAI API
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml # Dependencies
	├── README.md # This file
	└── server/
	├── __init__.py
	├── compiler_opt_env_environment.py # Core RL logic: passes, synergy, grader
	├── app.py # FastAPI app
	└── Dockerfile # Container definition
	```

	---

	## References

	- [DeepMind MLGO: A Machine Learning Guided Compiler Optimizations Framework](https://arxiv.org/abs/2101.04808)
	- [Meta OpenEnv](https://github.com/meta-pytorch/OpenEnv)
	- [LLVM Optimization Passes](https://llvm.org/docs/Passes.html)