Spaces:

PRANAV05092003
/

autonomous-code-refactoring-env

Sleeping

App Files Files Community

PRANAV05092003 commited on Apr 7

Commit

bc5030f

1 Parent(s): 8422246

Fixed structure (moved files to root)

Browse files

Files changed (23) hide show

.gitignore +26 -0
Dockerfile +23 -0
README.md +168 -4
acre/__init__.py +14 -0
acre/actions/__init__.py +6 -0
acre/actions/transformations.py +518 -0
acre/datasets/__init__.py +6 -0
acre/datasets/code_samples.py +34 -0
acre/demo.py +185 -0
acre/main.py +39 -0
acre/tasks/__init__.py +3 -0
acre/tasks/task_registry.py +222 -0
acre/training/__init__.py +6 -0
acre/training/train_agent.py +75 -0
acre/utils/__init__.py +6 -0
acre/utils/metrics.py +33 -0
inference.py +278 -0
models.py +156 -0
openenv.yaml +85 -0
openenv_interface.py +116 -0
requirements.txt +11 -0
server.py +667 -0
validate.py +281 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,26 @@

+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+*.egg-info/
+dist/
+build/
+.env
+.venv
+venv/
+*.zip
+acre_agent.zip
+*.log
+.DS_Store
+.deps/
+libs/
+numpy.libs/
+*.dll
+*.so
+*.dylib
+env/
+ENV/
+.cache/
+.huggingface/
+Thumbs.db

Dockerfile ADDED Viewed

	@@ -0,0 +1,23 @@

+FROM python:3.11-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+ENV API_BASE_URL=https://api.openai.com/v1
+ENV MODEL_NAME=gpt-4o-mini
+ENV PORT=7860
+EXPOSE 7860
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+  CMD python -c "import requests; requests.get('http://localhost:7860/').raise_for_status()"
+CMD ["python", "server.py"]

README.md CHANGED Viewed

@@ -1,10 +1,174 @@
 ---
-title: Autonomous Code Refactoring Env
-emoji: ⚡
 colorFrom: blue
-colorTo: red
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: ACRE - Autonomous Code Refactoring Environment
 colorFrom: blue
+colorTo: green
 sdk: docker
+app_port: 7860
 pinned: false
+license: mit
+tags:
+  - openenv
 ---
+# ACRE - Autonomous Code Refactoring Environment
+ACRE is an OpenEnv-compatible environment for autonomous Python code refactoring. An agent receives real code-cleanup tasks and must improve the code through AST-based transformations while receiving dense reward feedback for correctness, simplification, and performance.
+## Environment Overview and Motivation
+This project simulates a realistic developer workflow: cleaning up messy Python code, removing dead logic, simplifying loops, and inlining trivial helpers. The canonical OpenEnv wrapper lives in `openenv_interface.py`, while the original Gymnasium-compatible environment remains available for RL training and demos.
+## Definitions of Action and Observation Spaces
+### Action Space - Discrete(5)
+| Action | Name | Description |
+|---|---|---|
+| 0 | rename_variable | Rename generic variables like `x`, `tmp`, and `i` |
+| 1 | remove_dead_code | Remove unreachable statements, `if False` branches, and unused assignments |
+| 2 | simplify_loop | Convert append-loops into list comprehensions |
+| 3 | optimize_condition | Simplify `not not x`, `if True`, `if False`, and boolean comparisons |
+| 4 | inline_function | Inline simple single-return module-level functions |
+### Observation Space - Box(4,)
+The environment tracks:
+- `code_length`
+- `complexity_score`
+- `runtime_s`
+- `error_flag`
+### Typed OpenEnv Models
+The submission-facing interface uses Pydantic models in `models.py`:
+- `ObservationModel`
+- `ActionModel`
+- `RewardModel`
+- `StateResponse`
+The canonical interface is:
+```python
+observation = env.reset(...)
+observation, reward, done, info = env.step(action)
+state = env.state()
+```
+## Task Descriptions with Expected Difficulty Levels
+| Task ID | Difficulty | Objective |
+|---|---|---|
+| `rename_variables` | Easy | Remove generic variable names from the snippet |
+| `remove_dead_code` | Medium | Eliminate dead branches, unreachable code, and unused assignments |
+| `full_refactor` | Hard | Combine renaming, dead-code removal, loop simplification, condition optimization, and inlining |
+Each task includes a deterministic AST-based grader returning a score in `[0.0, 1.0]`.
+## Reward Design
+Rewards are shaped throughout the trajectory instead of only at the end.
+- Success reward for syntactically valid, executable output
+- Complexity reward when control-flow complexity decreases
+- Performance reward when runtime improves
+- Error penalty for invalid or failing code
+- No-change penalty to discourage loops and unproductive actions
+Raw reward range is `[-32, 20]`, normalized to `[0.0, 1.0]` with `(raw + 32) / 52`.
+## HTTP API
+| Method | Path | Purpose |
+|---|---|---|
+| GET | `/` | Health check |
+| GET | `/health` | Compatibility health check |
+| POST | `/reset` | Reset environment and return typed observation/state |
+| POST | `/step` | Apply one action and return typed observation/reward/done |
+| GET | `/state` | Return the current typed state |
+| GET | `/tasks` | List available tasks |
+| POST | `/tasks/{task_id}/grade` | Grade submitted code |
+## Setup and Usage Instructions
+### Local setup
+```bash
+pip install -r requirements.txt
+python server.py
+```
+### Baseline inference
+Set environment variables before running:
+```bash
+export API_BASE_URL=https://api.openai.com/v1
+export MODEL_NAME=gpt-4o-mini
+export HF_TOKEN=your_key
+export ENV_URL=http://localhost:7860
+python inference.py
+```
+Notes:
+- `API_BASE_URL` and `MODEL_NAME` have defaults in `inference.py`
+- `HF_TOKEN` is optional because the script falls back to a deterministic heuristic baseline
+- `LOCAL_IMAGE_NAME` is read for evaluator compatibility when using a local Docker image launcher
+### Docker / Hugging Face Spaces
+```bash
+docker build -t acre .
+docker run -p 7860:7860 \
+  -e API_BASE_URL=https://api.openai.com/v1 \
+  -e MODEL_NAME=gpt-4o-mini \
+  -e HF_TOKEN=your_key \
+  -e ENV_URL=http://localhost:7860 \
+  acre
+```
+The repository is configured for a Docker-based Hugging Face Space and includes the `openenv` tag in the front matter.
+## Validation
+Run the repository validator:
+```bash
+python validate.py --url http://localhost:7860
+```
+When using the official hackathon tooling, also run:
+```bash
+openenv validate
+```
+## Interactive Demo
+Start the server and open:
+```text
+http://localhost:7860/demo
+```
+The demo shows:
+- Original code
+- Optimized code
+- Unified diff
+- Per-step action and reward logs
+## Baseline Performance Scores
+The deterministic fallback policy used by `inference.py` produces the following reproducible task scores:
+| Task | Score |
+|---|---|
+| `rename_variables` | 1.0 |
+| `remove_dead_code` | 1.0 |
+| `full_refactor` | 1.0 |
+| Average | 1.0 |
+These scores come from the built-in heuristic policy with `HF_TOKEN` unset, which keeps the baseline reproducible across runs.

acre/__init__.py ADDED Viewed

	@@ -0,0 +1,14 @@

+"""
+ACRE (Autonomous Code Refactoring Environment).
+Package skeleton for an RL-based code refactoring system.
+"""
+__all__ = [
+    "env",
+    "actions",
+    "datasets",
+    "training",
+    "utils",
+]

acre/actions/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Action definitions and transformations for ACRE."""
+from .transformations import Transformation, TransformationResult
+__all__ = ["Transformation", "TransformationResult"]

acre/actions/transformations.py ADDED Viewed

	@@ -0,0 +1,518 @@

+from __future__ import annotations
+import ast
+import copy
+from dataclasses import dataclass
+from itertools import zip_longest
+from typing import Any, Dict, Protocol
+@dataclass(frozen=True)
+class TransformationResult:
+    """Output of applying a transformation (placeholder)."""
+    code: str
+    changed: bool
+    metadata: Dict[str, Any]
+class Transformation(Protocol):
+    """Protocol for a code transformation."""
+    name: str
+    def apply(self, code: str) -> TransformationResult: ...
+def noop_transformation(code: str) -> TransformationResult:
+    """Baseline transformation that leaves code unchanged."""
+    return TransformationResult(code=code, changed=False, metadata={"kind": "noop"})
+def _finalize_result(*, original: str, out: str, meta: Dict[str, Any]) -> TransformationResult:
+    """
+    Standardize metadata across transformations.
+    - Adds `lines_changed` and `impact` for explainability/metrics.
+    - Ensures formatting-only changes don't count as `changed`.
+    """
+    def _count_lines_changed(a: str, b: str) -> int:
+        a_lines = a.splitlines()
+        b_lines = b.splitlines()
+        changed = 0
+        for x, y in zip_longest(a_lines, b_lines, fillvalue=None):
+            if x != y:
+                changed += 1
+        return int(changed)
+    lines_changed = _count_lines_changed(original, out)
+    # Fallback identity check: AST round-trips can reformat without changing meaning.
+    # If the textual content is the same after stripping, treat it as unchanged.
+    if out.strip() == original.strip():
+        meta["success"] = False
+        meta["lines_changed"] = 0
+        meta["impact"] = "low"
+        return TransformationResult(code=original, changed=False, metadata=meta)
+    meta["lines_changed"] = lines_changed
+    meta["impact"] = "high" if lines_changed >= 3 else "low"
+    meta["success"] = True
+    return TransformationResult(code=out, changed=True, metadata=meta)
+def _unchanged(*, code: str, meta: Dict[str, Any]) -> TransformationResult:
+    meta.setdefault("success", False)
+    meta.setdefault("lines_changed", 0)
+    meta.setdefault("impact", "low")
+    return TransformationResult(code=code, changed=False, metadata=meta)
+def rename_variable(code: str) -> TransformationResult:
+    """
+    Rename simple, generic variable names to more descriptive ones.
+    Hackathon-scope heuristic:
+    - Rename generic names in priority order: x, tmp, i.
+    - Uses descriptive base names and avoids collisions.
+    - Applies to Name nodes and function args.
+    """
+    meta: Dict[str, Any] = {"type": "rename_variable", "success": False}
+    try:
+        tree = ast.parse(code)
+        class _NameCollector(ast.NodeVisitor):
+            def __init__(self) -> None:
+                self.names: set[str] = set()
+            def visit_Name(self, node: ast.Name) -> None:  # noqa: N802
+                self.names.add(node.id)
+            def visit_arg(self, node: ast.arg) -> None:  # noqa: N802
+                self.names.add(node.arg)
+        collector = _NameCollector()
+        collector.visit(tree)
+        rename_plan = [
+            ("x", "value"),
+            ("tmp", "temp_value"),
+            ("i", "index"),
+        ]
+        old = ""
+        base_new = "value"
+        for candidate_old, candidate_base in rename_plan:
+            if candidate_old in collector.names:
+                old = candidate_old
+                base_new = candidate_base
+                break
+        if not old:
+            return _unchanged(code=code, meta=meta)
+        new = base_new
+        i = 1
+        while new in collector.names:
+            new = f"{base_new}{i}"
+            i += 1
+        class _Renamer(ast.NodeTransformer):
+            def __init__(self, old_name: str, new_name: str) -> None:
+                self.old_name = old_name
+                self.new_name = new_name
+                self.changed = False
+            def visit_Name(self, node: ast.Name) -> ast.AST:  # noqa: N802
+                if node.id == self.old_name:
+                    self.changed = True
+                    return ast.copy_location(ast.Name(id=self.new_name, ctx=node.ctx), node)
+                return node
+            def visit_arg(self, node: ast.arg) -> ast.AST:  # noqa: N802
+                if node.arg == self.old_name:
+                    self.changed = True
+                    new_node = copy.copy(node)
+                    new_node.arg = self.new_name
+                    return new_node
+                return node
+        renamer = _Renamer(old, new)
+        tree = renamer.visit(tree)
+        ast.fix_missing_locations(tree)
+        if not renamer.changed:
+            return _unchanged(code=code, meta=meta)
+        out = ast.unparse(tree)
+        meta["old"] = old
+        meta["new"] = new
+        # Renames tend to be small diffs; label as low impact unless the diff is large.
+        return _finalize_result(original=code, out=out, meta=meta)
+    except Exception:
+        return _unchanged(code=code, meta=meta)
+def remove_dead_code(code: str) -> TransformationResult:
+    """
+    Remove simple dead code patterns.
+    Hackathon-scope heuristics:
+    - Drop statements after `return` / `raise` in the same block.
+    - Remove `if False: ...` blocks (keep `else` if present).
+    - Remove assignments to unused names in a block (very simple check).
+    """
+    meta: Dict[str, Any] = {"type": "remove_dead_code", "success": False}
+    try:
+        tree = ast.parse(code)
+        def _is_const_bool(expr: ast.AST, value: bool) -> bool:
+            return isinstance(expr, ast.Constant) and isinstance(expr.value, bool) and expr.value is value
+        class _LoadNameCollector(ast.NodeVisitor):
+            def __init__(self) -> None:
+                self.loaded: set[str] = set()
+            def visit_Name(self, node: ast.Name) -> None:  # noqa: N802
+                if isinstance(node.ctx, ast.Load):
+                    self.loaded.add(node.id)
+        class _DeadCode(ast.NodeTransformer):
+            def __init__(self) -> None:
+                self.changed = False
+            def _prune_unreachable(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
+                out: list[ast.stmt] = []
+                unreachable = False
+                for s in stmts:
+                    if unreachable:
+                        self.changed = True
+                        continue
+                    out.append(s)
+                    if isinstance(s, (ast.Return, ast.Raise)):
+                        unreachable = True
+                return out
+            def _remove_unused_assigns(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
+                collector = _LoadNameCollector()
+                for s in stmts:
+                    collector.visit(s)
+                used = collector.loaded
+                out: list[ast.stmt] = []
+                for s in stmts:
+                    if isinstance(s, ast.Assign) and all(isinstance(t, ast.Name) for t in s.targets):
+                        targets = [t.id for t in s.targets if isinstance(t, ast.Name)]
+                        # Remove only if *all* assigned names are unused.
+                        if targets and all(t not in used for t in targets):
+                            self.changed = True
+                            continue
+                    if isinstance(s, ast.AnnAssign) and isinstance(s.target, ast.Name):
+                        if s.target.id not in used:
+                            self.changed = True
+                            continue
+                    out.append(s)
+                return out
+            def _clean_block(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
+                # First apply transformations inside statements.
+                visited = [self.visit(s) for s in stmts]
+                flat: list[ast.stmt] = []
+                for s in visited:
+                    if s is None:
+                        self.changed = True
+                        continue
+                    if isinstance(s, list):
+                        flat.extend([x for x in s if isinstance(x, ast.stmt)])
+                        self.changed = True
+                    else:
+                        flat.append(s)
+                flat = self._prune_unreachable(flat)
+                flat = self._remove_unused_assigns(flat)
+                return flat
+            def visit_Module(self, node: ast.Module) -> ast.AST:  # noqa: N802
+                node.body = self._clean_block(node.body)
+                return node
+            def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.AST:  # noqa: N802
+                node.body = self._clean_block(node.body)
+                return node
+            def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AST:  # noqa: N802
+                node.body = self._clean_block(node.body)
+                return node
+            def visit_If(self, node: ast.If) -> ast.AST | list[ast.stmt]:  # noqa: N802
+                node = self.generic_visit(node)
+                if _is_const_bool(node.test, False):
+                    self.changed = True
+                    return node.orelse or []
+                return node
+            def visit_While(self, node: ast.While) -> ast.AST | None:  # noqa: N802
+                node = self.generic_visit(node)
+                if _is_const_bool(node.test, False):
+                    self.changed = True
+                    return None
+                return node
+        dc = _DeadCode()
+        tree = dc.visit(tree)
+        ast.fix_missing_locations(tree)
+        if not dc.changed:
+            return _unchanged(code=code, meta=meta)
+        out = ast.unparse(tree)
+        return _finalize_result(original=code, out=out, meta=meta)
+    except Exception:
+        return _unchanged(code=code, meta=meta)
+def simplify_loops(code: str) -> TransformationResult:
+    """
+    Simplify very basic loop patterns into more pythonic forms.
+    Supported pattern (only when adjacent in the same block):
+    - xs = []
+      for t in it:
+          xs.append(expr)
+      => xs = [expr for t in it]
+    """
+    meta: Dict[str, Any] = {"type": "simplify_loops", "success": False}
+    try:
+        tree = ast.parse(code)
+        class _LoopSimplifier(ast.NodeTransformer):
+            def __init__(self) -> None:
+                self.changed = False
+            def _simplify_body(self, body: list[ast.stmt]) -> list[ast.stmt]:
+                out: list[ast.stmt] = []
+                i = 0
+                while i < len(body):
+                    cur = body[i]
+                    nxt = body[i + 1] if i + 1 < len(body) else None
+                    if (
+                        isinstance(cur, ast.Assign)
+                        and len(cur.targets) == 1
+                        and isinstance(cur.targets[0], ast.Name)
+                        and isinstance(cur.value, ast.List)
+                        and cur.value.elts == []
+                        and isinstance(nxt, ast.For)
+                        and len(nxt.body) == 1
+                        and isinstance(nxt.body[0], ast.Expr)
+                        and isinstance(nxt.body[0].value, ast.Call)
+                    ):
+                        list_name = cur.targets[0].id
+                        call = nxt.body[0].value
+                        if (
+                            isinstance(call.func, ast.Attribute)
+                            and isinstance(call.func.value, ast.Name)
+                            and call.func.value.id == list_name
+                            and call.func.attr == "append"
+                            and len(call.args) == 1
+                            and not call.keywords
+                        ):
+                            # Build list comprehension: [call.args[0] for <target> in <iter>]
+                            comp = ast.ListComp(
+                                elt=call.args[0],
+                                generators=[
+                                    ast.comprehension(
+                                        target=nxt.target,
+                                        iter=nxt.iter,
+                                        ifs=[],
+                                        is_async=0,
+                                    )
+                                ],
+                            )
+                            new_assign = ast.Assign(targets=[ast.Name(id=list_name, ctx=ast.Store())], value=comp)
+                            out.append(ast.copy_location(new_assign, cur))
+                            self.changed = True
+                            i += 2
+                            continue
+                    out.append(cur)
+                    i += 1
+                return out
+            def visit_Module(self, node: ast.Module) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                node.body = self._simplify_body(node.body)
+                return node
+            def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                node.body = self._simplify_body(node.body)
+                return node
+            def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                node.body = self._simplify_body(node.body)
+                return node
+        simp = _LoopSimplifier()
+        tree = simp.visit(tree)
+        ast.fix_missing_locations(tree)
+        if not simp.changed:
+            return _unchanged(code=code, meta=meta)
+        out = ast.unparse(tree)
+        return _finalize_result(original=code, out=out, meta=meta)
+    except Exception:
+        return _unchanged(code=code, meta=meta)
+def simplify_loop(code: str) -> TransformationResult:
+    # Backwards-compatible alias for the environment's action mapping.
+    return simplify_loops(code)
+def optimize_condition(code: str) -> TransformationResult:
+    """
+    Simplify redundant boolean conditions.
+    Hackathon-scope heuristics:
+    - Replace `if True:` with its body; `if False:` with `else` (if present).
+    - Simplify `not not X` -> `X`.
+    - Simplify comparisons to True/False: `X == True` -> `X`, `X == False` -> `not X`.
+    """
+    meta: Dict[str, Any] = {"type": "optimize_condition", "success": False}
+    try:
+        tree = ast.parse(code)
+        def _is_bool_const(node: ast.AST, value: bool) -> bool:
+            return isinstance(node, ast.Constant) and isinstance(node.value, bool) and node.value is value
+        class _CondOpt(ast.NodeTransformer):
+            def __init__(self) -> None:
+                self.changed = False
+            def visit_UnaryOp(self, node: ast.UnaryOp) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                if isinstance(node.op, ast.Not) and isinstance(node.operand, ast.UnaryOp) and isinstance(node.operand.op, ast.Not):
+                    self.changed = True
+                    return node.operand.operand
+                return node
+            def visit_Compare(self, node: ast.Compare) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                if len(node.ops) == 1 and len(node.comparators) == 1:
+                    op = node.ops[0]
+                    rhs = node.comparators[0]
+                    if isinstance(op, (ast.Eq, ast.Is)) and _is_bool_const(rhs, True):
+                        self.changed = True
+                        return node.left
+                    if isinstance(op, (ast.Eq, ast.Is)) and _is_bool_const(rhs, False):
+                        self.changed = True
+                        return ast.UnaryOp(op=ast.Not(), operand=node.left)
+                return node
+            def visit_If(self, node: ast.If) -> ast.AST | list[ast.stmt]:  # noqa: N802
+                node = self.generic_visit(node)
+                if _is_bool_const(node.test, True):
+                    self.changed = True
+                    return node.body
+                if _is_bool_const(node.test, False):
+                    self.changed = True
+                    return node.orelse or []
+                return node
+        opt = _CondOpt()
+        tree = opt.visit(tree)
+        ast.fix_missing_locations(tree)
+        if not opt.changed:
+            return _unchanged(code=code, meta=meta)
+        out = ast.unparse(tree)
+        return _finalize_result(original=code, out=out, meta=meta)
+    except Exception:
+        return _unchanged(code=code, meta=meta)
+def inline_function(code: str) -> TransformationResult:
+    """
+    Inline very simple functions into their call sites.
+    Supported pattern:
+    - def f(a, b): return <expr using only a,b>
+    - Replace calls: f(x, y) -> <expr with a->x, b->y>
+    Only handles module-level functions and positional args.
+    """
+    meta: Dict[str, Any] = {"type": "inline_function", "success": False}
+    try:
+        tree = ast.parse(code)
+        simple_fns: Dict[str, tuple[list[str], ast.AST]] = {}
+        for node in tree.body:
+            if not isinstance(node, ast.FunctionDef):
+                continue
+            if node.decorator_list:
+                continue
+            args = node.args
+            if args.vararg or args.kwarg or args.kwonlyargs or args.defaults or args.posonlyargs:
+                continue
+            if len(node.body) != 1 or not isinstance(node.body[0], ast.Return) or node.body[0].value is None:
+                continue
+            arg_names = [a.arg for a in args.args]
+            # Ensure the return expression only references the function's args.
+            referenced: set[str] = set()
+            class _Ref(ast.NodeVisitor):
+                def visit_Name(self, n: ast.Name) -> None:  # noqa: N802
+                    if isinstance(n.ctx, ast.Load):
+                        referenced.add(n.id)
+            _Ref().visit(node.body[0].value)
+            if not referenced.issubset(set(arg_names)):
+                continue
+            simple_fns[node.name] = (arg_names, node.body[0].value)
+        if not simple_fns:
+            return _unchanged(code=code, meta=meta)
+        class _Substitute(ast.NodeTransformer):
+            def __init__(self, mapping: Dict[str, ast.AST]) -> None:
+                self.mapping = mapping
+            def visit_Name(self, n: ast.Name) -> ast.AST:  # noqa: N802
+                if isinstance(n.ctx, ast.Load) and n.id in self.mapping:
+                    return copy.deepcopy(self.mapping[n.id])
+                return n
+        class _Inliner(ast.NodeTransformer):
+            def __init__(self) -> None:
+                self.changed = False
+            def visit_Call(self, node: ast.Call) -> ast.AST:  # noqa: N802
+                node = self.generic_visit(node)
+                if not isinstance(node.func, ast.Name):
+                    return node
+                fn = simple_fns.get(node.func.id)
+                if fn is None:
+                    return node
+                arg_names, expr = fn
+                if node.keywords or len(node.args) != len(arg_names):
+                    return node
+                mapping = {name: arg for name, arg in zip(arg_names, node.args, strict=True)}
+                new_expr = _Substitute(mapping).visit(copy.deepcopy(expr))
+                self.changed = True
+                return ast.copy_location(new_expr, node)
+        inliner = _Inliner()
+        tree = inliner.visit(tree)
+        ast.fix_missing_locations(tree)
+        if not inliner.changed:
+            return _unchanged(code=code, meta=meta)
+        out = ast.unparse(tree)
+        meta["inlined"] = sorted(simple_fns.keys())
+        return _finalize_result(original=code, out=out, meta=meta)
+    except Exception:
+        return _unchanged(code=code, meta=meta)

acre/datasets/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Datasets and sample code providers for ACRE."""
+from .code_samples import CodeSample, CodeSampleDataset
+__all__ = ["CodeSample", "CodeSampleDataset"]

acre/datasets/code_samples.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Iterable, Iterator, List, Optional
+@dataclass(frozen=True)
+class CodeSample:
+    """A single code sample (placeholder)."""
+    id: str
+    language: str
+    code: str
+class CodeSampleDataset:
+    """
+    Minimal in-memory dataset stub.
+    Later versions can back this with files, Git repos, or benchmark suites.
+    """
+    def __init__(self, samples: Optional[Iterable[CodeSample]] = None) -> None:
+        self._samples: List[CodeSample] = list(samples or [])
+    def __len__(self) -> int:
+        return len(self._samples)
+    def __iter__(self) -> Iterator[CodeSample]:
+        return iter(self._samples)
+    def add(self, sample: CodeSample) -> None:
+        self._samples.append(sample)

acre/demo.py ADDED Viewed

	@@ -0,0 +1,185 @@

+from __future__ import annotations
+import os
+import random
+import sys
+from typing import Any, Optional, Tuple
+from acre.datasets.code_samples import CodeSample, CodeSampleDataset
+from acre.env.refactor_env import RefactorEnv
+def _load_model(path: str):
+    """Load a Stable-Baselines3 PPO model if available; otherwise return None."""
+    if not os.path.exists(path):
+        return None
+    try:
+        from stable_baselines3 import PPO
+    except Exception:
+        return None
+    try:
+        return PPO.load(path)
+    except Exception:
+        return None
+def _messy_sample_code() -> str:
+    # Intentionally "messy" but valid Python for demo purposes.
+    return (
+        "def add(a,b):\n"
+        "    x=0\n"
+        "    for i in range(a):\n"
+        "        x=x+1\n"
+        "    if True:\n"
+        "        x = x\n"
+        "    if False:\n"
+        "        y=123\n"
+        "    else:\n"
+        "        y=0\n"
+        "    def f(p,q):\n"
+        "        return p+q\n"
+        "    r = f(x,y)\n"
+        "    return r\n"
+    )
+def _format_code_block(code: str) -> str:
+    return "\n".join(f"  {line}" for line in code.rstrip().splitlines()) + "\n"
+def _safe_print(text: str) -> None:
+    """
+    Print text safely across Windows consoles (some default encodings can't print emojis).
+    """
+    encoding = sys.stdout.encoding or "utf-8"
+    try:
+        text.encode(encoding)
+        print(text, flush=True)
+    except Exception:
+        # Fall back to ASCII-friendly markers if emojis can't be encoded.
+        safe = text.replace("✅", "[OK]").replace("⚠️", "[WARN]").replace("⚠", "[WARN]")
+        print(safe, flush=True)
+def _compute_runtime(executor: Any, code: str) -> float:
+    """Best-effort runtime metric using the current executor contract."""
+    try:
+        res = executor.run(code, filename="demo.py")
+        if getattr(res, "exit_code", 1) == 0 and isinstance(getattr(res, "metrics", None), dict):
+            return float(res.metrics.get("runtime_s", 0.0) or 0.0)
+    except Exception:
+        pass
+    return 0.0
+def _choose_action(model: Any, obs, env: RefactorEnv, rng: random.Random) -> Tuple[int, str]:
+    """Choose an action from the model, falling back to random."""
+    n_actions = int(getattr(getattr(env, "action_space", None), "n", 5))
+    if model is None:
+        a = int(rng.randint(0, n_actions - 1))
+        return a, "random"
+    try:
+        action, _state = model.predict(obs, deterministic=True)
+        # SB3 may return scalar or 1-element array.
+        if hasattr(action, "__len__"):
+            a = int(action[0])
+        else:
+            a = int(action)
+        return a, "ppo"
+    except Exception:
+        a = int(rng.randint(0, n_actions - 1))
+        return a, "random"
+def run_demo(*, model_path: str = "acre_agent.zip", seed: int = 0) -> None:
+    rng = random.Random(seed)
+    # Create a dataset with one messy sample so `reset()` loads it deterministically.
+    dataset = CodeSampleDataset(
+        [
+            CodeSample(
+                id="demo_sample",
+                language="python",
+                code=_messy_sample_code(),
+            )
+        ]
+    )
+    env = RefactorEnv(dataset=dataset, seed=seed)
+    model = _load_model(model_path)
+    model_status = "loaded" if model is not None else "not found (using random actions)"
+    # Reset and capture the original code/metrics.
+    obs, info = env.reset()
+    original_code = getattr(env, "_code", "")
+    original_complexity = float(getattr(env, "_compute_complexity")(original_code))
+    original_runtime = _compute_runtime(env.executor, original_code)
+    print("=" * 72)
+    print("ACRE: Autonomous RL Code Refactoring Agent (5-step episode)")
+    print(f"Model: {model_path} -> {model_status}")
+    print(f"Sample: {info.get('sample_id')} ({info.get('language')})")
+    print("=" * 72)
+    print("\nORIGINAL CODE:\n")
+    print(_format_code_block(original_code))
+    total_reward = 0.0
+    successful_transformations = 0
+    steps_taken = 0
+    for step_idx in range(1, 6):
+        action, policy = _choose_action(model, obs, env, rng)
+        obs, reward, terminated, truncated, step_info = env.step(action)
+        total_reward += float(reward)
+        steps_taken = step_idx
+        action_name = step_info.get("action_name", "unknown")
+        transform_meta = step_info.get("transform", {})
+        if isinstance(transform_meta, dict) and bool(transform_meta.get("success", False)):
+            successful_transformations += 1
+        transformed_code = getattr(env, "_code", "")
+        print("-" * 72)
+        print(f"STEP {step_idx}/5")
+        print(f"policy={policy} action={action} ({action_name})")
+        print(f"transform={transform_meta}")
+        print(f"reward={float(reward):.2f}  components={step_info.get('reward_components')}")
+        print("\nUPDATED CODE:\n")
+        print(_format_code_block(transformed_code))
+        if terminated or truncated:
+            break
+    final_code = getattr(env, "_code", "")
+    final_complexity = float(getattr(env, "_compute_complexity")(final_code))
+    final_runtime = _compute_runtime(env.executor, final_code)
+    print("=" * 72)
+    print("FINAL SUMMARY")
+    print("=" * 72)
+    print(f"total_reward: {total_reward:.2f}")
+    print(f"complexity: {original_complexity:.0f} -> {final_complexity:.0f}")
+    print(f"runtime_s:   {original_runtime:.4f} -> {final_runtime:.4f}")
+    complexity_improvement = ((original_complexity - final_complexity) / max(original_complexity, 1.0)) * 100.0
+    print(f"complexity improvement: {complexity_improvement:.2f}%")
+    print("\nCHANGES APPLIED:")
+    print(f"- Total steps: {steps_taken}")
+    print(f"- Successful transformations: {successful_transformations}")
+    if total_reward > 0:
+        _safe_print("\n✅ Code improved successfully")
+    else:
+        _safe_print("\n⚠️ No significant improvement")
+    print("\nFINAL CODE:\n")
+    print(_format_code_block(final_code))
+    env.close()
+if __name__ == "__main__":
+    run_demo()

acre/main.py ADDED Viewed

	@@ -0,0 +1,39 @@

+from __future__ import annotations
+import argparse
+from acre.training.train_agent import TrainConfig, train
+def _build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(prog="acre", description="ACRE: Autonomous Code Refactoring Environment")
+    sub = parser.add_subparsers(dest="command", required=False)
+    train_p = sub.add_parser("train", help="Run training (stub)")
+    train_p.add_argument("--total-steps", type=int, default=100, help="Total training steps (stub)")
+    sub.add_parser("demo", help="Run a small demo (stub)")
+    return parser
+def run_demo() -> None:
+    # Placeholder for a future interactive/demo flow.
+    print("ACRE demo mode is not implemented yet.")
+def main(argv: list[str] | None = None) -> None:
+    parser = _build_parser()
+    args = parser.parse_args(argv)
+    if args.command == "demo":
+        run_demo()
+        return
+    total_steps = getattr(args, "total_steps", 100)
+    train(config=TrainConfig(total_steps=total_steps))
+if __name__ == "__main__":
+    main()

acre/tasks/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from acre.tasks.task_registry import Task, TaskRegistry
2	+
3	+ __all__ = ["Task", "TaskRegistry"]

acre/tasks/task_registry.py ADDED Viewed

	@@ -0,0 +1,222 @@

+"""
+Three OpenEnv tasks with AST-based graders scoring 0.0-1.0.
+"""
+from __future__ import annotations
+import ast
+from dataclasses import dataclass
+from typing import Callable, Dict, List, Optional
+@dataclass
+class Task:
+    id: str
+    name: str
+    description: str
+    difficulty: str
+    initial_code: str
+    _grade_fn: Callable[[str], float]
+    def grade(self, code: str) -> float:
+        """Return a score in [0.0, 1.0]."""
+        try:
+            return float(min(1.0, max(0.0, self._grade_fn(code))))
+        except Exception:
+            return 0.0
+# ---------------------------------------------------------------------------
+# Task 1 — Easy: Rename generic variables
+# ---------------------------------------------------------------------------
+_EASY_CODE = """\
+def compute(x, y, tmp):
+    tmp = x + y
+    x = tmp * 2
+    result = x
+    return result
+"""
+def _grade_easy(code: str) -> float:
+    """Score = fraction of generic names (x, tmp) removed from all scopes."""
+    generic = {"x", "tmp"}
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0.0
+    remaining: set[str] = set()
+    class _Collector(ast.NodeVisitor):
+        def visit_Name(self, node: ast.Name) -> None:
+            if node.id in generic:
+                remaining.add(node.id)
+            self.generic_visit(node)
+        def visit_arg(self, node: ast.arg) -> None:
+            if node.arg in generic:
+                remaining.add(node.arg)
+            self.generic_visit(node)
+    _Collector().visit(tree)
+    renamed = len(generic - remaining)
+    return renamed / len(generic)
+# ---------------------------------------------------------------------------
+# Task 2 — Medium: Remove dead code
+# ---------------------------------------------------------------------------
+_MEDIUM_CODE = """\
+def process(data):
+    result = []
+    for item in data:
+        result.append(item * 2)
+    if False:
+        print("never runs")
+    unused_var = 42
+    return result
+    print("unreachable")
+"""
+def _grade_medium(code: str) -> float:
+    """Score = fraction of dead-code patterns eliminated (3 checks, ~0.33 each)."""
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0.0
+    source = ast.unparse(tree)
+    score = 0.0
+    # Check 1: if-False block removed
+    if "if False" not in source:
+        score += 1 / 3
+    # Check 2: unused_var assignment removed
+    if "unused_var" not in source:
+        score += 1 / 3
+    # Check 3: list comprehension used (loop simplified)
+    has_listcomp = any(isinstance(n, ast.ListComp) for n in ast.walk(tree))
+    if has_listcomp:
+        score += 1 / 3
+    return score
+# ---------------------------------------------------------------------------
+# Task 3 — Hard: Full refactor
+# ---------------------------------------------------------------------------
+_HARD_CODE = """\
+def add(p, q):
+    return p + q
+def compute(x, data, tmp):
+    result = []
+    for item in data:
+        result.append(item * 2)
+    if False:
+        y = 999
+    if True:
+        val = add(x, tmp)
+    unused = 0
+    flag = not not True
+    return val
+    print("dead")
+"""
+def _grade_hard(code: str) -> float:
+    """Score = fraction of 5 quality checks passed."""
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0.0
+    source = ast.unparse(tree)
+    checks = 0
+    # 1. No generic variable names x/tmp in function signature or body
+    has_generic = False
+    class _GenCheck(ast.NodeVisitor):
+        def visit_arg(self, node: ast.arg) -> None:
+            nonlocal has_generic
+            if node.arg in {"x", "tmp"}:
+                has_generic = True
+    _GenCheck().visit(tree)
+    if not has_generic:
+        checks += 1
+    # 2. No if False block
+    if "if False" not in source:
+        checks += 1
+    # 3. if True removed (body inlined)
+    if "if True" not in source:
+        checks += 1
+    # 4. List comprehension used
+    if any(isinstance(n, ast.ListComp) for n in ast.walk(tree)):
+        checks += 1
+    # 5. add() call inlined (no call to 'add')
+    calls = [n for n in ast.walk(tree) if isinstance(n, ast.Call)]
+    fn_names = {c.func.id for c in calls if isinstance(c.func, ast.Name)}
+    if "add" not in fn_names:
+        checks += 1
+    return checks / 5
+# ---------------------------------------------------------------------------
+# Registry
+# ---------------------------------------------------------------------------
+class TaskRegistry:
+    def __init__(self) -> None:
+        self._tasks: Dict[str, Task] = {}
+        self._register_all()
+    def _register_all(self) -> None:
+        self._tasks["rename_variables"] = Task(
+            id="rename_variables",
+            name="Rename Variables (Easy)",
+            description="Rename generic variable names (x, tmp) to descriptive ones",
+            difficulty="easy",
+            initial_code=_EASY_CODE,
+            _grade_fn=_grade_easy,
+        )
+        self._tasks["remove_dead_code"] = Task(
+            id="remove_dead_code",
+            name="Remove Dead Code (Medium)",
+            description="Remove unreachable code, if False blocks, and unused variables",
+            difficulty="medium",
+            initial_code=_MEDIUM_CODE,
+            _grade_fn=_grade_medium,
+        )
+        self._tasks["full_refactor"] = Task(
+            id="full_refactor",
+            name="Full Refactor (Hard)",
+            description="Apply all transformations: rename, dead code, loops, conditions, inlining",
+            difficulty="hard",
+            initial_code=_HARD_CODE,
+            _grade_fn=_grade_hard,
+        )
+    def get_task(self, task_id: str) -> Optional[Task]:
+        return self._tasks.get(task_id)
+    def list_tasks(self) -> List[dict]:
+        return [
+            {
+                "id": t.id,
+                "name": t.name,
+                "description": t.description,
+                "difficulty": t.difficulty,
+                "initial_code": t.initial_code,
+            }
+            for t in self._tasks.values()
+        ]

acre/training/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Training utilities for ACRE."""
+from .train_agent import TrainConfig, train
+__all__ = ["TrainConfig", "train"]

acre/training/train_agent.py ADDED Viewed

	@@ -0,0 +1,75 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Optional
+from acre.env.refactor_env import RefactorEnv
+@dataclass(frozen=True)
+class TrainConfig:
+    """Configuration stub for training."""
+    total_steps: int = 5_000
+    seed: Optional[int] = None
+    model_path: str = "acre_agent.zip"
+def train(*, env: Optional[RefactorEnv] = None, config: Optional[TrainConfig] = None) -> None:
+    """
+    Train a PPO agent on `RefactorEnv` using Stable-Baselines3.
+    This is intentionally lightweight (hackathon-friendly) and focuses on a
+    working demo: basic training loop, simple logging, and saving the model.
+    """
+    _config = config or TrainConfig()
+    _env = env or RefactorEnv(seed=_config.seed)
+    try:
+        from stable_baselines3 import PPO
+        from stable_baselines3.common.callbacks import BaseCallback
+        from stable_baselines3.common.monitor import Monitor
+        from stable_baselines3.common.vec_env import DummyVecEnv
+    except Exception as e:  # pragma: no cover
+        print("Stable-Baselines3 is required for training. Install with `pip install -r requirements.txt`.")
+        print(f"Import error: {e}")
+        return None
+    class EpisodeRewardPrinter(BaseCallback):
+        """Print episode reward when an episode ends (via Monitor)."""
+        def __init__(self) -> None:
+            super().__init__()
+            self.episode_count = 0
+        def _on_step(self) -> bool:
+            infos = self.locals.get("infos", [])
+            for info in infos:
+                ep = info.get("episode") if isinstance(info, dict) else None
+                if isinstance(ep, dict) and "r" in ep:
+                    self.episode_count += 1
+                    print(f"episode={self.episode_count} reward={ep['r']:.2f} length={int(ep.get('l', 0))}")
+            return True
+    # Wrap with Monitor so SB3 can compute episode stats and expose them in `info["episode"]`.
+    def make_env() -> RefactorEnv:
+        return Monitor(_env)
+    vec_env = DummyVecEnv([make_env])
+    model = PPO(
+        policy="MlpPolicy",
+        env=vec_env,
+        verbose=0,
+        seed=_config.seed,
+        n_steps=64,
+        batch_size=64,
+    )
+    print(f"Training PPO for {int(_config.total_steps)} timesteps...")
+    model.learn(total_timesteps=int(_config.total_steps), callback=EpisodeRewardPrinter())
+    model.save(_config.model_path)
+    print(f"Saved model to {_config.model_path!r}")
+    return None

acre/utils/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Shared utility helpers for ACRE."""
+from .metrics import Metric, MetricLogger
+__all__ = ["Metric", "MetricLogger"]

acre/utils/metrics.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Dict, Iterable, List, Tuple
+@dataclass(frozen=True)
+class Metric:
+    """Single scalar metric value (placeholder)."""
+    name: str
+    value: float
+@dataclass
+class MetricLogger:
+    """Tiny metric logger stub."""
+    _history: Dict[str, List[float]] = field(default_factory=dict)
+    def log(self, metric: Metric) -> None:
+        self._history.setdefault(metric.name, []).append(metric.value)
+    def latest(self) -> Dict[str, float]:
+        return {k: v[-1] for k, v in self._history.items() if v}
+    def as_series(self) -> Dict[str, Tuple[float, ...]]:
+        return {k: tuple(v) for k, v in self._history.items()}
+    def extend(self, metrics: Iterable[Metric]) -> None:
+        for m in metrics:
+            self.log(m)

inference.py ADDED Viewed

	@@ -0,0 +1,278 @@

+"""
+ACRE inference script for OpenEnv submission evaluation.
+Required environment variables:
+  API_BASE_URL: LLM API endpoint (default allowed)
+  MODEL_NAME: model identifier (default allowed)
+  HF_TOKEN: API token for the OpenAI-compatible endpoint
+  ENV_URL: running ACRE server base URL
+Optional:
+  LOCAL_IMAGE_NAME: present for evaluator compatibility when using a local
+  Docker image launcher.
+Stdout format uses strict START / STEP / END event markers.
+"""
+from __future__ import annotations
+import json
+import os
+import re
+import sys
+import time
+from typing import Dict, List, Tuple
+import requests
+from openai import OpenAI
+API_BASE_URL: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME: str = os.getenv("MODEL_NAME", "gpt-4o-mini")
+HF_TOKEN: str | None = os.getenv("HF_TOKEN")
+ENV_URL: str | None = os.getenv("ENV_URL")
+LOCAL_IMAGE_NAME: str | None = os.getenv("LOCAL_IMAGE_NAME")
+TASKS: List[str] = ["rename_variables", "remove_dead_code", "full_refactor"]
+ACTION_MEANINGS: Dict[int, str] = {
+    0: "rename_variable",
+    1: "remove_dead_code",
+    2: "simplify_loop",
+    3: "optimize_condition",
+    4: "inline_function",
+}
+SYSTEM_PROMPT = """\
+You are an RL agent that refactors Python code. Choose one action per step.
+Actions:
+  0 rename_variable   - rename generic names (x, tmp, i) to descriptive ones
+  1 remove_dead_code  - remove unreachable stmts, if False blocks, unused vars
+  2 simplify_loop     - convert append-loops to list comprehensions
+  3 optimize_condition- simplify 'not not x', 'if True/False', 'x==True'
+  4 inline_function   - inline simple single-return module-level functions
+Respond ONLY with valid JSON (no markdown):
+{"action": <0-4>, "reason": "<one sentence>"}"""
+def _env_url() -> str:
+    if ENV_URL:
+        return ENV_URL.rstrip("/")
+    raise RuntimeError("ENV_URL must be set before running inference.py")
+def _post(path: str, payload: dict | None = None) -> dict:
+    response = requests.post(f"{_env_url()}{path}", json=payload or {}, timeout=30)
+    response.raise_for_status()
+    return response.json()
+def _get(path: str) -> dict:
+    response = requests.get(f"{_env_url()}{path}", timeout=30)
+    response.raise_for_status()
+    return response.json()
+def reset_env(task_id: str) -> dict:
+    return _post("/reset", {"task_id": task_id})
+def step_env(action: int) -> dict:
+    return _post("/step", {"action": action})
+def get_state() -> dict:
+    return _get("/state")
+def grade(task_id: str, code: str) -> float:
+    response = requests.post(
+        f"{_env_url()}/tasks/{task_id}/grade",
+        json={"code": code},
+        timeout=30,
+    )
+    response.raise_for_status()
+    return float(response.json().get("score", 0.0))
+def choose_action(client: OpenAI, state: dict, task_id: str) -> Tuple[int, str]:
+    def heuristic_action() -> Tuple[int, str]:
+        code = str(state.get("current_code", ""))
+        step_i = int(state.get("episode_steps", 0))
+        has_generic = re.search(r"\b(x|tmp|i)\b", code) is not None
+        has_if_false = re.search(r"\bif\s+False\b", code) is not None
+        has_if_true = re.search(r"\bif\s+True\b", code) is not None
+        has_append_loop = ".append(" in code and "for " in code
+        has_double_not = "not not" in code
+        has_add_call = "add(" in code
+        if task_id == "rename_variables":
+            if has_generic:
+                return 0, "heuristic: remove generic names first"
+            if has_if_false or "unused" in code:
+                return 1, "heuristic: remove dead code"
+            if has_append_loop:
+                return 2, "heuristic: simplify loop"
+            if has_if_true or has_double_not:
+                return 3, "heuristic: optimize conditions"
+            return 4, "heuristic: inline simple function"
+        if task_id == "remove_dead_code":
+            if has_if_false or "unused" in code:
+                return 1, "heuristic: remove dead code patterns"
+            if has_append_loop:
+                return 2, "heuristic: convert append-loop"
+            if has_if_true or has_double_not:
+                return 3, "heuristic: simplify conditions"
+            if has_generic:
+                return 0, "heuristic: clean generic names"
+            return 4, "heuristic: inline helper"
+        if has_generic:
+            return 0, "heuristic: rename generic variables"
+        if has_append_loop:
+            return 2, "heuristic: simplify loop into listcomp"
+        if has_if_false or has_if_true or has_double_not:
+            return 3, "heuristic: optimize boolean branches"
+        if has_add_call:
+            return 4, "heuristic: inline add() call"
+        if step_i >= 2:
+            return 1, "heuristic: remove remaining dead code"
+        return 3, "heuristic: condition optimization as safe default"
+    if not HF_TOKEN:
+        return heuristic_action()
+    messages = [
+        {"role": "system", "content": SYSTEM_PROMPT},
+        {
+            "role": "user",
+            "content": (
+                f"Task: {task_id}\n"
+                f"Steps remaining: {state.get('max_steps', 5) - state.get('episode_steps', 0)}\n"
+                f"Complexity: {state.get('complexity', 0)}\n\n"
+                f"Current code:\n```python\n{state.get('current_code', '')}\n```\n\n"
+                "Choose the best action."
+            ),
+        },
+    ]
+    try:
+        response = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=messages,
+            temperature=0.0,
+            max_tokens=120,
+        )
+        raw = (response.choices[0].message.content or "").strip()
+        json_blob = raw
+        if "{" not in json_blob or "}" not in json_blob:
+            return heuristic_action()
+        match = re.search(r"\{.*\}", json_blob, flags=re.DOTALL)
+        if match:
+            json_blob = match.group(0)
+        parsed = json.loads(json_blob)
+        action = int(parsed.get("action", -1))
+        reason = str(parsed.get("reason", ""))
+        if 0 <= action <= 4:
+            return action, reason or "llm-selected action"
+        return heuristic_action()
+    except Exception:
+        return heuristic_action()
+def run_episode(client: OpenAI, task_id: str, episode_num: int) -> float:
+    reset_env(task_id)
+    state = get_state()
+    print(
+        json.dumps(
+            {
+                "event": "START",
+                "episode": episode_num,
+                "task_id": task_id,
+                "initial_complexity": state.get("complexity", 0),
+                "initial_code_length": len(state.get("current_code", "")),
+                "timestamp": time.time(),
+            }
+        ),
+        flush=True,
+    )
+    cumulative_reward = 0.0
+    for step_num in range(1, 6):
+        action, reason = choose_action(client, state, task_id)
+        result = step_env(action)
+        state = get_state()
+        reward_payload = result.get("reward", {})
+        raw_reward = float(reward_payload.get("raw", 0.0))
+        norm_reward = float(reward_payload.get("normalized", (raw_reward + 32) / 52))
+        cumulative_reward += raw_reward
+        print(
+            json.dumps(
+                {
+                    "event": "STEP",
+                    "episode": episode_num,
+                    "step": step_num,
+                    "action": action,
+                    "action_name": ACTION_MEANINGS.get(action, "unknown"),
+                    "reason": reason,
+                    "reward": round(raw_reward, 4),
+                    "normalized_reward": round(norm_reward, 4),
+                    "cumulative_reward": round(cumulative_reward, 4),
+                    "changed": result.get("info", {}).get("changed", False),
+                    "reward_components": reward_payload.get("components", {}),
+                    "done": result.get("done", False),
+                }
+            ),
+            flush=True,
+        )
+        if result.get("done") or result.get("terminated") or result.get("truncated"):
+            break
+    final_state = get_state()
+    task_score = grade(task_id, final_state.get("current_code", ""))
+    print(
+        json.dumps(
+            {
+                "event": "END",
+                "episode": episode_num,
+                "task_id": task_id,
+                "cumulative_reward": round(cumulative_reward, 4),
+                "normalized_cumulative": round((cumulative_reward + 32) / 52, 4),
+                "task_score": round(task_score, 4),
+                "final_complexity": final_state.get("complexity", 0),
+                "timestamp": time.time(),
+            }
+        ),
+        flush=True,
+    )
+    return task_score
+def main() -> None:
+    if not ENV_URL:
+        raise SystemExit("ENV_URL is required. Example: ENV_URL=http://localhost:7860")
+    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN or "dummy")
+    scores: List[float] = []
+    for i, task_id in enumerate(TASKS, start=1):
+        score = run_episode(client, task_id, i)
+        scores.append(score)
+    avg_score = sum(scores) / len(scores) if scores else 0.0
+    sys.exit(0 if avg_score >= 0.5 else 1)
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,156 @@

+from __future__ import annotations
+from typing import Any, Dict, List, Optional, Sequence
+from pydantic import BaseModel, Field
+class ObservationModel(BaseModel):
+    code_length: float
+    complexity_score: float
+    runtime_s: float
+    error_flag: bool
+    @classmethod
+    def from_vector(cls, values: Sequence[float]) -> "ObservationModel":
+        vector = list(values)
+        if len(vector) != 4:
+            raise ValueError(f"observation vector must have length 4, got {len(vector)}")
+        return cls(
+            code_length=float(vector[0]),
+            complexity_score=float(vector[1]),
+            runtime_s=float(vector[2]),
+            error_flag=bool(vector[3]),
+        )
+    def to_vector(self) -> List[float]:
+        return [
+            float(self.code_length),
+            float(self.complexity_score),
+            float(self.runtime_s),
+            float(int(self.error_flag)),
+        ]
+class ActionModel(BaseModel):
+    action: int = Field(ge=0, le=4)
+    action_name: Optional[str] = None
+class RewardModel(BaseModel):
+    raw: float
+    normalized: float = Field(ge=0.0, le=1.0)
+    components: Dict[str, float]
+class HealthResponse(BaseModel):
+    status: str
+    env: str
+    version: str
+class CompatibilityHealthResponse(BaseModel):
+    status: str
+    service: str
+class ResetRequest(BaseModel):
+    task_id: Optional[str] = None
+    seed: Optional[int] = None
+    code: Optional[str] = None
+class StepRequest(BaseModel):
+    action: int = Field(ge=0, le=4)
+class GradeRequest(BaseModel):
+    code: str
+class TaskInfo(BaseModel):
+    id: str
+    name: str
+    description: str
+    difficulty: str
+    initial_code: str
+class TasksResponse(BaseModel):
+    tasks: List[TaskInfo]
+class GradeResponse(BaseModel):
+    task_id: str
+    score: float
+    passed: bool
+class StateResponse(BaseModel):
+    current_code: str
+    episode_steps: int
+    max_steps: int
+    complexity: float
+    last_runtime: float
+    last_error: bool
+    sample_id: Optional[str]
+    language: Optional[str]
+    task_id: Optional[str]
+    observation: ObservationModel
+    observation_vector: List[float]
+    action_meanings: Dict[int, str]
+class ResetResponse(BaseModel):
+    observation: ObservationModel
+    observation_vector: List[float]
+    info: Dict[str, Any]
+    task_id: Optional[str]
+    state: StateResponse
+class StepResponse(BaseModel):
+    action: ActionModel
+    observation: ObservationModel
+    observation_vector: List[float]
+    reward: RewardModel
+    done: bool
+    terminated: bool
+    truncated: bool
+    info: Dict[str, Any]
+    state: StateResponse
+class OptimizeRequest(BaseModel):
+    code: str
+    task_id: Optional[str] = None
+    max_steps: int = Field(default=5, ge=1, le=5)
+    use_rl: bool = True
+    use_llm: bool = False
+    fallback_to_llm: bool = True
+    rl_model_path: Optional[str] = None
+    api_base_url: Optional[str] = None
+    model_name: Optional[str] = None
+    api_token: Optional[str] = None
+class OptimizationStep(BaseModel):
+    step: int
+    action: int
+    action_name: str
+    reason: str
+    source: str
+    reward: float
+    normalized_reward: float
+    changed: bool
+    complexity: float
+class OptimizeResponse(BaseModel):
+    original_code: str
+    optimized_code: str
+    diff: str
+    steps: List[OptimizationStep]
+    cumulative_reward: float
+    task_id: Optional[str]
+    task_score: Optional[float]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,85 @@

+name: ACRE
+version: "1.0.0"
+description: >
+  Autonomous Code Refactoring Environment - an RL environment where an
+  agent improves Python code quality using AST-level transformations.
+author: "Nikhil Pratap Singh, Pranav Mangal, Ananya Gupta"
+entrypoint: "openenv_interface:OpenEnvRefactorEnv"
+tags:
+  - openenv
+tasks:
+  - id: rename_variables
+    name: "Rename Variables (Easy)"
+    description: "Rename generic variable names (x, tmp) to descriptive ones"
+    difficulty: easy
+    reward_range: [0.0, 1.0]
+    max_steps: 5
+  - id: remove_dead_code
+    name: "Remove Dead Code (Medium)"
+    description: "Remove unreachable statements, if-False blocks, and unused assignments"
+    difficulty: medium
+    reward_range: [0.0, 1.0]
+    max_steps: 5
+  - id: full_refactor
+    name: "Full Refactor (Hard)"
+    description: "Apply all transformations - rename, dead code removal, loop simplification, condition optimization, and function inlining"
+    difficulty: hard
+    reward_range: [0.0, 1.0]
+    max_steps: 5
+observation_space:
+  type: Box
+  shape: [4]
+  dtype: float32
+  low: [0.0, 0.0, 0.0, 0.0]
+  high: [inf, inf, inf, 1.0]
+  fields:
+    - code_length
+    - complexity_score
+    - runtime_s
+    - error_flag
+action_space:
+  type: Discrete
+  n: 5
+  actions:
+    0: rename_variable
+    1: remove_dead_code
+    2: simplify_loop
+    3: optimize_condition
+    4: inline_function
+api:
+  health: "GET /"
+  reset: "POST /reset"
+  step: "POST /step"
+  state: "GET /state"
+  tasks: "GET /tasks"
+  grade: "POST /tasks/{task_id}/grade"
+reward:
+  raw_range: [-32, 20]
+  normalized_range: [0.0, 1.0]
+  formula: "(raw + 32) / 52"
+  components:
+    success: { max: 10, min: -10 }
+    complexity: { max: 5, min: -5 }
+    performance: { max: 5, min: -2 }
+    error: { max: 0, min: -15 }
+    no_change: { max: 0, min: -2 }
+validation:
+  python_api:
+    reset: "ObservationModel"
+    step: "(ObservationModel, RewardModel, done, info)"
+    state: "StateResponse"
+  http_api:
+    health: "GET /"
+    reset: "POST /reset"
+    step: "POST /step"
+    state: "GET /state"
+    tasks: "GET /tasks"
+    grade: "POST /tasks/{task_id}/grade"

openenv_interface.py ADDED Viewed

	@@ -0,0 +1,116 @@

+from __future__ import annotations
+from typing import Any, Dict, Optional, Tuple
+try:
+    from openenv.env import Env as OpenEnvBase
+except Exception:  # pragma: no cover
+    class OpenEnvBase:
+        def __init__(self, *args: Any, **kwargs: Any) -> None:
+            return None
+from acre.datasets.code_samples import CodeSample, CodeSampleDataset
+from acre.env.refactor_env import RefactorEnv
+from acre.tasks.task_registry import TaskRegistry
+from models import ActionModel, ObservationModel, RewardModel, StateResponse
+class OpenEnvRefactorEnv(OpenEnvBase):
+    """
+    Canonical OpenEnv interface for ACRE.
+    This wrapper keeps the strict hackathon contract:
+    - reset() -> ObservationModel
+    - step(action) -> (ObservationModel, RewardModel, done, info)
+    - state() -> StateResponse
+    """
+    def __init__(
+        self,
+        *,
+        env: Optional[RefactorEnv] = None,
+        registry: Optional[TaskRegistry] = None,
+    ) -> None:
+        super().__init__(
+            name="ACRE",
+            state_space="ObservationModel",
+            action_space="ActionModel",
+            episode_max_length=RefactorEnv.MAX_STEPS,
+        )
+        self._env = env or RefactorEnv()
+        self._registry = registry or TaskRegistry()
+        self._task_id: Optional[str] = None
+        self._last_reset_info: Dict[str, Any] = {}
+    @property
+    def action_meanings(self) -> Dict[int, str]:
+        return self._env.ACTION_MEANINGS
+    @property
+    def last_reset_info(self) -> Dict[str, Any]:
+        return dict(self._last_reset_info)
+    def _load_episode_source(self, *, task_id: Optional[str], code: Optional[str]) -> None:
+        initial_code = code
+        if initial_code is None and task_id:
+            task = self._registry.get_task(task_id)
+            if task is None:
+                raise ValueError(f"Task '{task_id}' not found")
+            initial_code = task.initial_code
+        if initial_code is None:
+            return None
+        self._env.dataset = CodeSampleDataset(
+            [
+                CodeSample(
+                    id=task_id or "custom",
+                    language="python",
+                    code=initial_code,
+                )
+            ]
+        )
+        return None
+    def reset(
+        self,
+        *,
+        seed: Optional[int] = None,
+        task_id: Optional[str] = None,
+        code: Optional[str] = None,
+    ) -> ObservationModel:
+        self._task_id = task_id
+        self._load_episode_source(task_id=task_id, code=code)
+        observation, info = self._env.reset(seed=seed)
+        self._last_reset_info = dict(info)
+        return ObservationModel.from_vector(observation.tolist())
+    def step(self, action: int | ActionModel) -> Tuple[ObservationModel, RewardModel, bool, Dict[str, Any]]:
+        action_value = action.action if isinstance(action, ActionModel) else int(action)
+        observation, raw_reward, terminated, truncated, info = self._env.step(action_value)
+        reward = RewardModel(
+            raw=float(raw_reward),
+            normalized=float(info.get("normalized_reward", 0.0)),
+            components=dict(info.get("reward_components", {})),
+        )
+        done = bool(terminated or truncated)
+        return ObservationModel.from_vector(observation.tolist()), reward, done, dict(info)
+    def state(self) -> StateResponse:
+        raw_state = self._env.state()
+        observation_vector = list(raw_state.get("observation", [0.0, 0.0, 0.0, 0.0]))
+        observation = ObservationModel.from_vector(observation_vector)
+        return StateResponse(
+            current_code=str(raw_state.get("current_code", "")),
+            episode_steps=int(raw_state.get("episode_steps", 0)),
+            max_steps=int(raw_state.get("max_steps", RefactorEnv.MAX_STEPS)),
+            complexity=float(raw_state.get("complexity", 0.0)),
+            last_runtime=float(raw_state.get("last_runtime", 0.0)),
+            last_error=bool(raw_state.get("last_error", False)),
+            sample_id=raw_state.get("sample_id"),
+            language=raw_state.get("language"),
+            task_id=self._task_id,
+            observation=observation,
+            observation_vector=observation.to_vector(),
+            action_meanings=dict(raw_state.get("action_meanings", {})),
+        )

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+fastapi>=0.109.0
+uvicorn[standard]>=0.27.0
+numpy>=1.26
+gymnasium
+stable-baselines3
+radon>=6.0.1
+openai>=1.0.0
+openenv>=0.1.13
+requests>=2.31.0
+pydantic>=2.0.0
+typing_extensions>=4.0.0

server.py ADDED Viewed

	@@ -0,0 +1,667 @@

+"""
+ACRE OpenEnv HTTP server.
+Endpoints (all required by OpenEnv spec):
+  GET  /          — health check (must return HTTP 200)
+  POST /reset     — reset environment, returns observation + info
+  POST /step      — take one step, returns obs/reward/done/info
+  GET  /state     — full current state snapshot
+  GET  /tasks     — list all tasks with initial code
+  POST /tasks/{task_id}/grade  — grade code for a specific task
+"""
+from __future__ import annotations
+import difflib
+import os
+import re
+import json
+from typing import Optional
+import uvicorn
+import numpy as np
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import HTMLResponse
+from openai import OpenAI
+try:
+    from stable_baselines3 import PPO
+except Exception:
+    PPO = None  # type: ignore[assignment]
+from acre.tasks.task_registry import TaskRegistry
+from models import (
+    ActionModel,
+    CompatibilityHealthResponse,
+    GradeRequest,
+    GradeResponse,
+    HealthResponse,
+    OptimizationStep,
+    OptimizeRequest,
+    OptimizeResponse,
+    ResetRequest,
+    ResetResponse,
+    StateResponse,
+    StepRequest,
+    StepResponse,
+    TaskInfo,
+    TasksResponse,
+)
+from openenv_interface import OpenEnvRefactorEnv
+DEFAULT_API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+DEFAULT_MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-mini")
+DEFAULT_RL_MODEL_PATH = os.getenv("RL_MODEL_PATH", "acre_agent.zip")
+# ---------------------------------------------------------------------------
+# App setup
+# ---------------------------------------------------------------------------
+app = FastAPI(
+    title="ACRE — Autonomous Code Refactoring Environment",
+    description="OpenEnv-compatible RL environment for Python code refactoring.",
+    version="1.0.0",
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global singletons
+registry = TaskRegistry()
+_env: Optional[OpenEnvRefactorEnv] = None
+_rl_model_cache: dict[str, object] = {}
+def get_env() -> OpenEnvRefactorEnv:
+    global _env
+    if _env is None:
+        _env = OpenEnvRefactorEnv(registry=registry)
+    return _env
+def _state_response() -> StateResponse:
+    return get_env().state()
+def _choose_action_heuristic(code: str, task_id: Optional[str]) -> int:
+    has_generic = re.search(r"\b(x|tmp|i)\b", code) is not None
+    has_if_false = re.search(r"\bif\s+False\b", code) is not None
+    has_if_true = re.search(r"\bif\s+True\b", code) is not None
+    has_append_loop = ".append(" in code and "for " in code
+    has_double_not = "not not" in code
+    has_add_call = "add(" in code
+    if task_id == "rename_variables":
+        if has_generic:
+            return 0
+        if has_if_false or "unused" in code:
+            return 1
+        if has_append_loop:
+            return 2
+        if has_if_true or has_double_not:
+            return 3
+        return 4
+    if task_id == "remove_dead_code":
+        if has_if_false or "unused" in code:
+            return 1
+        if has_append_loop:
+            return 2
+        if has_if_true or has_double_not:
+            return 3
+        if has_generic:
+            return 0
+        return 4
+    if has_generic:
+        return 0
+    if has_append_loop:
+        return 2
+    if has_if_false or has_if_true or has_double_not:
+        return 3
+    if has_add_call:
+        return 4
+    return 1
+def _choose_action_llm(
+    *,
+    code: str,
+    task_id: Optional[str],
+    step_index: int,
+    max_steps: int,
+    api_base_url: str,
+    model_name: str,
+    api_token: str,
+) -> tuple[int, str, str]:
+    if not api_token.strip():
+        return _choose_action_heuristic(code, task_id), "empty token -> heuristic", "heuristic"
+    client = OpenAI(base_url=api_base_url, api_key=api_token)
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are a code-refactoring action selector. Return ONLY compact JSON: "
+                '{"action": <0-4>, "reason": "..."}.\n'
+                "Actions: 0=rename_variable,1=remove_dead_code,2=simplify_loop,3=optimize_condition,4=inline_function"
+            ),
+        },
+        {
+            "role": "user",
+            "content": (
+                f"task_id={task_id or 'auto'}\n"
+                f"step={step_index}/{max_steps}\n"
+                "Current code:\n"
+                f"```python\n{code}\n```"
+            ),
+        },
+    ]
+    try:
+        resp = client.chat.completions.create(
+            model=model_name,
+            messages=messages,
+            temperature=0.0,
+            max_tokens=120,
+        )
+        raw = (resp.choices[0].message.content or "").strip()
+        m = re.search(r"\{.*\}", raw, flags=re.DOTALL)
+        blob = m.group(0) if m else raw
+        parsed = json.loads(blob)
+        action = int(parsed.get("action", -1))
+        reason = str(parsed.get("reason", "llm-selected action"))
+        if 0 <= action <= 4:
+            return action, reason, "llm"
+    except Exception as exc:
+        return _choose_action_heuristic(code, task_id), f"llm error -> heuristic: {exc}", "heuristic"
+    return _choose_action_heuristic(code, task_id), "invalid llm output -> heuristic", "heuristic"
+def _choose_action_rl(observation: list[float], model_path: str) -> tuple[Optional[int], str, str]:
+    if PPO is None:
+        return None, "stable-baselines3 unavailable", "rl"
+    if not os.path.exists(model_path):
+        return None, f"rl model not found: {model_path}", "rl"
+    try:
+        model = _rl_model_cache.get(model_path)
+        if model is None:
+            model = PPO.load(model_path)
+            _rl_model_cache[model_path] = model
+        obs = np.asarray(observation, dtype=np.float32)
+        action, _ = model.predict(obs, deterministic=True)
+        action_i = int(action)
+        if 0 <= action_i <= 4:
+            return action_i, "rl policy action", "rl"
+        return None, f"invalid rl action: {action_i}", "rl"
+    except Exception as exc:
+        return None, f"rl failure: {exc}", "rl"
+def _demo_html() -> str:
+    return """<!doctype html>
+<html lang=\"en\">
+<head>
+    <meta charset=\"utf-8\" />
+    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />
+    <title>ACRE Refactor Demo</title>
+    <style>
+        @import url('https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;600;700&display=swap');
+        :root {
+            --bg0: #0b1f2a;
+            --bg1: #14344a;
+            --ink: #eaf7ff;
+            --muted: #a7c8db;
+            --brand: #1ec28b;
+            --warn: #ffcb47;
+            --panel: rgba(8, 24, 36, 0.72);
+            --stroke: rgba(140, 197, 225, 0.35);
+        }
+        * { box-sizing: border-box; }
+        body {
+            margin: 0;
+            color: var(--ink);
+            font-family: 'Space Grotesk', sans-serif;
+            background:
+                radial-gradient(circle at 12% 18%, rgba(30, 194, 139, 0.28), transparent 35%),
+                radial-gradient(circle at 88% 8%, rgba(255, 203, 71, 0.22), transparent 30%),
+                linear-gradient(150deg, var(--bg0), var(--bg1));
+            min-height: 100vh;
+        }
+        .wrap {
+            max-width: 1200px;
+            margin: 0 auto;
+            padding: 28px 20px 40px;
+        }
+        h1 {
+            margin: 0 0 6px;
+            font-size: clamp(1.6rem, 2vw + 1rem, 2.6rem);
+            letter-spacing: 0.2px;
+        }
+        .sub { margin: 0 0 20px; color: var(--muted); }
+        .grid {
+            display: grid;
+            grid-template-columns: 1fr;
+            gap: 16px;
+        }
+        .panel {
+            border: 1px solid var(--stroke);
+            border-radius: 14px;
+            background: var(--panel);
+            backdrop-filter: blur(4px);
+            padding: 14px;
+        }
+        .controls {
+            display: grid;
+            grid-template-columns: 1fr 1fr;
+            gap: 8px;
+            margin-bottom: 10px;
+        }
+        textarea, pre {
+            width: 100%;
+            min-height: 260px;
+            border: 1px solid var(--stroke);
+            border-radius: 10px;
+            padding: 12px;
+            background: rgba(1, 13, 24, 0.82);
+            color: #dcf4ff;
+            font-family: Consolas, 'Courier New', monospace;
+            font-size: 13px;
+            line-height: 1.4;
+            overflow: auto;
+            white-space: pre;
+        }
+        button, select {
+            border: 1px solid var(--stroke);
+            border-radius: 10px;
+            padding: 10px 12px;
+            background: rgba(11, 36, 52, 0.9);
+            color: var(--ink);
+            font-weight: 600;
+        }
+        button.primary {
+            background: linear-gradient(120deg, #19a7ff, #1ec28b);
+            color: #032235;
+            border: none;
+        }
+        .cols {
+            display: grid;
+            grid-template-columns: 1fr;
+            gap: 14px;
+        }
+        .meta {
+            color: var(--muted);
+            font-size: 0.92rem;
+            margin-top: 8px;
+        }
+        .badge {
+            color: #082b22;
+            background: var(--brand);
+            border-radius: 999px;
+            padding: 2px 9px;
+            font-size: 12px;
+            font-weight: 700;
+        }
+        .warn {
+            color: #2a1c00;
+            background: var(--warn);
+        }
+        @media (min-width: 900px) {
+            .cols { grid-template-columns: 1fr 1fr; }
+        }
+    </style>
+</head>
+<body>
+    <div class=\"wrap\">
+        <h1>ACRE Live Refactor Arena</h1>
+        <p class=\"sub\">Paste old code, run the agent, and compare before and after with a full diff and step-by-step rewards.</p>
+        <div class=\"panel\">
+            <div class=\"controls\">
+                <button onclick=\"loadExample(1)\">Load Example 1</button>
+                <button onclick=\"loadExample(2)\">Load Example 2</button>
+                <select id=\"task\">
+                    <option value=\"\">Auto strategy</option>
+                    <option value=\"rename_variables\">rename_variables</option>
+                    <option value=\"remove_dead_code\">remove_dead_code</option>
+                    <option value=\"full_refactor\">full_refactor</option>
+                </select>
+                <button class=\"primary\" onclick=\"runOptimize()\">Run Optimization</button>
+            </div>
+            <div class=\"controls\" style=\"margin-bottom: 10px;\">
+                <select id=\"mode\">
+                    <option value=\"rl_then_llm\">RL First -> LLM Fallback</option>
+                    <option value=\"heuristic\">Heuristic Agent (no API key)</option>
+                    <option value=\"llm\">LLM Agent (OpenAI-compatible API)</option>
+                </select>
+                <input id=\"rlModelPath\" placeholder=\"RL model path\" value=\"acre_agent.zip\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
+                <input id=\"baseUrl\" placeholder=\"API base URL (optional)\" value=\"https://api.openai.com/v1\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
+                <input id=\"modelName\" placeholder=\"Model name (optional)\" value=\"gpt-4o-mini\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
+                <input id=\"apiToken\" type=\"password\" placeholder=\"Paste API token here for LLM mode\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
+            </div>
+            <div class=\"controls\" style=\"margin-bottom: 10px;\">
+                <label style=\"display:flex;align-items:center;gap:8px;padding:8px 10px;border:1px solid var(--stroke);border-radius:10px;\">
+                    <input id=\"autoSuggest\" type=\"checkbox\" />
+                    Auto suggest after typing pause
+                </label>
+            </div>
+            <textarea id=\"input\" spellcheck=\"false\" placeholder=\"Paste your Python code here...\"></textarea>
+            <p class=\"meta\" id=\"status\">Status: ready</p>
+        </div>
+        <div class=\"cols\" style=\"margin-top: 14px\">
+            <div class=\"panel\">
+                <h3>Original Code</h3>
+                <pre id=\"original\"></pre>
+            </div>
+            <div class=\"panel\">
+                <h3>Optimized Code</h3>
+                <pre id=\"optimized\"></pre>
+            </div>
+        </div>
+        <div class=\"panel\" style=\"margin-top: 14px\">
+            <h3>Diff</h3>
+            <pre id=\"diff\"></pre>
+        </div>
+        <div class=\"panel\" style=\"margin-top: 14px\">
+            <h3>Step Logs</h3>
+            <pre id=\"steps\"></pre>
+        </div>
+    </div>
+    <script>
+        const EX1 = `def compute(x, y, tmp):\n    tmp = x + y\n    x = tmp * 2\n    result = x\n    return result\n`;
+        const EX2 = `def add(p, q):\n    return p + q\n\ndef compute(x, data, tmp):\n    result = []\n    for item in data:\n        result.append(item * 2)\n    if False:\n        y = 999\n    if True:\n        val = add(x, tmp)\n    unused = 0\n    flag = not not True\n    return val\n    print(\"dead\")\n`;
+        let autoTimer = null;
+        function loadExample(i) {
+            document.getElementById('input').value = i === 1 ? EX1 : EX2;
+            document.getElementById('status').textContent = `Status: loaded example ${i}`;
+        }
+        async function runOptimize() {
+            const code = document.getElementById('input').value;
+            const task = document.getElementById('task').value || null;
+            const mode = document.getElementById('mode').value;
+            const useRl = mode === 'rl_then_llm';
+            const useLlm = mode === 'llm' || mode === 'rl_then_llm';
+            const fallbackToLlm = mode === 'rl_then_llm';
+            const rlModelPath = document.getElementById('rlModelPath').value || null;
+            const apiToken = document.getElementById('apiToken').value || null;
+            const apiBaseUrl = document.getElementById('baseUrl').value || null;
+            const modelName = document.getElementById('modelName').value || null;
+            if (!code.trim()) {
+                document.getElementById('status').innerHTML = 'Status: <span class=\"badge warn\">please paste code first</span>';
+                return;
+            }
+            if (mode === 'llm' && (!apiToken || !apiToken.trim())) {
+                document.getElementById('status').innerHTML = 'Status: <span class=\"badge warn\">paste API token for LLM mode</span>';
+                return;
+            }
+            document.getElementById('status').textContent = 'Status: running optimization...';
+            try {
+                const res = await fetch('/optimize', {
+                    method: 'POST',
+                    headers: {'Content-Type': 'application/json'},
+                    body: JSON.stringify({
+                        code,
+                        task_id: task,
+                        max_steps: 5,
+                        use_rl: useRl,
+                        use_llm: useLlm,
+                        fallback_to_llm: fallbackToLlm,
+                        rl_model_path: rlModelPath,
+                        api_base_url: apiBaseUrl,
+                        model_name: modelName,
+                        api_token: apiToken,
+                    })
+                });
+                const data = await res.json();
+                if (!res.ok) {
+                    throw new Error(data.detail || 'request failed');
+                }
+                document.getElementById('original').textContent = data.original_code;
+                document.getElementById('optimized').textContent = data.optimized_code;
+                document.getElementById('diff').textContent = data.diff || '(no diff)';
+                document.getElementById('steps').textContent = JSON.stringify(data.steps, null, 2);
+                const scoreText = data.task_score === null ? 'n/a' : data.task_score;
+                document.getElementById('status').innerHTML = `Status: <span class=\"badge\">done</span> cumulative_reward=${data.cumulative_reward.toFixed(2)} task_score=${scoreText}`;
+            } catch (err) {
+                document.getElementById('status').innerHTML = `Status: <span class=\"badge warn\">error</span> ${err.message}`;
+            }
+        }
+        loadExample(1);
+        document.getElementById('input').addEventListener('input', () => {
+            if (!document.getElementById('autoSuggest').checked) {
+                return;
+            }
+            if (autoTimer) {
+                clearTimeout(autoTimer);
+            }
+            autoTimer = setTimeout(() => {
+                runOptimize();
+            }, 1200);
+        });
+    </script>
+</body>
+</html>"""
+# ---------------------------------------------------------------------------
+# Routes
+# ---------------------------------------------------------------------------
+@app.get("/", response_model=HealthResponse)
+def health() -> HealthResponse:
+    """Health check — OpenEnv pings this URL to verify the Space is live."""
+    return HealthResponse(status="ok", env="ACRE", version="1.0.0")
+@app.get("/health", response_model=CompatibilityHealthResponse)
+def health_compat() -> CompatibilityHealthResponse:
+    """Compatibility health route used by some OpenEnv reference environments."""
+    return CompatibilityHealthResponse(status="healthy", service="acre-env")
+@app.get("/demo", response_class=HTMLResponse)
+def demo_ui() -> HTMLResponse:
+    """Simple UI to compare original and optimized code side-by-side."""
+    return HTMLResponse(content=_demo_html())
+@app.post("/reset", response_model=ResetResponse)
+def reset(req: ResetRequest = ResetRequest()) -> ResetResponse:
+    """Reset the environment. Optionally load a task's initial code."""
+    env = get_env()
+    try:
+        obs = env.reset(seed=req.seed, task_id=req.task_id, code=req.code)
+    except ValueError as exc:
+        raise HTTPException(status_code=404, detail=str(exc)) from exc
+    return ResetResponse(
+        observation=obs,
+        observation_vector=obs.to_vector(),
+        info=env.last_reset_info,
+        task_id=req.task_id,
+        state=_state_response(),
+    )
+@app.post("/step", response_model=StepResponse)
+def step(req: StepRequest) -> StepResponse:
+    """Take one refactoring step."""
+    env = get_env()
+    if not (0 <= req.action <= 4):
+        raise HTTPException(status_code=400, detail="action must be 0–4")
+    obs, reward, done, info = env.step(req.action)
+    action_name = str(info.get("action_name", env.action_meanings.get(req.action, "unknown")))
+    return StepResponse(
+        action=ActionModel(action=req.action, action_name=action_name),
+        observation=obs,
+        observation_vector=obs.to_vector(),
+        reward=reward,
+        done=done,
+        terminated=done,
+        truncated=False,
+        info=info,
+        state=_state_response(),
+    )
+@app.get("/state", response_model=StateResponse)
+def state() -> StateResponse:
+    """Return full current environment state (OpenEnv spec requirement)."""
+    return _state_response()
+@app.get("/tasks", response_model=TasksResponse)
+def list_tasks() -> TasksResponse:
+    """Enumerate all tasks (easy → medium → hard)."""
+    return TasksResponse(tasks=[TaskInfo.model_validate(t) for t in registry.list_tasks()])
+@app.post("/tasks/{task_id}/grade", response_model=GradeResponse)
+def grade(task_id: str, req: GradeRequest) -> GradeResponse:
+    """Grade submitted code against a task's grader (returns score 0.0–1.0)."""
+    task = registry.get_task(task_id)
+    if task is None:
+        raise HTTPException(status_code=404, detail=f"Task '{task_id}' not found")
+    score = task.grade(req.code)
+    return GradeResponse(
+        task_id=task_id,
+        score=round(score, 4),
+        passed=score >= 0.8,
+    )
+@app.post("/optimize", response_model=OptimizeResponse)
+def optimize(req: OptimizeRequest) -> OptimizeResponse:
+    """Run a full optimization episode and return code comparison artifacts."""
+    code = req.code.strip("\n")
+    if not code.strip():
+        raise HTTPException(status_code=400, detail="code must be non-empty")
+    env = get_env()
+    try:
+        env.reset(task_id=req.task_id, code=code)
+    except ValueError as exc:
+        raise HTTPException(status_code=404, detail=str(exc)) from exc
+    steps: list[OptimizationStep] = []
+    cumulative_reward = 0.0
+    for step_idx in range(1, req.max_steps + 1):
+        state_now = env.state()
+        current_code = state_now.current_code
+        obs_list = [float(x) for x in state_now.observation_vector]
+        action: int
+        reason: str
+        source: str
+        if req.use_rl:
+            rl_action, rl_reason, rl_source = _choose_action_rl(
+                observation=obs_list,
+                model_path=req.rl_model_path or DEFAULT_RL_MODEL_PATH,
+            )
+            if rl_action is not None:
+                action, reason, source = rl_action, rl_reason, rl_source
+            elif req.fallback_to_llm and req.use_llm:
+                action, reason, source = _choose_action_llm(
+                    code=current_code,
+                    task_id=req.task_id,
+                    step_index=step_idx,
+                    max_steps=req.max_steps,
+                    api_base_url=req.api_base_url or DEFAULT_API_BASE_URL,
+                    model_name=req.model_name or DEFAULT_MODEL_NAME,
+                    api_token=req.api_token or "",
+                )
+                reason = f"{rl_reason}; {reason}"
+            else:
+                action = _choose_action_heuristic(current_code, req.task_id)
+                reason = f"{rl_reason}; heuristic fallback"
+                source = "heuristic"
+        elif req.use_llm:
+            action, reason, source = _choose_action_llm(
+                code=current_code,
+                task_id=req.task_id,
+                step_index=step_idx,
+                max_steps=req.max_steps,
+                api_base_url=req.api_base_url or DEFAULT_API_BASE_URL,
+                model_name=req.model_name or DEFAULT_MODEL_NAME,
+                api_token=req.api_token or "",
+            )
+        else:
+            action = _choose_action_heuristic(current_code, req.task_id)
+            reason = "heuristic policy"
+            source = "heuristic"
+        _, reward, done, info = env.step(action)
+        state_now = env.state()
+        cumulative_reward += float(reward.raw)
+        steps.append(
+            OptimizationStep(
+                step=step_idx,
+                action=action,
+                action_name=info.get("action_name", "unknown"),
+                reason=reason,
+                source=source,
+                reward=float(reward.raw),
+                normalized_reward=float(reward.normalized),
+                changed=bool(info.get("changed", False)),
+                complexity=float(state_now.complexity),
+            )
+        )
+        if done:
+            break
+    final_code = str(env.state().current_code)
+    diff_lines = difflib.unified_diff(
+        code.splitlines(),
+        final_code.splitlines(),
+        fromfile="original.py",
+        tofile="optimized.py",
+        lineterm="",
+    )
+    diff_text = "\n".join(diff_lines)
+    task_score: Optional[float] = None
+    if req.task_id:
+        task = registry.get_task(req.task_id)
+        if task is None:
+            raise HTTPException(status_code=404, detail=f"Task '{req.task_id}' not found")
+        task_score = round(task.grade(final_code), 4)
+    return OptimizeResponse(
+        original_code=code,
+        optimized_code=final_code,
+        diff=diff_text,
+        steps=steps,
+        cumulative_reward=round(cumulative_reward, 4),
+        task_id=req.task_id,
+        task_score=task_score,
+    )
+# ---------------------------------------------------------------------------
+# Entry point
+# ---------------------------------------------------------------------------
+if __name__ == "__main__":
+    port = int(os.getenv("PORT", 7860))
+    uvicorn.run(app, host="0.0.0.0", port=port)

validate.py ADDED Viewed

	@@ -0,0 +1,281 @@

+"""
+ACRE pre-submission validator.
+Checks the repository against the submission checklist and, when a server URL is
+available, probes the HTTP API as well.
+Run:
+    python validate.py --url http://localhost:7860
+"""
+from __future__ import annotations
+import argparse
+import ast
+import re
+import sys
+from typing import Any, Tuple
+try:
+    import requests
+except ImportError:
+    print("[ERROR] requests is required. Run: pip install requests")
+    sys.exit(1)
+PASS = "\033[92m[PASS]\033[0m"
+FAIL = "\033[91m[FAIL]\033[0m"
+def check(label: str, ok: bool, detail: str = "") -> bool:
+    status = PASS if ok else FAIL
+    message = f"  {status}  {label}"
+    if detail:
+        message += f" - {detail}"
+    print(message)
+    return ok
+def get(url: str, path: str, timeout: int = 15) -> Tuple[bool, Any]:
+    try:
+        response = requests.get(f"{url}{path}", timeout=timeout)
+        response.raise_for_status()
+        return True, response.json()
+    except Exception as exc:
+        return False, str(exc)
+def post(url: str, path: str, payload: dict, timeout: int = 15) -> Tuple[bool, Any]:
+    try:
+        response = requests.post(f"{url}{path}", json=payload, timeout=timeout)
+        response.raise_for_status()
+        return True, response.json()
+    except Exception as exc:
+        return False, str(exc)
+def read_text(path: str) -> str:
+    with open(path, encoding="utf-8") as handle:
+        return handle.read()
+def run_validation(base_url: str) -> int:
+    failures = 0
+    print("\n" + "=" * 60)
+    print("  ACRE Pre-Submission Validator")
+    print("=" * 60)
+    print(f"  Target: {base_url}\n")
+    print("1. Static repository checks")
+    try:
+        interface_src = read_text("openenv_interface.py")
+        tree = ast.parse(interface_src)
+        classes = {node.name: node for node in tree.body if isinstance(node, ast.ClassDef)}
+        env_cls = classes.get("OpenEnvRefactorEnv")
+        failures += 0 if check("openenv_interface.py exists", True) else 1
+        failures += 0 if check("OpenEnvRefactorEnv is defined", env_cls is not None) else 1
+        if env_cls is not None:
+            methods = {node.name for node in env_cls.body if isinstance(node, ast.FunctionDef)}
+            for method_name in ["reset", "step", "state"]:
+                failures += 0 if check(
+                    f"OpenEnvRefactorEnv implements {method_name}()",
+                    method_name in methods,
+                ) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("openenv_interface.py exists", False, "file not found")
+    try:
+        models_src = read_text("models.py")
+        for name in ["ObservationModel", "ActionModel", "RewardModel"]:
+            failures += 0 if check(
+                f"{name} is defined in models.py",
+                f"class {name}" in models_src,
+            ) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("models.py exists", False, "file not found")
+    print("\n2. Health check (GET /)")
+    ok, data = get(base_url, "/")
+    failures += 0 if check("GET / returns HTTP 200", ok) else 1
+    if ok:
+        failures += 0 if check(
+            "Response has status field",
+            isinstance(data, dict) and "status" in data,
+            str(data),
+        ) else 1
+    print("\n3. Tasks (GET /tasks)")
+    ok, data = get(base_url, "/tasks")
+    failures += 0 if check("GET /tasks returns 200", ok) else 1
+    if ok:
+        tasks = data.get("tasks", []) if isinstance(data, dict) else []
+        failures += 0 if check("At least 3 tasks defined", len(tasks) >= 3, f"found {len(tasks)}") else 1
+        difficulties = [t.get("difficulty", "") for t in tasks]
+        for diff in ["easy", "medium", "hard"]:
+            failures += 0 if check(f"Task with difficulty '{diff}' exists", diff in difficulties) else 1
+        for task in tasks:
+            failures += 0 if check(
+                f"Task '{task.get('id')}' has initial_code",
+                bool(task.get("initial_code")),
+            ) else 1
+    print("\n4. Reset (POST /reset)")
+    ok, data = post(base_url, "/reset", {})
+    failures += 0 if check("POST /reset returns 200", ok) else 1
+    if ok:
+        observation = data.get("observation", {})
+        failures += 0 if check("Response has observation field", isinstance(observation, dict)) else 1
+        failures += 0 if check(
+            "Observation is typed with 4 fields",
+            {"code_length", "complexity_score", "runtime_s", "error_flag"}.issubset(observation),
+            str(observation),
+        ) else 1
+    ok, _ = post(base_url, "/reset", {"task_id": "rename_variables"})
+    failures += 0 if check("POST /reset with task_id works", ok) else 1
+    print("\n5. State (GET /state)")
+    ok, data = get(base_url, "/state")
+    failures += 0 if check("GET /state returns 200", ok) else 1
+    if ok:
+        required_keys = [
+            "current_code",
+            "episode_steps",
+            "max_steps",
+            "complexity",
+            "observation",
+            "observation_vector",
+            "action_meanings",
+        ]
+        for key in required_keys:
+            failures += 0 if check(f"State has '{key}' field", key in data) else 1
+    print("\n6. Step (POST /step)")
+    post(base_url, "/reset", {"task_id": "rename_variables"})
+    for action in range(5):
+        ok, data = post(base_url, "/step", {"action": action})
+        failures += 0 if check(
+            f"Action {action} executes without error",
+            ok and isinstance(data, dict) and "reward" in data and "done" in data,
+        ) else 1
+        if ok:
+            reward_payload = data.get("reward", {})
+            norm = reward_payload.get("normalized", -1)
+            failures += 0 if check(
+                f"Action {action} returns typed reward payload",
+                {"raw", "normalized", "components"}.issubset(reward_payload),
+                str(reward_payload),
+            ) else 1
+            failures += 0 if check(
+                f"Action {action} normalized_reward in [0,1]",
+                isinstance(norm, (int, float)) and 0.0 <= float(norm) <= 1.0,
+                f"got {norm}",
+            ) else 1
+            if data.get("done"):
+                break
+    ok, data = post(base_url, "/step", {"action": 99})
+    check("Invalid action returns error (not crash)", not ok or "detail" in str(data), "(expected 4xx)")
+    print("\n7. Task graders (POST /tasks/{id}/grade)")
+    for task_id in ["rename_variables", "remove_dead_code", "full_refactor"]:
+        ok, data = post(base_url, f"/tasks/{task_id}/grade", {"code": "def f(): pass"})
+        failures += 0 if check(f"Grade endpoint for '{task_id}' works", ok) else 1
+        if ok:
+            score = data.get("score", -1)
+            failures += 0 if check(
+                f"Score for '{task_id}' in [0.0, 1.0]",
+                isinstance(score, (int, float)) and 0.0 <= float(score) <= 1.0,
+                f"got {score}",
+            ) else 1
+    print("\n8. openenv.yaml")
+    try:
+        openenv_yaml = read_text("openenv.yaml")
+        failures += 0 if check("openenv.yaml exists", True) else 1
+        for field in ["tasks:", "action_space:", "observation_space:", "reward:", "entrypoint:", "validation:"]:
+            failures += 0 if check(f"openenv.yaml has '{field}' section", field in openenv_yaml) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("openenv.yaml exists", False, "file not found")
+    print("\n9. inference.py")
+    try:
+        inference_src = read_text("inference.py")
+        failures += 0 if check("inference.py exists", True) else 1
+        for marker in ['"event": "START"', '"event": "STEP"', '"event": "END"']:
+            failures += 0 if check(f"inference.py emits {marker}", marker in inference_src) else 1
+        failures += 0 if check(
+            "Uses OpenAI client",
+            "from openai import OpenAI" in inference_src,
+        ) else 1
+        for var in ["API_BASE_URL", "MODEL_NAME", "HF_TOKEN", "ENV_URL", "LOCAL_IMAGE_NAME"]:
+            failures += 0 if check(f"inference.py reads {var} from env", var in inference_src) else 1
+        failures += 0 if check(
+            "API_BASE_URL has a default",
+            'os.getenv("API_BASE_URL", "https://api.openai.com/v1")' in inference_src,
+        ) else 1
+        failures += 0 if check(
+            "MODEL_NAME has a default",
+            'os.getenv("MODEL_NAME", "gpt-4o-mini")' in inference_src,
+        ) else 1
+        failures += 0 if check(
+            "HF_TOKEN has no default",
+            re.search(r'HF_TOKEN\s*:\s*.*os\.getenv\("HF_TOKEN"\)', inference_src) is not None,
+        ) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("inference.py exists", False, "file not found")
+    print("\n10. Dockerfile")
+    try:
+        dockerfile = read_text("Dockerfile")
+        failures += 0 if check("Dockerfile exists", True) else 1
+        failures += 0 if check("Exposes port 7860", "7860" in dockerfile) else 1
+        failures += 0 if check("Has CMD/ENTRYPOINT", "CMD" in dockerfile or "ENTRYPOINT" in dockerfile) else 1
+        failures += 0 if check("Does not set a default HF_TOKEN", "ENV HF_TOKEN" not in dockerfile) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("Dockerfile exists", False, "file not found")
+    print("\n11. README / Hugging Face metadata")
+    try:
+        readme = read_text("README.md")
+        failures += 0 if check("README has docker SDK front matter", "sdk: docker" in readme) else 1
+        failures += 0 if check("README includes openenv tag", "openenv" in readme) else 1
+        for section in [
+            "Environment Overview and Motivation",
+            "Definitions of Action and Observation Spaces",
+            "Task Descriptions with Expected Difficulty Levels",
+            "Setup and Usage Instructions",
+            "Baseline Performance Scores",
+        ]:
+            failures += 0 if check(f"README includes '{section}'", section in readme) else 1
+    except FileNotFoundError:
+        failures += 1
+        check("README.md exists", False, "file not found")
+    print("\n" + "=" * 60)
+    if failures == 0:
+        print(f"  {PASS}  All checks passed. Repository is submission-ready.")
+    else:
+        print(f"  {FAIL}  {failures} check(s) failed. Fix before submitting.")
+    print("=" * 60 + "\n")
+    return failures
+def main() -> None:
+    parser = argparse.ArgumentParser(description="ACRE pre-submission validator")
+    parser.add_argument(
+        "--url",
+        default="http://localhost:7860",
+        help="Base URL of the running ACRE server",
+    )
+    args = parser.parse_args()
+    sys.exit(run_validation(args.url))
+if __name__ == "__main__":
+    main()