Spaces:

PRANAV05092003
/

autonomous-code-refactoring-env

Sleeping

App Files Files Community

PRANAV05092003 commited on Apr 8

Commit

8d66fec

1 Parent(s): 900e1f4

Updated structure and fixed module import issue

Browse files

Files changed (28) hide show

ACRE_FINAL/.gitignore +0 -26
ACRE_FINAL/Dockerfile +0 -23
ACRE_FINAL/README.md +0 -174
ACRE_FINAL/acre/__init__.py +0 -14
ACRE_FINAL/acre/actions/__init__.py +0 -6
ACRE_FINAL/acre/actions/transformations.py +0 -518
ACRE_FINAL/acre/datasets/__init__.py +0 -6
ACRE_FINAL/acre/datasets/code_samples.py +0 -34
ACRE_FINAL/acre/demo.py +0 -185
ACRE_FINAL/acre/main.py +0 -39
ACRE_FINAL/acre/tasks/__init__.py +0 -3
ACRE_FINAL/acre/tasks/task_registry.py +0 -222
ACRE_FINAL/acre/training/__init__.py +0 -6
ACRE_FINAL/acre/training/train_agent.py +0 -75
ACRE_FINAL/acre/utils/__init__.py +0 -6
ACRE_FINAL/acre/utils/metrics.py +0 -33
ACRE_FINAL/inference.py +0 -278
ACRE_FINAL/models.py +0 -156
ACRE_FINAL/openenv.yaml +0 -85
ACRE_FINAL/openenv_interface.py +0 -116
ACRE_FINAL/requirements.txt +0 -11
ACRE_FINAL/server.py +0 -667
ACRE_FINAL/validate.py +0 -281
README.md +4 -4
acre/tasks/task_registry.py +212 -31
inference.py +26 -63
openenv_interface.py +17 -1
validate.py +10 -2

ACRE_FINAL/.gitignore DELETED Viewed

@@ -1,26 +0,0 @@
-__pycache__/
-*.pyc
-*.pyo
-*.pyd
-.Python
-*.egg-info/
-dist/
-build/
-.env
-.venv
-venv/
-*.zip
-acre_agent.zip
-*.log
-.DS_Store
-.deps/
-libs/
-numpy.libs/
-*.dll
-*.so
-*.dylib
-env/
-ENV/
-.cache/
-.huggingface/
-Thumbs.db

ACRE_FINAL/Dockerfile DELETED Viewed

@@ -1,23 +0,0 @@
-FROM python:3.11-slim
-WORKDIR /app
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    build-essential \
-    && rm -rf /var/lib/apt/lists/*
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-COPY . .
-ENV API_BASE_URL=https://api.openai.com/v1
-ENV MODEL_NAME=gpt-4o-mini
-ENV PORT=7860
-EXPOSE 7860
-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-  CMD python -c "import requests; requests.get('http://localhost:7860/').raise_for_status()"
-CMD ["python", "server.py"]

ACRE_FINAL/README.md DELETED Viewed

@@ -1,174 +0,0 @@
----
-title: ACRE - Autonomous Code Refactoring Environment
-colorFrom: blue
-colorTo: green
-sdk: docker
-app_port: 7860
-pinned: false
-license: mit
-tags:
-  - openenv
----
-# ACRE - Autonomous Code Refactoring Environment
-ACRE is an OpenEnv-compatible environment for autonomous Python code refactoring. An agent receives real code-cleanup tasks and must improve the code through AST-based transformations while receiving dense reward feedback for correctness, simplification, and performance.
-## Environment Overview and Motivation
-This project simulates a realistic developer workflow: cleaning up messy Python code, removing dead logic, simplifying loops, and inlining trivial helpers. The canonical OpenEnv wrapper lives in `openenv_interface.py`, while the original Gymnasium-compatible environment remains available for RL training and demos.
-## Definitions of Action and Observation Spaces
-### Action Space - Discrete(5)
-| Action | Name | Description |
-|---|---|---|
-| 0 | rename_variable | Rename generic variables like `x`, `tmp`, and `i` |
-| 1 | remove_dead_code | Remove unreachable statements, `if False` branches, and unused assignments |
-| 2 | simplify_loop | Convert append-loops into list comprehensions |
-| 3 | optimize_condition | Simplify `not not x`, `if True`, `if False`, and boolean comparisons |
-| 4 | inline_function | Inline simple single-return module-level functions |
-### Observation Space - Box(4,)
-The environment tracks:
-- `code_length`
-- `complexity_score`
-- `runtime_s`
-- `error_flag`
-### Typed OpenEnv Models
-The submission-facing interface uses Pydantic models in `models.py`:
-- `ObservationModel`
-- `ActionModel`
-- `RewardModel`
-- `StateResponse`
-The canonical interface is:
-```python
-observation = env.reset(...)
-observation, reward, done, info = env.step(action)
-state = env.state()
-```
-## Task Descriptions with Expected Difficulty Levels
-| Task ID | Difficulty | Objective |
-|---|---|---|
-| `rename_variables` | Easy | Remove generic variable names from the snippet |
-| `remove_dead_code` | Medium | Eliminate dead branches, unreachable code, and unused assignments |
-| `full_refactor` | Hard | Combine renaming, dead-code removal, loop simplification, condition optimization, and inlining |
-Each task includes a deterministic AST-based grader returning a score in `[0.0, 1.0]`.
-## Reward Design
-Rewards are shaped throughout the trajectory instead of only at the end.
-- Success reward for syntactically valid, executable output
-- Complexity reward when control-flow complexity decreases
-- Performance reward when runtime improves
-- Error penalty for invalid or failing code
-- No-change penalty to discourage loops and unproductive actions
-Raw reward range is `[-32, 20]`, normalized to `[0.0, 1.0]` with `(raw + 32) / 52`.
-## HTTP API
-| Method | Path | Purpose |
-|---|---|---|
-| GET | `/` | Health check |
-| GET | `/health` | Compatibility health check |
-| POST | `/reset` | Reset environment and return typed observation/state |
-| POST | `/step` | Apply one action and return typed observation/reward/done |
-| GET | `/state` | Return the current typed state |
-| GET | `/tasks` | List available tasks |
-| POST | `/tasks/{task_id}/grade` | Grade submitted code |
-## Setup and Usage Instructions
-### Local setup
-```bash
-pip install -r requirements.txt
-python server.py
-```
-### Baseline inference
-Set environment variables before running:
-```bash
-export API_BASE_URL=https://api.openai.com/v1
-export MODEL_NAME=gpt-4o-mini
-export HF_TOKEN=your_key
-export ENV_URL=http://localhost:7860
-python inference.py
-```
-Notes:
-- `API_BASE_URL` and `MODEL_NAME` have defaults in `inference.py`
-- `HF_TOKEN` is optional because the script falls back to a deterministic heuristic baseline
-- `LOCAL_IMAGE_NAME` is read for evaluator compatibility when using a local Docker image launcher
-### Docker / Hugging Face Spaces
-```bash
-docker build -t acre .
-docker run -p 7860:7860 \
-  -e API_BASE_URL=https://api.openai.com/v1 \
-  -e MODEL_NAME=gpt-4o-mini \
-  -e HF_TOKEN=your_key \
-  -e ENV_URL=http://localhost:7860 \
-  acre
-```
-The repository is configured for a Docker-based Hugging Face Space and includes the `openenv` tag in the front matter.
-## Validation
-Run the repository validator:
-```bash
-python validate.py --url http://localhost:7860
-```
-When using the official hackathon tooling, also run:
-```bash
-openenv validate
-```
-## Interactive Demo
-Start the server and open:
-```text
-http://localhost:7860/demo
-```
-The demo shows:
-- Original code
-- Optimized code
-- Unified diff
-- Per-step action and reward logs
-## Baseline Performance Scores
-The deterministic fallback policy used by `inference.py` produces the following reproducible task scores:
-| Task | Score |
-|---|---|
-| `rename_variables` | 1.0 |
-| `remove_dead_code` | 1.0 |
-| `full_refactor` | 1.0 |
-| Average | 1.0 |
-These scores come from the built-in heuristic policy with `HF_TOKEN` unset, which keeps the baseline reproducible across runs.

ACRE_FINAL/acre/__init__.py DELETED Viewed

@@ -1,14 +0,0 @@
-"""
-ACRE (Autonomous Code Refactoring Environment).
-Package skeleton for an RL-based code refactoring system.
-"""
-__all__ = [
-    "env",
-    "actions",
-    "datasets",
-    "training",
-    "utils",
-]

ACRE_FINAL/acre/actions/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-"""Action definitions and transformations for ACRE."""
-from .transformations import Transformation, TransformationResult
-__all__ = ["Transformation", "TransformationResult"]

ACRE_FINAL/acre/actions/transformations.py DELETED Viewed

@@ -1,518 +0,0 @@
-from __future__ import annotations
-import ast
-import copy
-from dataclasses import dataclass
-from itertools import zip_longest
-from typing import Any, Dict, Protocol
-@dataclass(frozen=True)
-class TransformationResult:
-    """Output of applying a transformation (placeholder)."""
-    code: str
-    changed: bool
-    metadata: Dict[str, Any]
-class Transformation(Protocol):
-    """Protocol for a code transformation."""
-    name: str
-    def apply(self, code: str) -> TransformationResult: ...
-def noop_transformation(code: str) -> TransformationResult:
-    """Baseline transformation that leaves code unchanged."""
-    return TransformationResult(code=code, changed=False, metadata={"kind": "noop"})
-def _finalize_result(*, original: str, out: str, meta: Dict[str, Any]) -> TransformationResult:
-    """
-    Standardize metadata across transformations.
-    - Adds `lines_changed` and `impact` for explainability/metrics.
-    - Ensures formatting-only changes don't count as `changed`.
-    """
-    def _count_lines_changed(a: str, b: str) -> int:
-        a_lines = a.splitlines()
-        b_lines = b.splitlines()
-        changed = 0
-        for x, y in zip_longest(a_lines, b_lines, fillvalue=None):
-            if x != y:
-                changed += 1
-        return int(changed)
-    lines_changed = _count_lines_changed(original, out)
-    # Fallback identity check: AST round-trips can reformat without changing meaning.
-    # If the textual content is the same after stripping, treat it as unchanged.
-    if out.strip() == original.strip():
-        meta["success"] = False
-        meta["lines_changed"] = 0
-        meta["impact"] = "low"
-        return TransformationResult(code=original, changed=False, metadata=meta)
-    meta["lines_changed"] = lines_changed
-    meta["impact"] = "high" if lines_changed >= 3 else "low"
-    meta["success"] = True
-    return TransformationResult(code=out, changed=True, metadata=meta)
-def _unchanged(*, code: str, meta: Dict[str, Any]) -> TransformationResult:
-    meta.setdefault("success", False)
-    meta.setdefault("lines_changed", 0)
-    meta.setdefault("impact", "low")
-    return TransformationResult(code=code, changed=False, metadata=meta)
-def rename_variable(code: str) -> TransformationResult:
-    """
-    Rename simple, generic variable names to more descriptive ones.
-    Hackathon-scope heuristic:
-    - Rename generic names in priority order: x, tmp, i.
-    - Uses descriptive base names and avoids collisions.
-    - Applies to Name nodes and function args.
-    """
-    meta: Dict[str, Any] = {"type": "rename_variable", "success": False}
-    try:
-        tree = ast.parse(code)
-        class _NameCollector(ast.NodeVisitor):
-            def __init__(self) -> None:
-                self.names: set[str] = set()
-            def visit_Name(self, node: ast.Name) -> None:  # noqa: N802
-                self.names.add(node.id)
-            def visit_arg(self, node: ast.arg) -> None:  # noqa: N802
-                self.names.add(node.arg)
-        collector = _NameCollector()
-        collector.visit(tree)
-        rename_plan = [
-            ("x", "value"),
-            ("tmp", "temp_value"),
-            ("i", "index"),
-        ]
-        old = ""
-        base_new = "value"
-        for candidate_old, candidate_base in rename_plan:
-            if candidate_old in collector.names:
-                old = candidate_old
-                base_new = candidate_base
-                break
-        if not old:
-            return _unchanged(code=code, meta=meta)
-        new = base_new
-        i = 1
-        while new in collector.names:
-            new = f"{base_new}{i}"
-            i += 1
-        class _Renamer(ast.NodeTransformer):
-            def __init__(self, old_name: str, new_name: str) -> None:
-                self.old_name = old_name
-                self.new_name = new_name
-                self.changed = False
-            def visit_Name(self, node: ast.Name) -> ast.AST:  # noqa: N802
-                if node.id == self.old_name:
-                    self.changed = True
-                    return ast.copy_location(ast.Name(id=self.new_name, ctx=node.ctx), node)
-                return node
-            def visit_arg(self, node: ast.arg) -> ast.AST:  # noqa: N802
-                if node.arg == self.old_name:
-                    self.changed = True
-                    new_node = copy.copy(node)
-                    new_node.arg = self.new_name
-                    return new_node
-                return node
-        renamer = _Renamer(old, new)
-        tree = renamer.visit(tree)
-        ast.fix_missing_locations(tree)
-        if not renamer.changed:
-            return _unchanged(code=code, meta=meta)
-        out = ast.unparse(tree)
-        meta["old"] = old
-        meta["new"] = new
-        # Renames tend to be small diffs; label as low impact unless the diff is large.
-        return _finalize_result(original=code, out=out, meta=meta)
-    except Exception:
-        return _unchanged(code=code, meta=meta)
-def remove_dead_code(code: str) -> TransformationResult:
-    """
-    Remove simple dead code patterns.
-    Hackathon-scope heuristics:
-    - Drop statements after `return` / `raise` in the same block.
-    - Remove `if False: ...` blocks (keep `else` if present).
-    - Remove assignments to unused names in a block (very simple check).
-    """
-    meta: Dict[str, Any] = {"type": "remove_dead_code", "success": False}
-    try:
-        tree = ast.parse(code)
-        def _is_const_bool(expr: ast.AST, value: bool) -> bool:
-            return isinstance(expr, ast.Constant) and isinstance(expr.value, bool) and expr.value is value
-        class _LoadNameCollector(ast.NodeVisitor):
-            def __init__(self) -> None:
-                self.loaded: set[str] = set()
-            def visit_Name(self, node: ast.Name) -> None:  # noqa: N802
-                if isinstance(node.ctx, ast.Load):
-                    self.loaded.add(node.id)
-        class _DeadCode(ast.NodeTransformer):
-            def __init__(self) -> None:
-                self.changed = False
-            def _prune_unreachable(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
-                out: list[ast.stmt] = []
-                unreachable = False
-                for s in stmts:
-                    if unreachable:
-                        self.changed = True
-                        continue
-                    out.append(s)
-                    if isinstance(s, (ast.Return, ast.Raise)):
-                        unreachable = True
-                return out
-            def _remove_unused_assigns(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
-                collector = _LoadNameCollector()
-                for s in stmts:
-                    collector.visit(s)
-                used = collector.loaded
-                out: list[ast.stmt] = []
-                for s in stmts:
-                    if isinstance(s, ast.Assign) and all(isinstance(t, ast.Name) for t in s.targets):
-                        targets = [t.id for t in s.targets if isinstance(t, ast.Name)]
-                        # Remove only if *all* assigned names are unused.
-                        if targets and all(t not in used for t in targets):
-                            self.changed = True
-                            continue
-                    if isinstance(s, ast.AnnAssign) and isinstance(s.target, ast.Name):
-                        if s.target.id not in used:
-                            self.changed = True
-                            continue
-                    out.append(s)
-                return out
-            def _clean_block(self, stmts: list[ast.stmt]) -> list[ast.stmt]:
-                # First apply transformations inside statements.
-                visited = [self.visit(s) for s in stmts]
-                flat: list[ast.stmt] = []
-                for s in visited:
-                    if s is None:
-                        self.changed = True
-                        continue
-                    if isinstance(s, list):
-                        flat.extend([x for x in s if isinstance(x, ast.stmt)])
-                        self.changed = True
-                    else:
-                        flat.append(s)
-                flat = self._prune_unreachable(flat)
-                flat = self._remove_unused_assigns(flat)
-                return flat
-            def visit_Module(self, node: ast.Module) -> ast.AST:  # noqa: N802
-                node.body = self._clean_block(node.body)
-                return node
-            def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.AST:  # noqa: N802
-                node.body = self._clean_block(node.body)
-                return node
-            def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AST:  # noqa: N802
-                node.body = self._clean_block(node.body)
-                return node
-            def visit_If(self, node: ast.If) -> ast.AST | list[ast.stmt]:  # noqa: N802
-                node = self.generic_visit(node)
-                if _is_const_bool(node.test, False):
-                    self.changed = True
-                    return node.orelse or []
-                return node
-            def visit_While(self, node: ast.While) -> ast.AST | None:  # noqa: N802
-                node = self.generic_visit(node)
-                if _is_const_bool(node.test, False):
-                    self.changed = True
-                    return None
-                return node
-        dc = _DeadCode()
-        tree = dc.visit(tree)
-        ast.fix_missing_locations(tree)
-        if not dc.changed:
-            return _unchanged(code=code, meta=meta)
-        out = ast.unparse(tree)
-        return _finalize_result(original=code, out=out, meta=meta)
-    except Exception:
-        return _unchanged(code=code, meta=meta)
-def simplify_loops(code: str) -> TransformationResult:
-    """
-    Simplify very basic loop patterns into more pythonic forms.
-    Supported pattern (only when adjacent in the same block):
-    - xs = []
-      for t in it:
-          xs.append(expr)
-      => xs = [expr for t in it]
-    """
-    meta: Dict[str, Any] = {"type": "simplify_loops", "success": False}
-    try:
-        tree = ast.parse(code)
-        class _LoopSimplifier(ast.NodeTransformer):
-            def __init__(self) -> None:
-                self.changed = False
-            def _simplify_body(self, body: list[ast.stmt]) -> list[ast.stmt]:
-                out: list[ast.stmt] = []
-                i = 0
-                while i < len(body):
-                    cur = body[i]
-                    nxt = body[i + 1] if i + 1 < len(body) else None
-                    if (
-                        isinstance(cur, ast.Assign)
-                        and len(cur.targets) == 1
-                        and isinstance(cur.targets[0], ast.Name)
-                        and isinstance(cur.value, ast.List)
-                        and cur.value.elts == []
-                        and isinstance(nxt, ast.For)
-                        and len(nxt.body) == 1
-                        and isinstance(nxt.body[0], ast.Expr)
-                        and isinstance(nxt.body[0].value, ast.Call)
-                    ):
-                        list_name = cur.targets[0].id
-                        call = nxt.body[0].value
-                        if (
-                            isinstance(call.func, ast.Attribute)
-                            and isinstance(call.func.value, ast.Name)
-                            and call.func.value.id == list_name
-                            and call.func.attr == "append"
-                            and len(call.args) == 1
-                            and not call.keywords
-                        ):
-                            # Build list comprehension: [call.args[0] for <target> in <iter>]
-                            comp = ast.ListComp(
-                                elt=call.args[0],
-                                generators=[
-                                    ast.comprehension(
-                                        target=nxt.target,
-                                        iter=nxt.iter,
-                                        ifs=[],
-                                        is_async=0,
-                                    )
-                                ],
-                            )
-                            new_assign = ast.Assign(targets=[ast.Name(id=list_name, ctx=ast.Store())], value=comp)
-                            out.append(ast.copy_location(new_assign, cur))
-                            self.changed = True
-                            i += 2
-                            continue
-                    out.append(cur)
-                    i += 1
-                return out
-            def visit_Module(self, node: ast.Module) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                node.body = self._simplify_body(node.body)
-                return node
-            def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                node.body = self._simplify_body(node.body)
-                return node
-            def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                node.body = self._simplify_body(node.body)
-                return node
-        simp = _LoopSimplifier()
-        tree = simp.visit(tree)
-        ast.fix_missing_locations(tree)
-        if not simp.changed:
-            return _unchanged(code=code, meta=meta)
-        out = ast.unparse(tree)
-        return _finalize_result(original=code, out=out, meta=meta)
-    except Exception:
-        return _unchanged(code=code, meta=meta)
-def simplify_loop(code: str) -> TransformationResult:
-    # Backwards-compatible alias for the environment's action mapping.
-    return simplify_loops(code)
-def optimize_condition(code: str) -> TransformationResult:
-    """
-    Simplify redundant boolean conditions.
-    Hackathon-scope heuristics:
-    - Replace `if True:` with its body; `if False:` with `else` (if present).
-    - Simplify `not not X` -> `X`.
-    - Simplify comparisons to True/False: `X == True` -> `X`, `X == False` -> `not X`.
-    """
-    meta: Dict[str, Any] = {"type": "optimize_condition", "success": False}
-    try:
-        tree = ast.parse(code)
-        def _is_bool_const(node: ast.AST, value: bool) -> bool:
-            return isinstance(node, ast.Constant) and isinstance(node.value, bool) and node.value is value
-        class _CondOpt(ast.NodeTransformer):
-            def __init__(self) -> None:
-                self.changed = False
-            def visit_UnaryOp(self, node: ast.UnaryOp) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                if isinstance(node.op, ast.Not) and isinstance(node.operand, ast.UnaryOp) and isinstance(node.operand.op, ast.Not):
-                    self.changed = True
-                    return node.operand.operand
-                return node
-            def visit_Compare(self, node: ast.Compare) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                if len(node.ops) == 1 and len(node.comparators) == 1:
-                    op = node.ops[0]
-                    rhs = node.comparators[0]
-                    if isinstance(op, (ast.Eq, ast.Is)) and _is_bool_const(rhs, True):
-                        self.changed = True
-                        return node.left
-                    if isinstance(op, (ast.Eq, ast.Is)) and _is_bool_const(rhs, False):
-                        self.changed = True
-                        return ast.UnaryOp(op=ast.Not(), operand=node.left)
-                return node
-            def visit_If(self, node: ast.If) -> ast.AST | list[ast.stmt]:  # noqa: N802
-                node = self.generic_visit(node)
-                if _is_bool_const(node.test, True):
-                    self.changed = True
-                    return node.body
-                if _is_bool_const(node.test, False):
-                    self.changed = True
-                    return node.orelse or []
-                return node
-        opt = _CondOpt()
-        tree = opt.visit(tree)
-        ast.fix_missing_locations(tree)
-        if not opt.changed:
-            return _unchanged(code=code, meta=meta)
-        out = ast.unparse(tree)
-        return _finalize_result(original=code, out=out, meta=meta)
-    except Exception:
-        return _unchanged(code=code, meta=meta)
-def inline_function(code: str) -> TransformationResult:
-    """
-    Inline very simple functions into their call sites.
-    Supported pattern:
-    - def f(a, b): return <expr using only a,b>
-    - Replace calls: f(x, y) -> <expr with a->x, b->y>
-    Only handles module-level functions and positional args.
-    """
-    meta: Dict[str, Any] = {"type": "inline_function", "success": False}
-    try:
-        tree = ast.parse(code)
-        simple_fns: Dict[str, tuple[list[str], ast.AST]] = {}
-        for node in tree.body:
-            if not isinstance(node, ast.FunctionDef):
-                continue
-            if node.decorator_list:
-                continue
-            args = node.args
-            if args.vararg or args.kwarg or args.kwonlyargs or args.defaults or args.posonlyargs:
-                continue
-            if len(node.body) != 1 or not isinstance(node.body[0], ast.Return) or node.body[0].value is None:
-                continue
-            arg_names = [a.arg for a in args.args]
-            # Ensure the return expression only references the function's args.
-            referenced: set[str] = set()
-            class _Ref(ast.NodeVisitor):
-                def visit_Name(self, n: ast.Name) -> None:  # noqa: N802
-                    if isinstance(n.ctx, ast.Load):
-                        referenced.add(n.id)
-            _Ref().visit(node.body[0].value)
-            if not referenced.issubset(set(arg_names)):
-                continue
-            simple_fns[node.name] = (arg_names, node.body[0].value)
-        if not simple_fns:
-            return _unchanged(code=code, meta=meta)
-        class _Substitute(ast.NodeTransformer):
-            def __init__(self, mapping: Dict[str, ast.AST]) -> None:
-                self.mapping = mapping
-            def visit_Name(self, n: ast.Name) -> ast.AST:  # noqa: N802
-                if isinstance(n.ctx, ast.Load) and n.id in self.mapping:
-                    return copy.deepcopy(self.mapping[n.id])
-                return n
-        class _Inliner(ast.NodeTransformer):
-            def __init__(self) -> None:
-                self.changed = False
-            def visit_Call(self, node: ast.Call) -> ast.AST:  # noqa: N802
-                node = self.generic_visit(node)
-                if not isinstance(node.func, ast.Name):
-                    return node
-                fn = simple_fns.get(node.func.id)
-                if fn is None:
-                    return node
-                arg_names, expr = fn
-                if node.keywords or len(node.args) != len(arg_names):
-                    return node
-                mapping = {name: arg for name, arg in zip(arg_names, node.args, strict=True)}
-                new_expr = _Substitute(mapping).visit(copy.deepcopy(expr))
-                self.changed = True
-                return ast.copy_location(new_expr, node)
-        inliner = _Inliner()
-        tree = inliner.visit(tree)
-        ast.fix_missing_locations(tree)
-        if not inliner.changed:
-            return _unchanged(code=code, meta=meta)
-        out = ast.unparse(tree)
-        meta["inlined"] = sorted(simple_fns.keys())
-        return _finalize_result(original=code, out=out, meta=meta)
-    except Exception:
-        return _unchanged(code=code, meta=meta)

ACRE_FINAL/acre/datasets/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-"""Datasets and sample code providers for ACRE."""
-from .code_samples import CodeSample, CodeSampleDataset
-__all__ = ["CodeSample", "CodeSampleDataset"]

ACRE_FINAL/acre/datasets/code_samples.py DELETED Viewed

@@ -1,34 +0,0 @@
-from __future__ import annotations
-from dataclasses import dataclass
-from typing import Iterable, Iterator, List, Optional
-@dataclass(frozen=True)
-class CodeSample:
-    """A single code sample (placeholder)."""
-    id: str
-    language: str
-    code: str
-class CodeSampleDataset:
-    """
-    Minimal in-memory dataset stub.
-    Later versions can back this with files, Git repos, or benchmark suites.
-    """
-    def __init__(self, samples: Optional[Iterable[CodeSample]] = None) -> None:
-        self._samples: List[CodeSample] = list(samples or [])
-    def __len__(self) -> int:
-        return len(self._samples)
-    def __iter__(self) -> Iterator[CodeSample]:
-        return iter(self._samples)
-    def add(self, sample: CodeSample) -> None:
-        self._samples.append(sample)

ACRE_FINAL/acre/demo.py DELETED Viewed

@@ -1,185 +0,0 @@
-from __future__ import annotations
-import os
-import random
-import sys
-from typing import Any, Optional, Tuple
-from acre.datasets.code_samples import CodeSample, CodeSampleDataset
-from acre.env.refactor_env import RefactorEnv
-def _load_model(path: str):
-    """Load a Stable-Baselines3 PPO model if available; otherwise return None."""
-    if not os.path.exists(path):
-        return None
-    try:
-        from stable_baselines3 import PPO
-    except Exception:
-        return None
-    try:
-        return PPO.load(path)
-    except Exception:
-        return None
-def _messy_sample_code() -> str:
-    # Intentionally "messy" but valid Python for demo purposes.
-    return (
-        "def add(a,b):\n"
-        "    x=0\n"
-        "    for i in range(a):\n"
-        "        x=x+1\n"
-        "    if True:\n"
-        "        x = x\n"
-        "    if False:\n"
-        "        y=123\n"
-        "    else:\n"
-        "        y=0\n"
-        "    def f(p,q):\n"
-        "        return p+q\n"
-        "    r = f(x,y)\n"
-        "    return r\n"
-    )
-def _format_code_block(code: str) -> str:
-    return "\n".join(f"  {line}" for line in code.rstrip().splitlines()) + "\n"
-def _safe_print(text: str) -> None:
-    """
-    Print text safely across Windows consoles (some default encodings can't print emojis).
-    """
-    encoding = sys.stdout.encoding or "utf-8"
-    try:
-        text.encode(encoding)
-        print(text, flush=True)
-    except Exception:
-        # Fall back to ASCII-friendly markers if emojis can't be encoded.
-        safe = text.replace("✅", "[OK]").replace("⚠️", "[WARN]").replace("⚠", "[WARN]")
-        print(safe, flush=True)
-def _compute_runtime(executor: Any, code: str) -> float:
-    """Best-effort runtime metric using the current executor contract."""
-    try:
-        res = executor.run(code, filename="demo.py")
-        if getattr(res, "exit_code", 1) == 0 and isinstance(getattr(res, "metrics", None), dict):
-            return float(res.metrics.get("runtime_s", 0.0) or 0.0)
-    except Exception:
-        pass
-    return 0.0
-def _choose_action(model: Any, obs, env: RefactorEnv, rng: random.Random) -> Tuple[int, str]:
-    """Choose an action from the model, falling back to random."""
-    n_actions = int(getattr(getattr(env, "action_space", None), "n", 5))
-    if model is None:
-        a = int(rng.randint(0, n_actions - 1))
-        return a, "random"
-    try:
-        action, _state = model.predict(obs, deterministic=True)
-        # SB3 may return scalar or 1-element array.
-        if hasattr(action, "__len__"):
-            a = int(action[0])
-        else:
-            a = int(action)
-        return a, "ppo"
-    except Exception:
-        a = int(rng.randint(0, n_actions - 1))
-        return a, "random"
-def run_demo(*, model_path: str = "acre_agent.zip", seed: int = 0) -> None:
-    rng = random.Random(seed)
-    # Create a dataset with one messy sample so `reset()` loads it deterministically.
-    dataset = CodeSampleDataset(
-        [
-            CodeSample(
-                id="demo_sample",
-                language="python",
-                code=_messy_sample_code(),
-            )
-        ]
-    )
-    env = RefactorEnv(dataset=dataset, seed=seed)
-    model = _load_model(model_path)
-    model_status = "loaded" if model is not None else "not found (using random actions)"
-    # Reset and capture the original code/metrics.
-    obs, info = env.reset()
-    original_code = getattr(env, "_code", "")
-    original_complexity = float(getattr(env, "_compute_complexity")(original_code))
-    original_runtime = _compute_runtime(env.executor, original_code)
-    print("=" * 72)
-    print("ACRE: Autonomous RL Code Refactoring Agent (5-step episode)")
-    print(f"Model: {model_path} -> {model_status}")
-    print(f"Sample: {info.get('sample_id')} ({info.get('language')})")
-    print("=" * 72)
-    print("\nORIGINAL CODE:\n")
-    print(_format_code_block(original_code))
-    total_reward = 0.0
-    successful_transformations = 0
-    steps_taken = 0
-    for step_idx in range(1, 6):
-        action, policy = _choose_action(model, obs, env, rng)
-        obs, reward, terminated, truncated, step_info = env.step(action)
-        total_reward += float(reward)
-        steps_taken = step_idx
-        action_name = step_info.get("action_name", "unknown")
-        transform_meta = step_info.get("transform", {})
-        if isinstance(transform_meta, dict) and bool(transform_meta.get("success", False)):
-            successful_transformations += 1
-        transformed_code = getattr(env, "_code", "")
-        print("-" * 72)
-        print(f"STEP {step_idx}/5")
-        print(f"policy={policy} action={action} ({action_name})")
-        print(f"transform={transform_meta}")
-        print(f"reward={float(reward):.2f}  components={step_info.get('reward_components')}")
-        print("\nUPDATED CODE:\n")
-        print(_format_code_block(transformed_code))
-        if terminated or truncated:
-            break
-    final_code = getattr(env, "_code", "")
-    final_complexity = float(getattr(env, "_compute_complexity")(final_code))
-    final_runtime = _compute_runtime(env.executor, final_code)
-    print("=" * 72)
-    print("FINAL SUMMARY")
-    print("=" * 72)
-    print(f"total_reward: {total_reward:.2f}")
-    print(f"complexity: {original_complexity:.0f} -> {final_complexity:.0f}")
-    print(f"runtime_s:   {original_runtime:.4f} -> {final_runtime:.4f}")
-    complexity_improvement = ((original_complexity - final_complexity) / max(original_complexity, 1.0)) * 100.0
-    print(f"complexity improvement: {complexity_improvement:.2f}%")
-    print("\nCHANGES APPLIED:")
-    print(f"- Total steps: {steps_taken}")
-    print(f"- Successful transformations: {successful_transformations}")
-    if total_reward > 0:
-        _safe_print("\n✅ Code improved successfully")
-    else:
-        _safe_print("\n⚠️ No significant improvement")
-    print("\nFINAL CODE:\n")
-    print(_format_code_block(final_code))
-    env.close()
-if __name__ == "__main__":
-    run_demo()

ACRE_FINAL/acre/main.py DELETED Viewed

@@ -1,39 +0,0 @@
-from __future__ import annotations
-import argparse
-from acre.training.train_agent import TrainConfig, train
-def _build_parser() -> argparse.ArgumentParser:
-    parser = argparse.ArgumentParser(prog="acre", description="ACRE: Autonomous Code Refactoring Environment")
-    sub = parser.add_subparsers(dest="command", required=False)
-    train_p = sub.add_parser("train", help="Run training (stub)")
-    train_p.add_argument("--total-steps", type=int, default=100, help="Total training steps (stub)")
-    sub.add_parser("demo", help="Run a small demo (stub)")
-    return parser
-def run_demo() -> None:
-    # Placeholder for a future interactive/demo flow.
-    print("ACRE demo mode is not implemented yet.")
-def main(argv: list[str] | None = None) -> None:
-    parser = _build_parser()
-    args = parser.parse_args(argv)
-    if args.command == "demo":
-        run_demo()
-        return
-    total_steps = getattr(args, "total_steps", 100)
-    train(config=TrainConfig(total_steps=total_steps))
-if __name__ == "__main__":
-    main()

ACRE_FINAL/acre/tasks/__init__.py DELETED Viewed

@@ -1,3 +0,0 @@
-from acre.tasks.task_registry import Task, TaskRegistry
-__all__ = ["Task", "TaskRegistry"]

ACRE_FINAL/acre/tasks/task_registry.py DELETED Viewed

@@ -1,222 +0,0 @@
-"""
-Three OpenEnv tasks with AST-based graders scoring 0.0-1.0.
-"""
-from __future__ import annotations
-import ast
-from dataclasses import dataclass
-from typing import Callable, Dict, List, Optional
-@dataclass
-class Task:
-    id: str
-    name: str
-    description: str
-    difficulty: str
-    initial_code: str
-    _grade_fn: Callable[[str], float]
-    def grade(self, code: str) -> float:
-        """Return a score in [0.0, 1.0]."""
-        try:
-            return float(min(1.0, max(0.0, self._grade_fn(code))))
-        except Exception:
-            return 0.0
-# ---------------------------------------------------------------------------
-# Task 1 — Easy: Rename generic variables
-# ---------------------------------------------------------------------------
-_EASY_CODE = """\
-def compute(x, y, tmp):
-    tmp = x + y
-    x = tmp * 2
-    result = x
-    return result
-"""
-def _grade_easy(code: str) -> float:
-    """Score = fraction of generic names (x, tmp) removed from all scopes."""
-    generic = {"x", "tmp"}
-    try:
-        tree = ast.parse(code)
-    except SyntaxError:
-        return 0.0
-    remaining: set[str] = set()
-    class _Collector(ast.NodeVisitor):
-        def visit_Name(self, node: ast.Name) -> None:
-            if node.id in generic:
-                remaining.add(node.id)
-            self.generic_visit(node)
-        def visit_arg(self, node: ast.arg) -> None:
-            if node.arg in generic:
-                remaining.add(node.arg)
-            self.generic_visit(node)
-    _Collector().visit(tree)
-    renamed = len(generic - remaining)
-    return renamed / len(generic)
-# ---------------------------------------------------------------------------
-# Task 2 — Medium: Remove dead code
-# ---------------------------------------------------------------------------
-_MEDIUM_CODE = """\
-def process(data):
-    result = []
-    for item in data:
-        result.append(item * 2)
-    if False:
-        print("never runs")
-    unused_var = 42
-    return result
-    print("unreachable")
-"""
-def _grade_medium(code: str) -> float:
-    """Score = fraction of dead-code patterns eliminated (3 checks, ~0.33 each)."""
-    try:
-        tree = ast.parse(code)
-    except SyntaxError:
-        return 0.0
-    source = ast.unparse(tree)
-    score = 0.0
-    # Check 1: if-False block removed
-    if "if False" not in source:
-        score += 1 / 3
-    # Check 2: unused_var assignment removed
-    if "unused_var" not in source:
-        score += 1 / 3
-    # Check 3: list comprehension used (loop simplified)
-    has_listcomp = any(isinstance(n, ast.ListComp) for n in ast.walk(tree))
-    if has_listcomp:
-        score += 1 / 3
-    return score
-# ---------------------------------------------------------------------------
-# Task 3 — Hard: Full refactor
-# ---------------------------------------------------------------------------
-_HARD_CODE = """\
-def add(p, q):
-    return p + q
-def compute(x, data, tmp):
-    result = []
-    for item in data:
-        result.append(item * 2)
-    if False:
-        y = 999
-    if True:
-        val = add(x, tmp)
-    unused = 0
-    flag = not not True
-    return val
-    print("dead")
-"""
-def _grade_hard(code: str) -> float:
-    """Score = fraction of 5 quality checks passed."""
-    try:
-        tree = ast.parse(code)
-    except SyntaxError:
-        return 0.0
-    source = ast.unparse(tree)
-    checks = 0
-    # 1. No generic variable names x/tmp in function signature or body
-    has_generic = False
-    class _GenCheck(ast.NodeVisitor):
-        def visit_arg(self, node: ast.arg) -> None:
-            nonlocal has_generic
-            if node.arg in {"x", "tmp"}:
-                has_generic = True
-    _GenCheck().visit(tree)
-    if not has_generic:
-        checks += 1
-    # 2. No if False block
-    if "if False" not in source:
-        checks += 1
-    # 3. if True removed (body inlined)
-    if "if True" not in source:
-        checks += 1
-    # 4. List comprehension used
-    if any(isinstance(n, ast.ListComp) for n in ast.walk(tree)):
-        checks += 1
-    # 5. add() call inlined (no call to 'add')
-    calls = [n for n in ast.walk(tree) if isinstance(n, ast.Call)]
-    fn_names = {c.func.id for c in calls if isinstance(c.func, ast.Name)}
-    if "add" not in fn_names:
-        checks += 1
-    return checks / 5
-# ---------------------------------------------------------------------------
-# Registry
-# ---------------------------------------------------------------------------
-class TaskRegistry:
-    def __init__(self) -> None:
-        self._tasks: Dict[str, Task] = {}
-        self._register_all()
-    def _register_all(self) -> None:
-        self._tasks["rename_variables"] = Task(
-            id="rename_variables",
-            name="Rename Variables (Easy)",
-            description="Rename generic variable names (x, tmp) to descriptive ones",
-            difficulty="easy",
-            initial_code=_EASY_CODE,
-            _grade_fn=_grade_easy,
-        )
-        self._tasks["remove_dead_code"] = Task(
-            id="remove_dead_code",
-            name="Remove Dead Code (Medium)",
-            description="Remove unreachable code, if False blocks, and unused variables",
-            difficulty="medium",
-            initial_code=_MEDIUM_CODE,
-            _grade_fn=_grade_medium,
-        )
-        self._tasks["full_refactor"] = Task(
-            id="full_refactor",
-            name="Full Refactor (Hard)",
-            description="Apply all transformations: rename, dead code, loops, conditions, inlining",
-            difficulty="hard",
-            initial_code=_HARD_CODE,
-            _grade_fn=_grade_hard,
-        )
-    def get_task(self, task_id: str) -> Optional[Task]:
-        return self._tasks.get(task_id)
-    def list_tasks(self) -> List[dict]:
-        return [
-            {
-                "id": t.id,
-                "name": t.name,
-                "description": t.description,
-                "difficulty": t.difficulty,
-                "initial_code": t.initial_code,
-            }
-            for t in self._tasks.values()
-        ]

ACRE_FINAL/acre/training/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-"""Training utilities for ACRE."""
-from .train_agent import TrainConfig, train
-__all__ = ["TrainConfig", "train"]

ACRE_FINAL/acre/training/train_agent.py DELETED Viewed

@@ -1,75 +0,0 @@
-from __future__ import annotations
-from dataclasses import dataclass
-from typing import Optional
-from acre.env.refactor_env import RefactorEnv
-@dataclass(frozen=True)
-class TrainConfig:
-    """Configuration stub for training."""
-    total_steps: int = 5_000
-    seed: Optional[int] = None
-    model_path: str = "acre_agent.zip"
-def train(*, env: Optional[RefactorEnv] = None, config: Optional[TrainConfig] = None) -> None:
-    """
-    Train a PPO agent on `RefactorEnv` using Stable-Baselines3.
-    This is intentionally lightweight (hackathon-friendly) and focuses on a
-    working demo: basic training loop, simple logging, and saving the model.
-    """
-    _config = config or TrainConfig()
-    _env = env or RefactorEnv(seed=_config.seed)
-    try:
-        from stable_baselines3 import PPO
-        from stable_baselines3.common.callbacks import BaseCallback
-        from stable_baselines3.common.monitor import Monitor
-        from stable_baselines3.common.vec_env import DummyVecEnv
-    except Exception as e:  # pragma: no cover
-        print("Stable-Baselines3 is required for training. Install with `pip install -r requirements.txt`.")
-        print(f"Import error: {e}")
-        return None
-    class EpisodeRewardPrinter(BaseCallback):
-        """Print episode reward when an episode ends (via Monitor)."""
-        def __init__(self) -> None:
-            super().__init__()
-            self.episode_count = 0
-        def _on_step(self) -> bool:
-            infos = self.locals.get("infos", [])
-            for info in infos:
-                ep = info.get("episode") if isinstance(info, dict) else None
-                if isinstance(ep, dict) and "r" in ep:
-                    self.episode_count += 1
-                    print(f"episode={self.episode_count} reward={ep['r']:.2f} length={int(ep.get('l', 0))}")
-            return True
-    # Wrap with Monitor so SB3 can compute episode stats and expose them in `info["episode"]`.
-    def make_env() -> RefactorEnv:
-        return Monitor(_env)
-    vec_env = DummyVecEnv([make_env])
-    model = PPO(
-        policy="MlpPolicy",
-        env=vec_env,
-        verbose=0,
-        seed=_config.seed,
-        n_steps=64,
-        batch_size=64,
-    )
-    print(f"Training PPO for {int(_config.total_steps)} timesteps...")
-    model.learn(total_timesteps=int(_config.total_steps), callback=EpisodeRewardPrinter())
-    model.save(_config.model_path)
-    print(f"Saved model to {_config.model_path!r}")
-    return None

ACRE_FINAL/acre/utils/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-"""Shared utility helpers for ACRE."""
-from .metrics import Metric, MetricLogger
-__all__ = ["Metric", "MetricLogger"]

ACRE_FINAL/acre/utils/metrics.py DELETED Viewed

@@ -1,33 +0,0 @@
-from __future__ import annotations
-from dataclasses import dataclass, field
-from typing import Dict, Iterable, List, Tuple
-@dataclass(frozen=True)
-class Metric:
-    """Single scalar metric value (placeholder)."""
-    name: str
-    value: float
-@dataclass
-class MetricLogger:
-    """Tiny metric logger stub."""
-    _history: Dict[str, List[float]] = field(default_factory=dict)
-    def log(self, metric: Metric) -> None:
-        self._history.setdefault(metric.name, []).append(metric.value)
-    def latest(self) -> Dict[str, float]:
-        return {k: v[-1] for k, v in self._history.items() if v}
-    def as_series(self) -> Dict[str, Tuple[float, ...]]:
-        return {k: tuple(v) for k, v in self._history.items()}
-    def extend(self, metrics: Iterable[Metric]) -> None:
-        for m in metrics:
-            self.log(m)

ACRE_FINAL/inference.py DELETED Viewed

@@ -1,278 +0,0 @@
-"""
-ACRE inference script for OpenEnv submission evaluation.
-Required environment variables:
-  API_BASE_URL: LLM API endpoint (default allowed)
-  MODEL_NAME: model identifier (default allowed)
-  HF_TOKEN: API token for the OpenAI-compatible endpoint
-  ENV_URL: running ACRE server base URL
-Optional:
-  LOCAL_IMAGE_NAME: present for evaluator compatibility when using a local
-  Docker image launcher.
-Stdout format uses strict START / STEP / END event markers.
-"""
-from __future__ import annotations
-import json
-import os
-import re
-import sys
-import time
-from typing import Dict, List, Tuple
-import requests
-from openai import OpenAI
-API_BASE_URL: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
-MODEL_NAME: str = os.getenv("MODEL_NAME", "gpt-4o-mini")
-HF_TOKEN: str | None = os.getenv("HF_TOKEN")
-ENV_URL: str | None = os.getenv("ENV_URL")
-LOCAL_IMAGE_NAME: str | None = os.getenv("LOCAL_IMAGE_NAME")
-TASKS: List[str] = ["rename_variables", "remove_dead_code", "full_refactor"]
-ACTION_MEANINGS: Dict[int, str] = {
-    0: "rename_variable",
-    1: "remove_dead_code",
-    2: "simplify_loop",
-    3: "optimize_condition",
-    4: "inline_function",
-}
-SYSTEM_PROMPT = """\
-You are an RL agent that refactors Python code. Choose one action per step.
-Actions:
-  0 rename_variable   - rename generic names (x, tmp, i) to descriptive ones
-  1 remove_dead_code  - remove unreachable stmts, if False blocks, unused vars
-  2 simplify_loop     - convert append-loops to list comprehensions
-  3 optimize_condition- simplify 'not not x', 'if True/False', 'x==True'
-  4 inline_function   - inline simple single-return module-level functions
-Respond ONLY with valid JSON (no markdown):
-{"action": <0-4>, "reason": "<one sentence>"}"""
-def _env_url() -> str:
-    if ENV_URL:
-        return ENV_URL.rstrip("/")
-    raise RuntimeError("ENV_URL must be set before running inference.py")
-def _post(path: str, payload: dict | None = None) -> dict:
-    response = requests.post(f"{_env_url()}{path}", json=payload or {}, timeout=30)
-    response.raise_for_status()
-    return response.json()
-def _get(path: str) -> dict:
-    response = requests.get(f"{_env_url()}{path}", timeout=30)
-    response.raise_for_status()
-    return response.json()
-def reset_env(task_id: str) -> dict:
-    return _post("/reset", {"task_id": task_id})
-def step_env(action: int) -> dict:
-    return _post("/step", {"action": action})
-def get_state() -> dict:
-    return _get("/state")
-def grade(task_id: str, code: str) -> float:
-    response = requests.post(
-        f"{_env_url()}/tasks/{task_id}/grade",
-        json={"code": code},
-        timeout=30,
-    )
-    response.raise_for_status()
-    return float(response.json().get("score", 0.0))
-def choose_action(client: OpenAI, state: dict, task_id: str) -> Tuple[int, str]:
-    def heuristic_action() -> Tuple[int, str]:
-        code = str(state.get("current_code", ""))
-        step_i = int(state.get("episode_steps", 0))
-        has_generic = re.search(r"\b(x|tmp|i)\b", code) is not None
-        has_if_false = re.search(r"\bif\s+False\b", code) is not None
-        has_if_true = re.search(r"\bif\s+True\b", code) is not None
-        has_append_loop = ".append(" in code and "for " in code
-        has_double_not = "not not" in code
-        has_add_call = "add(" in code
-        if task_id == "rename_variables":
-            if has_generic:
-                return 0, "heuristic: remove generic names first"
-            if has_if_false or "unused" in code:
-                return 1, "heuristic: remove dead code"
-            if has_append_loop:
-                return 2, "heuristic: simplify loop"
-            if has_if_true or has_double_not:
-                return 3, "heuristic: optimize conditions"
-            return 4, "heuristic: inline simple function"
-        if task_id == "remove_dead_code":
-            if has_if_false or "unused" in code:
-                return 1, "heuristic: remove dead code patterns"
-            if has_append_loop:
-                return 2, "heuristic: convert append-loop"
-            if has_if_true or has_double_not:
-                return 3, "heuristic: simplify conditions"
-            if has_generic:
-                return 0, "heuristic: clean generic names"
-            return 4, "heuristic: inline helper"
-        if has_generic:
-            return 0, "heuristic: rename generic variables"
-        if has_append_loop:
-            return 2, "heuristic: simplify loop into listcomp"
-        if has_if_false or has_if_true or has_double_not:
-            return 3, "heuristic: optimize boolean branches"
-        if has_add_call:
-            return 4, "heuristic: inline add() call"
-        if step_i >= 2:
-            return 1, "heuristic: remove remaining dead code"
-        return 3, "heuristic: condition optimization as safe default"
-    if not HF_TOKEN:
-        return heuristic_action()
-    messages = [
-        {"role": "system", "content": SYSTEM_PROMPT},
-        {
-            "role": "user",
-            "content": (
-                f"Task: {task_id}\n"
-                f"Steps remaining: {state.get('max_steps', 5) - state.get('episode_steps', 0)}\n"
-                f"Complexity: {state.get('complexity', 0)}\n\n"
-                f"Current code:\n```python\n{state.get('current_code', '')}\n```\n\n"
-                "Choose the best action."
-            ),
-        },
-    ]
-    try:
-        response = client.chat.completions.create(
-            model=MODEL_NAME,
-            messages=messages,
-            temperature=0.0,
-            max_tokens=120,
-        )
-        raw = (response.choices[0].message.content or "").strip()
-        json_blob = raw
-        if "{" not in json_blob or "}" not in json_blob:
-            return heuristic_action()
-        match = re.search(r"\{.*\}", json_blob, flags=re.DOTALL)
-        if match:
-            json_blob = match.group(0)
-        parsed = json.loads(json_blob)
-        action = int(parsed.get("action", -1))
-        reason = str(parsed.get("reason", ""))
-        if 0 <= action <= 4:
-            return action, reason or "llm-selected action"
-        return heuristic_action()
-    except Exception:
-        return heuristic_action()
-def run_episode(client: OpenAI, task_id: str, episode_num: int) -> float:
-    reset_env(task_id)
-    state = get_state()
-    print(
-        json.dumps(
-            {
-                "event": "START",
-                "episode": episode_num,
-                "task_id": task_id,
-                "initial_complexity": state.get("complexity", 0),
-                "initial_code_length": len(state.get("current_code", "")),
-                "timestamp": time.time(),
-            }
-        ),
-        flush=True,
-    )
-    cumulative_reward = 0.0
-    for step_num in range(1, 6):
-        action, reason = choose_action(client, state, task_id)
-        result = step_env(action)
-        state = get_state()
-        reward_payload = result.get("reward", {})
-        raw_reward = float(reward_payload.get("raw", 0.0))
-        norm_reward = float(reward_payload.get("normalized", (raw_reward + 32) / 52))
-        cumulative_reward += raw_reward
-        print(
-            json.dumps(
-                {
-                    "event": "STEP",
-                    "episode": episode_num,
-                    "step": step_num,
-                    "action": action,
-                    "action_name": ACTION_MEANINGS.get(action, "unknown"),
-                    "reason": reason,
-                    "reward": round(raw_reward, 4),
-                    "normalized_reward": round(norm_reward, 4),
-                    "cumulative_reward": round(cumulative_reward, 4),
-                    "changed": result.get("info", {}).get("changed", False),
-                    "reward_components": reward_payload.get("components", {}),
-                    "done": result.get("done", False),
-                }
-            ),
-            flush=True,
-        )
-        if result.get("done") or result.get("terminated") or result.get("truncated"):
-            break
-    final_state = get_state()
-    task_score = grade(task_id, final_state.get("current_code", ""))
-    print(
-        json.dumps(
-            {
-                "event": "END",
-                "episode": episode_num,
-                "task_id": task_id,
-                "cumulative_reward": round(cumulative_reward, 4),
-                "normalized_cumulative": round((cumulative_reward + 32) / 52, 4),
-                "task_score": round(task_score, 4),
-                "final_complexity": final_state.get("complexity", 0),
-                "timestamp": time.time(),
-            }
-        ),
-        flush=True,
-    )
-    return task_score
-def main() -> None:
-    if not ENV_URL:
-        raise SystemExit("ENV_URL is required. Example: ENV_URL=http://localhost:7860")
-    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN or "dummy")
-    scores: List[float] = []
-    for i, task_id in enumerate(TASKS, start=1):
-        score = run_episode(client, task_id, i)
-        scores.append(score)
-    avg_score = sum(scores) / len(scores) if scores else 0.0
-    sys.exit(0 if avg_score >= 0.5 else 1)
-if __name__ == "__main__":
-    main()

ACRE_FINAL/models.py DELETED Viewed

@@ -1,156 +0,0 @@
-from __future__ import annotations
-from typing import Any, Dict, List, Optional, Sequence
-from pydantic import BaseModel, Field
-class ObservationModel(BaseModel):
-    code_length: float
-    complexity_score: float
-    runtime_s: float
-    error_flag: bool
-    @classmethod
-    def from_vector(cls, values: Sequence[float]) -> "ObservationModel":
-        vector = list(values)
-        if len(vector) != 4:
-            raise ValueError(f"observation vector must have length 4, got {len(vector)}")
-        return cls(
-            code_length=float(vector[0]),
-            complexity_score=float(vector[1]),
-            runtime_s=float(vector[2]),
-            error_flag=bool(vector[3]),
-        )
-    def to_vector(self) -> List[float]:
-        return [
-            float(self.code_length),
-            float(self.complexity_score),
-            float(self.runtime_s),
-            float(int(self.error_flag)),
-        ]
-class ActionModel(BaseModel):
-    action: int = Field(ge=0, le=4)
-    action_name: Optional[str] = None
-class RewardModel(BaseModel):
-    raw: float
-    normalized: float = Field(ge=0.0, le=1.0)
-    components: Dict[str, float]
-class HealthResponse(BaseModel):
-    status: str
-    env: str
-    version: str
-class CompatibilityHealthResponse(BaseModel):
-    status: str
-    service: str
-class ResetRequest(BaseModel):
-    task_id: Optional[str] = None
-    seed: Optional[int] = None
-    code: Optional[str] = None
-class StepRequest(BaseModel):
-    action: int = Field(ge=0, le=4)
-class GradeRequest(BaseModel):
-    code: str
-class TaskInfo(BaseModel):
-    id: str
-    name: str
-    description: str
-    difficulty: str
-    initial_code: str
-class TasksResponse(BaseModel):
-    tasks: List[TaskInfo]
-class GradeResponse(BaseModel):
-    task_id: str
-    score: float
-    passed: bool
-class StateResponse(BaseModel):
-    current_code: str
-    episode_steps: int
-    max_steps: int
-    complexity: float
-    last_runtime: float
-    last_error: bool
-    sample_id: Optional[str]
-    language: Optional[str]
-    task_id: Optional[str]
-    observation: ObservationModel
-    observation_vector: List[float]
-    action_meanings: Dict[int, str]
-class ResetResponse(BaseModel):
-    observation: ObservationModel
-    observation_vector: List[float]
-    info: Dict[str, Any]
-    task_id: Optional[str]
-    state: StateResponse
-class StepResponse(BaseModel):
-    action: ActionModel
-    observation: ObservationModel
-    observation_vector: List[float]
-    reward: RewardModel
-    done: bool
-    terminated: bool
-    truncated: bool
-    info: Dict[str, Any]
-    state: StateResponse
-class OptimizeRequest(BaseModel):
-    code: str
-    task_id: Optional[str] = None
-    max_steps: int = Field(default=5, ge=1, le=5)
-    use_rl: bool = True
-    use_llm: bool = False
-    fallback_to_llm: bool = True
-    rl_model_path: Optional[str] = None
-    api_base_url: Optional[str] = None
-    model_name: Optional[str] = None
-    api_token: Optional[str] = None
-class OptimizationStep(BaseModel):
-    step: int
-    action: int
-    action_name: str
-    reason: str
-    source: str
-    reward: float
-    normalized_reward: float
-    changed: bool
-    complexity: float
-class OptimizeResponse(BaseModel):
-    original_code: str
-    optimized_code: str
-    diff: str
-    steps: List[OptimizationStep]
-    cumulative_reward: float
-    task_id: Optional[str]
-    task_score: Optional[float]

ACRE_FINAL/openenv.yaml DELETED Viewed

@@ -1,85 +0,0 @@
-name: ACRE
-version: "1.0.0"
-description: >
-  Autonomous Code Refactoring Environment - an RL environment where an
-  agent improves Python code quality using AST-level transformations.
-author: "Nikhil Pratap Singh, Pranav Mangal, Ananya Gupta"
-entrypoint: "openenv_interface:OpenEnvRefactorEnv"
-tags:
-  - openenv
-tasks:
-  - id: rename_variables
-    name: "Rename Variables (Easy)"
-    description: "Rename generic variable names (x, tmp) to descriptive ones"
-    difficulty: easy
-    reward_range: [0.0, 1.0]
-    max_steps: 5
-  - id: remove_dead_code
-    name: "Remove Dead Code (Medium)"
-    description: "Remove unreachable statements, if-False blocks, and unused assignments"
-    difficulty: medium
-    reward_range: [0.0, 1.0]
-    max_steps: 5
-  - id: full_refactor
-    name: "Full Refactor (Hard)"
-    description: "Apply all transformations - rename, dead code removal, loop simplification, condition optimization, and function inlining"
-    difficulty: hard
-    reward_range: [0.0, 1.0]
-    max_steps: 5
-observation_space:
-  type: Box
-  shape: [4]
-  dtype: float32
-  low: [0.0, 0.0, 0.0, 0.0]
-  high: [inf, inf, inf, 1.0]
-  fields:
-    - code_length
-    - complexity_score
-    - runtime_s
-    - error_flag
-action_space:
-  type: Discrete
-  n: 5
-  actions:
-    0: rename_variable
-    1: remove_dead_code
-    2: simplify_loop
-    3: optimize_condition
-    4: inline_function
-api:
-  health: "GET /"
-  reset: "POST /reset"
-  step: "POST /step"
-  state: "GET /state"
-  tasks: "GET /tasks"
-  grade: "POST /tasks/{task_id}/grade"
-reward:
-  raw_range: [-32, 20]
-  normalized_range: [0.0, 1.0]
-  formula: "(raw + 32) / 52"
-  components:
-    success: { max: 10, min: -10 }
-    complexity: { max: 5, min: -5 }
-    performance: { max: 5, min: -2 }
-    error: { max: 0, min: -15 }
-    no_change: { max: 0, min: -2 }
-validation:
-  python_api:
-    reset: "ObservationModel"
-    step: "(ObservationModel, RewardModel, done, info)"
-    state: "StateResponse"
-  http_api:
-    health: "GET /"
-    reset: "POST /reset"
-    step: "POST /step"
-    state: "GET /state"
-    tasks: "GET /tasks"
-    grade: "POST /tasks/{task_id}/grade"

ACRE_FINAL/openenv_interface.py DELETED Viewed

@@ -1,116 +0,0 @@
-from __future__ import annotations
-from typing import Any, Dict, Optional, Tuple
-try:
-    from openenv.env import Env as OpenEnvBase
-except Exception:  # pragma: no cover
-    class OpenEnvBase:
-        def __init__(self, *args: Any, **kwargs: Any) -> None:
-            return None
-from acre.datasets.code_samples import CodeSample, CodeSampleDataset
-from acre.env.refactor_env import RefactorEnv
-from acre.tasks.task_registry import TaskRegistry
-from models import ActionModel, ObservationModel, RewardModel, StateResponse
-class OpenEnvRefactorEnv(OpenEnvBase):
-    """
-    Canonical OpenEnv interface for ACRE.
-    This wrapper keeps the strict hackathon contract:
-    - reset() -> ObservationModel
-    - step(action) -> (ObservationModel, RewardModel, done, info)
-    - state() -> StateResponse
-    """
-    def __init__(
-        self,
-        *,
-        env: Optional[RefactorEnv] = None,
-        registry: Optional[TaskRegistry] = None,
-    ) -> None:
-        super().__init__(
-            name="ACRE",
-            state_space="ObservationModel",
-            action_space="ActionModel",
-            episode_max_length=RefactorEnv.MAX_STEPS,
-        )
-        self._env = env or RefactorEnv()
-        self._registry = registry or TaskRegistry()
-        self._task_id: Optional[str] = None
-        self._last_reset_info: Dict[str, Any] = {}
-    @property
-    def action_meanings(self) -> Dict[int, str]:
-        return self._env.ACTION_MEANINGS
-    @property
-    def last_reset_info(self) -> Dict[str, Any]:
-        return dict(self._last_reset_info)
-    def _load_episode_source(self, *, task_id: Optional[str], code: Optional[str]) -> None:
-        initial_code = code
-        if initial_code is None and task_id:
-            task = self._registry.get_task(task_id)
-            if task is None:
-                raise ValueError(f"Task '{task_id}' not found")
-            initial_code = task.initial_code
-        if initial_code is None:
-            return None
-        self._env.dataset = CodeSampleDataset(
-            [
-                CodeSample(
-                    id=task_id or "custom",
-                    language="python",
-                    code=initial_code,
-                )
-            ]
-        )
-        return None
-    def reset(
-        self,
-        *,
-        seed: Optional[int] = None,
-        task_id: Optional[str] = None,
-        code: Optional[str] = None,
-    ) -> ObservationModel:
-        self._task_id = task_id
-        self._load_episode_source(task_id=task_id, code=code)
-        observation, info = self._env.reset(seed=seed)
-        self._last_reset_info = dict(info)
-        return ObservationModel.from_vector(observation.tolist())
-    def step(self, action: int | ActionModel) -> Tuple[ObservationModel, RewardModel, bool, Dict[str, Any]]:
-        action_value = action.action if isinstance(action, ActionModel) else int(action)
-        observation, raw_reward, terminated, truncated, info = self._env.step(action_value)
-        reward = RewardModel(
-            raw=float(raw_reward),
-            normalized=float(info.get("normalized_reward", 0.0)),
-            components=dict(info.get("reward_components", {})),
-        )
-        done = bool(terminated or truncated)
-        return ObservationModel.from_vector(observation.tolist()), reward, done, dict(info)
-    def state(self) -> StateResponse:
-        raw_state = self._env.state()
-        observation_vector = list(raw_state.get("observation", [0.0, 0.0, 0.0, 0.0]))
-        observation = ObservationModel.from_vector(observation_vector)
-        return StateResponse(
-            current_code=str(raw_state.get("current_code", "")),
-            episode_steps=int(raw_state.get("episode_steps", 0)),
-            max_steps=int(raw_state.get("max_steps", RefactorEnv.MAX_STEPS)),
-            complexity=float(raw_state.get("complexity", 0.0)),
-            last_runtime=float(raw_state.get("last_runtime", 0.0)),
-            last_error=bool(raw_state.get("last_error", False)),
-            sample_id=raw_state.get("sample_id"),
-            language=raw_state.get("language"),
-            task_id=self._task_id,
-            observation=observation,
-            observation_vector=observation.to_vector(),
-            action_meanings=dict(raw_state.get("action_meanings", {})),
-        )

ACRE_FINAL/requirements.txt DELETED Viewed

@@ -1,11 +0,0 @@
-fastapi>=0.109.0
-uvicorn[standard]>=0.27.0
-numpy>=1.26
-gymnasium
-stable-baselines3
-radon>=6.0.1
-openai>=1.0.0
-openenv>=0.1.13
-requests>=2.31.0
-pydantic>=2.0.0
-typing_extensions>=4.0.0

ACRE_FINAL/server.py DELETED Viewed

@@ -1,667 +0,0 @@
-"""
-ACRE OpenEnv HTTP server.
-Endpoints (all required by OpenEnv spec):
-  GET  /          — health check (must return HTTP 200)
-  POST /reset     — reset environment, returns observation + info
-  POST /step      — take one step, returns obs/reward/done/info
-  GET  /state     — full current state snapshot
-  GET  /tasks     — list all tasks with initial code
-  POST /tasks/{task_id}/grade  — grade code for a specific task
-"""
-from __future__ import annotations
-import difflib
-import os
-import re
-import json
-from typing import Optional
-import uvicorn
-import numpy as np
-from fastapi import FastAPI, HTTPException
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import HTMLResponse
-from openai import OpenAI
-try:
-    from stable_baselines3 import PPO
-except Exception:
-    PPO = None  # type: ignore[assignment]
-from acre.tasks.task_registry import TaskRegistry
-from models import (
-    ActionModel,
-    CompatibilityHealthResponse,
-    GradeRequest,
-    GradeResponse,
-    HealthResponse,
-    OptimizationStep,
-    OptimizeRequest,
-    OptimizeResponse,
-    ResetRequest,
-    ResetResponse,
-    StateResponse,
-    StepRequest,
-    StepResponse,
-    TaskInfo,
-    TasksResponse,
-)
-from openenv_interface import OpenEnvRefactorEnv
-DEFAULT_API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
-DEFAULT_MODEL_NAME = os.getenv("MODEL_NAME", "gpt-4o-mini")
-DEFAULT_RL_MODEL_PATH = os.getenv("RL_MODEL_PATH", "acre_agent.zip")
-# ---------------------------------------------------------------------------
-# App setup
-# ---------------------------------------------------------------------------
-app = FastAPI(
-    title="ACRE — Autonomous Code Refactoring Environment",
-    description="OpenEnv-compatible RL environment for Python code refactoring.",
-    version="1.0.0",
-)
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# Global singletons
-registry = TaskRegistry()
-_env: Optional[OpenEnvRefactorEnv] = None
-_rl_model_cache: dict[str, object] = {}
-def get_env() -> OpenEnvRefactorEnv:
-    global _env
-    if _env is None:
-        _env = OpenEnvRefactorEnv(registry=registry)
-    return _env
-def _state_response() -> StateResponse:
-    return get_env().state()
-def _choose_action_heuristic(code: str, task_id: Optional[str]) -> int:
-    has_generic = re.search(r"\b(x|tmp|i)\b", code) is not None
-    has_if_false = re.search(r"\bif\s+False\b", code) is not None
-    has_if_true = re.search(r"\bif\s+True\b", code) is not None
-    has_append_loop = ".append(" in code and "for " in code
-    has_double_not = "not not" in code
-    has_add_call = "add(" in code
-    if task_id == "rename_variables":
-        if has_generic:
-            return 0
-        if has_if_false or "unused" in code:
-            return 1
-        if has_append_loop:
-            return 2
-        if has_if_true or has_double_not:
-            return 3
-        return 4
-    if task_id == "remove_dead_code":
-        if has_if_false or "unused" in code:
-            return 1
-        if has_append_loop:
-            return 2
-        if has_if_true or has_double_not:
-            return 3
-        if has_generic:
-            return 0
-        return 4
-    if has_generic:
-        return 0
-    if has_append_loop:
-        return 2
-    if has_if_false or has_if_true or has_double_not:
-        return 3
-    if has_add_call:
-        return 4
-    return 1
-def _choose_action_llm(
-    *,
-    code: str,
-    task_id: Optional[str],
-    step_index: int,
-    max_steps: int,
-    api_base_url: str,
-    model_name: str,
-    api_token: str,
-) -> tuple[int, str, str]:
-    if not api_token.strip():
-        return _choose_action_heuristic(code, task_id), "empty token -> heuristic", "heuristic"
-    client = OpenAI(base_url=api_base_url, api_key=api_token)
-    messages = [
-        {
-            "role": "system",
-            "content": (
-                "You are a code-refactoring action selector. Return ONLY compact JSON: "
-                '{"action": <0-4>, "reason": "..."}.\n'
-                "Actions: 0=rename_variable,1=remove_dead_code,2=simplify_loop,3=optimize_condition,4=inline_function"
-            ),
-        },
-        {
-            "role": "user",
-            "content": (
-                f"task_id={task_id or 'auto'}\n"
-                f"step={step_index}/{max_steps}\n"
-                "Current code:\n"
-                f"```python\n{code}\n```"
-            ),
-        },
-    ]
-    try:
-        resp = client.chat.completions.create(
-            model=model_name,
-            messages=messages,
-            temperature=0.0,
-            max_tokens=120,
-        )
-        raw = (resp.choices[0].message.content or "").strip()
-        m = re.search(r"\{.*\}", raw, flags=re.DOTALL)
-        blob = m.group(0) if m else raw
-        parsed = json.loads(blob)
-        action = int(parsed.get("action", -1))
-        reason = str(parsed.get("reason", "llm-selected action"))
-        if 0 <= action <= 4:
-            return action, reason, "llm"
-    except Exception as exc:
-        return _choose_action_heuristic(code, task_id), f"llm error -> heuristic: {exc}", "heuristic"
-    return _choose_action_heuristic(code, task_id), "invalid llm output -> heuristic", "heuristic"
-def _choose_action_rl(observation: list[float], model_path: str) -> tuple[Optional[int], str, str]:
-    if PPO is None:
-        return None, "stable-baselines3 unavailable", "rl"
-    if not os.path.exists(model_path):
-        return None, f"rl model not found: {model_path}", "rl"
-    try:
-        model = _rl_model_cache.get(model_path)
-        if model is None:
-            model = PPO.load(model_path)
-            _rl_model_cache[model_path] = model
-        obs = np.asarray(observation, dtype=np.float32)
-        action, _ = model.predict(obs, deterministic=True)
-        action_i = int(action)
-        if 0 <= action_i <= 4:
-            return action_i, "rl policy action", "rl"
-        return None, f"invalid rl action: {action_i}", "rl"
-    except Exception as exc:
-        return None, f"rl failure: {exc}", "rl"
-def _demo_html() -> str:
-    return """<!doctype html>
-<html lang=\"en\">
-<head>
-    <meta charset=\"utf-8\" />
-    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />
-    <title>ACRE Refactor Demo</title>
-    <style>
-        @import url('https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;600;700&display=swap');
-        :root {
-            --bg0: #0b1f2a;
-            --bg1: #14344a;
-            --ink: #eaf7ff;
-            --muted: #a7c8db;
-            --brand: #1ec28b;
-            --warn: #ffcb47;
-            --panel: rgba(8, 24, 36, 0.72);
-            --stroke: rgba(140, 197, 225, 0.35);
-        }
-        * { box-sizing: border-box; }
-        body {
-            margin: 0;
-            color: var(--ink);
-            font-family: 'Space Grotesk', sans-serif;
-            background:
-                radial-gradient(circle at 12% 18%, rgba(30, 194, 139, 0.28), transparent 35%),
-                radial-gradient(circle at 88% 8%, rgba(255, 203, 71, 0.22), transparent 30%),
-                linear-gradient(150deg, var(--bg0), var(--bg1));
-            min-height: 100vh;
-        }
-        .wrap {
-            max-width: 1200px;
-            margin: 0 auto;
-            padding: 28px 20px 40px;
-        }
-        h1 {
-            margin: 0 0 6px;
-            font-size: clamp(1.6rem, 2vw + 1rem, 2.6rem);
-            letter-spacing: 0.2px;
-        }
-        .sub { margin: 0 0 20px; color: var(--muted); }
-        .grid {
-            display: grid;
-            grid-template-columns: 1fr;
-            gap: 16px;
-        }
-        .panel {
-            border: 1px solid var(--stroke);
-            border-radius: 14px;
-            background: var(--panel);
-            backdrop-filter: blur(4px);
-            padding: 14px;
-        }
-        .controls {
-            display: grid;
-            grid-template-columns: 1fr 1fr;
-            gap: 8px;
-            margin-bottom: 10px;
-        }
-        textarea, pre {
-            width: 100%;
-            min-height: 260px;
-            border: 1px solid var(--stroke);
-            border-radius: 10px;
-            padding: 12px;
-            background: rgba(1, 13, 24, 0.82);
-            color: #dcf4ff;
-            font-family: Consolas, 'Courier New', monospace;
-            font-size: 13px;
-            line-height: 1.4;
-            overflow: auto;
-            white-space: pre;
-        }
-        button, select {
-            border: 1px solid var(--stroke);
-            border-radius: 10px;
-            padding: 10px 12px;
-            background: rgba(11, 36, 52, 0.9);
-            color: var(--ink);
-            font-weight: 600;
-        }
-        button.primary {
-            background: linear-gradient(120deg, #19a7ff, #1ec28b);
-            color: #032235;
-            border: none;
-        }
-        .cols {
-            display: grid;
-            grid-template-columns: 1fr;
-            gap: 14px;
-        }
-        .meta {
-            color: var(--muted);
-            font-size: 0.92rem;
-            margin-top: 8px;
-        }
-        .badge {
-            color: #082b22;
-            background: var(--brand);
-            border-radius: 999px;
-            padding: 2px 9px;
-            font-size: 12px;
-            font-weight: 700;
-        }
-        .warn {
-            color: #2a1c00;
-            background: var(--warn);
-        }
-        @media (min-width: 900px) {
-            .cols { grid-template-columns: 1fr 1fr; }
-        }
-    </style>
-</head>
-<body>
-    <div class=\"wrap\">
-        <h1>ACRE Live Refactor Arena</h1>
-        <p class=\"sub\">Paste old code, run the agent, and compare before and after with a full diff and step-by-step rewards.</p>
-        <div class=\"panel\">
-            <div class=\"controls\">
-                <button onclick=\"loadExample(1)\">Load Example 1</button>
-                <button onclick=\"loadExample(2)\">Load Example 2</button>
-                <select id=\"task\">
-                    <option value=\"\">Auto strategy</option>
-                    <option value=\"rename_variables\">rename_variables</option>
-                    <option value=\"remove_dead_code\">remove_dead_code</option>
-                    <option value=\"full_refactor\">full_refactor</option>
-                </select>
-                <button class=\"primary\" onclick=\"runOptimize()\">Run Optimization</button>
-            </div>
-            <div class=\"controls\" style=\"margin-bottom: 10px;\">
-                <select id=\"mode\">
-                    <option value=\"rl_then_llm\">RL First -> LLM Fallback</option>
-                    <option value=\"heuristic\">Heuristic Agent (no API key)</option>
-                    <option value=\"llm\">LLM Agent (OpenAI-compatible API)</option>
-                </select>
-                <input id=\"rlModelPath\" placeholder=\"RL model path\" value=\"acre_agent.zip\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
-                <input id=\"baseUrl\" placeholder=\"API base URL (optional)\" value=\"https://api.openai.com/v1\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
-                <input id=\"modelName\" placeholder=\"Model name (optional)\" value=\"gpt-4o-mini\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
-                <input id=\"apiToken\" type=\"password\" placeholder=\"Paste API token here for LLM mode\" style=\"border:1px solid var(--stroke);border-radius:10px;padding:10px 12px;background:rgba(1,13,24,0.82);color:#dcf4ff;\" />
-            </div>
-            <div class=\"controls\" style=\"margin-bottom: 10px;\">
-                <label style=\"display:flex;align-items:center;gap:8px;padding:8px 10px;border:1px solid var(--stroke);border-radius:10px;\">
-                    <input id=\"autoSuggest\" type=\"checkbox\" />
-                    Auto suggest after typing pause
-                </label>
-            </div>
-            <textarea id=\"input\" spellcheck=\"false\" placeholder=\"Paste your Python code here...\"></textarea>
-            <p class=\"meta\" id=\"status\">Status: ready</p>
-        </div>
-        <div class=\"cols\" style=\"margin-top: 14px\">
-            <div class=\"panel\">
-                <h3>Original Code</h3>
-                <pre id=\"original\"></pre>
-            </div>
-            <div class=\"panel\">
-                <h3>Optimized Code</h3>
-                <pre id=\"optimized\"></pre>
-            </div>
-        </div>
-        <div class=\"panel\" style=\"margin-top: 14px\">
-            <h3>Diff</h3>
-            <pre id=\"diff\"></pre>
-        </div>
-        <div class=\"panel\" style=\"margin-top: 14px\">
-            <h3>Step Logs</h3>
-            <pre id=\"steps\"></pre>
-        </div>
-    </div>
-    <script>
-        const EX1 = `def compute(x, y, tmp):\n    tmp = x + y\n    x = tmp * 2\n    result = x\n    return result\n`;
-        const EX2 = `def add(p, q):\n    return p + q\n\ndef compute(x, data, tmp):\n    result = []\n    for item in data:\n        result.append(item * 2)\n    if False:\n        y = 999\n    if True:\n        val = add(x, tmp)\n    unused = 0\n    flag = not not True\n    return val\n    print(\"dead\")\n`;
-        let autoTimer = null;
-        function loadExample(i) {
-            document.getElementById('input').value = i === 1 ? EX1 : EX2;
-            document.getElementById('status').textContent = `Status: loaded example ${i}`;
-        }
-        async function runOptimize() {
-            const code = document.getElementById('input').value;
-            const task = document.getElementById('task').value || null;
-            const mode = document.getElementById('mode').value;
-            const useRl = mode === 'rl_then_llm';
-            const useLlm = mode === 'llm' || mode === 'rl_then_llm';
-            const fallbackToLlm = mode === 'rl_then_llm';
-            const rlModelPath = document.getElementById('rlModelPath').value || null;
-            const apiToken = document.getElementById('apiToken').value || null;
-            const apiBaseUrl = document.getElementById('baseUrl').value || null;
-            const modelName = document.getElementById('modelName').value || null;
-            if (!code.trim()) {
-                document.getElementById('status').innerHTML = 'Status: <span class=\"badge warn\">please paste code first</span>';
-                return;
-            }
-            if (mode === 'llm' && (!apiToken || !apiToken.trim())) {
-                document.getElementById('status').innerHTML = 'Status: <span class=\"badge warn\">paste API token for LLM mode</span>';
-                return;
-            }
-            document.getElementById('status').textContent = 'Status: running optimization...';
-            try {
-                const res = await fetch('/optimize', {
-                    method: 'POST',
-                    headers: {'Content-Type': 'application/json'},
-                    body: JSON.stringify({
-                        code,
-                        task_id: task,
-                        max_steps: 5,
-                        use_rl: useRl,
-                        use_llm: useLlm,
-                        fallback_to_llm: fallbackToLlm,
-                        rl_model_path: rlModelPath,
-                        api_base_url: apiBaseUrl,
-                        model_name: modelName,
-                        api_token: apiToken,
-                    })
-                });
-                const data = await res.json();
-                if (!res.ok) {
-                    throw new Error(data.detail || 'request failed');
-                }
-                document.getElementById('original').textContent = data.original_code;
-                document.getElementById('optimized').textContent = data.optimized_code;
-                document.getElementById('diff').textContent = data.diff || '(no diff)';
-                document.getElementById('steps').textContent = JSON.stringify(data.steps, null, 2);
-                const scoreText = data.task_score === null ? 'n/a' : data.task_score;
-                document.getElementById('status').innerHTML = `Status: <span class=\"badge\">done</span> cumulative_reward=${data.cumulative_reward.toFixed(2)} task_score=${scoreText}`;
-            } catch (err) {
-                document.getElementById('status').innerHTML = `Status: <span class=\"badge warn\">error</span> ${err.message}`;
-            }
-        }
-        loadExample(1);
-        document.getElementById('input').addEventListener('input', () => {
-            if (!document.getElementById('autoSuggest').checked) {
-                return;
-            }
-            if (autoTimer) {
-                clearTimeout(autoTimer);
-            }
-            autoTimer = setTimeout(() => {
-                runOptimize();
-            }, 1200);
-        });
-    </script>
-</body>
-</html>"""
-# ---------------------------------------------------------------------------
-# Routes
-# ---------------------------------------------------------------------------
-@app.get("/", response_model=HealthResponse)
-def health() -> HealthResponse:
-    """Health check — OpenEnv pings this URL to verify the Space is live."""
-    return HealthResponse(status="ok", env="ACRE", version="1.0.0")
-@app.get("/health", response_model=CompatibilityHealthResponse)
-def health_compat() -> CompatibilityHealthResponse:
-    """Compatibility health route used by some OpenEnv reference environments."""
-    return CompatibilityHealthResponse(status="healthy", service="acre-env")
-@app.get("/demo", response_class=HTMLResponse)
-def demo_ui() -> HTMLResponse:
-    """Simple UI to compare original and optimized code side-by-side."""
-    return HTMLResponse(content=_demo_html())
-@app.post("/reset", response_model=ResetResponse)
-def reset(req: ResetRequest = ResetRequest()) -> ResetResponse:
-    """Reset the environment. Optionally load a task's initial code."""
-    env = get_env()
-    try:
-        obs = env.reset(seed=req.seed, task_id=req.task_id, code=req.code)
-    except ValueError as exc:
-        raise HTTPException(status_code=404, detail=str(exc)) from exc
-    return ResetResponse(
-        observation=obs,
-        observation_vector=obs.to_vector(),
-        info=env.last_reset_info,
-        task_id=req.task_id,
-        state=_state_response(),
-    )
-@app.post("/step", response_model=StepResponse)
-def step(req: StepRequest) -> StepResponse:
-    """Take one refactoring step."""
-    env = get_env()
-    if not (0 <= req.action <= 4):
-        raise HTTPException(status_code=400, detail="action must be 0–4")
-    obs, reward, done, info = env.step(req.action)
-    action_name = str(info.get("action_name", env.action_meanings.get(req.action, "unknown")))
-    return StepResponse(
-        action=ActionModel(action=req.action, action_name=action_name),
-        observation=obs,
-        observation_vector=obs.to_vector(),
-        reward=reward,
-        done=done,
-        terminated=done,
-        truncated=False,
-        info=info,
-        state=_state_response(),
-    )
-@app.get("/state", response_model=StateResponse)
-def state() -> StateResponse:
-    """Return full current environment state (OpenEnv spec requirement)."""
-    return _state_response()
-@app.get("/tasks", response_model=TasksResponse)
-def list_tasks() -> TasksResponse:
-    """Enumerate all tasks (easy → medium → hard)."""
-    return TasksResponse(tasks=[TaskInfo.model_validate(t) for t in registry.list_tasks()])
-@app.post("/tasks/{task_id}/grade", response_model=GradeResponse)
-def grade(task_id: str, req: GradeRequest) -> GradeResponse:
-    """Grade submitted code against a task's grader (returns score 0.0–1.0)."""
-    task = registry.get_task(task_id)
-    if task is None:
-        raise HTTPException(status_code=404, detail=f"Task '{task_id}' not found")
-    score = task.grade(req.code)
-    return GradeResponse(
-        task_id=task_id,
-        score=round(score, 4),
-        passed=score >= 0.8,
-    )
-@app.post("/optimize", response_model=OptimizeResponse)
-def optimize(req: OptimizeRequest) -> OptimizeResponse:
-    """Run a full optimization episode and return code comparison artifacts."""
-    code = req.code.strip("\n")
-    if not code.strip():
-        raise HTTPException(status_code=400, detail="code must be non-empty")
-    env = get_env()
-    try:
-        env.reset(task_id=req.task_id, code=code)
-    except ValueError as exc:
-        raise HTTPException(status_code=404, detail=str(exc)) from exc
-    steps: list[OptimizationStep] = []
-    cumulative_reward = 0.0
-    for step_idx in range(1, req.max_steps + 1):
-        state_now = env.state()
-        current_code = state_now.current_code
-        obs_list = [float(x) for x in state_now.observation_vector]
-        action: int
-        reason: str
-        source: str
-        if req.use_rl:
-            rl_action, rl_reason, rl_source = _choose_action_rl(
-                observation=obs_list,
-                model_path=req.rl_model_path or DEFAULT_RL_MODEL_PATH,
-            )
-            if rl_action is not None:
-                action, reason, source = rl_action, rl_reason, rl_source
-            elif req.fallback_to_llm and req.use_llm:
-                action, reason, source = _choose_action_llm(
-                    code=current_code,
-                    task_id=req.task_id,
-                    step_index=step_idx,
-                    max_steps=req.max_steps,
-                    api_base_url=req.api_base_url or DEFAULT_API_BASE_URL,
-                    model_name=req.model_name or DEFAULT_MODEL_NAME,
-                    api_token=req.api_token or "",
-                )
-                reason = f"{rl_reason}; {reason}"
-            else:
-                action = _choose_action_heuristic(current_code, req.task_id)
-                reason = f"{rl_reason}; heuristic fallback"
-                source = "heuristic"
-        elif req.use_llm:
-            action, reason, source = _choose_action_llm(
-                code=current_code,
-                task_id=req.task_id,
-                step_index=step_idx,
-                max_steps=req.max_steps,
-                api_base_url=req.api_base_url or DEFAULT_API_BASE_URL,
-                model_name=req.model_name or DEFAULT_MODEL_NAME,
-                api_token=req.api_token or "",
-            )
-        else:
-            action = _choose_action_heuristic(current_code, req.task_id)
-            reason = "heuristic policy"
-            source = "heuristic"
-        _, reward, done, info = env.step(action)
-        state_now = env.state()
-        cumulative_reward += float(reward.raw)
-        steps.append(
-            OptimizationStep(
-                step=step_idx,
-                action=action,
-                action_name=info.get("action_name", "unknown"),
-                reason=reason,
-                source=source,
-                reward=float(reward.raw),
-                normalized_reward=float(reward.normalized),
-                changed=bool(info.get("changed", False)),
-                complexity=float(state_now.complexity),
-            )
-        )
-        if done:
-            break
-    final_code = str(env.state().current_code)
-    diff_lines = difflib.unified_diff(
-        code.splitlines(),
-        final_code.splitlines(),
-        fromfile="original.py",
-        tofile="optimized.py",
-        lineterm="",
-    )
-    diff_text = "\n".join(diff_lines)
-    task_score: Optional[float] = None
-    if req.task_id:
-        task = registry.get_task(req.task_id)
-        if task is None:
-            raise HTTPException(status_code=404, detail=f"Task '{req.task_id}' not found")
-        task_score = round(task.grade(final_code), 4)
-    return OptimizeResponse(
-        original_code=code,
-        optimized_code=final_code,
-        diff=diff_text,
-        steps=steps,
-        cumulative_reward=round(cumulative_reward, 4),
-        task_id=req.task_id,
-        task_score=task_score,
-    )
-# ---------------------------------------------------------------------------
-# Entry point
-# ---------------------------------------------------------------------------
-if __name__ == "__main__":
-    port = int(os.getenv("PORT", 7860))
-    uvicorn.run(app, host="0.0.0.0", port=port)

ACRE_FINAL/validate.py DELETED Viewed

@@ -1,281 +0,0 @@
-"""
-ACRE pre-submission validator.
-Checks the repository against the submission checklist and, when a server URL is
-available, probes the HTTP API as well.
-Run:
-    python validate.py --url http://localhost:7860
-"""
-from __future__ import annotations
-import argparse
-import ast
-import re
-import sys
-from typing import Any, Tuple
-try:
-    import requests
-except ImportError:
-    print("[ERROR] requests is required. Run: pip install requests")
-    sys.exit(1)
-PASS = "\033[92m[PASS]\033[0m"
-FAIL = "\033[91m[FAIL]\033[0m"
-def check(label: str, ok: bool, detail: str = "") -> bool:
-    status = PASS if ok else FAIL
-    message = f"  {status}  {label}"
-    if detail:
-        message += f" - {detail}"
-    print(message)
-    return ok
-def get(url: str, path: str, timeout: int = 15) -> Tuple[bool, Any]:
-    try:
-        response = requests.get(f"{url}{path}", timeout=timeout)
-        response.raise_for_status()
-        return True, response.json()
-    except Exception as exc:
-        return False, str(exc)
-def post(url: str, path: str, payload: dict, timeout: int = 15) -> Tuple[bool, Any]:
-    try:
-        response = requests.post(f"{url}{path}", json=payload, timeout=timeout)
-        response.raise_for_status()
-        return True, response.json()
-    except Exception as exc:
-        return False, str(exc)
-def read_text(path: str) -> str:
-    with open(path, encoding="utf-8") as handle:
-        return handle.read()
-def run_validation(base_url: str) -> int:
-    failures = 0
-    print("\n" + "=" * 60)
-    print("  ACRE Pre-Submission Validator")
-    print("=" * 60)
-    print(f"  Target: {base_url}\n")
-    print("1. Static repository checks")
-    try:
-        interface_src = read_text("openenv_interface.py")
-        tree = ast.parse(interface_src)
-        classes = {node.name: node for node in tree.body if isinstance(node, ast.ClassDef)}
-        env_cls = classes.get("OpenEnvRefactorEnv")
-        failures += 0 if check("openenv_interface.py exists", True) else 1
-        failures += 0 if check("OpenEnvRefactorEnv is defined", env_cls is not None) else 1
-        if env_cls is not None:
-            methods = {node.name for node in env_cls.body if isinstance(node, ast.FunctionDef)}
-            for method_name in ["reset", "step", "state"]:
-                failures += 0 if check(
-                    f"OpenEnvRefactorEnv implements {method_name}()",
-                    method_name in methods,
-                ) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("openenv_interface.py exists", False, "file not found")
-    try:
-        models_src = read_text("models.py")
-        for name in ["ObservationModel", "ActionModel", "RewardModel"]:
-            failures += 0 if check(
-                f"{name} is defined in models.py",
-                f"class {name}" in models_src,
-            ) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("models.py exists", False, "file not found")
-    print("\n2. Health check (GET /)")
-    ok, data = get(base_url, "/")
-    failures += 0 if check("GET / returns HTTP 200", ok) else 1
-    if ok:
-        failures += 0 if check(
-            "Response has status field",
-            isinstance(data, dict) and "status" in data,
-            str(data),
-        ) else 1
-    print("\n3. Tasks (GET /tasks)")
-    ok, data = get(base_url, "/tasks")
-    failures += 0 if check("GET /tasks returns 200", ok) else 1
-    if ok:
-        tasks = data.get("tasks", []) if isinstance(data, dict) else []
-        failures += 0 if check("At least 3 tasks defined", len(tasks) >= 3, f"found {len(tasks)}") else 1
-        difficulties = [t.get("difficulty", "") for t in tasks]
-        for diff in ["easy", "medium", "hard"]:
-            failures += 0 if check(f"Task with difficulty '{diff}' exists", diff in difficulties) else 1
-        for task in tasks:
-            failures += 0 if check(
-                f"Task '{task.get('id')}' has initial_code",
-                bool(task.get("initial_code")),
-            ) else 1
-    print("\n4. Reset (POST /reset)")
-    ok, data = post(base_url, "/reset", {})
-    failures += 0 if check("POST /reset returns 200", ok) else 1
-    if ok:
-        observation = data.get("observation", {})
-        failures += 0 if check("Response has observation field", isinstance(observation, dict)) else 1
-        failures += 0 if check(
-            "Observation is typed with 4 fields",
-            {"code_length", "complexity_score", "runtime_s", "error_flag"}.issubset(observation),
-            str(observation),
-        ) else 1
-    ok, _ = post(base_url, "/reset", {"task_id": "rename_variables"})
-    failures += 0 if check("POST /reset with task_id works", ok) else 1
-    print("\n5. State (GET /state)")
-    ok, data = get(base_url, "/state")
-    failures += 0 if check("GET /state returns 200", ok) else 1
-    if ok:
-        required_keys = [
-            "current_code",
-            "episode_steps",
-            "max_steps",
-            "complexity",
-            "observation",
-            "observation_vector",
-            "action_meanings",
-        ]
-        for key in required_keys:
-            failures += 0 if check(f"State has '{key}' field", key in data) else 1
-    print("\n6. Step (POST /step)")
-    post(base_url, "/reset", {"task_id": "rename_variables"})
-    for action in range(5):
-        ok, data = post(base_url, "/step", {"action": action})
-        failures += 0 if check(
-            f"Action {action} executes without error",
-            ok and isinstance(data, dict) and "reward" in data and "done" in data,
-        ) else 1
-        if ok:
-            reward_payload = data.get("reward", {})
-            norm = reward_payload.get("normalized", -1)
-            failures += 0 if check(
-                f"Action {action} returns typed reward payload",
-                {"raw", "normalized", "components"}.issubset(reward_payload),
-                str(reward_payload),
-            ) else 1
-            failures += 0 if check(
-                f"Action {action} normalized_reward in [0,1]",
-                isinstance(norm, (int, float)) and 0.0 <= float(norm) <= 1.0,
-                f"got {norm}",
-            ) else 1
-            if data.get("done"):
-                break
-    ok, data = post(base_url, "/step", {"action": 99})
-    check("Invalid action returns error (not crash)", not ok or "detail" in str(data), "(expected 4xx)")
-    print("\n7. Task graders (POST /tasks/{id}/grade)")
-    for task_id in ["rename_variables", "remove_dead_code", "full_refactor"]:
-        ok, data = post(base_url, f"/tasks/{task_id}/grade", {"code": "def f(): pass"})
-        failures += 0 if check(f"Grade endpoint for '{task_id}' works", ok) else 1
-        if ok:
-            score = data.get("score", -1)
-            failures += 0 if check(
-                f"Score for '{task_id}' in [0.0, 1.0]",
-                isinstance(score, (int, float)) and 0.0 <= float(score) <= 1.0,
-                f"got {score}",
-            ) else 1
-    print("\n8. openenv.yaml")
-    try:
-        openenv_yaml = read_text("openenv.yaml")
-        failures += 0 if check("openenv.yaml exists", True) else 1
-        for field in ["tasks:", "action_space:", "observation_space:", "reward:", "entrypoint:", "validation:"]:
-            failures += 0 if check(f"openenv.yaml has '{field}' section", field in openenv_yaml) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("openenv.yaml exists", False, "file not found")
-    print("\n9. inference.py")
-    try:
-        inference_src = read_text("inference.py")
-        failures += 0 if check("inference.py exists", True) else 1
-        for marker in ['"event": "START"', '"event": "STEP"', '"event": "END"']:
-            failures += 0 if check(f"inference.py emits {marker}", marker in inference_src) else 1
-        failures += 0 if check(
-            "Uses OpenAI client",
-            "from openai import OpenAI" in inference_src,
-        ) else 1
-        for var in ["API_BASE_URL", "MODEL_NAME", "HF_TOKEN", "ENV_URL", "LOCAL_IMAGE_NAME"]:
-            failures += 0 if check(f"inference.py reads {var} from env", var in inference_src) else 1
-        failures += 0 if check(
-            "API_BASE_URL has a default",
-            'os.getenv("API_BASE_URL", "https://api.openai.com/v1")' in inference_src,
-        ) else 1
-        failures += 0 if check(
-            "MODEL_NAME has a default",
-            'os.getenv("MODEL_NAME", "gpt-4o-mini")' in inference_src,
-        ) else 1
-        failures += 0 if check(
-            "HF_TOKEN has no default",
-            re.search(r'HF_TOKEN\s*:\s*.*os\.getenv\("HF_TOKEN"\)', inference_src) is not None,
-        ) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("inference.py exists", False, "file not found")
-    print("\n10. Dockerfile")
-    try:
-        dockerfile = read_text("Dockerfile")
-        failures += 0 if check("Dockerfile exists", True) else 1
-        failures += 0 if check("Exposes port 7860", "7860" in dockerfile) else 1
-        failures += 0 if check("Has CMD/ENTRYPOINT", "CMD" in dockerfile or "ENTRYPOINT" in dockerfile) else 1
-        failures += 0 if check("Does not set a default HF_TOKEN", "ENV HF_TOKEN" not in dockerfile) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("Dockerfile exists", False, "file not found")
-    print("\n11. README / Hugging Face metadata")
-    try:
-        readme = read_text("README.md")
-        failures += 0 if check("README has docker SDK front matter", "sdk: docker" in readme) else 1
-        failures += 0 if check("README includes openenv tag", "openenv" in readme) else 1
-        for section in [
-            "Environment Overview and Motivation",
-            "Definitions of Action and Observation Spaces",
-            "Task Descriptions with Expected Difficulty Levels",
-            "Setup and Usage Instructions",
-            "Baseline Performance Scores",
-        ]:
-            failures += 0 if check(f"README includes '{section}'", section in readme) else 1
-    except FileNotFoundError:
-        failures += 1
-        check("README.md exists", False, "file not found")
-    print("\n" + "=" * 60)
-    if failures == 0:
-        print(f"  {PASS}  All checks passed. Repository is submission-ready.")
-    else:
-        print(f"  {FAIL}  {failures} check(s) failed. Fix before submitting.")
-    print("=" * 60 + "\n")
-    return failures
-def main() -> None:
-    parser = argparse.ArgumentParser(description="ACRE pre-submission validator")
-    parser.add_argument(
-        "--url",
-        default="http://localhost:7860",
-        help="Base URL of the running ACRE server",
-    )
-    args = parser.parse_args()
-    sys.exit(run_validation(args.url))
-if __name__ == "__main__":
-    main()

README.md CHANGED Viewed

@@ -167,9 +167,9 @@ The deterministic fallback policy used by `inference.py` produces the following
 | Task | Score |
 |---|---|
-| `rename_variables` | 1.0 |
-| `remove_dead_code` | 1.0 |
-| `full_refactor` | 1.0 |
-| Average | 1.0 |
 These scores come from the built-in heuristic policy with `HF_TOKEN` unset, which keeps the baseline reproducible across runs.

 | Task | Score |
 |---|---|
+| `rename_variables` | 1.0000 |
+| `remove_dead_code` | 0.2500 |
+| `full_refactor` | 0.7143 |
+| Average | 0.6548 |
 These scores come from the built-in heuristic policy with `HF_TOKEN` unset, which keeps the baseline reproducible across runs.

acre/tasks/task_registry.py CHANGED Viewed

@@ -5,7 +5,7 @@ from __future__ import annotations
 import ast
 from dataclasses import dataclass
-from typing import Callable, Dict, List, Optional
 @dataclass
@@ -14,9 +14,13 @@ class Task:
     name: str
     description: str
     difficulty: str
-    initial_code: str
     _grade_fn: Callable[[str], float]
     def grade(self, code: str) -> float:
         """Return a score in [0.0, 1.0]."""
         try:
@@ -25,21 +29,90 @@ class Task:
             return 0.0
 # ---------------------------------------------------------------------------
 # Task 1 — Easy: Rename generic variables
 # ---------------------------------------------------------------------------
-_EASY_CODE = """\
 def compute(x, y, tmp):
     tmp = x + y
     x = tmp * 2
     result = x
     return result
-"""
 def _grade_easy(code: str) -> float:
-    """Score = fraction of generic names (x, tmp) removed from all scopes."""
-    generic = {"x", "tmp"}
     try:
         tree = ast.parse(code)
     except SyntaxError:
@@ -66,7 +139,8 @@ def _grade_easy(code: str) -> float:
 # ---------------------------------------------------------------------------
 # Task 2 — Medium: Remove dead code
 # ---------------------------------------------------------------------------
-_MEDIUM_CODE = """\
 def process(data):
     result = []
     for item in data:
@@ -76,31 +150,74 @@ def process(data):
     unused_var = 42
     return result
     print("unreachable")
-"""
 def _grade_medium(code: str) -> float:
-    """Score = fraction of dead-code patterns eliminated (3 checks, ~0.33 each)."""
     try:
         tree = ast.parse(code)
     except SyntaxError:
         return 0.0
-    source = ast.unparse(tree)
     score = 0.0
-    # Check 1: if-False block removed
-    if "if False" not in source:
-        score += 1 / 3
-    # Check 2: unused_var assignment removed
-    if "unused_var" not in source:
-        score += 1 / 3
     # Check 3: list comprehension used (loop simplified)
     has_listcomp = any(isinstance(n, ast.ListComp) for n in ast.walk(tree))
     if has_listcomp:
-        score += 1 / 3
     return score
@@ -108,7 +225,8 @@ def _grade_medium(code: str) -> float:
 # ---------------------------------------------------------------------------
 # Task 3 — Hard: Full refactor
 # ---------------------------------------------------------------------------
-_HARD_CODE = """\
 def add(p, q):
     return p + q
@@ -124,34 +242,89 @@ def compute(x, data, tmp):
     flag = not not True
     return val
     print("dead")
-"""
 def _grade_hard(code: str) -> float:
-    """Score = fraction of 5 quality checks passed."""
     try:
         tree = ast.parse(code)
     except SyntaxError:
         return 0.0
-    source = ast.unparse(tree)
     checks = 0
-    # 1. No generic variable names x/tmp in function signature or body
     has_generic = False
     class _GenCheck(ast.NodeVisitor):
         def visit_arg(self, node: ast.arg) -> None:
             nonlocal has_generic
-            if node.arg in {"x", "tmp"}:
                 has_generic = True
     _GenCheck().visit(tree)
     if not has_generic:
         checks += 1
-    # 2. No if False block
-    if "if False" not in source:
         checks += 1
     # 3. if True removed (body inlined)
@@ -162,13 +335,21 @@ def _grade_hard(code: str) -> float:
     if any(isinstance(n, ast.ListComp) for n in ast.walk(tree)):
         checks += 1
-    # 5. add() call inlined (no call to 'add')
     calls = [n for n in ast.walk(tree) if isinstance(n, ast.Call)]
     fn_names = {c.func.id for c in calls if isinstance(c.func, ast.Name)}
-    if "add" not in fn_names:
         checks += 1
-    return checks / 5
 # ---------------------------------------------------------------------------
@@ -186,7 +367,7 @@ class TaskRegistry:
             name="Rename Variables (Easy)",
             description="Rename generic variable names (x, tmp) to descriptive ones",
             difficulty="easy",
-            initial_code=_EASY_CODE,
             _grade_fn=_grade_easy,
         )
         self._tasks["remove_dead_code"] = Task(
@@ -194,7 +375,7 @@ class TaskRegistry:
             name="Remove Dead Code (Medium)",
             description="Remove unreachable code, if False blocks, and unused variables",
             difficulty="medium",
-            initial_code=_MEDIUM_CODE,
             _grade_fn=_grade_medium,
         )
         self._tasks["full_refactor"] = Task(
@@ -202,7 +383,7 @@ class TaskRegistry:
             name="Full Refactor (Hard)",
             description="Apply all transformations: rename, dead code, loops, conditions, inlining",
             difficulty="hard",
-            initial_code=_HARD_CODE,
             _grade_fn=_grade_hard,
         )

 import ast
 from dataclasses import dataclass
+from typing import Callable, Dict, List, Optional, Sequence
 @dataclass
     name: str
     description: str
     difficulty: str
+    samples: List[str]
     _grade_fn: Callable[[str], float]
+    @property
+    def initial_code(self) -> str:
+        return str(self.samples[0]) if self.samples else ""
     def grade(self, code: str) -> float:
         """Return a score in [0.0, 1.0]."""
         try:
             return 0.0
+def _safe_unparse(tree: ast.AST) -> str:
+    try:
+        return ast.unparse(tree)
+    except Exception:
+        return ""
+def _has_unreachable_after_terminator(stmts: Sequence[ast.stmt]) -> bool:
+    unreachable = False
+    for s in stmts:
+        if unreachable:
+            # ignore empty docstrings as "unreachable" noise
+            if isinstance(s, ast.Expr) and isinstance(s.value, ast.Constant) and isinstance(s.value.value, str):
+                continue
+            return True
+        if isinstance(s, (ast.Return, ast.Raise)):
+            unreachable = True
+    return False
+def _tree_has_unreachable(tree: ast.AST) -> bool:
+    class _Scan(ast.NodeVisitor):
+        def __init__(self) -> None:
+            self.bad = False
+        def visit_FunctionDef(self, node: ast.FunctionDef) -> None:  # noqa: N802
+            if _has_unreachable_after_terminator(node.body):
+                self.bad = True
+            self.generic_visit(node)
+        def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:  # noqa: N802
+            if _has_unreachable_after_terminator(node.body):
+                self.bad = True
+            self.generic_visit(node)
+    s = _Scan()
+    s.visit(tree)
+    return bool(s.bad)
 # ---------------------------------------------------------------------------
 # Task 1 — Easy: Rename generic variables
 # ---------------------------------------------------------------------------
+_EASY_SAMPLES: List[str] = [
+    """\
 def compute(x, y, tmp):
     tmp = x + y
     x = tmp * 2
     result = x
     return result
+""",
+    """\
+def normalize(tmp, x):
+    for i in range(3):
+        tmp = tmp + i
+    return tmp * x
+""",
+    """\
+def score(items):
+    tmp = 0
+    for i in items:
+        tmp += i
+    x = tmp
+    return x
+""",
+    """\
+def transform(x):
+    tmp = x
+    if tmp > 10:
+        tmp = tmp - 1
+    return tmp
+""",
+    """\
+def merge(a, b):
+    x = a
+    tmp = b
+    return x + tmp
+""",
+]
 def _grade_easy(code: str) -> float:
+    """Score = fraction of generic names removed from all scopes."""
+    generic = {"x", "tmp", "i"}
     try:
         tree = ast.parse(code)
     except SyntaxError:
 # ---------------------------------------------------------------------------
 # Task 2 — Medium: Remove dead code
 # ---------------------------------------------------------------------------
+_MEDIUM_SAMPLES: List[str] = [
+    """\
 def process(data):
     result = []
     for item in data:
     unused_var = 42
     return result
     print("unreachable")
+""",
+    """\
+def build(values):
+    out = []
+    for v in values:
+        out.append(v + 1)
+    while False:
+        out.append(999)
+    dead = 0
+    return out
+    dead += 1
+""",
+    """\
+def route(flag):
+    if False:
+        return 1
+    if True:
+        x = 2
+    y = x
+    return y
+""",
+    """\
+def clean(xs):
+    res = []
+    for x in xs:
+        res.append(x * 2)
+    unused = "remove me"
+    if False:
+        unused2 = 123
+    return res
+""",
+    """\
+def calc(n):
+    total = 0
+    for i in range(n):
+        total += i
+    return total
+    print("dead")
+""",
+]
 def _grade_medium(code: str) -> float:
+    """Score = fraction of dead-code patterns eliminated (4 checks, 0.25 each)."""
     try:
         tree = ast.parse(code)
     except SyntaxError:
         return 0.0
+    source = _safe_unparse(tree)
     score = 0.0
+    # Check 1: if/while-False removed
+    if ("if False" not in source) and ("while False" not in source):
+        score += 0.25
+    # Check 2: no unreachable statements after return/raise
+    if not _tree_has_unreachable(tree):
+        score += 0.25
     # Check 3: list comprehension used (loop simplified)
     has_listcomp = any(isinstance(n, ast.ListComp) for n in ast.walk(tree))
     if has_listcomp:
+        score += 0.25
+    # Check 4: obvious dead/unused sentinel names removed
+    if all(name not in source for name in ["unused_var", "unused", "dead", "unused2"]):
+        score += 0.25
     return score
 # ---------------------------------------------------------------------------
 # Task 3 — Hard: Full refactor
 # ---------------------------------------------------------------------------
+_HARD_SAMPLES: List[str] = [
+    """\
 def add(p, q):
     return p + q
     flag = not not True
     return val
     print("dead")
+""",
+    """\
+def helper(a, b):
+    return a + b
+def pipeline(tmp, xs, x):
+    out = []
+    for i in xs:
+        out.append(i * 2)
+    if True:
+        y = helper(tmp, x)
+    if False:
+        y = 0
+    return y
+    y = 123
+""",
+    """\
+def add(p, q):
+    return p + q
+def compute(x, data, tmp):
+    result = []
+    for item in data:
+        result.append(item * 2)
+    if False:
+        print("never")
+    val = add(x, tmp)
+    return val
+""",
+    """\
+def add(p, q):
+    return p + q
+def compute(x, data, tmp):
+    res = []
+    for item in data:
+        res.append(item * 2)
+    flag = not not True
+    if True:
+        return add(x, tmp)
+""",
+    """\
+def plus(p, q):
+    return p + q
+def compute(tmp, data, x):
+    out = []
+    for item in data:
+        out.append(item * 2)
+    if False:
+        tmp = 999
+    if True:
+        val = plus(x, tmp)
+    return val
+""",
+]
 def _grade_hard(code: str) -> float:
+    """Score = fraction of 7 quality checks passed."""
     try:
         tree = ast.parse(code)
     except SyntaxError:
         return 0.0
+    source = _safe_unparse(tree)
     checks = 0
+    # 1. No generic variable names x/tmp/i in function signature
     has_generic = False
     class _GenCheck(ast.NodeVisitor):
         def visit_arg(self, node: ast.arg) -> None:
             nonlocal has_generic
+            if node.arg in {"x", "tmp", "i"}:
                 has_generic = True
     _GenCheck().visit(tree)
     if not has_generic:
         checks += 1
+    # 2. No if/while False block
+    if ("if False" not in source) and ("while False" not in source):
         checks += 1
     # 3. if True removed (body inlined)
     if any(isinstance(n, ast.ListComp) for n in ast.walk(tree)):
         checks += 1
+    # 5. helper calls inlined (no call sites remain)
     calls = [n for n in ast.walk(tree) if isinstance(n, ast.Call)]
     fn_names = {c.func.id for c in calls if isinstance(c.func, ast.Name)}
+    if not ({"add", "plus", "helper"} & fn_names):
+        checks += 1
+    # 6. no unreachable after return/raise
+    if not _tree_has_unreachable(tree):
+        checks += 1
+    # 7. remove double-not
+    if "not not" not in source:
         checks += 1
+    return checks / 7
 # ---------------------------------------------------------------------------
             name="Rename Variables (Easy)",
             description="Rename generic variable names (x, tmp) to descriptive ones",
             difficulty="easy",
+            samples=_EASY_SAMPLES,
             _grade_fn=_grade_easy,
         )
         self._tasks["remove_dead_code"] = Task(
             name="Remove Dead Code (Medium)",
             description="Remove unreachable code, if False blocks, and unused variables",
             difficulty="medium",
+            samples=_MEDIUM_SAMPLES,
             _grade_fn=_grade_medium,
         )
         self._tasks["full_refactor"] = Task(
             name="Full Refactor (Hard)",
             description="Apply all transformations: rename, dead code, loops, conditions, inlining",
             difficulty="hard",
+            samples=_HARD_SAMPLES,
             _grade_fn=_grade_hard,
         )

inference.py CHANGED Viewed

@@ -1,17 +1,18 @@
 """
 ACRE inference script for OpenEnv submission evaluation.
-Required environment variables:
-  API_BASE_URL: LLM API endpoint (default allowed)
-  MODEL_NAME: model identifier (default allowed)
-  HF_TOKEN: API token for the OpenAI-compatible endpoint
-  ENV_URL: running ACRE server base URL
-Optional:
-  LOCAL_IMAGE_NAME: present for evaluator compatibility when using a local
-  Docker image launcher.
-Stdout format uses strict START / STEP / END event markers.
 """
 from __future__ import annotations
@@ -20,7 +21,7 @@ import os
 import re
 import sys
 import time
-from typing import Dict, List, Tuple
 import requests
 from openai import OpenAI
@@ -95,7 +96,7 @@ def grade(task_id: str, code: str) -> float:
     return float(response.json().get("score", 0.0))
-def choose_action(client: OpenAI, state: dict, task_id: str) -> Tuple[int, str]:
     def heuristic_action() -> Tuple[int, str]:
         code = str(state.get("current_code", ""))
         step_i = int(state.get("episode_steps", 0))
@@ -141,7 +142,8 @@ def choose_action(client: OpenAI, state: dict, task_id: str) -> Tuple[int, str]:
             return 1, "heuristic: remove remaining dead code"
         return 3, "heuristic: condition optimization as safe default"
-    if not HF_TOKEN:
         return heuristic_action()
     messages = [
@@ -184,23 +186,12 @@ def choose_action(client: OpenAI, state: dict, task_id: str) -> Tuple[int, str]:
         return heuristic_action()
-def run_episode(client: OpenAI, task_id: str, episode_num: int) -> float:
     reset_env(task_id)
     state = get_state()
-    print(
-        json.dumps(
-            {
-                "event": "START",
-                "episode": episode_num,
-                "task_id": task_id,
-                "initial_complexity": state.get("complexity", 0),
-                "initial_code_length": len(state.get("current_code", "")),
-                "timestamp": time.time(),
-            }
-        ),
-        flush=True,
-    )
     cumulative_reward = 0.0
@@ -214,25 +205,8 @@ def run_episode(client: OpenAI, task_id: str, episode_num: int) -> float:
         norm_reward = float(reward_payload.get("normalized", (raw_reward + 32) / 52))
         cumulative_reward += raw_reward
-        print(
-            json.dumps(
-                {
-                    "event": "STEP",
-                    "episode": episode_num,
-                    "step": step_num,
-                    "action": action,
-                    "action_name": ACTION_MEANINGS.get(action, "unknown"),
-                    "reason": reason,
-                    "reward": round(raw_reward, 4),
-                    "normalized_reward": round(norm_reward, 4),
-                    "cumulative_reward": round(cumulative_reward, 4),
-                    "changed": result.get("info", {}).get("changed", False),
-                    "reward_components": reward_payload.get("components", {}),
-                    "done": result.get("done", False),
-                }
-            ),
-            flush=True,
-        )
         if result.get("done") or result.get("terminated") or result.get("truncated"):
             break
@@ -240,21 +214,8 @@ def run_episode(client: OpenAI, task_id: str, episode_num: int) -> float:
     final_state = get_state()
     task_score = grade(task_id, final_state.get("current_code", ""))
-    print(
-        json.dumps(
-            {
-                "event": "END",
-                "episode": episode_num,
-                "task_id": task_id,
-                "cumulative_reward": round(cumulative_reward, 4),
-                "normalized_cumulative": round((cumulative_reward + 32) / 52, 4),
-                "task_score": round(task_score, 4),
-                "final_complexity": final_state.get("complexity", 0),
-                "timestamp": time.time(),
-            }
-        ),
-        flush=True,
-    )
     return task_score
@@ -263,7 +224,9 @@ def main() -> None:
     if not ENV_URL:
         raise SystemExit("ENV_URL is required. Example: ENV_URL=http://localhost:7860")
-    client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN or "dummy")
     scores: List[float] = []
     for i, task_id in enumerate(TASKS, start=1):

 """
 ACRE inference script for OpenEnv submission evaluation.
+Environment variables:
+  - API_BASE_URL: LLM API endpoint (default allowed)
+  - MODEL_NAME: model identifier (default allowed)
+  - HF_TOKEN: API token for the OpenAI-compatible endpoint (NO default)
+  - ENV_URL: running ACRE server base URL (required)
+  - LOCAL_IMAGE_NAME: present for evaluator compatibility (optional)
+  - USE_LLM: set to "1" to enable LLM action selection when HF_TOKEN is set
+STRICT stdout format (do not change):
+  START <task_id>
+  STEP <action_int>
+  END <score_float>
 """
 from __future__ import annotations
 import re
 import sys
 import time
+from typing import Dict, List, Optional, Tuple
 import requests
 from openai import OpenAI
     return float(response.json().get("score", 0.0))
+def choose_action(client: Optional[OpenAI], state: dict, task_id: str) -> Tuple[int, str]:
     def heuristic_action() -> Tuple[int, str]:
         code = str(state.get("current_code", ""))
         step_i = int(state.get("episode_steps", 0))
             return 1, "heuristic: remove remaining dead code"
         return 3, "heuristic: condition optimization as safe default"
+    use_llm = bool(HF_TOKEN) and os.getenv("USE_LLM", "0") == "1"
+    if (not use_llm) or client is None:
         return heuristic_action()
     messages = [
         return heuristic_action()
+def run_episode(client: Optional[OpenAI], task_id: str, episode_num: int) -> float:
     reset_env(task_id)
     state = get_state()
+    # STRICT logging format required by evaluator.
+    print(f"START {task_id}", flush=True)
     cumulative_reward = 0.0
         norm_reward = float(reward_payload.get("normalized", (raw_reward + 32) / 52))
         cumulative_reward += raw_reward
+        # STRICT logging format required by evaluator.
+        print(f"STEP {int(action)}", flush=True)
         if result.get("done") or result.get("terminated") or result.get("truncated"):
             break
     final_state = get_state()
     task_score = grade(task_id, final_state.get("current_code", ""))
+    # STRICT logging format required by evaluator.
+    print(f"END {task_score:.4f}", flush=True)
     return task_score
     if not ENV_URL:
         raise SystemExit("ENV_URL is required. Example: ENV_URL=http://localhost:7860")
+    client: Optional[OpenAI] = None
+    if HF_TOKEN and os.getenv("USE_LLM", "0") == "1":
+        client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
     scores: List[float] = []
     for i, task_id in enumerate(TASKS, start=1):

openenv_interface.py CHANGED Viewed

@@ -62,7 +62,23 @@ class OpenEnvRefactorEnv(OpenEnvBase):
             task = self._registry.get_task(task_id)
             if task is None:
                 raise ValueError(f"Task '{task_id}' not found")
-            initial_code = task.initial_code
         if initial_code is None:
             return None

             task = self._registry.get_task(task_id)
             if task is None:
                 raise ValueError(f"Task '{task_id}' not found")
+            # Load a multi-sample dataset for this task. Sample selection is
+            # deterministic given the `seed` passed to `reset()`.
+            samples = list(getattr(task, "samples", []) or [])
+            if not samples:
+                initial_code = task.initial_code
+            else:
+                self._env.dataset = CodeSampleDataset(
+                    [
+                        CodeSample(
+                            id=f"{task_id}:{i}",
+                            language="python",
+                            code=str(src),
+                        )
+                        for i, src in enumerate(samples)
+                    ]
+                )
+                return None
         if initial_code is None:
             return None

validate.py CHANGED Viewed

@@ -204,8 +204,16 @@ def run_validation(base_url: str) -> int:
     try:
         inference_src = read_text("inference.py")
         failures += 0 if check("inference.py exists", True) else 1
-        for marker in ['"event": "START"', '"event": "STEP"', '"event": "END"']:
-            failures += 0 if check(f"inference.py emits {marker}", marker in inference_src) else 1
         failures += 0 if check(
             "Uses OpenAI client",
             "from openai import OpenAI" in inference_src,

     try:
         inference_src = read_text("inference.py")
         failures += 0 if check("inference.py exists", True) else 1
+        # Accept either the older JSON event markers or the strict hackathon
+        # line-based format:
+        #   START <task_id>
+        #   STEP <action>
+        #   END <score>
+        json_markers_ok = all(m in inference_src for m in ['"event": "START"', '"event": "STEP"', '"event": "END"'])
+        line_markers_ok = all(m in inference_src for m in ["START ", "STEP ", "END "])
+        failures += 0 if check("inference.py emits START marker", json_markers_ok or line_markers_ok) else 1
+        failures += 0 if check("inference.py emits STEP marker", json_markers_ok or line_markers_ok) else 1
+        failures += 0 if check("inference.py emits END marker", json_markers_ok or line_markers_ok) else 1
         failures += 0 if check(
             "Uses OpenAI client",
             "from openai import OpenAI" in inference_src,