PRANAV05092003's picture
Final multi-mode OpenEnv fix
19e4a1d
metadata
title: ACRE - Autonomous Code Refactoring Environment
colorFrom: blue
colorTo: green
sdk: docker
app_file: server.py
app_port: 7860
pinned: false
license: mit
tags:
  - openenv

πŸš€ ACRE β€” Autonomous Code Refactoring Environment

OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation.

Status OpenEnv Docker


πŸ”₯ Overview

ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents.

It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback.

Environment Overview and Motivation

ACRE models a realistic developer workflow where an agent incrementally improves Python code quality under a fixed action budget. The environment is designed for OpenEnv Round 1 requirements: typed APIs, deterministic grading, multi-difficulty tasks, and reproducible inference behavior.


πŸ’‘ Why This Matters

Modern software systems require automated code optimization and intelligent tooling.

ACRE enables:

  • πŸ€– AI coding assistants
  • πŸ” Automated code review systems
  • ⚑ Reinforcement learning-based optimization agents
  • 🧠 Learning real developer workflows

πŸ”„ How It Works

Code β†’ Action β†’ Refactor β†’ Reward β†’ Repeat

  1. Load messy code
  2. Apply transformation
  3. Evaluate using grader
  4. Compute reward
  5. Iterate until optimal

🧠 Key Features

  • βœ… Autonomous code refactoring
  • ⚑ Step-wise reward feedback
  • πŸ§ͺ OpenEnv compliant interface
  • πŸ“Š Deterministic grading system
  • πŸ” Reproducible inference pipeline
  • 🐳 Fully containerized (Docker + Hugging Face Spaces)

πŸ“‚ Tasks

Task ID Difficulty Objective
rename_variables Easy Replace generic variable names
remove_dead_code Medium Remove unreachable logic
full_refactor Hard Combine multiple optimizations

Each task uses AST-based transformations and deterministic grading.

Task Descriptions with Expected Difficulty Levels

  • Easy (rename_variables): rename generic names like x, tmp, i into descriptive identifiers.
  • Medium (remove_dead_code): remove unreachable branches and unused assignments while preserving behavior.
  • Hard (full_refactor): combine renaming, dead-code elimination, loop simplification, condition cleanup, and helper inlining.

🎯 Reward System

Rewards are computed at every step:

  • βœ… Valid executable code β†’ positive reward
  • πŸ“‰ Reduced complexity β†’ reward
  • ⚑ Improved performance β†’ reward
  • ❌ Errors or invalid code β†’ penalty
  • πŸ” No progress β†’ penalty

Normalization:

(raw_reward + 32) / 52 β†’ [0, 1]


πŸ“Š Example Execution

[START] task=rename_variables
[STEP] action=0
[END] task=rename_variables score=1.00

[START] task=remove_dead_code
[STEP] action=1
[END] task=remove_dead_code score=0.25

[START] task=full_refactor
[STEP] action=3
[END] task=full_refactor score=0.71

Final Score: 0.65

πŸ—οΈ Architecture

  • server/app.py β†’ FastAPI entry point used by OpenEnv + Docker
  • server.py β†’ legacy local runner / UI helper
  • openenv_interface.py β†’ OpenEnv wrapper
  • acre/env/ β†’ Core environment logic
  • acre/tasks/ β†’ Task definitions
  • acre/utils/ β†’ Metrics and helpers
  • inference.py β†’ Evaluation pipeline

βš™οΈ OpenEnv Interface

observation = env.reset()
observation, reward, done, info = env.step(action)
state = env.state()

Uses Pydantic models:

  • ObservationModel
  • ActionModel
  • RewardModel

Definitions of Action and Observation Spaces

  • Observation space: Box(4) with fields code_length, complexity_score, runtime_s, error_flag.
  • Action space: Discrete(5) with actions rename_variable, remove_dead_code, simplify_loop, optimize_condition, inline_function.

🌐 HTTP API

Method Endpoint Description
GET / Health check
GET /health Compatibility check
POST /reset Reset environment
POST /step Execute action
GET /state Get state
GET /tasks List tasks
POST /tasks/{task_id}/grade Grade code

πŸš€ Run Locally

Setup and Usage Instructions

pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860

🐳 Docker / Hugging Face Spaces

docker build -t acre .
docker run -p 7860:7860 \
  -e API_BASE_URL=https://api.openai.com/v1 \
  -e MODEL_NAME=gpt-4o-mini \
  -e API_KEY=your_key \
  -e ENV_URL=http://localhost:7860 \
  acre

πŸ§ͺ Inference

Set environment variables:

export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o-mini
export API_KEY=your_key
export ENV_URL=http://localhost:7860

Run:

python inference.py

Expected output:

Easy: 1.00
Medium: 0.25
Hard: 0.71
Final: 0.65

πŸ“Œ OpenEnv Compliance

  • βœ” step() implemented
  • βœ” reset() implemented
  • βœ” state() implemented
  • βœ” reward shaping
  • βœ” deterministic grading
  • βœ” structured logs

πŸ§ͺ Validation

python validate.py --url http://localhost:7860

Or:

openenv validate

🌐 Live Demo

πŸ‘‰ Running on Hugging Face Spaces


πŸ“Š Baseline Performance

Baseline Performance Scores

Task Score
rename_variables 1.0000
remove_dead_code 0.2500
full_refactor 0.7143
Average 0.6548

πŸ† Use Cases

  • AI-powered code optimization
  • Automated refactoring tools
  • Reinforcement learning environments
  • Developer productivity systems

πŸ“œ License

MIT License