Spaces:

PRANAV05092003
/

autonomous-code-refactoring-env

Sleeping

App Files Files Community

autonomous-code-refactoring-env / README.md

PRANAV05092003

Final multi-mode OpenEnv fix

19e4a1d about 2 months ago

preview code

raw

history blame contribute delete

5.98 kB

metadata

title: ACRE - Autonomous Code Refactoring Environment
colorFrom: blue
colorTo: green
sdk: docker
app_file: server.py
app_port: 7860
pinned: false
license: mit
tags:
  - openenv

🚀 ACRE — Autonomous Code Refactoring Environment

OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation.

🔥 Overview

ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents.

It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback.

Environment Overview and Motivation

ACRE models a realistic developer workflow where an agent incrementally improves Python code quality under a fixed action budget. The environment is designed for OpenEnv Round 1 requirements: typed APIs, deterministic grading, multi-difficulty tasks, and reproducible inference behavior.

💡 Why This Matters

Modern software systems require automated code optimization and intelligent tooling.

ACRE enables:

🤖 AI coding assistants
🔍 Automated code review systems
⚡ Reinforcement learning-based optimization agents
🧠 Learning real developer workflows

🔄 How It Works

Code → Action → Refactor → Reward → Repeat

Load messy code
Apply transformation
Evaluate using grader
Compute reward
Iterate until optimal

🧠 Key Features

✅ Autonomous code refactoring
⚡ Step-wise reward feedback
🧪 OpenEnv compliant interface
📊 Deterministic grading system
🔁 Reproducible inference pipeline
🐳 Fully containerized (Docker + Hugging Face Spaces)

📂 Tasks

Task ID	Difficulty	Objective
`rename_variables`	Easy	Replace generic variable names
`remove_dead_code`	Medium	Remove unreachable logic
`full_refactor`	Hard	Combine multiple optimizations

Each task uses AST-based transformations and deterministic grading.

Task Descriptions with Expected Difficulty Levels

Easy (rename_variables): rename generic names like x, tmp, i into descriptive identifiers.
Medium (remove_dead_code): remove unreachable branches and unused assignments while preserving behavior.
Hard (full_refactor): combine renaming, dead-code elimination, loop simplification, condition cleanup, and helper inlining.

🎯 Reward System

Rewards are computed at every step:

✅ Valid executable code → positive reward
📉 Reduced complexity → reward
⚡ Improved performance → reward
❌ Errors or invalid code → penalty
🔁 No progress → penalty

Normalization:

(raw_reward + 32) / 52 → [0, 1]

📊 Example Execution

[START] task=rename_variables
[STEP] action=0
[END] task=rename_variables score=1.00

[START] task=remove_dead_code
[STEP] action=1
[END] task=remove_dead_code score=0.25

[START] task=full_refactor
[STEP] action=3
[END] task=full_refactor score=0.71

Final Score: 0.65

🏗️ Architecture

server/app.py → FastAPI entry point used by OpenEnv + Docker
server.py → legacy local runner / UI helper
openenv_interface.py → OpenEnv wrapper
acre/env/ → Core environment logic
acre/tasks/ → Task definitions
acre/utils/ → Metrics and helpers
inference.py → Evaluation pipeline

⚙️ OpenEnv Interface

observation = env.reset()
observation, reward, done, info = env.step(action)
state = env.state()

Uses Pydantic models:

ObservationModel
ActionModel
RewardModel

Definitions of Action and Observation Spaces

Observation space: Box(4) with fields code_length, complexity_score, runtime_s, error_flag.
Action space: Discrete(5) with actions rename_variable, remove_dead_code, simplify_loop, optimize_condition, inline_function.

🌐 HTTP API

Method	Endpoint	Description
GET	`/`	Health check
GET	`/health`	Compatibility check
POST	`/reset`	Reset environment
POST	`/step`	Execute action
GET	`/state`	Get state
GET	`/tasks`	List tasks
POST	`/tasks/{task_id}/grade`	Grade code

🚀 Run Locally

Setup and Usage Instructions

pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860

🐳 Docker / Hugging Face Spaces

docker build -t acre .
docker run -p 7860:7860 \
  -e API_BASE_URL=https://api.openai.com/v1 \
  -e MODEL_NAME=gpt-4o-mini \
  -e API_KEY=your_key \
  -e ENV_URL=http://localhost:7860 \
  acre

🧪 Inference

Set environment variables:

export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o-mini
export API_KEY=your_key
export ENV_URL=http://localhost:7860

Run:

python inference.py

Expected output:

Easy: 1.00
Medium: 0.25
Hard: 0.71
Final: 0.65

📌 OpenEnv Compliance

✔ step() implemented
✔ reset() implemented
✔ state() implemented
✔ reward shaping
✔ deterministic grading
✔ structured logs

🧪 Validation

python validate.py --url http://localhost:7860

Or:

openenv validate

🌐 Live Demo

👉 Running on Hugging Face Spaces

📊 Baseline Performance

Baseline Performance Scores

Task	Score
`rename_variables`	1.0000
`remove_dead_code`	0.2500
`full_refactor`	0.7143
Average	0.6548

🏆 Use Cases

AI-powered code optimization
Automated refactoring tools
Reinforcement learning environments
Developer productivity systems

📜 License

MIT License