File size: 5,976 Bytes
8422246
bc5030f
8422246
bc5030f
8422246
900e1f4
bc5030f
8422246
bc5030f
 
 
8422246
 
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
19e4a1d
 
 
 
 
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
 
 
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc5030f
0cca1b6
 
 
bc5030f
 
0cca1b6
 
 
 
 
 
 
19e4a1d
 
 
 
 
 
0cca1b6
 
 
 
 
 
 
 
 
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
 
 
 
 
19e4a1d
 
 
0cca1b6
19e4a1d
 
 
0cca1b6
19e4a1d
 
 
0cca1b6
 
 
 
 
 
 
 
19e4a1d
 
0cca1b6
 
 
 
 
bc5030f
0cca1b6
 
 
 
 
 
 
 
 
 
 
bc5030f
0cca1b6
 
 
bc5030f
19e4a1d
 
 
 
 
0cca1b6
bc5030f
0cca1b6
 
 
bc5030f
 
0cca1b6
 
 
 
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
19e4a1d
 
bc5030f
 
e7eb0fa
bc5030f
 
0cca1b6
 
 
 
 
 
 
 
 
19e4a1d
0cca1b6
 
 
 
 
 
 
bc5030f
0cca1b6
bc5030f
 
 
 
19e4a1d
bc5030f
 
 
0cca1b6
bc5030f
0cca1b6
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
 
 
 
 
bc5030f
 
0cca1b6
 
 
 
 
 
 
 
 
 
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
 
 
 
 
0cca1b6
bc5030f
 
 
 
 
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
0cca1b6
bc5030f
19e4a1d
 
bc5030f
 
8d66fec
 
 
 
bc5030f
0cca1b6
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
title: ACRE - Autonomous Code Refactoring Environment
colorFrom: blue
colorTo: green
sdk: docker
app_file: server.py
app_port: 7860
pinned: false
license: mit
tags:
  - openenv
---

# πŸš€ ACRE β€” Autonomous Code Refactoring Environment

> OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation.

![Status](https://img.shields.io/badge/Status-Running-success)
![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)
![Docker](https://img.shields.io/badge/Docker-Ready-green)

---

## πŸ”₯ Overview

ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents.

It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback.

## Environment Overview and Motivation

ACRE models a realistic developer workflow where an agent incrementally improves Python code quality under a fixed action budget.
The environment is designed for OpenEnv Round 1 requirements: typed APIs, deterministic grading, multi-difficulty tasks, and reproducible inference behavior.

---

## πŸ’‘ Why This Matters

Modern software systems require automated code optimization and intelligent tooling.

ACRE enables:
- πŸ€– AI coding assistants
- πŸ” Automated code review systems
- ⚑ Reinforcement learning-based optimization agents
- 🧠 Learning real developer workflows

---

## πŸ”„ How It Works

Code β†’ Action β†’ Refactor β†’ Reward β†’ Repeat

1. Load messy code
2. Apply transformation
3. Evaluate using grader
4. Compute reward
5. Iterate until optimal

---

## 🧠 Key Features

- βœ… Autonomous code refactoring
- ⚑ Step-wise reward feedback
- πŸ§ͺ OpenEnv compliant interface
- πŸ“Š Deterministic grading system
- πŸ” Reproducible inference pipeline
- 🐳 Fully containerized (Docker + Hugging Face Spaces)

---

## πŸ“‚ Tasks

| Task ID | Difficulty | Objective |
|--------|----------|----------|
| `rename_variables` | Easy | Replace generic variable names |
| `remove_dead_code` | Medium | Remove unreachable logic |
| `full_refactor` | Hard | Combine multiple optimizations |

Each task uses AST-based transformations and deterministic grading.

## Task Descriptions with Expected Difficulty Levels

- Easy (`rename_variables`): rename generic names like `x`, `tmp`, `i` into descriptive identifiers.
- Medium (`remove_dead_code`): remove unreachable branches and unused assignments while preserving behavior.
- Hard (`full_refactor`): combine renaming, dead-code elimination, loop simplification, condition cleanup, and helper inlining.

---

## 🎯 Reward System

Rewards are computed at every step:

- βœ… Valid executable code β†’ positive reward
- πŸ“‰ Reduced complexity β†’ reward
- ⚑ Improved performance β†’ reward
- ❌ Errors or invalid code β†’ penalty
- πŸ” No progress β†’ penalty

**Normalization:**

`(raw_reward + 32) / 52 β†’ [0, 1]`

---

## πŸ“Š Example Execution

```text
[START] task=rename_variables
[STEP] action=0
[END] task=rename_variables score=1.00

[START] task=remove_dead_code
[STEP] action=1
[END] task=remove_dead_code score=0.25

[START] task=full_refactor
[STEP] action=3
[END] task=full_refactor score=0.71

Final Score: 0.65
```

---

## πŸ—οΈ Architecture

- `server/app.py` β†’ FastAPI entry point used by OpenEnv + Docker
- `server.py` β†’ legacy local runner / UI helper
- `openenv_interface.py` β†’ OpenEnv wrapper
- `acre/env/` β†’ Core environment logic
- `acre/tasks/` β†’ Task definitions
- `acre/utils/` β†’ Metrics and helpers
- `inference.py` β†’ Evaluation pipeline

---

## βš™οΈ OpenEnv Interface

```python
observation = env.reset()
observation, reward, done, info = env.step(action)
state = env.state()
```

Uses Pydantic models:

- `ObservationModel`
- `ActionModel`
- `RewardModel`

## Definitions of Action and Observation Spaces

- Observation space: Box(4) with fields `code_length`, `complexity_score`, `runtime_s`, `error_flag`.
- Action space: Discrete(5) with actions `rename_variable`, `remove_dead_code`, `simplify_loop`, `optimize_condition`, `inline_function`.

---

## 🌐 HTTP API

| Method | Endpoint | Description |
|---|---|---|
| GET | `/` | Health check |
| GET | `/health` | Compatibility check |
| POST | `/reset` | Reset environment |
| POST | `/step` | Execute action |
| GET | `/state` | Get state |
| GET | `/tasks` | List tasks |
| POST | `/tasks/{task_id}/grade` | Grade code |

---

## πŸš€ Run Locally

## Setup and Usage Instructions

```bash
pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860
```

---

## 🐳 Docker / Hugging Face Spaces

```bash
docker build -t acre .
docker run -p 7860:7860 \
  -e API_BASE_URL=https://api.openai.com/v1 \
  -e MODEL_NAME=gpt-4o-mini \
  -e API_KEY=your_key \
  -e ENV_URL=http://localhost:7860 \
  acre
```

---

## πŸ§ͺ Inference

Set environment variables:

```bash
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o-mini
export API_KEY=your_key
export ENV_URL=http://localhost:7860
```

Run:

```bash
python inference.py
```

Expected output:

```text
Easy: 1.00
Medium: 0.25
Hard: 0.71
Final: 0.65
```

---

## πŸ“Œ OpenEnv Compliance

- βœ” `step()` implemented
- βœ” `reset()` implemented
- βœ” `state()` implemented
- βœ” reward shaping
- βœ” deterministic grading
- βœ” structured logs

---

## πŸ§ͺ Validation

```bash
python validate.py --url http://localhost:7860
```

Or:

```bash
openenv validate
```

---

## 🌐 Live Demo

πŸ‘‰ Running on Hugging Face Spaces

---

## πŸ“Š Baseline Performance

## Baseline Performance Scores

| Task | Score |
|---|---|
| `rename_variables` | 1.0000 |
| `remove_dead_code` | 0.2500 |
| `full_refactor` | 0.7143 |
| Average | 0.6548 |

---

## πŸ† Use Cases

- AI-powered code optimization
- Automated refactoring tools
- Reinforcement learning environments
- Developer productivity systems

---

## πŸ“œ License

MIT License