PRANAV05092003 commited on
Commit
0cca1b6
Β·
1 Parent(s): e93bbca

Update README file

Browse files
Files changed (1) hide show
  1. README.md +175 -90
README.md CHANGED
@@ -11,159 +11,231 @@ tags:
11
  - openenv
12
  ---
13
 
14
- # ACRE - Autonomous Code Refactoring Environment
15
 
16
- ACRE is an OpenEnv-compatible environment for autonomous Python code refactoring. An agent receives real code-cleanup tasks and must improve the code through AST-based transformations while receiving dense reward feedback for correctness, simplification, and performance.
17
 
18
- ## Environment Overview and Motivation
 
 
19
 
20
- This project simulates a realistic developer workflow: cleaning up messy Python code, removing dead logic, simplifying loops, and inlining trivial helpers. The canonical OpenEnv wrapper lives in `openenv_interface.py`, while the original Gymnasium-compatible environment remains available for RL training and demos.
21
 
22
- ## Definitions of Action and Observation Spaces
23
 
24
- ### Action Space - Discrete(5)
25
 
26
- | Action | Name | Description |
27
- |---|---|---|
28
- | 0 | rename_variable | Rename generic variables like `x`, `tmp`, and `i` |
29
- | 1 | remove_dead_code | Remove unreachable statements, `if False` branches, and unused assignments |
30
- | 2 | simplify_loop | Convert append-loops into list comprehensions |
31
- | 3 | optimize_condition | Simplify `not not x`, `if True`, `if False`, and boolean comparisons |
32
- | 4 | inline_function | Inline simple single-return module-level functions |
33
 
34
- ### Observation Space - Box(4,)
35
 
36
- The environment tracks:
37
 
38
- - `code_length`
39
- - `complexity_score`
40
- - `runtime_s`
41
- - `error_flag`
42
 
43
- ### Typed OpenEnv Models
 
 
 
 
44
 
45
- The submission-facing interface uses Pydantic models in `models.py`:
46
 
47
- - `ObservationModel`
48
- - `ActionModel`
49
- - `RewardModel`
50
- - `StateResponse`
51
 
52
- The canonical interface is:
53
 
54
- ```python
55
- observation = env.reset(...)
56
- observation, reward, done, info = env.step(action)
57
- state = env.state()
58
- ```
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- ## Task Descriptions with Expected Difficulty Levels
 
 
61
 
62
  | Task ID | Difficulty | Objective |
63
- |---|---|---|
64
- | `rename_variables` | Easy | Remove generic variable names from the snippet |
65
- | `remove_dead_code` | Medium | Eliminate dead branches, unreachable code, and unused assignments |
66
- | `full_refactor` | Hard | Combine renaming, dead-code removal, loop simplification, condition optimization, and inlining |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- Each task includes a deterministic AST-based grader returning a score in `[0.0, 1.0]`.
69
 
70
- ## Reward Design
71
 
72
- Rewards are shaped throughout the trajectory instead of only at the end.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
- - Success reward for syntactically valid, executable output
75
- - Complexity reward when control-flow complexity decreases
76
- - Performance reward when runtime improves
77
- - Error penalty for invalid or failing code
78
- - No-change penalty to discourage loops and unproductive actions
 
 
 
 
 
 
79
 
80
- Raw reward range is `[-32, 20]`, normalized to `[0.0, 1.0]` with `(raw + 32) / 52`.
 
 
81
 
82
- ## HTTP API
83
 
84
- | Method | Path | Purpose |
 
 
85
  |---|---|---|
86
  | GET | `/` | Health check |
87
- | GET | `/health` | Compatibility health check |
88
- | POST | `/reset` | Reset environment and return typed observation/state |
89
- | POST | `/step` | Apply one action and return typed observation/reward/done |
90
- | GET | `/state` | Return the current typed state |
91
- | GET | `/tasks` | List available tasks |
92
- | POST | `/tasks/{task_id}/grade` | Grade submitted code |
93
 
94
- ## Setup and Usage Instructions
95
 
96
- ### Local setup
97
 
98
  ```bash
99
  pip install -r requirements.txt
100
  python server.py
101
  ```
102
 
103
- ### Baseline inference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
- Set environment variables before running:
106
 
107
  ```bash
108
  export API_BASE_URL=https://api.openai.com/v1
109
  export MODEL_NAME=gpt-4o-mini
110
  export HF_TOKEN=your_key
111
  export ENV_URL=http://localhost:7860
112
- python inference.py
113
  ```
114
 
115
- Notes:
116
 
117
- - `API_BASE_URL` and `MODEL_NAME` have defaults in `inference.py`
118
- - `HF_TOKEN` is optional because the script falls back to a deterministic heuristic baseline
119
- - `LOCAL_IMAGE_NAME` is read for evaluator compatibility when using a local Docker image launcher
120
 
121
- ### Docker / Hugging Face Spaces
122
 
123
- ```bash
124
- docker build -t acre .
125
- docker run -p 7860:7860 \
126
- -e API_BASE_URL=https://api.openai.com/v1 \
127
- -e MODEL_NAME=gpt-4o-mini \
128
- -e HF_TOKEN=your_key \
129
- -e ENV_URL=http://localhost:7860 \
130
- acre
131
  ```
132
 
133
- The repository is configured for a Docker-based Hugging Face Space and includes the `openenv` tag in the front matter.
 
 
 
 
 
 
 
 
 
134
 
135
- ## Validation
136
 
137
- Run the repository validator:
138
 
139
  ```bash
140
  python validate.py --url http://localhost:7860
141
  ```
142
 
143
- When using the official hackathon tooling, also run:
144
 
145
  ```bash
146
  openenv validate
147
  ```
148
 
149
- ## Interactive Demo
150
-
151
- Start the server and open:
152
-
153
- ```text
154
- http://localhost:7860/demo
155
- ```
156
 
157
- The demo shows:
158
 
159
- - Original code
160
- - Optimized code
161
- - Unified diff
162
- - Per-step action and reward logs
163
 
164
- ## Baseline Performance Scores
165
 
166
- The deterministic fallback policy used by `inference.py` produces the following reproducible task scores:
167
 
168
  | Task | Score |
169
  |---|---|
@@ -172,4 +244,17 @@ The deterministic fallback policy used by `inference.py` produces the following
172
  | `full_refactor` | 0.7143 |
173
  | Average | 0.6548 |
174
 
175
- These scores come from the built-in heuristic policy with `HF_TOKEN` unset, which keeps the baseline reproducible across runs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - openenv
12
  ---
13
 
14
+ # πŸš€ ACRE β€” Autonomous Code Refactoring Environment
15
 
16
+ > OpenEnv-powered AI system for real-world code optimization, refactoring, and evaluation.
17
 
18
+ ![Status](https://img.shields.io/badge/Status-Running-success)
19
+ ![OpenEnv](https://img.shields.io/badge/OpenEnv-Compatible-blue)
20
+ ![Docker](https://img.shields.io/badge/Docker-Ready-green)
21
 
22
+ ---
23
 
24
+ ## πŸ”₯ Overview
25
 
26
+ ACRE is an OpenEnv-compliant environment designed to simulate real-world software engineering workflows such as code cleanup, optimization, and refactoring using AI agents.
27
 
28
+ It enables agents to iteratively improve code through structured actions while receiving dense, step-wise reward feedback.
 
 
 
 
 
 
29
 
30
+ ---
31
 
32
+ ## πŸ’‘ Why This Matters
33
 
34
+ Modern software systems require automated code optimization and intelligent tooling.
 
 
 
35
 
36
+ ACRE enables:
37
+ - πŸ€– AI coding assistants
38
+ - πŸ” Automated code review systems
39
+ - ⚑ Reinforcement learning-based optimization agents
40
+ - 🧠 Learning real developer workflows
41
 
42
+ ---
43
 
44
+ ## πŸ”„ How It Works
 
 
 
45
 
46
+ Code β†’ Action β†’ Refactor β†’ Reward β†’ Repeat
47
 
48
+ 1. Load messy code
49
+ 2. Apply transformation
50
+ 3. Evaluate using grader
51
+ 4. Compute reward
52
+ 5. Iterate until optimal
53
+
54
+ ---
55
+
56
+ ## 🧠 Key Features
57
+
58
+ - βœ… Autonomous code refactoring
59
+ - ⚑ Step-wise reward feedback
60
+ - πŸ§ͺ OpenEnv compliant interface
61
+ - πŸ“Š Deterministic grading system
62
+ - πŸ” Reproducible inference pipeline
63
+ - 🐳 Fully containerized (Docker + Hugging Face Spaces)
64
 
65
+ ---
66
+
67
+ ## πŸ“‚ Tasks
68
 
69
  | Task ID | Difficulty | Objective |
70
+ |--------|----------|----------|
71
+ | `rename_variables` | Easy | Replace generic variable names |
72
+ | `remove_dead_code` | Medium | Remove unreachable logic |
73
+ | `full_refactor` | Hard | Combine multiple optimizations |
74
+
75
+ Each task uses AST-based transformations and deterministic grading.
76
+
77
+ ---
78
+
79
+ ## 🎯 Reward System
80
+
81
+ Rewards are computed at every step:
82
+
83
+ - βœ… Valid executable code β†’ positive reward
84
+ - πŸ“‰ Reduced complexity β†’ reward
85
+ - ⚑ Improved performance β†’ reward
86
+ - ❌ Errors or invalid code β†’ penalty
87
+ - πŸ” No progress β†’ penalty
88
 
89
+ **Normalization:**
90
 
91
+ `(raw_reward + 32) / 52 β†’ [0, 1]`
92
 
93
+ ---
94
+
95
+ ## πŸ“Š Example Execution
96
+
97
+ ```text
98
+ START rename_variables
99
+ STEP 0
100
+ END 1.00
101
+
102
+ START remove_dead_code
103
+ STEP 1
104
+ END 0.25
105
+
106
+ START full_refactor
107
+ STEP 3
108
+ END 0.71
109
+
110
+ Final Score: 0.65
111
+ ```
112
+
113
+ ---
114
+
115
+ ## πŸ—οΈ Architecture
116
+
117
+ - `server.py` β†’ FastAPI entry point
118
+ - `openenv_interface.py` β†’ OpenEnv wrapper
119
+ - `acre/env/` β†’ Core environment logic
120
+ - `acre/tasks/` β†’ Task definitions
121
+ - `acre/utils/` β†’ Metrics and helpers
122
+ - `inference.py` β†’ Evaluation pipeline
123
 
124
+ ---
125
+
126
+ ## βš™οΈ OpenEnv Interface
127
+
128
+ ```python
129
+ observation = env.reset()
130
+ observation, reward, done, info = env.step(action)
131
+ state = env.state()
132
+ ```
133
+
134
+ Uses Pydantic models:
135
 
136
+ - `ObservationModel`
137
+ - `ActionModel`
138
+ - `RewardModel`
139
 
140
+ ---
141
 
142
+ ## 🌐 HTTP API
143
+
144
+ | Method | Endpoint | Description |
145
  |---|---|---|
146
  | GET | `/` | Health check |
147
+ | GET | `/health` | Compatibility check |
148
+ | POST | `/reset` | Reset environment |
149
+ | POST | `/step` | Execute action |
150
+ | GET | `/state` | Get state |
151
+ | GET | `/tasks` | List tasks |
152
+ | POST | `/tasks/{task_id}/grade` | Grade code |
153
 
154
+ ---
155
 
156
+ ## πŸš€ Run Locally
157
 
158
  ```bash
159
  pip install -r requirements.txt
160
  python server.py
161
  ```
162
 
163
+ ---
164
+
165
+ ## 🐳 Docker / Hugging Face Spaces
166
+
167
+ ```bash
168
+ docker build -t acre .
169
+ docker run -p 7860:7860 \
170
+ -e API_BASE_URL=https://api.openai.com/v1 \
171
+ -e MODEL_NAME=gpt-4o-mini \
172
+ -e HF_TOKEN=your_key \
173
+ -e ENV_URL=http://localhost:7860 \
174
+ acre
175
+ ```
176
+
177
+ ---
178
+
179
+ ## πŸ§ͺ Inference
180
 
181
+ Set environment variables:
182
 
183
  ```bash
184
  export API_BASE_URL=https://api.openai.com/v1
185
  export MODEL_NAME=gpt-4o-mini
186
  export HF_TOKEN=your_key
187
  export ENV_URL=http://localhost:7860
 
188
  ```
189
 
190
+ Run:
191
 
192
+ ```bash
193
+ python inference.py
194
+ ```
195
 
196
+ Expected output:
197
 
198
+ ```text
199
+ Easy: 1.00
200
+ Medium: 0.25
201
+ Hard: 0.71
202
+ Final: 0.65
 
 
 
203
  ```
204
 
205
+ ---
206
+
207
+ ## πŸ“Œ OpenEnv Compliance
208
+
209
+ - βœ” `step()` implemented
210
+ - βœ” `reset()` implemented
211
+ - βœ” `state()` implemented
212
+ - βœ” reward shaping
213
+ - βœ” deterministic grading
214
+ - βœ” structured logs
215
 
216
+ ---
217
 
218
+ ## πŸ§ͺ Validation
219
 
220
  ```bash
221
  python validate.py --url http://localhost:7860
222
  ```
223
 
224
+ Or:
225
 
226
  ```bash
227
  openenv validate
228
  ```
229
 
230
+ ---
 
 
 
 
 
 
231
 
232
+ ## 🌐 Live Demo
233
 
234
+ πŸ‘‰ Running on Hugging Face Spaces
 
 
 
235
 
236
+ ---
237
 
238
+ ## πŸ“Š Baseline Performance
239
 
240
  | Task | Score |
241
  |---|---|
 
244
  | `full_refactor` | 0.7143 |
245
  | Average | 0.6548 |
246
 
247
+ ---
248
+
249
+ ## πŸ† Use Cases
250
+
251
+ - AI-powered code optimization
252
+ - Automated refactoring tools
253
+ - Reinforcement learning environments
254
+ - Developer productivity systems
255
+
256
+ ---
257
+
258
+ ## πŸ“œ License
259
+
260
+ MIT License