Initial release: OpenEnv Flow Debugger
Browse files- README.md +130 -0
- demo.py +41 -0
- flow_debugger_env/__init__.py +0 -0
- flow_debugger_env/data/cases.json +152 -0
- flow_debugger_env/env.py +117 -0
README.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenEnv Flow Debugger (Just a Simple Version for Now!)
|
| 2 |
+
|
| 3 |
+
This project is a small, easy-to-use debugging tool built with OpenEnv. It's inspired by those tricky real-world problems we hit in tools like Power Automate.
|
| 4 |
+
|
| 5 |
+
Our environment focuses on a super common issue: those annoying '400 BadRequest' errors that pop up when a condition in your automation flow has a syntax mistake.
|
| 6 |
+
|
| 7 |
+
The main idea here isn't to build a perfect smart agent right away. Instead, we want to create a clear, realistic, and expandable way to test and improve how agents fix bugs.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## What You Need to Do
|
| 12 |
+
|
| 13 |
+
Imagine you have a Power Automate Flow that just failed.
|
| 14 |
+
|
| 15 |
+
It failed because of an "HTTP 400 BadRequest" error.
|
| 16 |
+
This error happened in a "Condition" step.
|
| 17 |
+
And the condition expression has a tiny syntax error.
|
| 18 |
+
|
| 19 |
+
Your job as the agent is to fix that broken condition expression so the flow can run perfectly.
|
| 20 |
+
|
| 21 |
+
Each time you play (each "episode"), it's like facing a real-life debugging puzzle that automation engineers deal with all the time.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## What You See (Observation Space)
|
| 26 |
+
|
| 27 |
+
At each step, you'll get some info in a JSON-like format. It includes:
|
| 28 |
+
|
| 29 |
+
- `case_id`: A unique ID for this specific problem.
|
| 30 |
+
- `run_status`: Tells you if the flow is still 'Failed' or 'Succeeded'.
|
| 31 |
+
- `failed_step`: Which step caused the problem.
|
| 32 |
+
- `error`: Details about the error, like the code and a message.
|
| 33 |
+
- `steps`: A list of all the steps in the flow, showing their inputs and outputs.
|
| 34 |
+
- `attempts_left`: How many more tries you have to fix it.
|
| 35 |
+
|
| 36 |
+
**Example observation (kept simple):**
|
| 37 |
+
|
| 38 |
+
```
|
| 39 |
+
case_id: CASE_001
|
| 40 |
+
run_status: Failed
|
| 41 |
+
failed_step: Condition_Check
|
| 42 |
+
error: code=400, message=BadRequest, details=InvalidTemplate: The expression is invalid
|
| 43 |
+
steps:
|
| 44 |
+
- Compose_Ext (Succeeded, outputs: xlsx)
|
| 45 |
+
- Condition_Check (Failed, expression: @equals(outputs('Compose_Ext'),'xlsx')
|
| 46 |
+
attempts_left: 3
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## What You Can Do (Action Space - Just Starting!)
|
| 52 |
+
|
| 53 |
+
Right now, in this simple version, you can only do one type of action.
|
| 54 |
+
|
| 55 |
+
You can submit a `patch_step` action. This action targets the `Condition_Check` step and updates its `inputs.expression` field.
|
| 56 |
+
|
| 57 |
+
**Example action:**
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
action = patch_step
|
| 61 |
+
step = Condition_Check
|
| 62 |
+
field = inputs.expression
|
| 63 |
+
value = @equals(outputs('Compose_Ext'),'xlsx')
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
For now, your fix needs to be an *exact* match to what's expected for it to count as correct.
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## How You Get Graded (Reward Function)
|
| 71 |
+
|
| 72 |
+
Our scoring system is pretty straightforward:
|
| 73 |
+
|
| 74 |
+
- **+1.0** if you successfully fix the flow.
|
| 75 |
+
- **-0.1** for trying an incorrect fix (but you still have tries left).
|
| 76 |
+
- **-0.2** if you run out of tries without fixing it.
|
| 77 |
+
|
| 78 |
+
The game (episode) ends when the flow is fixed, or when you run out of chances.
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## The Problems (Dataset)
|
| 83 |
+
|
| 84 |
+
The specific bugs we're trying to fix are stored in JSON files here:
|
| 85 |
+
|
| 86 |
+
`flow_debugger_env/data/cases.json`
|
| 87 |
+
|
| 88 |
+
Each problem includes the messed-up flow state, error details, and a hidden 'gold_fix' (the right answer) that the environment uses to check your work. You, the agent, never see this 'gold_fix'.
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## How to Run the Example
|
| 93 |
+
|
| 94 |
+
Just run the `demo.py` file from the main project folder like this:
|
| 95 |
+
|
| 96 |
+
`python demo.py`
|
| 97 |
+
|
| 98 |
+
The demo will pick a random bug, use a basic rule-based agent to try and fix the condition expression, and then show you how it went.
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## What This Can't Do Yet (Limitations)
|
| 103 |
+
|
| 104 |
+
This simple version is kept small on purpose:
|
| 105 |
+
|
| 106 |
+
- It only deals with syntax errors in Condition expressions.
|
| 107 |
+
- It doesn't actually run real Power Automate flows.
|
| 108 |
+
- It doesn't connect to any outside services or APIs.
|
| 109 |
+
- It's not doing fancy AI learning (like reinforcement learning) yet.
|
| 110 |
+
|
| 111 |
+
Keeping things simple means it's fast, predictable, and easy for us to build on later.
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## What's Next?
|
| 116 |
+
|
| 117 |
+
We could add more cool stuff later, like:
|
| 118 |
+
|
| 119 |
+
- Figuring out errors in 'filter array' settings.
|
| 120 |
+
- Dealing with 'null' values or wrong data types.
|
| 121 |
+
- Fixing multiple steps at once.
|
| 122 |
+
- Using smarter, AI-powered agents.
|
| 123 |
+
- Training AI using special tools like TRL or Unsloth.
|
| 124 |
+
- Adding 'Green Agent' wrappers.
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## Why We Made This
|
| 129 |
+
|
| 130 |
+
Debugging Power Automate is a real headache for many, and it's a big deal. This environment turns those everyday automation failures into a structured task for agents and a useful testbed for learning and experimenting with OpenEnv.
|
demo.py
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import re
|
| 2 |
+
from flow_debugger_env.env import FlowDebugEnv
|
| 3 |
+
|
| 4 |
+
def rule_based_agent(obs):
|
| 5 |
+
condition_step = next(s for s in obs["steps"] if s["name"] == "Condition_Check")
|
| 6 |
+
expr = condition_step["inputs"]["expression"]
|
| 7 |
+
|
| 8 |
+
fixed = expr
|
| 9 |
+
fixed = fixed.replace("@equal(", "@equals(")
|
| 10 |
+
fixed = re.sub(r",\s*xlsx\s*\)", r",'xlsx')", fixed)
|
| 11 |
+
fixed = re.sub(r"\)\s*'xlsx'\s*\)", r"),'xlsx')", fixed)
|
| 12 |
+
|
| 13 |
+
if fixed.count("(") > fixed.count(")"):
|
| 14 |
+
fixed = fixed + (")" * (fixed.count("(") - fixed.count(")")))
|
| 15 |
+
|
| 16 |
+
while fixed.endswith("))") and fixed.count(")") > fixed.count("("):
|
| 17 |
+
fixed = fixed[:-1]
|
| 18 |
+
|
| 19 |
+
return {
|
| 20 |
+
"action": "patch_step",
|
| 21 |
+
"step": "Condition_Check",
|
| 22 |
+
"field": "inputs.expression",
|
| 23 |
+
"value": fixed
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
def main():
|
| 27 |
+
env = FlowDebugEnv.from_json("flow_debugger_env/data/cases.json", max_attempts=3, seed=42)
|
| 28 |
+
obs = env.reset()
|
| 29 |
+
done = False
|
| 30 |
+
total = 0.0
|
| 31 |
+
|
| 32 |
+
while not done:
|
| 33 |
+
action = rule_based_agent(obs)
|
| 34 |
+
result = env.step(action)
|
| 35 |
+
obs, reward, done, info = result.obs, result.reward, result.done, result.info
|
| 36 |
+
total += reward
|
| 37 |
+
|
| 38 |
+
print("Finished:", info, "total_reward:", total)
|
| 39 |
+
|
| 40 |
+
if __name__ == "__main__":
|
| 41 |
+
main()
|
flow_debugger_env/__init__.py
ADDED
|
File without changes
|
flow_debugger_env/data/cases.json
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"case_id": "CASE_001",
|
| 4 |
+
"failed_step": "Condition_Check",
|
| 5 |
+
"error": {
|
| 6 |
+
"code": 400,
|
| 7 |
+
"message": "BadRequest",
|
| 8 |
+
"details": "InvalidTemplate: The expression is invalid (missing closing parenthesis)."
|
| 9 |
+
},
|
| 10 |
+
"steps": [
|
| 11 |
+
{
|
| 12 |
+
"name": "Compose_Ext",
|
| 13 |
+
"type": "compose",
|
| 14 |
+
"status": "Succeeded",
|
| 15 |
+
"outputs": "xlsx"
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"name": "Condition_Check",
|
| 19 |
+
"type": "condition",
|
| 20 |
+
"status": "Failed",
|
| 21 |
+
"inputs": {
|
| 22 |
+
"expression": "@equals(outputs('Compose_Ext'),'xlsx'"
|
| 23 |
+
}
|
| 24 |
+
}
|
| 25 |
+
],
|
| 26 |
+
"gold_fix": {
|
| 27 |
+
"step": "Condition_Check",
|
| 28 |
+
"field": "inputs.expression",
|
| 29 |
+
"value": "@equals(outputs('Compose_Ext'),'xlsx')"
|
| 30 |
+
}
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"case_id": "CASE_002",
|
| 34 |
+
"failed_step": "Condition_Check",
|
| 35 |
+
"error": {
|
| 36 |
+
"code": 400,
|
| 37 |
+
"message": "BadRequest",
|
| 38 |
+
"details": "InvalidTemplate: The expression is invalid (extra closing parenthesis)."
|
| 39 |
+
},
|
| 40 |
+
"steps": [
|
| 41 |
+
{
|
| 42 |
+
"name": "Compose_Ext",
|
| 43 |
+
"type": "compose",
|
| 44 |
+
"status": "Succeeded",
|
| 45 |
+
"outputs": "xlsx"
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"name": "Condition_Check",
|
| 49 |
+
"type": "condition",
|
| 50 |
+
"status": "Failed",
|
| 51 |
+
"inputs": {
|
| 52 |
+
"expression": "@equals(outputs('Compose_Ext'),'xlsx'))"
|
| 53 |
+
}
|
| 54 |
+
}
|
| 55 |
+
],
|
| 56 |
+
"gold_fix": {
|
| 57 |
+
"step": "Condition_Check",
|
| 58 |
+
"field": "inputs.expression",
|
| 59 |
+
"value": "@equals(outputs('Compose_Ext'),'xlsx')"
|
| 60 |
+
}
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
"case_id": "CASE_003",
|
| 64 |
+
"failed_step": "Condition_Check",
|
| 65 |
+
"error": {
|
| 66 |
+
"code": 400,
|
| 67 |
+
"message": "BadRequest",
|
| 68 |
+
"details": "InvalidTemplate: The expression is invalid (missing quotes around string literal)."
|
| 69 |
+
},
|
| 70 |
+
"steps": [
|
| 71 |
+
{
|
| 72 |
+
"name": "Compose_Ext",
|
| 73 |
+
"type": "compose",
|
| 74 |
+
"status": "Succeeded",
|
| 75 |
+
"outputs": "xlsx"
|
| 76 |
+
},
|
| 77 |
+
{
|
| 78 |
+
"name": "Condition_Check",
|
| 79 |
+
"type": "condition",
|
| 80 |
+
"status": "Failed",
|
| 81 |
+
"inputs": {
|
| 82 |
+
"expression": "@equals(outputs('Compose_Ext'),xlsx)"
|
| 83 |
+
}
|
| 84 |
+
}
|
| 85 |
+
],
|
| 86 |
+
"gold_fix": {
|
| 87 |
+
"step": "Condition_Check",
|
| 88 |
+
"field": "inputs.expression",
|
| 89 |
+
"value": "@equals(outputs('Compose_Ext'),'xlsx')"
|
| 90 |
+
}
|
| 91 |
+
},
|
| 92 |
+
{
|
| 93 |
+
"case_id": "CASE_004",
|
| 94 |
+
"failed_step": "Condition_Check",
|
| 95 |
+
"error": {
|
| 96 |
+
"code": 400,
|
| 97 |
+
"message": "BadRequest",
|
| 98 |
+
"details": "InvalidTemplate: Unknown function 'equal' (typo)."
|
| 99 |
+
},
|
| 100 |
+
"steps": [
|
| 101 |
+
{
|
| 102 |
+
"name": "Compose_Ext",
|
| 103 |
+
"type": "compose",
|
| 104 |
+
"status": "Succeeded",
|
| 105 |
+
"outputs": "xlsx"
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"name": "Condition_Check",
|
| 109 |
+
"type": "condition",
|
| 110 |
+
"status": "Failed",
|
| 111 |
+
"inputs": {
|
| 112 |
+
"expression": "@equal(outputs('Compose_Ext'),'xlsx')"
|
| 113 |
+
}
|
| 114 |
+
}
|
| 115 |
+
],
|
| 116 |
+
"gold_fix": {
|
| 117 |
+
"step": "Condition_Check",
|
| 118 |
+
"field": "inputs.expression",
|
| 119 |
+
"value": "@equals(outputs('Compose_Ext'),'xlsx')"
|
| 120 |
+
}
|
| 121 |
+
},
|
| 122 |
+
{
|
| 123 |
+
"case_id": "CASE_005",
|
| 124 |
+
"failed_step": "Condition_Check",
|
| 125 |
+
"error": {
|
| 126 |
+
"code": 400,
|
| 127 |
+
"message": "BadRequest",
|
| 128 |
+
"details": "InvalidTemplate: The expression is invalid (missing comma between args)."
|
| 129 |
+
},
|
| 130 |
+
"steps": [
|
| 131 |
+
{
|
| 132 |
+
"name": "Compose_Ext",
|
| 133 |
+
"type": "compose",
|
| 134 |
+
"status": "Succeeded",
|
| 135 |
+
"outputs": "xlsx"
|
| 136 |
+
},
|
| 137 |
+
{
|
| 138 |
+
"name": "Condition_Check",
|
| 139 |
+
"type": "condition",
|
| 140 |
+
"status": "Failed",
|
| 141 |
+
"inputs": {
|
| 142 |
+
"expression": "@equals(outputs('Compose_Ext') 'xlsx')"
|
| 143 |
+
}
|
| 144 |
+
}
|
| 145 |
+
],
|
| 146 |
+
"gold_fix": {
|
| 147 |
+
"step": "Condition_Check",
|
| 148 |
+
"field": "inputs.expression",
|
| 149 |
+
"value": "@equals(outputs('Compose_Ext'),'xlsx')"
|
| 150 |
+
}
|
| 151 |
+
}
|
| 152 |
+
]
|
flow_debugger_env/env.py
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import copy
|
| 2 |
+
import json
|
| 3 |
+
import random
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
from typing import Any, Dict, Optional, List
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
@dataclass
|
| 10 |
+
class StepResult:
|
| 11 |
+
obs: Dict[str, Any]
|
| 12 |
+
reward: float
|
| 13 |
+
done: bool
|
| 14 |
+
info: Dict[str, Any]
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
class FlowDebugEnv:
|
| 18 |
+
"""
|
| 19 |
+
This is a simple environment made for OpenEnv.
|
| 20 |
+
Here's what it does:
|
| 21 |
+
- It gives you information in text or JSON.
|
| 22 |
+
- You can only do one thing: fix the 'inputs.expression' in a 'Condition_Check' step.
|
| 23 |
+
- If your fix is exactly right, you win!
|
| 24 |
+
"""
|
| 25 |
+
def __init__(self, cases: List[Dict[str, Any]], max_attempts: int = 3, seed: Optional[int] = None):
|
| 26 |
+
self.cases = cases
|
| 27 |
+
self.max_attempts = max_attempts
|
| 28 |
+
self.rng = random.Random(seed)
|
| 29 |
+
self.current_case: Optional[Dict[str, Any]] = None
|
| 30 |
+
self.attempts_left = max_attempts
|
| 31 |
+
|
| 32 |
+
@classmethod
|
| 33 |
+
def from_json(cls, cases_json_path: str, max_attempts: int = 3, seed: Optional[int] = None):
|
| 34 |
+
path = Path(cases_json_path)
|
| 35 |
+
with open(path, "r", encoding="utf-8") as f:
|
| 36 |
+
cases = json.load(f)
|
| 37 |
+
return cls(cases=cases, max_attempts=max_attempts, seed=seed)
|
| 38 |
+
|
| 39 |
+
def reset(self) -> Dict[str, Any]:
|
| 40 |
+
self.current_case = copy.deepcopy(self.rng.choice(self.cases))
|
| 41 |
+
self.attempts_left = self.max_attempts
|
| 42 |
+
return self._make_observation()
|
| 43 |
+
|
| 44 |
+
def step(self, action: Dict[str, Any]) -> StepResult:
|
| 45 |
+
if self.current_case is None:
|
| 46 |
+
raise RuntimeError("Call reset() before step().")
|
| 47 |
+
|
| 48 |
+
self.attempts_left -= 1
|
| 49 |
+
|
| 50 |
+
if action.get("action") != "patch_step":
|
| 51 |
+
return self._invalid_action("Unsupported action type")
|
| 52 |
+
|
| 53 |
+
step_name = action.get("step")
|
| 54 |
+
field = action.get("field")
|
| 55 |
+
value = action.get("value")
|
| 56 |
+
|
| 57 |
+
patched_ok = self._apply_patch(step_name, field, value)
|
| 58 |
+
if not patched_ok:
|
| 59 |
+
return self._invalid_action("Patch failed (step/field not found)")
|
| 60 |
+
|
| 61 |
+
gold = self.current_case["gold_fix"]
|
| 62 |
+
solved = (step_name == gold["step"] and field == gold["field"] and value == gold["value"])
|
| 63 |
+
|
| 64 |
+
if solved:
|
| 65 |
+
self._mark_success()
|
| 66 |
+
obs = self._make_observation(run_status="Succeeded", error=None, failed_step=None)
|
| 67 |
+
return StepResult(obs=obs, reward=1.0, done=True,
|
| 68 |
+
info={"result": "success", "case_id": self.current_case["case_id"]})
|
| 69 |
+
|
| 70 |
+
if self.attempts_left <= 0:
|
| 71 |
+
obs = self._make_observation()
|
| 72 |
+
return StepResult(obs=obs, reward=-0.2, done=True,
|
| 73 |
+
info={"result": "out_of_attempts", "case_id": self.current_case["case_id"]})
|
| 74 |
+
|
| 75 |
+
obs = self._make_observation()
|
| 76 |
+
return StepResult(obs=obs, reward=-0.1, done=False,
|
| 77 |
+
info={"result": "still_failed", "case_id": self.current_case["case_id"]})
|
| 78 |
+
|
| 79 |
+
# --------- helpers ----------
|
| 80 |
+
def _apply_patch(self, step_name: str, field: str, value: str) -> bool:
|
| 81 |
+
for step in self.current_case["steps"]:
|
| 82 |
+
if step["name"] == step_name:
|
| 83 |
+
if field == "inputs.expression":
|
| 84 |
+
step.setdefault("inputs", {})
|
| 85 |
+
step["inputs"]["expression"] = value
|
| 86 |
+
return True
|
| 87 |
+
return False
|
| 88 |
+
|
| 89 |
+
def _mark_success(self):
|
| 90 |
+
for step in self.current_case["steps"]:
|
| 91 |
+
step["status"] = "Succeeded"
|
| 92 |
+
|
| 93 |
+
def _make_observation(self, run_status="Failed", error="keep", failed_step="keep"):
|
| 94 |
+
if error == "keep":
|
| 95 |
+
err_obj = self.current_case["error"]
|
| 96 |
+
else:
|
| 97 |
+
err_obj = error
|
| 98 |
+
|
| 99 |
+
if failed_step == "keep":
|
| 100 |
+
failed = self.current_case["failed_step"]
|
| 101 |
+
else:
|
| 102 |
+
failed = failed_step
|
| 103 |
+
|
| 104 |
+
return {
|
| 105 |
+
"case_id": self.current_case["case_id"],
|
| 106 |
+
"run_status": run_status,
|
| 107 |
+
"failed_step": failed,
|
| 108 |
+
"error": err_obj,
|
| 109 |
+
"steps": self.current_case["steps"],
|
| 110 |
+
"attempts_left": self.attempts_left
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
def _invalid_action(self, msg: str) -> StepResult:
|
| 114 |
+
obs = self._make_observation()
|
| 115 |
+
done = (self.attempts_left <= 0)
|
| 116 |
+
return StepResult(obs=obs, reward=-0.1, done=done,
|
| 117 |
+
info={"result": "invalid_action", "message": msg, "case_id": self.current_case["case_id"]})
|