Akarztrk commited on
Commit
6aaa5e3
·
verified ·
1 Parent(s): 9c6bef8

Initial release: OpenEnv Flow Debugger

Browse files
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OpenEnv Flow Debugger (Just a Simple Version for Now!)
2
+
3
+ This project is a small, easy-to-use debugging tool built with OpenEnv. It's inspired by those tricky real-world problems we hit in tools like Power Automate.
4
+
5
+ Our environment focuses on a super common issue: those annoying '400 BadRequest' errors that pop up when a condition in your automation flow has a syntax mistake.
6
+
7
+ The main idea here isn't to build a perfect smart agent right away. Instead, we want to create a clear, realistic, and expandable way to test and improve how agents fix bugs.
8
+
9
+ ---
10
+
11
+ ## What You Need to Do
12
+
13
+ Imagine you have a Power Automate Flow that just failed.
14
+
15
+ It failed because of an "HTTP 400 BadRequest" error.
16
+ This error happened in a "Condition" step.
17
+ And the condition expression has a tiny syntax error.
18
+
19
+ Your job as the agent is to fix that broken condition expression so the flow can run perfectly.
20
+
21
+ Each time you play (each "episode"), it's like facing a real-life debugging puzzle that automation engineers deal with all the time.
22
+
23
+ ---
24
+
25
+ ## What You See (Observation Space)
26
+
27
+ At each step, you'll get some info in a JSON-like format. It includes:
28
+
29
+ - `case_id`: A unique ID for this specific problem.
30
+ - `run_status`: Tells you if the flow is still 'Failed' or 'Succeeded'.
31
+ - `failed_step`: Which step caused the problem.
32
+ - `error`: Details about the error, like the code and a message.
33
+ - `steps`: A list of all the steps in the flow, showing their inputs and outputs.
34
+ - `attempts_left`: How many more tries you have to fix it.
35
+
36
+ **Example observation (kept simple):**
37
+
38
+ ```
39
+ case_id: CASE_001
40
+ run_status: Failed
41
+ failed_step: Condition_Check
42
+ error: code=400, message=BadRequest, details=InvalidTemplate: The expression is invalid
43
+ steps:
44
+ - Compose_Ext (Succeeded, outputs: xlsx)
45
+ - Condition_Check (Failed, expression: @equals(outputs('Compose_Ext'),'xlsx')
46
+ attempts_left: 3
47
+ ```
48
+
49
+ ---
50
+
51
+ ## What You Can Do (Action Space - Just Starting!)
52
+
53
+ Right now, in this simple version, you can only do one type of action.
54
+
55
+ You can submit a `patch_step` action. This action targets the `Condition_Check` step and updates its `inputs.expression` field.
56
+
57
+ **Example action:**
58
+
59
+ ```
60
+ action = patch_step
61
+ step = Condition_Check
62
+ field = inputs.expression
63
+ value = @equals(outputs('Compose_Ext'),'xlsx')
64
+ ```
65
+
66
+ For now, your fix needs to be an *exact* match to what's expected for it to count as correct.
67
+
68
+ ---
69
+
70
+ ## How You Get Graded (Reward Function)
71
+
72
+ Our scoring system is pretty straightforward:
73
+
74
+ - **+1.0** if you successfully fix the flow.
75
+ - **-0.1** for trying an incorrect fix (but you still have tries left).
76
+ - **-0.2** if you run out of tries without fixing it.
77
+
78
+ The game (episode) ends when the flow is fixed, or when you run out of chances.
79
+
80
+ ---
81
+
82
+ ## The Problems (Dataset)
83
+
84
+ The specific bugs we're trying to fix are stored in JSON files here:
85
+
86
+ `flow_debugger_env/data/cases.json`
87
+
88
+ Each problem includes the messed-up flow state, error details, and a hidden 'gold_fix' (the right answer) that the environment uses to check your work. You, the agent, never see this 'gold_fix'.
89
+
90
+ ---
91
+
92
+ ## How to Run the Example
93
+
94
+ Just run the `demo.py` file from the main project folder like this:
95
+
96
+ `python demo.py`
97
+
98
+ The demo will pick a random bug, use a basic rule-based agent to try and fix the condition expression, and then show you how it went.
99
+
100
+ ---
101
+
102
+ ## What This Can't Do Yet (Limitations)
103
+
104
+ This simple version is kept small on purpose:
105
+
106
+ - It only deals with syntax errors in Condition expressions.
107
+ - It doesn't actually run real Power Automate flows.
108
+ - It doesn't connect to any outside services or APIs.
109
+ - It's not doing fancy AI learning (like reinforcement learning) yet.
110
+
111
+ Keeping things simple means it's fast, predictable, and easy for us to build on later.
112
+
113
+ ---
114
+
115
+ ## What's Next?
116
+
117
+ We could add more cool stuff later, like:
118
+
119
+ - Figuring out errors in 'filter array' settings.
120
+ - Dealing with 'null' values or wrong data types.
121
+ - Fixing multiple steps at once.
122
+ - Using smarter, AI-powered agents.
123
+ - Training AI using special tools like TRL or Unsloth.
124
+ - Adding 'Green Agent' wrappers.
125
+
126
+ ---
127
+
128
+ ## Why We Made This
129
+
130
+ Debugging Power Automate is a real headache for many, and it's a big deal. This environment turns those everyday automation failures into a structured task for agents and a useful testbed for learning and experimenting with OpenEnv.
demo.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ from flow_debugger_env.env import FlowDebugEnv
3
+
4
+ def rule_based_agent(obs):
5
+ condition_step = next(s for s in obs["steps"] if s["name"] == "Condition_Check")
6
+ expr = condition_step["inputs"]["expression"]
7
+
8
+ fixed = expr
9
+ fixed = fixed.replace("@equal(", "@equals(")
10
+ fixed = re.sub(r",\s*xlsx\s*\)", r",'xlsx')", fixed)
11
+ fixed = re.sub(r"\)\s*'xlsx'\s*\)", r"),'xlsx')", fixed)
12
+
13
+ if fixed.count("(") > fixed.count(")"):
14
+ fixed = fixed + (")" * (fixed.count("(") - fixed.count(")")))
15
+
16
+ while fixed.endswith("))") and fixed.count(")") > fixed.count("("):
17
+ fixed = fixed[:-1]
18
+
19
+ return {
20
+ "action": "patch_step",
21
+ "step": "Condition_Check",
22
+ "field": "inputs.expression",
23
+ "value": fixed
24
+ }
25
+
26
+ def main():
27
+ env = FlowDebugEnv.from_json("flow_debugger_env/data/cases.json", max_attempts=3, seed=42)
28
+ obs = env.reset()
29
+ done = False
30
+ total = 0.0
31
+
32
+ while not done:
33
+ action = rule_based_agent(obs)
34
+ result = env.step(action)
35
+ obs, reward, done, info = result.obs, result.reward, result.done, result.info
36
+ total += reward
37
+
38
+ print("Finished:", info, "total_reward:", total)
39
+
40
+ if __name__ == "__main__":
41
+ main()
flow_debugger_env/__init__.py ADDED
File without changes
flow_debugger_env/data/cases.json ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "case_id": "CASE_001",
4
+ "failed_step": "Condition_Check",
5
+ "error": {
6
+ "code": 400,
7
+ "message": "BadRequest",
8
+ "details": "InvalidTemplate: The expression is invalid (missing closing parenthesis)."
9
+ },
10
+ "steps": [
11
+ {
12
+ "name": "Compose_Ext",
13
+ "type": "compose",
14
+ "status": "Succeeded",
15
+ "outputs": "xlsx"
16
+ },
17
+ {
18
+ "name": "Condition_Check",
19
+ "type": "condition",
20
+ "status": "Failed",
21
+ "inputs": {
22
+ "expression": "@equals(outputs('Compose_Ext'),'xlsx'"
23
+ }
24
+ }
25
+ ],
26
+ "gold_fix": {
27
+ "step": "Condition_Check",
28
+ "field": "inputs.expression",
29
+ "value": "@equals(outputs('Compose_Ext'),'xlsx')"
30
+ }
31
+ },
32
+ {
33
+ "case_id": "CASE_002",
34
+ "failed_step": "Condition_Check",
35
+ "error": {
36
+ "code": 400,
37
+ "message": "BadRequest",
38
+ "details": "InvalidTemplate: The expression is invalid (extra closing parenthesis)."
39
+ },
40
+ "steps": [
41
+ {
42
+ "name": "Compose_Ext",
43
+ "type": "compose",
44
+ "status": "Succeeded",
45
+ "outputs": "xlsx"
46
+ },
47
+ {
48
+ "name": "Condition_Check",
49
+ "type": "condition",
50
+ "status": "Failed",
51
+ "inputs": {
52
+ "expression": "@equals(outputs('Compose_Ext'),'xlsx'))"
53
+ }
54
+ }
55
+ ],
56
+ "gold_fix": {
57
+ "step": "Condition_Check",
58
+ "field": "inputs.expression",
59
+ "value": "@equals(outputs('Compose_Ext'),'xlsx')"
60
+ }
61
+ },
62
+ {
63
+ "case_id": "CASE_003",
64
+ "failed_step": "Condition_Check",
65
+ "error": {
66
+ "code": 400,
67
+ "message": "BadRequest",
68
+ "details": "InvalidTemplate: The expression is invalid (missing quotes around string literal)."
69
+ },
70
+ "steps": [
71
+ {
72
+ "name": "Compose_Ext",
73
+ "type": "compose",
74
+ "status": "Succeeded",
75
+ "outputs": "xlsx"
76
+ },
77
+ {
78
+ "name": "Condition_Check",
79
+ "type": "condition",
80
+ "status": "Failed",
81
+ "inputs": {
82
+ "expression": "@equals(outputs('Compose_Ext'),xlsx)"
83
+ }
84
+ }
85
+ ],
86
+ "gold_fix": {
87
+ "step": "Condition_Check",
88
+ "field": "inputs.expression",
89
+ "value": "@equals(outputs('Compose_Ext'),'xlsx')"
90
+ }
91
+ },
92
+ {
93
+ "case_id": "CASE_004",
94
+ "failed_step": "Condition_Check",
95
+ "error": {
96
+ "code": 400,
97
+ "message": "BadRequest",
98
+ "details": "InvalidTemplate: Unknown function 'equal' (typo)."
99
+ },
100
+ "steps": [
101
+ {
102
+ "name": "Compose_Ext",
103
+ "type": "compose",
104
+ "status": "Succeeded",
105
+ "outputs": "xlsx"
106
+ },
107
+ {
108
+ "name": "Condition_Check",
109
+ "type": "condition",
110
+ "status": "Failed",
111
+ "inputs": {
112
+ "expression": "@equal(outputs('Compose_Ext'),'xlsx')"
113
+ }
114
+ }
115
+ ],
116
+ "gold_fix": {
117
+ "step": "Condition_Check",
118
+ "field": "inputs.expression",
119
+ "value": "@equals(outputs('Compose_Ext'),'xlsx')"
120
+ }
121
+ },
122
+ {
123
+ "case_id": "CASE_005",
124
+ "failed_step": "Condition_Check",
125
+ "error": {
126
+ "code": 400,
127
+ "message": "BadRequest",
128
+ "details": "InvalidTemplate: The expression is invalid (missing comma between args)."
129
+ },
130
+ "steps": [
131
+ {
132
+ "name": "Compose_Ext",
133
+ "type": "compose",
134
+ "status": "Succeeded",
135
+ "outputs": "xlsx"
136
+ },
137
+ {
138
+ "name": "Condition_Check",
139
+ "type": "condition",
140
+ "status": "Failed",
141
+ "inputs": {
142
+ "expression": "@equals(outputs('Compose_Ext') 'xlsx')"
143
+ }
144
+ }
145
+ ],
146
+ "gold_fix": {
147
+ "step": "Condition_Check",
148
+ "field": "inputs.expression",
149
+ "value": "@equals(outputs('Compose_Ext'),'xlsx')"
150
+ }
151
+ }
152
+ ]
flow_debugger_env/env.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ import json
3
+ import random
4
+ from dataclasses import dataclass
5
+ from pathlib import Path
6
+ from typing import Any, Dict, Optional, List
7
+
8
+
9
+ @dataclass
10
+ class StepResult:
11
+ obs: Dict[str, Any]
12
+ reward: float
13
+ done: bool
14
+ info: Dict[str, Any]
15
+
16
+
17
+ class FlowDebugEnv:
18
+ """
19
+ This is a simple environment made for OpenEnv.
20
+ Here's what it does:
21
+ - It gives you information in text or JSON.
22
+ - You can only do one thing: fix the 'inputs.expression' in a 'Condition_Check' step.
23
+ - If your fix is exactly right, you win!
24
+ """
25
+ def __init__(self, cases: List[Dict[str, Any]], max_attempts: int = 3, seed: Optional[int] = None):
26
+ self.cases = cases
27
+ self.max_attempts = max_attempts
28
+ self.rng = random.Random(seed)
29
+ self.current_case: Optional[Dict[str, Any]] = None
30
+ self.attempts_left = max_attempts
31
+
32
+ @classmethod
33
+ def from_json(cls, cases_json_path: str, max_attempts: int = 3, seed: Optional[int] = None):
34
+ path = Path(cases_json_path)
35
+ with open(path, "r", encoding="utf-8") as f:
36
+ cases = json.load(f)
37
+ return cls(cases=cases, max_attempts=max_attempts, seed=seed)
38
+
39
+ def reset(self) -> Dict[str, Any]:
40
+ self.current_case = copy.deepcopy(self.rng.choice(self.cases))
41
+ self.attempts_left = self.max_attempts
42
+ return self._make_observation()
43
+
44
+ def step(self, action: Dict[str, Any]) -> StepResult:
45
+ if self.current_case is None:
46
+ raise RuntimeError("Call reset() before step().")
47
+
48
+ self.attempts_left -= 1
49
+
50
+ if action.get("action") != "patch_step":
51
+ return self._invalid_action("Unsupported action type")
52
+
53
+ step_name = action.get("step")
54
+ field = action.get("field")
55
+ value = action.get("value")
56
+
57
+ patched_ok = self._apply_patch(step_name, field, value)
58
+ if not patched_ok:
59
+ return self._invalid_action("Patch failed (step/field not found)")
60
+
61
+ gold = self.current_case["gold_fix"]
62
+ solved = (step_name == gold["step"] and field == gold["field"] and value == gold["value"])
63
+
64
+ if solved:
65
+ self._mark_success()
66
+ obs = self._make_observation(run_status="Succeeded", error=None, failed_step=None)
67
+ return StepResult(obs=obs, reward=1.0, done=True,
68
+ info={"result": "success", "case_id": self.current_case["case_id"]})
69
+
70
+ if self.attempts_left <= 0:
71
+ obs = self._make_observation()
72
+ return StepResult(obs=obs, reward=-0.2, done=True,
73
+ info={"result": "out_of_attempts", "case_id": self.current_case["case_id"]})
74
+
75
+ obs = self._make_observation()
76
+ return StepResult(obs=obs, reward=-0.1, done=False,
77
+ info={"result": "still_failed", "case_id": self.current_case["case_id"]})
78
+
79
+ # --------- helpers ----------
80
+ def _apply_patch(self, step_name: str, field: str, value: str) -> bool:
81
+ for step in self.current_case["steps"]:
82
+ if step["name"] == step_name:
83
+ if field == "inputs.expression":
84
+ step.setdefault("inputs", {})
85
+ step["inputs"]["expression"] = value
86
+ return True
87
+ return False
88
+
89
+ def _mark_success(self):
90
+ for step in self.current_case["steps"]:
91
+ step["status"] = "Succeeded"
92
+
93
+ def _make_observation(self, run_status="Failed", error="keep", failed_step="keep"):
94
+ if error == "keep":
95
+ err_obj = self.current_case["error"]
96
+ else:
97
+ err_obj = error
98
+
99
+ if failed_step == "keep":
100
+ failed = self.current_case["failed_step"]
101
+ else:
102
+ failed = failed_step
103
+
104
+ return {
105
+ "case_id": self.current_case["case_id"],
106
+ "run_status": run_status,
107
+ "failed_step": failed,
108
+ "error": err_obj,
109
+ "steps": self.current_case["steps"],
110
+ "attempts_left": self.attempts_left
111
+ }
112
+
113
+ def _invalid_action(self, msg: str) -> StepResult:
114
+ obs = self._make_observation()
115
+ done = (self.attempts_left <= 0)
116
+ return StepResult(obs=obs, reward=-0.1, done=done,
117
+ info={"result": "invalid_action", "message": msg, "case_id": self.current_case["case_id"]})