| name: distributed-systems-debug-env | |
| version: "1.0.0" | |
| description: > | |
| An RL environment for debugging a distributed job processing pipeline with | |
| five deterministic fault scenarios: cascading timeout, byzantine queue fault, | |
| distributed lock starvation, backpressure cascade, and route partition. | |
| author: Team Rocket | |
| tags: | |
| - openenv | |
| - distributed-systems | |
| - debugging | |
| - reinforcement-learning | |
| reward_range: [0.0, 1.0] | |
| tasks: | |
| - name: cascading-timeout | |
| description: Synchronous upstream delay exceeds gateway timeout. | |
| difficulty: easy | |
| max_steps: 15 | |
| - name: byzantine-queue-fault | |
| description: Poison pill message causes worker parse crash-loop. | |
| difficulty: medium | |
| max_steps: 18 | |
| - name: distributed-lock-starvation | |
| description: Stale distributed lock prevents queue consumption. | |
| difficulty: hard | |
| max_steps: 20 | |
| - name: backpressure-cascade | |
| description: Consumer throughput lower than producer enqueue rate. | |
| difficulty: hard | |
| max_steps: 20 | |
| - name: route-partition | |
| description: Route policy blocks gateway to redis communication. | |
| difficulty: hard | |
| max_steps: 20 | |
| - name: registry-corruption | |
| description: Gateway reloads a corrupted auth registry entry and fails requests. | |
| difficulty: medium | |
| max_steps: 18 | |
| - name: job-generator-runaway | |
| description: Runaway enqueue rate overwhelms worker throughput and grows backlog. | |
| difficulty: hard | |
| max_steps: 20 | |
| observation_space: | |
| type: object | |
| properties: | |
| command_output: | |
| type: string | |
| metrics: | |
| type: object | |
| properties: | |
| gateway_success_rate: | |
| type: number | |
| minimum: 0.0 | |
| maximum: 1.0 | |
| gateway_p99_latency_ms: | |
| type: number | |
| minimum: 0.0 | |
| queue_depth: | |
| type: integer | |
| minimum: 0 | |
| worker_restart_count: | |
| type: integer | |
| minimum: 0 | |
| consumer_stall_count: | |
| type: integer | |
| minimum: 0 | |
| process_status: | |
| type: object | |
| additionalProperties: | |
| type: string | |
| action_space: | |
| type: object | |
| properties: | |
| command: | |
| type: string | |
| description: Single bash command executed in the debug sandbox. | |