File size: 2,732 Bytes
a49c996
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# Cloud Queue Env - High Severity Analysis (Updated)

Date: 2026-04-12

This note captures the two highest-impact issues still present in the environment logic.

## 1) Arrival Modeling and Arrival Metrics Mismatch

Files and lines:
- cloud_queue_env/server/cloud_queue_env_environment.py:240

- cloud_queue_env/server/cloud_queue_env_environment.py:241
- cloud_queue_env/server/cloud_queue_env_environment.py:248

- cloud_queue_env/server/cloud_queue_env_environment.py:259

What happens now:
- The simulator samples Poisson arrivals each step.
- If sampled arrivals are greater than 1, the code still creates only one incoming job object.
- The arrivals metric is incremented by 1.0, not by sampled arrival count.

Why this is high severity:
- Burst behavior is compressed into a single-event stream, so load spikes are underrepresented.
- Several business metrics and grader components become biased (rejections, abandonment, SLA pressure).
- Policy ranking can drift because the environment under-penalizes burst scenarios.

Impact on benchmark credibility:
- High. This directly affects realism, fairness of grading, and reproducibility quality claims.

Recommended fix direction:
- Track all sampled arrivals each step.
- Either queue all arrivals or maintain an explicit backlog of pending incoming jobs.
- Increment arrivals metric using true sampled count.

## 2) Agent Dispatch Control Is Partially Bypassed by Autodispatch

Files and lines:
- cloud_queue_env/server/cloud_queue_env_environment.py:353

- cloud_queue_env/server/cloud_queue_env_environment.py:391
- cloud_queue_env/server/cloud_queue_env_environment.py:738



What happens now:

- The agent may choose an action that is not dispatch.

- After action application, the environment still runs autodispatch and moves work to idle servers.



Why this is high severity:

- It weakens action-to-outcome causality for dispatch decisions.

- A policy can look better than it should because server assignment still happens automatically.

- It reduces benchmark difficulty in exactly the control surface the task is evaluating.



Impact on benchmark credibility:

- High. This can alter policy comparisons and invalidate assumptions about explicit control.



Recommended fix direction:

- Make dispatch behavior explicit by mode:

  - strict-control mode: only agent dispatches.

  - assisted mode: autodispatch on, but document this clearly and score accordingly.

- Keep one consistent mode for official benchmark scoring.



## Priority Summary



1. Fix arrival accounting and multi-arrival handling first.

2. Fix dispatch authority semantics second.



Both should be addressed before claiming benchmark-grade reliability.