adityss commited on
Commit
74dc7b5
·
1 Parent(s): ebe8fa5

docs: add reward weight rationale table to README

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -57,6 +57,24 @@ GridMind-RL closes this gap by simulating a complete building energy system wher
57
  | **Episode** | 96 steps = 24 simulated hours @ 15-min resolution |
58
  | **Tasks** | 4 tasks: (1) cost, (2) temperature, (3) demand_response, (4) instruction_following |
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ### Observation Fields
61
 
62
  | Field | Type | Description |
 
57
  | **Episode** | 96 steps = 24 simulated hours @ 15-min resolution |
58
  | **Tasks** | 4 tasks: (1) cost, (2) temperature, (3) demand_response, (4) instruction_following |
59
 
60
+ ### Reward Weight Rationale
61
+
62
+ Weights reflect real-world building operator priorities — not arbitrary values:
63
+
64
+ | Component | Weight | Rationale |
65
+ |---|---|---|
66
+ | `cost_savings` | 0.28 | Primary operator KPI — energy spend is the main business metric |
67
+ | `carbon_reward` | 0.20 | ESG compliance — increasingly mandatory for industrial operators |
68
+ | `temp_constraint` | 0.20 | Hard safety constraint — comfort SLA violations incur penalties |
69
+ | `grid_response` | 0.20 | Regulatory SLA — demand response programs pay operators to shed load |
70
+ | `batch_deadline` | 0.12 | Production continuity — missing batch deadlines causes downstream losses |
71
+ | `efficiency_bonus` | 0.05 | Storage arbitrage — incentivises smart charge/discharge timing |
72
+ | `stability_penalty` | -0.05 | Anti-cycling — prevents HVAC thrashing that causes equipment wear |
73
+ | `fault_mitigation` | 0.05 | Emergency response — correct fault handling prevents costly outages |
74
+ | `instruction_reward` | 0.50* | Task 4 only — weighted per the episode's instruction card |
75
+
76
+ > *Task 4 instruction reward weight comes from the sampled instruction card, not a fixed value.
77
+
78
  ### Observation Fields
79
 
80
  | Field | Type | Description |