Spaces:
Sleeping
Sleeping
docs: add reward weight rationale table to README
Browse files
README.md
CHANGED
|
@@ -57,6 +57,24 @@ GridMind-RL closes this gap by simulating a complete building energy system wher
|
|
| 57 |
| **Episode** | 96 steps = 24 simulated hours @ 15-min resolution |
|
| 58 |
| **Tasks** | 4 tasks: (1) cost, (2) temperature, (3) demand_response, (4) instruction_following |
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
### Observation Fields
|
| 61 |
|
| 62 |
| Field | Type | Description |
|
|
|
|
| 57 |
| **Episode** | 96 steps = 24 simulated hours @ 15-min resolution |
|
| 58 |
| **Tasks** | 4 tasks: (1) cost, (2) temperature, (3) demand_response, (4) instruction_following |
|
| 59 |
|
| 60 |
+
### Reward Weight Rationale
|
| 61 |
+
|
| 62 |
+
Weights reflect real-world building operator priorities — not arbitrary values:
|
| 63 |
+
|
| 64 |
+
| Component | Weight | Rationale |
|
| 65 |
+
|---|---|---|
|
| 66 |
+
| `cost_savings` | 0.28 | Primary operator KPI — energy spend is the main business metric |
|
| 67 |
+
| `carbon_reward` | 0.20 | ESG compliance — increasingly mandatory for industrial operators |
|
| 68 |
+
| `temp_constraint` | 0.20 | Hard safety constraint — comfort SLA violations incur penalties |
|
| 69 |
+
| `grid_response` | 0.20 | Regulatory SLA — demand response programs pay operators to shed load |
|
| 70 |
+
| `batch_deadline` | 0.12 | Production continuity — missing batch deadlines causes downstream losses |
|
| 71 |
+
| `efficiency_bonus` | 0.05 | Storage arbitrage — incentivises smart charge/discharge timing |
|
| 72 |
+
| `stability_penalty` | -0.05 | Anti-cycling — prevents HVAC thrashing that causes equipment wear |
|
| 73 |
+
| `fault_mitigation` | 0.05 | Emergency response — correct fault handling prevents costly outages |
|
| 74 |
+
| `instruction_reward` | 0.50* | Task 4 only — weighted per the episode's instruction card |
|
| 75 |
+
|
| 76 |
+
> *Task 4 instruction reward weight comes from the sampled instruction card, not a fixed value.
|
| 77 |
+
|
| 78 |
### Observation Fields
|
| 79 |
|
| 80 |
| Field | Type | Description |
|