ShreeshantXD commited on
Commit
a4be35d
·
1 Parent(s): 32565e1

updated readme

Browse files
Files changed (1) hide show
  1. README.md +96 -38
README.md CHANGED
@@ -21,8 +21,6 @@ license: mit
21
 
22
  ---
23
 
24
- ---
25
-
26
  ## 🚀 Live Demo
27
 
28
  | | URL |
@@ -58,13 +56,13 @@ GridMind-RL consists of three tightly integrated components:
58
 
59
  ```
60
  Agent (python/inference.py)
61
- ?? HTTP POST /step, /reset, /grade
62
- ?
63
- Go Environment Server (main.go) Port 7860
64
- ?
65
  Physics Engine (env/environment.go) + Rewards (env/rewards.go) + Tasks (env/tasks.go)
66
- ?
67
- Web Dashboard (dashboard/server.py) Port 7861
68
  ```
69
 
70
  **Design philosophy:**
@@ -81,7 +79,7 @@ Web Dashboard (dashboard/server.py) � Port 7861
81
 
82
  | Field | Type | Range | Description |
83
  |-------|------|-------|-------------|
84
- | `indoor_temperature` | float | [15-27] C | Building indoor temperature |
85
  | `thermal_storage_level` | float | [0-1] | Thermal storage charge (0=empty, 1=full) |
86
  | `process_demand` | float | [5-50] kW | Baseline demand |
87
  | `current_price` | float | [0.03-0.25] $/kWh | Electricity price |
@@ -98,35 +96,86 @@ Web Dashboard (dashboard/server.py) � Port 7861
98
  | Field | Type | Range | Description |
99
  |-------|------|-------|-------------|
100
  | `hvac_power_level` | float | [0-1] | HVAC power (0=off, 1=max) |
101
- | `thermal_charge_rate` | float | [-1-1] | Storage charge/discharge rate |
102
- | `batch_job_slot` | int | [0-4] | Batch job scheduling slot |
103
- | `load_shed_fraction` | float | [0-0.5] | Load shedding fraction |
104
  | `building_id` | int | {0} | Building identifier |
105
 
106
- ### Reward Function (7 Components)
 
 
107
 
108
  | Component | Description |
109
  |-----------|-------------|
110
  | **Cost Savings** | Negative cost per energy consumed |
111
- | **Temperature Constraint** | Penalty if T outside [19-23]C |
112
  | **Grid Response** | Bonus for load shedding during stress |
113
  | **Deadline Penalty** | Penalty for missed batch deadlines |
114
  | **Efficiency Bonus** | Bonus for off-peak charging |
115
  | **Stability Penalty** | Penalty for rapid control changes |
116
  | **Carbon Reward** | Bonus for low-carbon periods |
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ---
119
 
120
  ## Tasks
121
 
122
  | Task | Difficulty | Objective | Baseline Score |
123
- |------|-----------|-----------|-----------------|
124
  | Task 1 | Easy | Minimize cost only | **0.708** |
125
  | Task 2 | Medium | Minimize cost + maintain comfort | **0.633** |
126
  | Task 3 | Hard | Full demand response + scheduling | **0.598** |
127
 
128
  **Task 1 (Easy)**: Cost minimization, no constraints
129
- **Task 2 (Medium)**: Cost + temperature comfort (19-23C)
130
  **Task 3 (Hard)**: Cost + comfort + grid response + batch scheduling + carbon
131
 
132
  ---
@@ -149,26 +198,35 @@ go run main.go
149
 
150
  **Terminal 2: Run agent**
151
  ```bash
152
- export HF_TOKEN="your_api_key"
153
- export API_BASE_URL="https://openrouter.ai/api/v1"
154
- export MODEL_NAME="meta-llama/llama-3.3-70b-instruct:free"
155
 
156
- # Heuristic policy (no LLM)
157
  python inference.py --fast-mode --episodes 1
158
 
159
- # LLM agent
160
  python inference.py --episodes 1
 
 
 
161
  ```
162
 
163
  ### Environment Variables
164
 
165
  | Variable | Required | Default | Description |
166
  |----------|----------|---------|-------------|
167
- | `HF_TOKEN` | Yes | | LLM API key |
168
- | `API_BASE_URL` | No | `https://openrouter.ai/api/v1` | LLM endpoint |
169
- | `MODEL_NAME` | No | `meta-llama/llama-3.3-70b-instruct:free` | Model ID |
170
  | `ENV_URL` | No | `http://localhost:7860` | Environment server URL |
171
- | `OPENAI_API_KEY` | No | � | Alternative to HF_TOKEN |
 
 
 
 
 
 
172
 
173
  ---
174
 
@@ -213,23 +271,23 @@ gridmind-rl/
213
  +-- openenv.yaml # OpenEnv spec
214
  +-- Dockerfile # Container build
215
  +-- env/
216
- +-- environment.go # Physics simulation
217
- +-- models.go # Data models
218
- +-- rewards.go # Reward computation
219
- +-- tasks.go # Task grading
220
  +-- python/
221
- +-- inference.py # LLM agent
222
- +-- models.py # Pydantic models
223
- +-- requirements.txt
224
  +-- dashboard/
225
- +-- server.py # Web server (port 7861)
226
- +-- static/ # Frontend assets
227
  +-- data/
228
- +-- price_curves.json # Price data
229
- +-- generate_prices.py # Price generator
230
  +-- tests/
231
- +-- test_graders.py # Python tests
232
- +-- environment_test.go # Go tests
233
  +-- baseline_scores.json # Reference scores
234
  +-- .env.example # Environment template
235
  +-- LICENSE # MIT License
 
21
 
22
  ---
23
 
 
 
24
  ## 🚀 Live Demo
25
 
26
  | | URL |
 
56
 
57
  ```
58
  Agent (python/inference.py)
59
+ HTTP POST /step, /reset, /grade
60
+
61
+ Go Environment Server (main.go) Port 7860
62
+
63
  Physics Engine (env/environment.go) + Rewards (env/rewards.go) + Tasks (env/tasks.go)
64
+
65
+ Web Dashboard (dashboard/server.py) Port 7861
66
  ```
67
 
68
  **Design philosophy:**
 
79
 
80
  | Field | Type | Range | Description |
81
  |-------|------|-------|-------------|
82
+ | `indoor_temperature` | float | [15-27] °C | Building indoor temperature |
83
  | `thermal_storage_level` | float | [0-1] | Thermal storage charge (0=empty, 1=full) |
84
  | `process_demand` | float | [5-50] kW | Baseline demand |
85
  | `current_price` | float | [0.03-0.25] $/kWh | Electricity price |
 
96
  | Field | Type | Range | Description |
97
  |-------|------|-------|-------------|
98
  | `hvac_power_level` | float | [0-1] | HVAC power (0=off, 1=max) |
99
+ | `thermal_charge_rate` | float | [-1 to 1] | Storage charge/discharge rate |
100
+ | `batch_job_slot` | int | [0 to 4] | Batch job scheduling slot |
101
+ | `load_shed_fraction` | float | [0 to 0.5] | Load shedding fraction |
102
  | `building_id` | int | {0} | Building identifier |
103
 
104
+ ### Reward System
105
+
106
+ #### Raw Reward Components (7 Components)
107
 
108
  | Component | Description |
109
  |-----------|-------------|
110
  | **Cost Savings** | Negative cost per energy consumed |
111
+ | **Temperature Constraint** | Penalty if T outside [19-23]°C |
112
  | **Grid Response** | Bonus for load shedding during stress |
113
  | **Deadline Penalty** | Penalty for missed batch deadlines |
114
  | **Efficiency Bonus** | Bonus for off-peak charging |
115
  | **Stability Penalty** | Penalty for rapid control changes |
116
  | **Carbon Reward** | Bonus for low-carbon periods |
117
 
118
+ #### Reward Normalization
119
+
120
+ The inference script normalizes rewards to a standardized range for consistent scoring:
121
+
122
+ | Metric | Range | Description |
123
+ |--------|-------|-------------|
124
+ | **Per-step reward** | [0.10, 0.90] | Worst action → 0.10, Best action → 0.90 |
125
+ | **Episode score** | (0.01, 0.99) | Clamped to avoid exact 0.0 or 1.0 |
126
+
127
+ **Normalization formula:**
128
+ ```
129
+ normalized_reward = ((raw_reward - raw_min) / (raw_max - raw_min)) * 0.80 + 0.10
130
+ episode_score = clamp(mean(normalized_rewards), 0.01, 0.99)
131
+ ```
132
+
133
+ This ensures:
134
+ - Scores are strictly between 0 and 1 (never exactly 0.0 or 1.0)
135
+ - Relative performance matters more than absolute values
136
+ - Fair comparison across different episodes and tasks
137
+
138
+ ---
139
+
140
+ ## Output Format
141
+
142
+ The inference script emits machine-parsed stdout for judge evaluation:
143
+
144
+ ```
145
+ [START] task=<task_name> env=<benchmark> model=<model_name>
146
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
147
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
148
+ ```
149
+
150
+ **Rules:**
151
+ - One `[START]` line at episode begin
152
+ - One `[STEP]` line per step, immediately after `env.step()` returns
153
+ - One `[END]` line after `env.close()`, always emitted (even on exception)
154
+ - `reward` and `rewards` are formatted to 2 decimal places
155
+ - `done` and `success` are lowercase booleans: `true` or `false`
156
+ - `error` is the raw `last_action_error` string, or `null` if none
157
+
158
+ **Example:**
159
+ ```
160
+ [START] task=gridmind-task-1 env=gridmind model=Qwen2.5-7B-Instruct
161
+ [STEP] step=1 action={"hvac_power_level":0.7,"thermal_charge_rate":0.5,...} reward=0.50 done=false error=null
162
+ [STEP] step=2 action={"hvac_power_level":0.5,"thermal_charge_rate":-0.3,...} reward=0.83 done=false error=null
163
+ [STEP] step=96 action={"hvac_power_level":0.3,"thermal_charge_rate":0.0,...} reward=0.90 done=true error=null
164
+ [END] success=true steps=96 score=0.683 rewards=0.50,0.55,0.83,...,0.90
165
+ ```
166
+
167
  ---
168
 
169
  ## Tasks
170
 
171
  | Task | Difficulty | Objective | Baseline Score |
172
+ |------|-----------|-----------|----------------|
173
  | Task 1 | Easy | Minimize cost only | **0.708** |
174
  | Task 2 | Medium | Minimize cost + maintain comfort | **0.633** |
175
  | Task 3 | Hard | Full demand response + scheduling | **0.598** |
176
 
177
  **Task 1 (Easy)**: Cost minimization, no constraints
178
+ **Task 2 (Medium)**: Cost + temperature comfort (19-23°C)
179
  **Task 3 (Hard)**: Cost + comfort + grid response + batch scheduling + carbon
180
 
181
  ---
 
198
 
199
  **Terminal 2: Run agent**
200
  ```bash
201
+ # Copy and configure .env file
202
+ cp .env.example .env
203
+ # Edit .env with your API keys
204
 
205
+ # Heuristic policy (no LLM, fastest)
206
  python inference.py --fast-mode --episodes 1
207
 
208
+ # LLM agent (default: reuses action for 8 steps)
209
  python inference.py --episodes 1
210
+
211
+ # LLM agent (custom reuse interval)
212
+ python inference.py --llm-every 4 --episodes 1
213
  ```
214
 
215
  ### Environment Variables
216
 
217
  | Variable | Required | Default | Description |
218
  |----------|----------|---------|-------------|
219
+ | `HF_TOKEN` | **Yes** | | Hugging Face / LLM API token |
220
+ | `API_BASE_URL` | No | `https://api-inference.huggingface.co/v1` | LLM endpoint |
221
+ | `MODEL_NAME` | No | `Qwen/Qwen2.5-7B-Instruct` | Model identifier |
222
  | `ENV_URL` | No | `http://localhost:7860` | Environment server URL |
223
+
224
+ **Example `.env` file:**
225
+ ```bash
226
+ HF_TOKEN=hf_your_token_here
227
+ API_BASE_URL=https://api-inference.huggingface.co/v1
228
+ MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
229
+ ```
230
 
231
  ---
232
 
 
271
  +-- openenv.yaml # OpenEnv spec
272
  +-- Dockerfile # Container build
273
  +-- env/
274
+ +-- environment.go # Physics simulation
275
+ +-- models.go # Data models
276
+ +-- rewards.go # Reward computation
277
+ +-- tasks.go # Task grading
278
  +-- python/
279
+ +-- inference.py # LLM agent
280
+ +-- models.py # Pydantic models
281
+ +-- requirements.txt
282
  +-- dashboard/
283
+ +-- server.py # Web server (port 7861)
284
+ +-- static/ # Frontend assets
285
  +-- data/
286
+ +-- price_curves.json # Price data
287
+ +-- generate_prices.py # Price generator
288
  +-- tests/
289
+ +-- test_graders.py # Python tests
290
+ +-- environment_test.go # Go tests
291
  +-- baseline_scores.json # Reference scores
292
  +-- .env.example # Environment template
293
  +-- LICENSE # MIT License