Spaces:
Running
Running
feat: add GridMind-RL inference script and update documentation
Browse files- README.md +250 -45
- python/inference.py +7 -2
README.md
CHANGED
|
@@ -4,13 +4,84 @@ GridMind-RL is an OpenEnv-compliant reinforcement learning environment simulatin
|
|
| 4 |
|
| 5 |
An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
```text
|
| 10 |
┌──────────────────────┐ ┌─────────────────────────────┐
|
| 11 |
│ │ │ │
|
| 12 |
│ LLM RL Agent │◄───────┤ GridMind-RL Server │
|
| 13 |
-
│ (
|
| 14 |
│ ├───────►│ Port 7860 │
|
| 15 |
└──────────────────────┘ Action │ │
|
| 16 |
└──────────────┬──────────────┘
|
|
@@ -24,7 +95,162 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
|
|
| 24 |
└─────────────────────────────┘
|
| 25 |
```
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
| Name | Type | Range | Description |
|
| 30 |
|------|------|-------|-------------|
|
|
@@ -40,7 +266,11 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
|
|
| 40 |
| `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
|
| 41 |
| `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
| Name | Type | Range | Description |
|
| 46 |
|------|------|-------|-------------|
|
|
@@ -50,18 +280,9 @@ An RL agent acts as the energy controller, shaping electrical load profiles by a
|
|
| 50 |
| `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
|
| 51 |
| `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
GridMind-RL features 3 progressively difficult tasks:
|
| 56 |
|
| 57 |
-
|
| 58 |
-
Minimize total energy costs by moving load to off-peak periods using thermal storage. No temperature constraints.
|
| 59 |
-
2. **Task 2: Temperature Management (Medium)**
|
| 60 |
-
Minimize costs while keeping indoor temperatures strictly within 19°C – 23°C.
|
| 61 |
-
3. **Task 3: Full Demand Response (Hard)**
|
| 62 |
-
Minimize cost, maintain temperature, successfully schedule batch jobs before deadlines, and shed loads when the grid stress signal exceeds 0.7.
|
| 63 |
-
|
| 64 |
-
## Reward Function
|
| 65 |
|
| 66 |
The dense reward includes several components:
|
| 67 |
* **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
|
|
@@ -73,38 +294,22 @@ The dense reward includes several components:
|
|
| 73 |
|
| 74 |
*Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
|
| 75 |
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
### Local Docker Build
|
| 79 |
|
| 80 |
-
|
| 81 |
-
docker build -t gridmind-rl .
|
| 82 |
-
docker run -p 7860:7860 -p 7861:7861 gridmind-rl
|
| 83 |
-
```
|
| 84 |
|
| 85 |
-
*
|
| 86 |
-
*
|
| 87 |
-
|
| 88 |
-
### Validating the Environment
|
| 89 |
-
|
| 90 |
-
```bash
|
| 91 |
-
python python/validate.py --env-url http://localhost:7860
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
### Running Baseline Inference
|
| 95 |
-
|
| 96 |
-
```bash
|
| 97 |
-
export API_BASE_URL=https://api-inference.huggingface.co/v1
|
| 98 |
-
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 99 |
-
export HF_TOKEN=your_token
|
| 100 |
|
| 101 |
-
|
| 102 |
-
pip install -r python/requirements.txt
|
| 103 |
|
| 104 |
-
#
|
| 105 |
-
python python/inference.py --episodes 3
|
| 106 |
-
```
|
| 107 |
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
|
| 6 |
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 🙋 Beginner? Start Here
|
| 10 |
+
|
| 11 |
+
If you're new to this project, you probably have these questions:
|
| 12 |
+
|
| 13 |
+
### ❓ Why do I need an API?
|
| 14 |
+
|
| 15 |
+
In this project, the "brain" that makes energy decisions is an **AI language model (LLM)** — like Llama.
|
| 16 |
+
|
| 17 |
+
Instead of running the full AI model on your own computer (which requires a powerful GPU), you connect to an **API** (Application Programming Interface) — a remote server that already has the model running. You send it the current building state (temperature, price, etc.) and it sends back what action to take (e.g. "charge thermal storage").
|
| 18 |
+
|
| 19 |
+
Think of it like this:
|
| 20 |
+
```
|
| 21 |
+
Your Computer ──(asks question)──► API Server (has the AI) ──(sends answer)──► Your Computer
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
Without an API key, your script has no way to reach the AI model and the inference won't work.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
### ❓ How do I get an API key?
|
| 29 |
+
|
| 30 |
+
This project uses **Hugging Face** — a free platform that hosts AI models.
|
| 31 |
+
|
| 32 |
+
#### Step-by-step:
|
| 33 |
+
|
| 34 |
+
1. **Create a free account** at [https://huggingface.co/join](https://huggingface.co/join)
|
| 35 |
+
|
| 36 |
+
2. **Go to your profile → Settings → Access Tokens**
|
| 37 |
+
Direct link: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
|
| 38 |
+
|
| 39 |
+
3. Click **"New token"**, give it any name (e.g. `gridmind`), and select role **"Read"**
|
| 40 |
+
|
| 41 |
+
4. Copy the token — it looks like: `hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ`
|
| 42 |
+
|
| 43 |
+
5. You'll paste this token in the terminal when running the project (shown below)
|
| 44 |
+
|
| 45 |
+
> **💡 It's free!** Hugging Face's inference API has a free tier that's enough to run this project.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
### ❓ Why Llama? What even is Llama?
|
| 50 |
+
|
| 51 |
+
**Llama** (Large Language Model Meta AI) is an open-source AI model made by Meta (Facebook). Think of it like a smarter, programmable version of ChatGPT that you can use via an API.
|
| 52 |
+
|
| 53 |
+
**Why this project uses Llama specifically:**
|
| 54 |
+
|
| 55 |
+
| Reason | Explanation |
|
| 56 |
+
|--------|-------------|
|
| 57 |
+
| 🆓 Free to use | Available on Hugging Face at no cost |
|
| 58 |
+
| 📖 Open-source | The weights and code are public — no black box |
|
| 59 |
+
| 🧠 Smart enough | Llama 3.1 8B is capable of reading sensor data and outputting valid JSON actions |
|
| 60 |
+
| ⚡ Fast | The 8B (8 billion parameter) version is small enough to run quickly on Hugging Face's servers |
|
| 61 |
+
| 🔄 OpenAI-compatible | It uses the same API format as OpenAI, so the code works with many models |
|
| 62 |
+
|
| 63 |
+
The model reads the building state (temperature, electricity price, grid stress) and outputs a JSON action like:
|
| 64 |
+
```json
|
| 65 |
+
{
|
| 66 |
+
"hvac_power_level": 0.4,
|
| 67 |
+
"thermal_charge_rate": 0.5,
|
| 68 |
+
"batch_job_slot": 2,
|
| 69 |
+
"load_shed_fraction": 0.0,
|
| 70 |
+
"building_id": 0
|
| 71 |
+
}
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
> **You can also swap Llama for any other OpenAI-compatible model** (GPT-4, Mistral, etc.) by changing the environment variables.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## 🏗️ Architecture
|
| 79 |
|
| 80 |
```text
|
| 81 |
┌──────────────────────┐ ┌─────────────────────────────┐
|
| 82 |
│ │ │ │
|
| 83 |
│ LLM RL Agent │◄───────┤ GridMind-RL Server │
|
| 84 |
+
│ (Python Script) │ POST │ (Go OpenEnv Backend) │
|
| 85 |
│ ├───────►│ Port 7860 │
|
| 86 |
└──────────────────────┘ Action │ │
|
| 87 |
└──────────────┬──────────────┘
|
|
|
|
| 95 |
└─────────────────────────────┘
|
| 96 |
```
|
| 97 |
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
## 🚀 How to Run the Project (Step by Step)
|
| 101 |
+
|
| 102 |
+
There are **two ways** to run this project:
|
| 103 |
+
- **Option A** — Using Docker (recommended, easiest)
|
| 104 |
+
- **Option B** — Running manually without Docker
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
### Option A: Docker (Recommended)
|
| 109 |
+
|
| 110 |
+
Docker packages everything into a container so you don't need to install Go, Python versions, etc. separately.
|
| 111 |
+
|
| 112 |
+
#### Prerequisites
|
| 113 |
+
|
| 114 |
+
- Install Docker Desktop: [https://www.docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop)
|
| 115 |
+
- A Hugging Face API token (see above ☝️)
|
| 116 |
+
|
| 117 |
+
#### Step 1 — Build the Docker image
|
| 118 |
+
|
| 119 |
+
Open a terminal in the project folder and run:
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
docker build -t gridmind-rl .
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
This may take a few minutes the first time.
|
| 126 |
+
|
| 127 |
+
#### Step 2 — Start the environment server
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
docker run -p 7860:7860 -p 7861:7861 gridmind-rl
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
You should see the server start. Keep this terminal open.
|
| 134 |
+
|
| 135 |
+
- **Environment API:** http://localhost:7860
|
| 136 |
+
- **Visualization Dashboard:** http://localhost:7861
|
| 137 |
+
|
| 138 |
+
#### Step 3 — Install Python dependencies
|
| 139 |
+
|
| 140 |
+
Open a **new terminal** (keep the Docker one running) and run:
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
pip install -r python/requirements.txt
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
#### Step 4 — Set your API credentials
|
| 147 |
+
|
| 148 |
+
**On Windows (Command Prompt):**
|
| 149 |
+
```cmd
|
| 150 |
+
set API_BASE_URL=https://router.huggingface.co/v1
|
| 151 |
+
set MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 152 |
+
set HF_TOKEN=hf_your_token_here
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
**On Windows (PowerShell):**
|
| 156 |
+
```powershell
|
| 157 |
+
$env:API_BASE_URL = "https://router.huggingface.co/v1"
|
| 158 |
+
$env:MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
|
| 159 |
+
$env:HF_TOKEN = "hf_your_token_here"
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
**On Mac/Linux:**
|
| 163 |
+
```bash
|
| 164 |
+
export API_BASE_URL=https://router.huggingface.co/v1
|
| 165 |
+
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 166 |
+
export HF_TOKEN=hf_your_token_here
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
Replace `hf_your_token_here` with your actual Hugging Face token.
|
| 170 |
+
|
| 171 |
+
#### Step 5 — Run the AI agent
|
| 172 |
+
|
| 173 |
+
```bash
|
| 174 |
+
python python/inference.py --episodes 3
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
You'll see the agent play through 3 episodes across all 3 tasks and print scores.
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
### Option B: Manual (Without Docker)
|
| 182 |
+
|
| 183 |
+
Use this if you don't have Docker installed.
|
| 184 |
+
|
| 185 |
+
#### Prerequisites
|
| 186 |
+
|
| 187 |
+
- [Go 1.21+](https://go.dev/dl/) — for running the environment server
|
| 188 |
+
- [Python 3.9+](https://www.python.org/downloads/) — for the AI agent script
|
| 189 |
+
- A Hugging Face API token (see above ☝️)
|
| 190 |
+
|
| 191 |
+
#### Step 1 — Start the Go environment server
|
| 192 |
+
|
| 193 |
+
```bash
|
| 194 |
+
go run main.go
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
The server starts on port `7860`. Keep this terminal open.
|
| 198 |
+
|
| 199 |
+
#### Step 2 — Open a new terminal and install Python dependencies
|
| 200 |
+
|
| 201 |
+
```bash
|
| 202 |
+
pip install -r python/requirements.txt
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
#### Step 3 — Set your API credentials (same as Option A, Step 4 above)
|
| 206 |
+
|
| 207 |
+
#### Step 4 — Validate the environment is working
|
| 208 |
+
|
| 209 |
+
```bash
|
| 210 |
+
python python/validate.py --env-url http://localhost:7860
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
You should see a series of checks pass. If they do, you're good to go.
|
| 214 |
+
|
| 215 |
+
#### Step 5 — Run the AI agent
|
| 216 |
+
|
| 217 |
+
```bash
|
| 218 |
+
python python/inference.py --episodes 3
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
## 📊 What Happens When You Run It
|
| 224 |
+
|
| 225 |
+
The agent runs through **3 tasks** (Easy → Medium → Hard), each for the number of episodes you specify:
|
| 226 |
+
|
| 227 |
+
| Task | Difficulty | Goal |
|
| 228 |
+
|------|-----------|------|
|
| 229 |
+
| Task 1 | Easy | Minimize energy costs only |
|
| 230 |
+
| Task 2 | Medium | Minimize costs + keep temperature 19°C–23°C |
|
| 231 |
+
| Task 3 | Hard | Costs + temperature + batch job deadlines + grid stress response |
|
| 232 |
+
|
| 233 |
+
At the end, you'll see a score table like:
|
| 234 |
+
```
|
| 235 |
+
============================================================
|
| 236 |
+
BASELINE SCORES SUMMARY
|
| 237 |
+
============================================================
|
| 238 |
+
Task Model Score Episodes
|
| 239 |
+
------------------------------------------------------------
|
| 240 |
+
Task 1 meta-llama/Llama-3.1-8B-Instruct 0.7823 3
|
| 241 |
+
Task 2 meta-llama/Llama-3.1-8B-Instruct 0.6541 3
|
| 242 |
+
Task 3 meta-llama/Llama-3.1-8B-Instruct 0.5102 3
|
| 243 |
+
------------------------------------------------------------
|
| 244 |
+
Overall 0.6489
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
Results are also saved to `baseline_scores.json`.
|
| 248 |
+
|
| 249 |
+
---
|
| 250 |
+
|
| 251 |
+
## 📐 Observation Space
|
| 252 |
+
|
| 253 |
+
These are the sensor readings the agent sees at each step:
|
| 254 |
|
| 255 |
| Name | Type | Range | Description |
|
| 256 |
|------|------|-------|-------------|
|
|
|
|
| 266 |
| `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
|
| 267 |
| `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
|
| 268 |
|
| 269 |
+
---
|
| 270 |
+
|
| 271 |
+
## 🕹️ Action Space
|
| 272 |
+
|
| 273 |
+
These are the controls the agent outputs at each step:
|
| 274 |
|
| 275 |
| Name | Type | Range | Description |
|
| 276 |
|------|------|-------|-------------|
|
|
|
|
| 280 |
| `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
|
| 281 |
| `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
|
| 282 |
|
| 283 |
+
---
|
|
|
|
|
|
|
| 284 |
|
| 285 |
+
## 🏆 Reward Function
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
|
| 287 |
The dense reward includes several components:
|
| 288 |
* **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
|
|
|
|
| 294 |
|
| 295 |
*Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
|
| 296 |
|
| 297 |
+
---
|
|
|
|
|
|
|
| 298 |
|
| 299 |
+
## 🔧 Extensions
|
|
|
|
|
|
|
|
|
|
| 300 |
|
| 301 |
+
* **Multi-building mode:** Switch the environment to 3 buildings via `POST /reset {"num_buildings": 3}` and output action arrays for coordinated dispatch.
|
| 302 |
+
* **Use a different model:** Just change `MODEL_NAME` to any OpenAI-compatible model (e.g. `mistralai/Mistral-7B-Instruct-v0.3`).
|
| 303 |
+
* **Add new tasks:** Edit `env/tasks.go` and implement a new `gradeTaskX` component.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
|
| 305 |
+
---
|
|
|
|
| 306 |
|
| 307 |
+
## ❓ Troubleshooting
|
|
|
|
|
|
|
| 308 |
|
| 309 |
+
| Problem | Fix |
|
| 310 |
+
|---------|-----|
|
| 311 |
+
| `Connection refused` on port 7860 | Make sure the Docker container or `go run main.go` is still running |
|
| 312 |
+
| `401 Unauthorized` from Hugging Face | Your `HF_TOKEN` is wrong or expired — generate a new one |
|
| 313 |
+
| `Model not found` error | Some large models require you to accept terms on Hugging Face first. Go to the model page and click "Agree to terms" |
|
| 314 |
+
| Python package errors | Make sure you ran `pip install -r python/requirements.txt` |
|
| 315 |
+
| `docker: command not found` | Install Docker Desktop from [docker.com](https://www.docker.com/products/docker-desktop) |
|
python/inference.py
CHANGED
|
@@ -5,7 +5,7 @@ Runs an LLM agent against all 3 tasks for N episodes each.
|
|
| 5 |
Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
|
| 6 |
|
| 7 |
Usage:
|
| 8 |
-
export API_BASE_URL=https://
|
| 9 |
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 10 |
export HF_TOKEN=hf_xxxx
|
| 11 |
python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
|
|
@@ -26,7 +26,7 @@ from openai import OpenAI
|
|
| 26 |
# ── Constants ──────────────────────────────────────────────────────────────
|
| 27 |
|
| 28 |
ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
|
| 29 |
-
API_BASE_URL = os.getenv("API_BASE_URL", "https://
|
| 30 |
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
|
| 31 |
HF_TOKEN = os.getenv("HF_TOKEN", "")
|
| 32 |
DEFAULT_EPISODES = 3
|
|
@@ -245,6 +245,11 @@ def run_episode(env_client: GridMindEnvClient, agent: LLMAgent,
|
|
| 245 |
action = agent.choose_action(obs, task_id)
|
| 246 |
step_resp = env_client.step(action)
|
| 247 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
obs = step_resp["observation"]
|
| 249 |
total_reward += step_resp["reward"]
|
| 250 |
total_steps += 1
|
|
|
|
| 5 |
Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
|
| 6 |
|
| 7 |
Usage:
|
| 8 |
+
export API_BASE_URL=https://router.huggingface.co/v1
|
| 9 |
export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
|
| 10 |
export HF_TOKEN=hf_xxxx
|
| 11 |
python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
|
|
|
|
| 26 |
# ── Constants ──────────────────────────────────────────────────────────────
|
| 27 |
|
| 28 |
ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
|
| 29 |
+
API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
|
| 30 |
MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
|
| 31 |
HF_TOKEN = os.getenv("HF_TOKEN", "")
|
| 32 |
DEFAULT_EPISODES = 3
|
|
|
|
| 245 |
action = agent.choose_action(obs, task_id)
|
| 246 |
step_resp = env_client.step(action)
|
| 247 |
|
| 248 |
+
if step_resp is None or "observation" not in step_resp:
|
| 249 |
+
print(f" [WARN] step {_step}: server returned invalid response, skipping step")
|
| 250 |
+
_step += 1
|
| 251 |
+
break
|
| 252 |
+
|
| 253 |
obs = step_resp["observation"]
|
| 254 |
total_reward += step_resp["reward"]
|
| 255 |
total_steps += 1
|