File size: 11,140 Bytes
361558b 283c357 361558b 283c357 361558b 283c357 c86248b 283c357 3b110db 283c357 c86248b abd8e0a c86248b a1a372b aaf8d81 5ec5058 aaf8d81 c86248b aaf8d81 c86248b a0c7810 aaf8d81 a0c7810 a1a372b aaf8d81 29012cb 228a5dc 29012cb aaf8d81 228a5dc aaf8d81 29012cb aaf8d81 29012cb aaf8d81 29012cb aaf8d81 29012cb a665345 a0c7810 50abebb a0c7810 a665345 50abebb a0c7810 a665345 05137c4 88e86fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | ---
license: mit
tags:
- Decision Transformer
- Heating-Ventilation-Air Conditionion(HVAC)
- Docker
- Energy Plus
- Generalisation, General Models
- Transfer Learning and Zero Shot
---
# One for All: LLM guided zero-shot HVAC control
## Abstract
HVAC controllers are widely deployed across buildings with dif-
ferent layouts, sensing configurations, climates, and occupancy
patterns. In practice, controllers tuned for one building often de-
grade when applied to another, leading to inconsistent energy effi-
ciency and occupant comfort. Many learning-based HVAC control
methods rely on building-specific training, retraining, or expert
intervention, which is often impractical or costly at scale.
To address these challenges, we present Gen-HVAC, an LLM-
guided, zero-shot HVAC control platform for multi-zone buildings
that is trained once and deployed across diverse buildings without
retraining. We design a transformer-based HVAC controller that
is trained on historical building operation data collected across
multiple buildings and generates control actions by conditioning
on recent system behavior rather than building-specific models.
By using recent temperature measurements, past control actions,
and observed system responses, the controller generates HVAC
control actions that transfer across buildings and climates with-
out retraining, enabling the same model to scale to new buildings.
To further improve occupant comfort, we integrate a lightweight
language model that allows users to specify comfort preferences
directly, without requiring human expertise, manual rule design,
or paid external APIs. The system translates these preferences into
control objectives that guide the controller without interfering with
system dynamics or real-time control. By conditioning on these
objectives, the controller switches between operating modes, such
as energy-focused or comfort-focused behavior.
We evaluate Gen-HVAC across multiple climates and building
scenarios using EnergyPlus and validate the system in a real build-
ing deployment. Results show consistent improvements over rule-
based control, achieving 36.8% energy savings with 28% comfort
performance under zero-shot deployment. We also release our plat-
form to support reproducibility and enable future research on scal-
able, data-driven HVAC control.
----

## System Architecture, Training and Implementation
We devided this entire project into 4 phases. Please go through each step to use our system.
### Energy Plus Setup
For this project we use [Sinergym](https://sinergym.readthedocs.io/en/latest/pages/installation.html) and [EnergyPlus](). Sinergym has also a prebuilt docker version, you can find it [here](https://ugr-sail.github.io/sinergym/compilation/v2.1.0/pages/installation.html). For installation:
```bash
docker pull ghcr.io/ugr-sail/sinergym:2.4.0
```
After this run the docker container and see next steps
```bash
docker run -it \
--name genhvac_container \
-v $(pwd):/workspace \
ghcr.io/ugr-sail/sinergym:2.4.0 \
/bin/bash
```

### Data generation
Trajectory generation is executed via the rollout runner coupled with a behavior policy. Use the data-generation script together with the rollout runner to generate
temporally consistent data across different buildings, climates, and envelope/occupancy variants. You can generate datasets using rule-based controllers,
learned policies, MPC-style rules, or real-building logs such as Ecobee traces, and the same pipeline will serialize them into a unified trajectory format.
The provided rollout utilities support targeted generation for a specific location or building type, as well as generation that mixes envelope variants, weather files,
and building archetypes to construct large, diverse training corpora.
All of this needs to be run inside a docker container which has energy plus. If you already have energy plus setup as a native software then you can simply adopt the synergym setup, and
generate the sequential training data.
```bash
# Inside Docker container
cd /workspace
python trajectory_generator.py \
--manifest patched_reference_data_base/OfficeSmall/reference_database.json \
--output_dir dataset \
--behavior seasonal_reactive \
--time_freq 900 \
```
Optional multi-building combinations:
```bash
python trajectory_generator.py \
--manifest patched_reference_data_base/OfficeMedium/reference_database.json \
--combine_climates True \
--combine_envelopes True \
--output_dir dataset_large
```
Each episode is stored as compressed `.npz`:
```
dataset/
βββ OfficeSmall__Buffalo__standard__episode_001.npz
βββ OfficeSmall__Dubai__high_internal__episode_002.npz
βββ metadata.json
```
Each file contains:
```python
{
"observations": np.ndarray(T, state_dim),
"actions": np.ndarray(T, action_dim),
"rewards": np.ndarray(T),
"state_keys": list,
"action_keys": list,
"meta": dict
}
```
Temporal resolution: 15 minutes
Episode length: 35040 timesteps (1 simulation year)
### Training Phase
After data generation, you can proceed to the training phase. In our experiments, we generated 2,300+ buildingβweatherβpolicy combinations, yielding 3M+ sequential
stateβaction transitions. The training pipeline is modular and consists of the dataloader, the Decision Transformer model, the loss modules, and the main training loop.
In most cases, the only required adaptation is mapping your raw sensor observations to the expected schema and defining the corresponding action keys;
we provide validated mappings for OfficeSmall STD2013 (5-zone) and OfficeMedium STD2013 (15-zone), and the same interface extends directly to other HOT buildings
as well as Ecobee or other real-building datasets. The training implementation is designed for generalization and zero-shot transfer.
Scaling to larger and more diverse building types primarily requires increasing model capacity (d_model, layers, heads), the embedding and loss structure can remain
unchanged. It supports heterogeneous buildings, zone counts, and sensing modalities.
We condition the policy on multi-objective return-to-go (RTG) targets for energy and comfort, and optionally apply Top-K filtering/selection by RTG to bias training
toward higher-quality sub-trajectories, enabling the model to learn how different action sequences causally trade off energy consumption and comfort outcomes.
### LLM deployment phase
Gen-HVAC supports LLM + Digital Human-in-the-Loop (DHIL) layer that modulates preference/RTG targets and high-level
constraints. For local LLM hosting, install Ollama, pull a quantized model ,and launch the service.
On Linux/macOS you can install Ollama via curl -fsSL https://ollama.com/install.sh | sh, start the daemon with ollama serve (leave it running), and pull recommended models using ollama pull deepseek-r1:7b (lightweight reasoning), ollama pull llama3.1:8b (strong general instruction-following), ollama pull qwen2.5:7b (efficient general model), or ollama pull mistral:instruct (fast instruct model). If you want a slightly heavier but still practical model, ollama pull deepseek-r1:14b or ollama pull qwen2.5:14b.
In our testing we choose Deepseek R1.
Once pulled, run deepseek-r1:7b with Ollama, then in another terminal point your Gen-HVAC LLM client to the default endpoint and run your integration from the llm/ folder (e.g., python -m llm.server --host 0.0.0.0 --port 8000 and python -m llm.client --base_url http://localhost:xxxx --model deepseek-r1:7b.
After the LLM endpoint is up, you can proceed to the inference server step to bind the persona/prompt layer to RTG conditioning and the control loop in one end to end pipeline.
### Inference
During inference, we deploy Gen-HVAC as a stateless HTTP microservice that loads the trained Decision Transformer checkpoint and normalization statistics at startup,
maintains a short autoregressive context window internally, and returns multi-zone heating/cooling setpoints per control step.
In our experiments, EnergyPlus/Sinergym executes inside the Docker container, while the inference service runs on the host/server (CPU/GPU), so the simulator can stream observation vectors to POST /predict (payload: {step, obs, info}) and receive an action vector in the response, with POST /reset
used to clear policy history at episode boundaries. When enabled, the DHIL module queries a local Ollama endpoint and updates the comfort RTG target at a low frequency (e.g., every 4 steps).
## Repository Structure
```text
Gen-HVAC:Controller/
β
βββ data_generator/ # Data generation pipeline (EnergyPlus/Sinergym rollouts)
β βββ rollout_runner.py # Runs rollouts for selected building-weather configs and logs outputs
β βββ trajectory_generator.py # Creates trajectory datasets from rollouts
β
βββ evaluation/ # Evaluation scripts
β
βββ Inference_&_LLM/ # Inference + LLM/DHIL (Digital Human-in-the-Loop) components
β βββ inference.py # Runs local inference (loads model + produces actions)
β βββ inference_server.py # Server wrapper for inference (API-based deployment)
β βββ digital_human_in_the_loop.py # DHIL logic
β βββ llm_client/ # LLM client utilities
β βββ ...
β
βββ Model/ # Saved model checkpoints + configs
β βββ Model_V1/
β β βββ last.pt # Model checkpoint
β β βββ model_config.json # Training/model parameters
β β βββ report.json
β βββ Model_V2/ ...
β βββ Model_V3/ ...
β
βββ training/ # Training code (DT model, embeddings, losses, trainer)
β βββ data_loader.py # Loads trajectories, builds tokens/batches, normalization, RTG, etc.
β βββ embeddings.py # Feature encoders + token embeddings (zone/global/RTG encodings)
β βββ losses.py # Action loss + auxiliary losses (physics/value/etc. if enabled)
β βββ training.py # Main training entry point (train loop, checkpoints, logging)
β
βββ utilities/ # Shared utilities used across data-gen/training/eval/inference
βββ comfort.py # Comfort metric helpers
βββ data_generator.py # Shared dataset helpers / schema utilities
βββ policy.py # Policy wrappers (DT policy interface, action post-processing)
βββ rewards.py
βββ rollout.py # Rollout utilities (env stepping, logging, post-processing)
βββ tables.py
|