Controller / README.md
Gen-HVAC's picture
Update README.md
88e86fb verified
|
raw
history blame
8.13 kB
metadata
license: mit
tags:
  - Decision Transformer
  - Heating-Ventilation-Air Conditionion(HVAC)
  - Docker
  - Energy Plus
  - Generalisation, General Models
  - Transfer Learning and Zero Shot

One for All: LLM guided zero-shot HVAC control

Abstract

HVAC controllers are widely deployed across buildings with dif- ferent layouts, sensing configurations, climates, and occupancy patterns. In practice, controllers tuned for one building often de- grade when applied to another, leading to inconsistent energy effi- ciency and occupant comfort. Many learning-based HVAC control methods rely on building-specific training, retraining, or expert intervention, which is often impractical or costly at scale. To address these challenges, we present Gen-HVAC, an LLM- guided, zero-shot HVAC control platform for multi-zone buildings that is trained once and deployed across diverse buildings without retraining. We design a transformer-based HVAC controller that is trained on historical building operation data collected across multiple buildings and generates control actions by conditioning on recent system behavior rather than building-specific models. By using recent temperature measurements, past control actions, and observed system responses, the controller generates HVAC control actions that transfer across buildings and climates with- out retraining, enabling the same model to scale to new buildings. To further improve occupant comfort, we integrate a lightweight language model that allows users to specify comfort preferences directly, without requiring human expertise, manual rule design, or paid external APIs. The system translates these preferences into control objectives that guide the controller without interfering with system dynamics or real-time control. By conditioning on these objectives, the controller switches between operating modes, such as energy-focused or comfort-focused behavior. We evaluate Gen-HVAC across multiple climates and building scenarios using EnergyPlus and validate the system in a real build- ing deployment. Results show consistent improvements over rule- based control, achieving 36.8% energy savings with 28% comfort performance under zero-shot deployment. We also release our plat- form to support reproducibility and enable future research on scal- able, data-driven HVAC control.


Gen-HVAC

System Architecture, Training and Implementation

We devided this entire project into 4 phases. Please go through each step to use our system.

Energy Plus Setup

For this project we use Sinergym and energy plus. sinergym has also a prebuilt docker version. you can use this and for installation:

docker pull ghcr.io/ugr-sail/sinergym:2.4.0

After this run the docker container and see next steps

docker run -it \
  --name genhvac_container \
  -v $(pwd):/workspace \
  ghcr.io/ugr-sail/sinergym:2.4.0 \
  /bin/bash

Data generation

Trajectory generation is executed through the rollout runner combined with a behavior policy. The framework is policy-based: any controller that maps

use the data generation script along with rollout runner to generate sequential data.

Our architecture works with all kinds of policy and you can try different patterns for generating data. If you have ecobee data then that can work too. if you have MPC rules for a particular building model then it will work excellent. This works a framework for data generation.

We have rollouts which you can use to generate specific building location data or building type or combine different envolop locations and weather and building type.

# Inside Docker container
cd /workspace

python trajectory_generator.py \
    --manifest patched_reference_data_base/OfficeSmall/reference_database.json \
    --output_dir dataset \
    --behavior seasonal_reactive \
    --time_freq 900 \

Optional multi-building combinations:

python trajectory_generator.py \
    --manifest patched_reference_data_base/OfficeMedium/reference_database.json \
    --combine_climates True \
    --combine_envelopes True \
    --output_dir dataset_large

Each episode is stored as compressed .npz:

dataset/
 β”œβ”€β”€ OfficeSmall__Buffalo__standard__episode_001.npz
 β”œβ”€β”€ OfficeSmall__Dubai__high_internal__episode_002.npz
 └── metadata.json

Each file contains:

{
  "observations": np.ndarray(T, state_dim),
  "actions": np.ndarray(T, action_dim),
  "rewards": np.ndarray(T),
  "state_keys": list,
  "action_keys": list,
  "meta": dict
}

Temporal resolution: 15 minutes
Episode length: 35040 timesteps (1 simulation year)

Training Phase

After you have generated data you can move on to the training phase which , for our experiments we generted more than 2300 sequential data combinations and resulted in more than 3 million trajectories.

Training phase is devided into 3 parts Dataloader, decision transformer and losses and finally the main training code.

The only changes needed will be mapping of the observation data from the sensors and also the action keys. We have already done that for a office small STD2013 and Office Medium STD2013. The same architecture can be extended to other buildings in the HOT data set and also with the ecobee data set as well as any real building dataset.

Next comes the training code. We tried to make a system which can a be framed as a general zero shot system. However the novelty also lies in the entire system since this system can be extended to cover vast amount of data atleast 1000 to 10000 times more. In the training code you have to simply increase the size of the transformer model and our losses and embeddings layers will try to generalize over more and more buildings residential homes etc.

We condition on different RTG for comfort and energy savings. Anykind of data will already be filtered on different RTG and TOPK filtering helping model to understand what kind of actions lead to what kind of consequenses.

  1. LLM deployment phase Gen-HVAC supports an optional LLM + Digital Human-in-the-Loop (DHIL) layer that modulates preference/RTG targets and high-level constraints. For local LLM hosting, install Ollama, pull a quantized model , and launch the service.

On Linux/macOS you can install Ollama via curl -fsSL https://ollama.com/install.sh | sh, start the daemon with ollama serve (leave it running), and pull recommended models using ollama pull deepseek-r1:7b (lightweight reasoning), ollama pull llama3.1:8b (strong general instruction-following), ollama pull qwen2.5:7b (efficient general model), or ollama pull mistral:instruct (fast instruct model). If you want a slightly heavier but still practical model, ollama pull deepseek-r1:14b or ollama pull qwen2.5:14b. In our testing we choose Deepseek R1.

Once pulled, sanity-check locally with ollama run deepseek-r1:7b, then in another terminal point your Gen-HVAC LLM client to the default endpoint and run your integration from the llm/ folder (e.g., python -m llm.server --host 0.0.0.0 --port 8000 and python -m llm.client --base_url http://localhost:xxxx --model deepseek-r1:7b. After the LLM endpoint is up, you can proceed to the inference server step to bind the persona/prompt layer to RTG conditioning and the control loop in one end to end pipeline.

  1. Inference During inference, we deploy Gen-HVAC as a stateless HTTP microservice

that loads the trained Decision Transformer checkpoint and normalization statistics at startup, maintains a short autoregressive context window internally, and returns multi-zone heating/cooling setpoints per control step. In our experiments, EnergyPlus/Sinergym executes inside the Docker container, while the inference service runs on the host/server (CPU/GPU), so the simulator can stream observation vectors to POST /predict (payload: {step, obs, info}) and receive an action vector in the response, with POST /reset used to clear policy history at episode boundaries.

When enabled, the DHIL module queries a local Ollama endpoint and updates the comfort RTG target at a low frequency (e.g., every 4 steps).