Gen-HVAC
/

Controller

@@ -141,24 +141,18 @@ toward higher-quality sub-trajectories, enabling the model to learn how differen
 ### LLM deployment phase
-Gen-HVAC supports an optional LLM + Digital Human-in-the-Loop (DHIL) layer that modulates preference/RTG targets and high-level
-constraints. For local LLM hosting, install Ollama, pull a quantized model
-, and launch the service.
 On Linux/macOS you can install Ollama via curl -fsSL https://ollama.com/install.sh | sh,  start the daemon with ollama serve (leave it running), and pull recommended models using ollama pull deepseek-r1:7b (lightweight reasoning), ollama pull llama3.1:8b (strong general instruction-following), ollama pull qwen2.5:7b (efficient general model), or ollama pull mistral:instruct (fast instruct model). If you want a slightly heavier but still practical model, ollama pull deepseek-r1:14b or ollama pull qwen2.5:14b.
 In our testing we choose Deepseek R1.
-Once pulled, sanity-check locally with ollama run deepseek-r1:7b, then in another terminal point your Gen-HVAC LLM client to the default endpoint  and run your integration from the llm/ folder (e.g., python -m llm.server --host 0.0.0.0 --port 8000 and python -m llm.client --base_url http://localhost:xxxx --model deepseek-r1:7b.
 After the LLM endpoint is up, you can proceed to the inference server step to bind the persona/prompt layer to RTG conditioning and the control loop in one end to end pipeline.
- ### Inference
-During inference, we deploy Gen-HVAC as a stateless HTTP microservice
- that loads the trained Decision Transformer checkpoint and normalization statistics at startup, maintains a short autoregressive context window internally,
- and returns multi-zone heating/cooling setpoints per control step.
- In our experiments, EnergyPlus/Sinergym executes inside the Docker container, while the inference service runs on the host/server (CPU/GPU),
- so the simulator can stream observation vectors to POST /predict (payload: {step, obs, info}) and receive an action vector in the response, with POST /reset
- used to clear policy history at episode boundaries.
- When enabled, the DHIL module queries a local Ollama endpoint and updates the comfort RTG target at a low frequency (e.g., every 4 steps).

 ### LLM deployment phase
+Gen-HVAC supports LLM + Digital Human-in-the-Loop (DHIL) layer that modulates preference/RTG targets and high-level
+constraints. For local LLM hosting, install Ollama, pull a quantized model ,and launch the service.
 On Linux/macOS you can install Ollama via curl -fsSL https://ollama.com/install.sh | sh,  start the daemon with ollama serve (leave it running), and pull recommended models using ollama pull deepseek-r1:7b (lightweight reasoning), ollama pull llama3.1:8b (strong general instruction-following), ollama pull qwen2.5:7b (efficient general model), or ollama pull mistral:instruct (fast instruct model). If you want a slightly heavier but still practical model, ollama pull deepseek-r1:14b or ollama pull qwen2.5:14b.
 In our testing we choose Deepseek R1.
+Once pulled, run deepseek-r1:7b with Ollama, then in another terminal point your Gen-HVAC LLM client to the default endpoint  and run your integration from the llm/ folder (e.g., python -m llm.server --host 0.0.0.0 --port 8000 and python -m llm.client --base_url http://localhost:xxxx --model deepseek-r1:7b.
 After the LLM endpoint is up, you can proceed to the inference server step to bind the persona/prompt layer to RTG conditioning and the control loop in one end to end pipeline.
+### Inference
+During inference, we deploy Gen-HVAC as a stateless HTTP microservice that loads the trained Decision Transformer checkpoint and normalization statistics at startup,
+maintains a short autoregressive context window internally, and returns multi-zone heating/cooling setpoints per control step.
+In our experiments, EnergyPlus/Sinergym executes inside the Docker container, while the inference service runs on the host/server (CPU/GPU),  so the simulator can stream observation vectors to POST /predict (payload: {step, obs, info}) and receive an action vector in the response, with POST /reset
+used to clear policy history at episode boundaries. When enabled, the DHIL module queries a local Ollama endpoint and updates the comfort RTG target at a low frequency (e.g., every 4 steps).