GangJiang commited on
Commit
c3b5f03
·
verified ·
1 Parent(s): 1fe4dfd

Update LLM-BEM-Engineer_Benchmark/README.md

Browse files
Files changed (1) hide show
  1. LLM-BEM-Engineer_Benchmark/README.md +158 -1
LLM-BEM-Engineer_Benchmark/README.md CHANGED
@@ -1 +1,158 @@
1
- **User instruction**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🏗️ LLM Benchmark for Automated Building Energy Model Generation
2
+
3
+ This repository provides a benchmark dataset designed to evaluate the capability of **Large Language Models (LLMs)** in generating **Building Energy Models (BEMs)** from natural language descriptions.
4
+
5
+ The benchmark focuses on two essential aspects of real-world applicability:
6
+
7
+ - **Scalability**: The ability of LLMs to handle a wide range of building configurations and system complexities.
8
+ - **Robustness**: The ability of LLMs to correctly infer user intent under noisy, ambiguous, or incomplete inputs.
9
+
10
+ ---
11
+
12
+ ## 📦 Dataset Overview
13
+
14
+ The benchmark consists of **two complementary test sets**:
15
+
16
+ | Dataset | Purpose | Description |
17
+ |-------|--------|-------------|
18
+ | `detailed_prompt_test` | Scalability benchmark | Well-specified, detailed building modeling prompts |
19
+ | `robust_prompt_test` | Robustness benchmark | Noisy and ambiguous user input prompts |
20
+
21
+ ---
22
+
23
+ ## 1️⃣ detailed_prompt_test — Scalability Benchmark
24
+
25
+ The `detailed_prompt_test` dataset contains **126 building energy modeling scenarios**, designed to test whether LLMs can scale across diverse modeling requirements.
26
+
27
+ ### Covered Modeling Dimensions
28
+
29
+ Each prompt may include combinations of the following specifications:
30
+
31
+ - Building geometry
32
+ - HVAC systems (heating, ventilation, and air-conditioning)
33
+ - Number of stories
34
+ - Envelope constructions and materials
35
+ - Occupancy and operational schedules
36
+ - Thermostat setpoints
37
+ - Space types
38
+ - Building orientation
39
+ - Window-to-wall ratios (WWRs)
40
+ - Zoning strategies
41
+
42
+ This dataset reflects realistic complexity encountered in professional building energy modeling workflows.
43
+
44
+ ---
45
+
46
+ ### 📄 File Naming Convention
47
+
48
+ Each file name ends with **two digits**, encoding the HVAC system type and the building geometry type:
49
+ ---
50
+
51
+ ### 🔢 First Digit — HVAC System Type
52
+
53
+ | Code | HVAC System |
54
+ |----|------------|
55
+ | 1 | DX system with electric heater |
56
+ | 2 | DX system with fuel burner |
57
+ | 3 | Heat pump |
58
+ | 4 | VRF system |
59
+ | 5 | DOAS + VRF, with multiple AHU units |
60
+ | 6 | DOAS + FCU, with multiple AHU units |
61
+ | 7 | FCU system |
62
+ | 8 | VAV system, with multiple AHU units |
63
+ | 9 | Hybrid VAV + FCU system |
64
+
65
+ ---
66
+
67
+ ### 🔢 Second Digit — Building Geometry Type
68
+
69
+ | Code | Geometry Description |
70
+ |----|---------------------|
71
+ | 1 | U-shaped building with gable roof |
72
+ | 2 | U-shaped building with flat roof |
73
+ | 3 | T-shaped building with gable roof |
74
+ | 4 | T-shaped building with flat roof |
75
+ | 5 | Rectangular building with hip roof |
76
+ | 6 | Rectangular building with gable roof |
77
+ | 7 | Rectangular building with flat roof |
78
+ | 8 | Rectangular building with core–perimeter zoning and hip roof |
79
+ | 9 | Rectangular building with core–perimeter zoning and gable roof |
80
+ | 10 | Rectangular building with core–perimeter zoning and flat roof |
81
+ | 11 | L-shaped building with gable roof |
82
+ | 12 | Flat-shaped building |
83
+ | 13 | Hollow square (courtyard) building with gable roof |
84
+ | 14 | Hollow square (courtyard) building with flat roof |
85
+
86
+ ---
87
+
88
+ ## 2️⃣ robust_prompt_test — Robustness Benchmark
89
+
90
+ The `robust_prompt_test` dataset evaluates the robustness of LLMs to **noisy and imperfect user inputs**, simulating real-world interactions.
91
+
92
+ ### Noise Characteristics
93
+
94
+ Prompts include various types of input noise, such as:
95
+
96
+ - Spelling errors
97
+ - Ambiguous or vague descriptions
98
+ - Incomplete or missing specifications
99
+ - Diverse sentence structures
100
+ - Informal or unstructured language
101
+
102
+ All prompts are **synthetically generated by GPT-5**, simulating noisy user intent.
103
+
104
+ ---
105
+
106
+ ### 📄 File Naming Convention
107
+
108
+ Each file name ends with a numeric suffix indicating a **distinct robustness test case**:
109
+
110
+ Each case corresponds to a unique noisy user input scenario.
111
+
112
+ ---
113
+
114
+ ## 🎯 Benchmark Objectives
115
+
116
+ This benchmark is designed to support the evaluation of:
117
+
118
+ - LLM generalization across building types and HVAC systems
119
+ - Accuracy of system and geometry inference
120
+ - Completeness and validity of generated building models
121
+ - Robustness to noisy, ambiguous, or incomplete user intent
122
+ - Failure modes under increasing modeling complexity
123
+
124
+ ---
125
+
126
+ ## 🧪 Suggested Evaluation Criteria (Optional)
127
+
128
+ Users may evaluate LLM outputs using one or more of the following criteria:
129
+
130
+ - Geometry correctness
131
+ - HVAC system selection accuracy
132
+ - Completeness of generated model components
133
+ - Constraint violations
134
+ - Simulation success rate (e.g., EnergyPlus error-free execution)
135
+ - Robust intent inference under noisy prompts
136
+
137
+ ---
138
+
139
+ ## 📌 Intended Use
140
+
141
+ This dataset is suitable for:
142
+
143
+ - Benchmarking LLMs in building energy modeling tasks
144
+ - Research on AI-assisted building simulation workflows
145
+ - Robustness testing of natural language interfaces
146
+ - Comparative evaluation of different LLM architectures
147
+
148
+ ---
149
+
150
+ ## 📄 License & Citation
151
+
152
+ Please cite this repository if used in academic or technical work.
153
+
154
+ (You may add license information, BibTeX, or DOI here.)
155
+
156
+ ---
157
+
158
+