Update README.md
Browse files
README.md
CHANGED
|
@@ -110,6 +110,24 @@ This current platform is designed for engineers, architects, and researchers wor
|
|
| 110 |
<p><em>LLM-BEM-Engineer for automated editing.</em></p>
|
| 111 |
</div>
|
| 112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
## 🚀 Quick Start
|
| 114 |
|
| 115 |
Here provides a code snippet to show you how to run the LLM-BEM-Engineer.
|
|
|
|
| 110 |
<p><em>LLM-BEM-Engineer for automated editing.</em></p>
|
| 111 |
</div>
|
| 112 |
|
| 113 |
+
## 📁 LLM-BEM-Engineer Benchmark Dataset
|
| 114 |
+
|
| 115 |
+
This benchmark dataset is designed to evaluate the capability of **Large Language Models (LLMs)** in generating **Building Energy Models (BEMs)** from natural language descriptions.
|
| 116 |
+
|
| 117 |
+
The benchmark focuses on two essential aspects of real-world applicability:
|
| 118 |
+
|
| 119 |
+
- **Scalability**: The ability of LLMs to handle a wide range of building configurations and system complexities.
|
| 120 |
+
- **Robustness**: The ability of LLMs to correctly infer user intent under noisy, ambiguous, or incomplete inputs.
|
| 121 |
+
|
| 122 |
+
The benchmark consists of **two complementary test sets**, each designed to evaluate a different capability of LLMs in automated building energy model generation.
|
| 123 |
+
|
| 124 |
+
| Dataset | Purpose | Description |
|
| 125 |
+
|-------|--------|-------------|
|
| 126 |
+
| `detailed_prompt_test` | Scalability benchmark | Well-specified and detailed building modeling prompts |
|
| 127 |
+
| `robust_prompt_test` | Robustness benchmark | Noisy and high-level user input prompts |
|
| 128 |
+
|
| 129 |
+
For details, please refer to the *LLM-BEM-Engineer Benchmark* folder in this repository.
|
| 130 |
+
|
| 131 |
## 🚀 Quick Start
|
| 132 |
|
| 133 |
Here provides a code snippet to show you how to run the LLM-BEM-Engineer.
|