unknown commited on
Commit
33264ad
Β·
1 Parent(s): 3708a78
README.md CHANGED
@@ -32,16 +32,33 @@ VEGA_AE
32
  ```
33
  ## 2. Hardware Dependency
34
 
35
- - Intel(R)Xeon(R)Gold 6132 CPU @ 2.60GHz
36
  - 8 Nvidia Tesla V100 GPU, each with 16 GB Memory
37
 
38
  ## 3. Software Dependency
39
  - CUDA == 11.4
40
  - python version == 3.8.1
41
- - pip install -r requirements.txt
42
 
 
43
 
44
- ## 4. Code Generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  We have provided a fine-tuned model using data from ```./dataset/train.jsonl``` and ```./dataset/valid.jsonl``` in ```./models/FT_Model```. The ```train.jsonl``` and ```valid.jsonl``` files contain function templates, feature vectors and ground truth for 98 backends in our dataset.
47
 
@@ -123,7 +140,7 @@ The inference result will be saved in ```./models/FT_Model/result.jsonl```.
123
 
124
  Note that if a ```./models/FT_Model/result.jsonl``` file already exists, it will be **overwritten** after the execution of ```run_function_test.sh``` or ```run_test.sh```.
125
 
126
- ## 5. Fine-Tuning (**Optional**)
127
 
128
 
129
  We provide the original UnixCoder-base-nine in ```./models/UnixCoder```. The original UnixCoder-base-nine can also be downloaded from HuggingFace: https://huggingface.co/microsoft/unixcoder-base-nine.
@@ -153,29 +170,29 @@ Customize parameters for fine-tuning by modifying following options in the ```ru
153
  The fine-tuned model will be saved in ```--output_dir```.
154
 
155
 
156
- ## 6. Reproducing Results in the Experiment
157
 
158
  We provide the scripts to reproduce each Figure/Table from the paper, along with the corresponding output result files, in the following table:
159
 
160
 
161
  | Script | Description | Output | Figure/Table |
162
  | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------- |
163
- | ./Scripts/Exp/Time/calculate_time.py | Calculate the time overhead. | ./Scripts/Exp/Time/Fig7.csv | Fig.7 |
164
- | ./Scripts/Exp/Acc/calculate_accuracy.py | Calculate the function-level accuracy. | ./Scripts/Exp/Acc/Fig8_Acc.csv | Fig.8 |
165
- | ./Scripts/Exp/Acc/calculate_purple.py | Calculate the the percentage of functions accurately synthesized from the statements of various existing targets (Purple Bar in Fig.8). | ./Scripts/Exp/Acc/Fig8_Purple.csv | Fig.8 |
166
- | ./Scripts/Exp/Acc/calculate_accuracy.py | Calculate the percentage of three types of error. | ./Scripts/Exp/Acc/Table2.csv | Table.2 |
167
- | ./Scripts/Exp/ForkFlow/calculate_forkflow.py | Calculate the statement-level accracy of VEGA and ForkFlow. | ./Scripts/Exp/ForkFlow/Fig9.csv | Fig.9 |
168
- | ./Scripts/Exp/ForkFlow/calculate_forkflow.py | Calculate the number of accurate statements of VEGA. | ./Scripts/Exp/ForkFlow/Table3.csv | Table.3 |
169
- | ./Scripts/Exp/Correction/calculate_correction.py | Calculate time required by two developers to modify the VEGA-generated RISC-V backend. | ./Scripts/Exp/Correction/Table4.csv | Table. 4 |
170
- | ./Scripts/Exp/Perf/calculate_perf.py | Calculate the speedup of LLVM-Base (-O3),and LLVM-VEGA (-O3) over LLVM-Base (-O0). | ./Scripts/Exp/Perf/Fig10.csv | Fig. 10 |
171
- ### 6.1 Results for Fig. 7
172
 
173
  In the code generation process, we set a batch size of 256 on 8 Nvidia Tesla V100 GPU (each with 16GB memory), meaning each batch contains 256 statements. Since each batch may include statements from different function modules, we did not directly measure the generation time for each function modules of three targets (RISC-V, RI5CY, xCORE) during execution. Instead, we calculated the average inference time of each batch (25 seconds) and then derived the inference time of each statement (25/256 seconds). With the total number of statements within each function module of each target, we subsequently calculated the total inference time required for each function module of each target.
174
 
175
 
176
  - Command:
177
  ```
178
- $ python ./Scripts/Exp/Time/calculate_time.py
179
  ```
180
 
181
 
@@ -184,7 +201,7 @@ $ python ./Scripts/Exp/Time/calculate_time.py
184
  $ cat ./Scripts/Exp/Time/Fig7.csv
185
  ```
186
 
187
- ### 6.2 Results for Fig. 8
188
 
189
 
190
  In our experiment, we employed the Pass@1 evaluation metric, which involves replacing each VEGA-generated function individually within the official LLVM (LLVM-Base), then running regression tests to verify the correctness of the replaced function. This process is highly time-intensive, as a single regression test run generally takes about half an hour. Thus, sequentially testing all 1,454 VEGA-generated functions across three targets would require approximately 727 hours.
@@ -196,7 +213,7 @@ In this Exact Match evaluation, each statement is deemed correct if the VEGA-gen
196
  - Command:
197
  ```
198
  $ cp ./models/FT_Model/result.jsonl ./Scripts/Exp/Acc
199
- $ python ./Scripts/Exp/Acc/calculate_accuracy.py
200
  ```
201
 
202
  This script will automatically analyze the VEGA's output from "result.jsonl" and compare the generated code and confidence scores with the ground truth. Based on this comparison, it will determine whether each function is correct.
@@ -212,7 +229,7 @@ We also provide a script for calculating the proportion of "Accurate Functions w
212
 
213
  - Command:
214
  ```
215
- $ python ./Scripts/Exp/Acc/calculate_purple.py
216
  ```
217
 
218
 
@@ -223,14 +240,14 @@ $ cat ./Scripts/Exp/Acc/Fig8_Purple.csv
223
 
224
 
225
 
226
- ### 6.3 Results for Table. 2
227
 
228
- Executing the script in 6.2 will also yield the proportion of the three types of errors for each target.
229
 
230
 
231
  - Command:
232
  ```
233
- $ python ./Scripts/Exp/Acc/calculate_accuracy.py
234
  ```
235
 
236
 
@@ -240,13 +257,13 @@ $ cat ./Scripts/Exp/Acc/Table2.csv
240
  ```
241
 
242
 
243
- ### 6.4 Results for Fig. 9
244
 
245
- We modified the functions generated by VEGA and functions in the MIPS backend (ForkFlow) to ensure they can correctly run on the RISC-V, RI5CY, and xCORE backends respectively. We have reserved function code for the MIPS backend in the ```./Scripts/Exp/ForkFlow/Mips_Code``` directory, along with manually modified code for the RISC-V, RI5CY, and xCORE LLVM backends in ```./Scripts/Exp/ForkFlow/Std_Code```. Additionally, the script in 6.2 will automatically write the VEGA-generated code from ```result.jsonl``` into the ```./Scripts/Exp/ForkFlow/VEGA_Code``` directory for comparison. By executing the following script, the proportion of accurate and modified statements of the VEGA-generated functions and ForkFlow processes will be automatically calculated.
246
 
247
  - Command:
248
  ```
249
- $ python ./Scripts/Exp/ForkFlow/calculate_forkflow.py
250
  ```
251
 
252
 
@@ -255,14 +272,14 @@ $ python ./Scripts/Exp/ForkFlow/calculate_forkflow.py
255
  $ cat ./Scripts/Exp/ForkFlow/Fig9.csv
256
  ```
257
 
258
- ### 6.5 Results for Table. 3
259
 
260
- Executing the script in 6.4 will also output the number of statements accurately generated and requiring manual correction by VEGA across seven function modules for RISC-V, RI5CY, and xCORE.
261
 
262
 
263
  - Command:
264
  ```
265
- $ python ./Scripts/Exp/ForkFlow/calculate_forkflow.py
266
  ```
267
 
268
 
@@ -272,7 +289,7 @@ $ cat ./Scripts/Exp/ForkFlow/Table3.csv
272
  ```
273
 
274
 
275
- ### 6.6 Results for Table. 4
276
 
277
  The data in Table. 4 show the time two developers needed to modify the VEGA-generated RISC-V backend. As a human-based experiment, only the recorded modification times for each function are record.
278
 
@@ -281,7 +298,7 @@ The following script computes the total time spent by Developers A and B to modi
281
  - Command:
282
 
283
  ```
284
- $ python ./Scripts/Exp/Correction/calculate_correction.py
285
  ```
286
 
287
  - Results:
@@ -289,7 +306,7 @@ $ python ./Scripts/Exp/Correction/calculate_correction.py
289
  $ cat ./Scripts/Exp/Correction/Table4.csv
290
  ```
291
 
292
- ### 6.7 Results for Fig. 10
293
 
294
  Due to commercial licensing restrictions, we cannot provide the source code for the SPEC 2017 CPU benchmark used in this experiment. Additionally, testing all benchmarks including SPEC 2017 CPU is time-intensive, requiring around 565 hours in total. To address these constraints, we provide our recorded experimental data.
295
 
@@ -298,7 +315,7 @@ By executing the following script, the speedup for VEGA-generated LLVM backend (
298
 
299
  - Command:
300
  ```
301
- $ python ./Scripts/Exp/Perf/calculate_perf.py
302
  ```
303
 
304
  - Results:
@@ -308,6 +325,6 @@ $ cat ./Scripts/Exp/Perf/Fig10.csv
308
 
309
 
310
 
311
- ## 7. Exeriment Customization
312
 
313
  Users can run this experiment in different environments, but they must ensure that PyTorch version is compatible with the CUDA version in those environments. The experiment can also be conducted in different hardware environments, but adjustments to the batch size for fine-tuning and inference are necessary based on the available GPU memory. We have fixed the random seed and parameters in the provided scripts to ensure consistent code generation accuracy within the same hardware and software environment. However, when the experiment is executed in different hardware or software environments, the accuracy may experience some fluctuations.
 
32
  ```
33
  ## 2. Hardware Dependency
34
 
 
35
  - 8 Nvidia Tesla V100 GPU, each with 16 GB Memory
36
 
37
  ## 3. Software Dependency
38
  - CUDA == 11.4
39
  - python version == 3.8.1
40
+ - Conda (Any version that supports the installation of Python 3.8.1)
41
 
42
+ ## 4. Installation
43
 
44
+
45
+ - Download the artifact from https://huggingface.co/docz-ict/VEGA_AE.
46
+
47
+ ```
48
+ $ git lfs clone https://huggingface.co/docz-ict/VEGA_AE
49
+ $ cd VEGA_AE
50
+ ```
51
+
52
+ - Install Python (Version 3.8.1) in Conda environment.
53
+
54
+ ```
55
+ $ conda create -n vega_ae python=3.8.1
56
+ $ conda activate vega_ae
57
+ $ pip install -r requirements.txt
58
+ ```
59
+
60
+
61
+ ## 5. Code Generation
62
 
63
  We have provided a fine-tuned model using data from ```./dataset/train.jsonl``` and ```./dataset/valid.jsonl``` in ```./models/FT_Model```. The ```train.jsonl``` and ```valid.jsonl``` files contain function templates, feature vectors and ground truth for 98 backends in our dataset.
64
 
 
140
 
141
  Note that if a ```./models/FT_Model/result.jsonl``` file already exists, it will be **overwritten** after the execution of ```run_function_test.sh``` or ```run_test.sh```.
142
 
143
+ ## 6. Fine-Tuning (**Optional**)
144
 
145
 
146
  We provide the original UnixCoder-base-nine in ```./models/UnixCoder```. The original UnixCoder-base-nine can also be downloaded from HuggingFace: https://huggingface.co/microsoft/unixcoder-base-nine.
 
170
  The fine-tuned model will be saved in ```--output_dir```.
171
 
172
 
173
+ ## 7. Reproducing Results in the Experiment
174
 
175
  We provide the scripts to reproduce each Figure/Table from the paper, along with the corresponding output result files, in the following table:
176
 
177
 
178
  | Script | Description | Output | Figure/Table |
179
  | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------- |
180
+ | ./Scripts/Exp/Time/gen_time.py | Calculate the time overhead. | ./Scripts/Exp/Time/Fig7.csv | Fig.7 |
181
+ | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the function-level accuracy. | ./Scripts/Exp/Acc/Fig8_Acc.csv | Fig.8 |
182
+ | ./Scripts/Exp/Acc/gen_purple.py | Calculate the the percentage of functions accurately synthesized from the statements of various existing targets (Purple Bar in Fig.8). | ./Scripts/Exp/Acc/Fig8_Purple.csv | Fig.8 |
183
+ | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the percentage of three types of error. | ./Scripts/Exp/Acc/Table2.csv | Table.2 |
184
+ | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the statement-level accracy of VEGA and ForkFlow. | ./Scripts/Exp/ForkFlow/Fig9.csv | Fig.9 |
185
+ | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the number of accurate statements of VEGA. | ./Scripts/Exp/ForkFlow/Table3.csv | Table.3 |
186
+ | ./Scripts/Exp/Correction/gen_correct.py | Calculate time required by two developers to modify the VEGA-generated RISC-V backend. | ./Scripts/Exp/Correction/Table4.csv | Table. 4 |
187
+ | ./Scripts/Exp/Perf/gen_perf.py | Calculate the speedup of LLVM-Base (-O3),and LLVM-VEGA (-O3) over LLVM-Base (-O0). | ./Scripts/Exp/Perf/Fig10.csv | Fig. 10 |
188
+ ### 7.1 Results for Fig. 7
189
 
190
  In the code generation process, we set a batch size of 256 on 8 Nvidia Tesla V100 GPU (each with 16GB memory), meaning each batch contains 256 statements. Since each batch may include statements from different function modules, we did not directly measure the generation time for each function modules of three targets (RISC-V, RI5CY, xCORE) during execution. Instead, we calculated the average inference time of each batch (25 seconds) and then derived the inference time of each statement (25/256 seconds). With the total number of statements within each function module of each target, we subsequently calculated the total inference time required for each function module of each target.
191
 
192
 
193
  - Command:
194
  ```
195
+ $ python ./Scripts/Exp/Time/gen_time.py
196
  ```
197
 
198
 
 
201
  $ cat ./Scripts/Exp/Time/Fig7.csv
202
  ```
203
 
204
+ ### 7.2 Results for Fig. 8
205
 
206
 
207
  In our experiment, we employed the Pass@1 evaluation metric, which involves replacing each VEGA-generated function individually within the official LLVM (LLVM-Base), then running regression tests to verify the correctness of the replaced function. This process is highly time-intensive, as a single regression test run generally takes about half an hour. Thus, sequentially testing all 1,454 VEGA-generated functions across three targets would require approximately 727 hours.
 
213
  - Command:
214
  ```
215
  $ cp ./models/FT_Model/result.jsonl ./Scripts/Exp/Acc
216
+ $ python ./Scripts/Exp/Acc/gen_accuracy.py
217
  ```
218
 
219
  This script will automatically analyze the VEGA's output from "result.jsonl" and compare the generated code and confidence scores with the ground truth. Based on this comparison, it will determine whether each function is correct.
 
229
 
230
  - Command:
231
  ```
232
+ $ python ./Scripts/Exp/Acc/gen_purple.py
233
  ```
234
 
235
 
 
240
 
241
 
242
 
243
+ ### 7.3 Results for Table. 2
244
 
245
+ Executing the script in 7.2 will also yield the proportion of the three types of errors for each target.
246
 
247
 
248
  - Command:
249
  ```
250
+ $ python ./Scripts/Exp/Acc/gen_accuracy.py
251
  ```
252
 
253
 
 
257
  ```
258
 
259
 
260
+ ### 7.4 Results for Fig. 9
261
 
262
+ We modified the functions generated by VEGA and functions in the MIPS backend (ForkFlow) to ensure they can correctly run on the RISC-V, RI5CY, and xCORE backends respectively. We have reserved function code for the MIPS backend in the ```./Scripts/Exp/ForkFlow/Mips_Code``` directory, along with manually modified code for the RISC-V, RI5CY, and xCORE LLVM backends in ```./Scripts/Exp/ForkFlow/Std_Code```. Additionally, the script in 7.2 will automatically write the VEGA-generated code from ```result.jsonl``` into the ```./Scripts/Exp/ForkFlow/VEGA_Code``` directory for comparison. By executing the following script, the proportion of accurate and modified statements of the VEGA-generated functions and ForkFlow processes will be automatically calculated.
263
 
264
  - Command:
265
  ```
266
+ $ python ./Scripts/Exp/ForkFlow/gen_forkflow.py
267
  ```
268
 
269
 
 
272
  $ cat ./Scripts/Exp/ForkFlow/Fig9.csv
273
  ```
274
 
275
+ ### 7.5 Results for Table. 3
276
 
277
+ Executing the script in 7.4 will also output the number of statements accurately generated and requiring manual correction by VEGA across seven function modules for RISC-V, RI5CY, and xCORE.
278
 
279
 
280
  - Command:
281
  ```
282
+ $ python ./Scripts/Exp/ForkFlow/gen_forkflow.py
283
  ```
284
 
285
 
 
289
  ```
290
 
291
 
292
+ ### 7.6 Results for Table. 4
293
 
294
  The data in Table. 4 show the time two developers needed to modify the VEGA-generated RISC-V backend. As a human-based experiment, only the recorded modification times for each function are record.
295
 
 
298
  - Command:
299
 
300
  ```
301
+ $ python ./Scripts/Exp/Correction/gen_correct.py
302
  ```
303
 
304
  - Results:
 
306
  $ cat ./Scripts/Exp/Correction/Table4.csv
307
  ```
308
 
309
+ ### 7.7 Results for Fig. 10
310
 
311
  Due to commercial licensing restrictions, we cannot provide the source code for the SPEC 2017 CPU benchmark used in this experiment. Additionally, testing all benchmarks including SPEC 2017 CPU is time-intensive, requiring around 565 hours in total. To address these constraints, we provide our recorded experimental data.
312
 
 
315
 
316
  - Command:
317
  ```
318
+ $ python ./Scripts/Exp/Perf/gen_perf.py
319
  ```
320
 
321
  - Results:
 
325
 
326
 
327
 
328
+ ## 8. Exeriment Customization
329
 
330
  Users can run this experiment in different environments, but they must ensure that PyTorch version is compatible with the CUDA version in those environments. The experiment can also be conducted in different hardware environments, but adjustments to the batch size for fine-tuning and inference are necessary based on the available GPU memory. We have fixed the random seed and parameters in the provided scripts to ensure consistent code generation accuracy within the same hardware and software environment. However, when the experiment is executed in different hardware or software environments, the accuracy may experience some fluctuations.
Scripts/Exp/Acc/{calculate_accuracy.py β†’ gen_accuracy.py} RENAMED
File without changes
Scripts/Exp/Acc/{calculate_purple.py β†’ gen_purple.py} RENAMED
File without changes
Scripts/Exp/Correction/{calculate_correction.py β†’ gen_correct.py} RENAMED
File without changes
Scripts/Exp/ForkFlow/{calculate_forkflow.py β†’ gen_forkflow.py} RENAMED
File without changes
Scripts/Exp/Perf/{calculate_perf.py β†’ gen_perf.py} RENAMED
File without changes
Scripts/Exp/Time/{calculate_time.py β†’ gen_time.py} RENAMED
File without changes