unknown commited on
Commit
7a9df24
·
1 Parent(s): 33264ad
Files changed (1) hide show
  1. README.md +20 -20
README.md CHANGED
@@ -35,7 +35,7 @@ VEGA_AE
35
  - 8 Nvidia Tesla V100 GPU, each with 16 GB Memory
36
 
37
  ## 3. Software Dependency
38
- - CUDA == 11.4
39
  - python version == 3.8.1
40
  - Conda (Any version that supports the installation of Python 3.8.1)
41
 
@@ -60,9 +60,9 @@ $ pip install -r requirements.txt
60
 
61
  ## 5. Code Generation
62
 
63
- We have provided a fine-tuned model using data from ```./dataset/train.jsonl``` and ```./dataset/valid.jsonl``` in ```./models/FT_Model```. The ```train.jsonl``` and ```valid.jsonl``` files contain function templates, feature vectors and ground truth for 98 backends in our dataset.
64
 
65
- We have also provided a script fot functionality test, which only generates a single function for RI5CY (Recorded as PULP in our dataset), taking less than 3 minutes with 8 Nividia Tesla V100 GPUs.
66
 
67
  - **Run functionality test with:**
68
 
@@ -86,7 +86,7 @@ Check the generated code with:
86
  $ cat ./models/FT_Model/result.jsonl
87
  ```
88
 
89
- In the `result.jsonl` file, the meaning of each item in each entry corresponds as follows:
90
 
91
 
92
  | Item | Description |
@@ -94,7 +94,7 @@ In the `result.jsonl` file, the meaning of each item in each entry corresponds a
94
  | vega_code | The model-generated code. |
95
  | ans_code | The ground truth of the code. |
96
  | vega_pre | The model-generated confidence score. |
97
- | ans_code | The ground truth of the confidence score. |
98
  | File | The file to which this item belongs. |
99
  | Function | The function to which this item belongs. |
100
  | Module | The function module to which this item belongs. |
@@ -109,7 +109,7 @@ The fine-tuned model will take function templates and feature vectors for RISC-V
109
  $ bash run_test.sh
110
  ```
111
 
112
- Customize parameters for inferencing by modifying following options in the ```run_test.sh```.
113
  ```
114
  --model_name_or_path ../../models/UnixCoder \
115
  --test_filename ../../dataset/test.jsonl \
@@ -177,14 +177,14 @@ We provide the scripts to reproduce each Figure/Table from the paper, along with
177
 
178
  | Script | Description | Output | Figure/Table |
179
  | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------- |
180
- | ./Scripts/Exp/Time/gen_time.py | Calculate the time overhead. | ./Scripts/Exp/Time/Fig7.csv | Fig.7 |
181
- | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the function-level accuracy. | ./Scripts/Exp/Acc/Fig8_Acc.csv | Fig.8 |
182
- | ./Scripts/Exp/Acc/gen_purple.py | Calculate the the percentage of functions accurately synthesized from the statements of various existing targets (Purple Bar in Fig.8). | ./Scripts/Exp/Acc/Fig8_Purple.csv | Fig.8 |
183
- | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the percentage of three types of error. | ./Scripts/Exp/Acc/Table2.csv | Table.2 |
184
- | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the statement-level accracy of VEGA and ForkFlow. | ./Scripts/Exp/ForkFlow/Fig9.csv | Fig.9 |
185
- | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the number of accurate statements of VEGA. | ./Scripts/Exp/ForkFlow/Table3.csv | Table.3 |
186
- | ./Scripts/Exp/Correction/gen_correct.py | Calculate time required by two developers to modify the VEGA-generated RISC-V backend. | ./Scripts/Exp/Correction/Table4.csv | Table. 4 |
187
- | ./Scripts/Exp/Perf/gen_perf.py | Calculate the speedup of LLVM-Base (-O3),and LLVM-VEGA (-O3) over LLVM-Base (-O0). | ./Scripts/Exp/Perf/Fig10.csv | Fig. 10 |
188
  ### 7.1 Results for Fig. 7
189
 
190
  In the code generation process, we set a batch size of 256 on 8 Nvidia Tesla V100 GPU (each with 16GB memory), meaning each batch contains 256 statements. Since each batch may include statements from different function modules, we did not directly measure the generation time for each function modules of three targets (RISC-V, RI5CY, xCORE) during execution. Instead, we calculated the average inference time of each batch (25 seconds) and then derived the inference time of each statement (25/256 seconds). With the total number of statements within each function module of each target, we subsequently calculated the total inference time required for each function module of each target.
@@ -204,7 +204,7 @@ $ cat ./Scripts/Exp/Time/Fig7.csv
204
  ### 7.2 Results for Fig. 8
205
 
206
 
207
- In our experiment, we employed the Pass@1 evaluation metric, which involves replacing each VEGA-generated function individually within the official LLVM (LLVM-Base), then running regression tests to verify the correctness of the replaced function. This process is highly time-intensive, as a single regression test run generally takes about half an hour. Thus, sequentially testing all 1,454 VEGA-generated functions across three targets would require approximately 727 hours.
208
 
209
  To simplify this process, we recorded the ground truth for each statement based on the Pass@1 experiment results. Additionally, we documented a list of functions containing Err-Def errors (i.e., errors due to missing necessary statements in the function template; functions with Err-Def can not pass all regression tests). This allowed us to transform the Pass@1 testing process into an Exact Match evaluation.
210
 
@@ -259,7 +259,7 @@ $ cat ./Scripts/Exp/Acc/Table2.csv
259
 
260
  ### 7.4 Results for Fig. 9
261
 
262
- We modified the functions generated by VEGA and functions in the MIPS backend (ForkFlow) to ensure they can correctly run on the RISC-V, RI5CY, and xCORE backends respectively. We have reserved function code for the MIPS backend in the ```./Scripts/Exp/ForkFlow/Mips_Code``` directory, along with manually modified code for the RISC-V, RI5CY, and xCORE LLVM backends in ```./Scripts/Exp/ForkFlow/Std_Code```. Additionally, the script in 7.2 will automatically write the VEGA-generated code from ```result.jsonl``` into the ```./Scripts/Exp/ForkFlow/VEGA_Code``` directory for comparison. By executing the following script, the proportion of accurate and modified statements of the VEGA-generated functions and ForkFlow processes will be automatically calculated.
263
 
264
  - Command:
265
  ```
@@ -291,7 +291,7 @@ $ cat ./Scripts/Exp/ForkFlow/Table3.csv
291
 
292
  ### 7.6 Results for Table. 4
293
 
294
- The data in Table. 4 show the time two developers needed to modify the VEGA-generated RISC-V backend. As a human-based experiment, only the recorded modification times for each function are record.
295
 
296
  The following script computes the total time spent by Developers A and B to modify each **function module** in the VEGA-generated RISC-V backend, based on the recorded times for each **function**.
297
 
@@ -310,7 +310,7 @@ $ cat ./Scripts/Exp/Correction/Table4.csv
310
 
311
  Due to commercial licensing restrictions, we cannot provide the source code for the SPEC 2017 CPU benchmark used in this experiment. Additionally, testing all benchmarks including SPEC 2017 CPU is time-intensive, requiring around 565 hours in total. To address these constraints, we provide our recorded experimental data.
312
 
313
- By executing the following script, the speedup for VEGA-generated LLVM backend (LLVM-VEGA) and the offical LLVM backend (LLVM-Base) will be automatically calculated repectively.
314
 
315
 
316
  - Command:
@@ -325,6 +325,6 @@ $ cat ./Scripts/Exp/Perf/Fig10.csv
325
 
326
 
327
 
328
- ## 8. Exeriment Customization
329
 
330
- Users can run this experiment in different environments, but they must ensure that PyTorch version is compatible with the CUDA version in those environments. The experiment can also be conducted in different hardware environments, but adjustments to the batch size for fine-tuning and inference are necessary based on the available GPU memory. We have fixed the random seed and parameters in the provided scripts to ensure consistent code generation accuracy within the same hardware and software environment. However, when the experiment is executed in different hardware or software environments, the accuracy may experience some fluctuations.
 
35
  - 8 Nvidia Tesla V100 GPU, each with 16 GB Memory
36
 
37
  ## 3. Software Dependency
38
+ - CUDA == 11.7
39
  - python version == 3.8.1
40
  - Conda (Any version that supports the installation of Python 3.8.1)
41
 
 
60
 
61
  ## 5. Code Generation
62
 
63
+ We have provided a fine-tuned model in ```./models/FT_Model```, which is fine-tuned with ```./dataset/train.jsonl``` and ```./dataset/valid.jsonl```. The ```train.jsonl``` and ```valid.jsonl``` files contain function templates, feature vectors and ground truth for 98 backends (excluding RISC-V, RI5CY, xCORE) in our dataset.
64
 
65
+ We have also provided a script fot functionality test, which only generates a single function for RI5CY (Recorded as PULP in our dataset), taking less than 3 minutes with 8 Nvidia Tesla V100 GPUs.
66
 
67
  - **Run functionality test with:**
68
 
 
86
  $ cat ./models/FT_Model/result.jsonl
87
  ```
88
 
89
+ In the `result.jsonl` file, the meaning of each item in an entry can be found in the following table:
90
 
91
 
92
  | Item | Description |
 
94
  | vega_code | The model-generated code. |
95
  | ans_code | The ground truth of the code. |
96
  | vega_pre | The model-generated confidence score. |
97
+ | ans_pre | The ground truth of the confidence score. |
98
  | File | The file to which this item belongs. |
99
  | Function | The function to which this item belongs. |
100
  | Module | The function module to which this item belongs. |
 
109
  $ bash run_test.sh
110
  ```
111
 
112
+ Customize parameters for code generation by modifying following options in the ```run_test.sh```.
113
  ```
114
  --model_name_or_path ../../models/UnixCoder \
115
  --test_filename ../../dataset/test.jsonl \
 
177
 
178
  | Script | Description | Output | Figure/Table |
179
  | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------- |
180
+ | ./Scripts/Exp/Time/gen_time.py | Calculate the time overhead for VEGA to generate three backends. | ./Scripts/Exp/Time/Fig7.csv | Fig.7 |
181
+ | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the function-level accuracy of three VEGA-generated backends. | ./Scripts/Exp/Acc/Fig8_Acc.csv | Fig.8 |
182
+ | ./Scripts/Exp/Acc/gen_purple.py | Calculate the results of Purple Bar in Fig. 8. | ./Scripts/Exp/Acc/Fig8_Purple.csv | Fig.8 |
183
+ | ./Scripts/Exp/Acc/gen_accuracy.py | Calculate the percentage of three types of errors in three VEGA-generated backends. | ./Scripts/Exp/Acc/Table2.csv | Table.2 |
184
+ | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the statement-level accuracy of VEGA-generated backends and ForkFlow-generated backends. | ./Scripts/Exp/ForkFlow/Fig9.csv | Fig.9 |
185
+ | ./Scripts/Exp/ForkFlow/gen_forkflow.py | Calculate the number of statements accurately generated and requiring manual correction by VEGA of three backends. | ./Scripts/Exp/ForkFlow/Table3.csv | Table.3 |
186
+ | ./Scripts/Exp/Correction/gen_correct.py | Calculate time required by two developers to modify the VEGA-generated RISC-V backend. | ./Scripts/Exp/Correction/Table4.csv | Table. 4 |
187
+ | ./Scripts/Exp/Perf/gen_perf.py | Calculate the speedup of LLVM-Base (-O3),and LLVM-VEGA (-O3) over LLVM-Base (-O0) on three benchmarks. | ./Scripts/Exp/Perf/Fig10.csv | Fig. 10 |
188
  ### 7.1 Results for Fig. 7
189
 
190
  In the code generation process, we set a batch size of 256 on 8 Nvidia Tesla V100 GPU (each with 16GB memory), meaning each batch contains 256 statements. Since each batch may include statements from different function modules, we did not directly measure the generation time for each function modules of three targets (RISC-V, RI5CY, xCORE) during execution. Instead, we calculated the average inference time of each batch (25 seconds) and then derived the inference time of each statement (25/256 seconds). With the total number of statements within each function module of each target, we subsequently calculated the total inference time required for each function module of each target.
 
204
  ### 7.2 Results for Fig. 8
205
 
206
 
207
+ In our experiment, we employed the Pass@1 evaluation metric, which involves replacing each VEGA-generated function individually within the official LLVM (LLVM-Base), then running regression tests to verify the correctness of the replaced function. This process is highly time-consuming, as a single regression test run generally takes about half an hour. Thus, sequentially testing all 1,454 VEGA-generated functions across three targets would require approximately 727 hours.
208
 
209
  To simplify this process, we recorded the ground truth for each statement based on the Pass@1 experiment results. Additionally, we documented a list of functions containing Err-Def errors (i.e., errors due to missing necessary statements in the function template; functions with Err-Def can not pass all regression tests). This allowed us to transform the Pass@1 testing process into an Exact Match evaluation.
210
 
 
259
 
260
  ### 7.4 Results for Fig. 9
261
 
262
+ We modified the functions generated by VEGA and functions in the MIPS backend (ForkFlow) to ensure they can correctly run on the RISC-V, RI5CY, and xCORE backends respectively. We have reserved function code for the MIPS backend in the ```./Scripts/Exp/ForkFlow/Mips_Code``` directory, along with manually fixed code for the RISC-V, RI5CY, and xCORE LLVM backends in ```./Scripts/Exp/ForkFlow/Std_Code```. Additionally, the script in 7.2 will automatically write the VEGA-generated code from ```result.jsonl``` into the ```./Scripts/Exp/ForkFlow/VEGA_Code``` directory for comparison. By executing the following script, the proportion of accurate and modified statements of the VEGA-generated functions and ForkFlow processes will be automatically calculated.
263
 
264
  - Command:
265
  ```
 
291
 
292
  ### 7.6 Results for Table. 4
293
 
294
+ The data in Table. 4 show the time two developers needed to modify the VEGA-generated RISC-V backend. As a human-based experiment, only the recorded modification times for each function are provided.
295
 
296
  The following script computes the total time spent by Developers A and B to modify each **function module** in the VEGA-generated RISC-V backend, based on the recorded times for each **function**.
297
 
 
310
 
311
  Due to commercial licensing restrictions, we cannot provide the source code for the SPEC 2017 CPU benchmark used in this experiment. Additionally, testing all benchmarks including SPEC 2017 CPU is time-intensive, requiring around 565 hours in total. To address these constraints, we provide our recorded experimental data.
312
 
313
+ Running the following script will automatically calculate the speedup of the VEGA-generated LLVM backend (LLVM-VEGA) with the "-O3" optimization over the performance of the official LLVM backend (LLVM-Base) with "-O0", as well as the speedup of LLVM-Base with "-O3" over its own performance with "-O0".
314
 
315
 
316
  - Command:
 
325
 
326
 
327
 
328
+ ## 8. Experiment Customization
329
 
330
+ Users can run this experiment in different software environments, but they must ensure that PyTorch version is compatible with the CUDA version in those software environments. The experiment can also be conducted in different hardware environments, but adjustments to the batch size for fine-tuning and inference are necessary based on the available GPU memory. We have fixed the random seed and parameters in the provided scripts to ensure consistent code generation accuracy within the same hardware and software environment. However, when the experiment is executed in different hardware or software environments, the accuracy may experience some fluctuations.