cgrumbach commited on
Commit
3c72861
·
verified ·
1 Parent(s): 7e6aa8f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Title
2
+ Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin
3
+
4
+ - Cyrille Grumbach, ETH Zurich, Switzerland (cgrumbach@ethz.ch)
5
+ - Didier Sornette, Southern University of Science and Technology, China (didier@sustech.edu.cn)
6
+
7
+ # Folders
8
+
9
+ ## scraper
10
+
11
+ File main.ipynb is used to scrape the forum.
12
+
13
+ ## hardwarelist
14
+
15
+ Includes the following:
16
+
17
+ - pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
18
+ - get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
19
+ - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
20
+ - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
21
+ - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
22
+ - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
23
+ - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
24
+
25
+ ## bitcoinforum
26
+
27
+ ### 1_forum_dataset
28
+
29
+ It contains the raw HTML from the forum and code to parse it and combine it into data frames.
30
+
31
+ ### 2_train_set_creation
32
+
33
+ Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
34
+
35
+ ### 3_training
36
+
37
+ Trains Mistral 7B using LoRA, on the dataset generated earlier and saves the merged model.
38
+
39
+
40
+ ### 4_inference
41
+
42
+ Runs inference of the trained Mistral 7B on inputs.csv created in part 2.
43
+
44
+ ### 5_processing_extracted_data
45
+
46
+ Includes the following files:
47
+
48
+ - 1_processing.ipynb: Takes the raw output from Mistral 7B and converts it into hardware_instances.csv
49
+ - 2_create_mapping.ipynb: Uses GPT4 to map the hardware names to those of the efficiency table
50
+ - 3_add_efficiency.ipynb: Merges the mapped hardware instances and the efficiency table to get hardware_instances_with_efficiency.csv
51
+ - 4_visualizations.ipynb, not_usable_threads.txt, hardware_instances_inc_threads.csv: Only used for debugging
52
+ - hardware_mapping.py: automatically generated by step 3
53
+
54
+ ### 6_merging
55
+
56
+ Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
57
+
58
+ monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency,max_efficiency
59
+
60
+
61
+ ## plots
62
+
63
+ Includes the following:
64
+
65
+ - carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
66
+ - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
67
+ - appendix2.ipynb: Creates all plots needed for the paper
68
+
69
+
70
+ # System requirements
71
+
72
+ Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
73
+
74
+ Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
75
+
76
+ # Operating system
77
+
78
+ Code that is not related to training or inference of Mistral 7B has been tested on Windows 10.
79
+
80
+ Code for Mistral 7B training and inference has been tested on Runpod instances.
81
+
82
+ # Installation guide for software dependencies
83
+
84
+ For the code that is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
85
+
86
+ ## Installation guide for Mistral 7B training and inference
87
+
88
+ Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
89
+
90
+ Also, install SGLang for inference.
91
+
92
+ ## Typical install time on a "normal" desktop computer
93
+
94
+ For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
95
+
96
+ For Mistral 7B training and inference, the install time is around 1 hour.
97
+
98
+ # Demo
99
+
100
+ ## Instructions to run on data
101
+
102
+ Run the code in the order listed in the folders section above.
103
+
104
+ Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
105
+
106
+ - The scraper takes over 12 hours to run.
107
+ - The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
108
+ - The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
109
+
110
+ All other files can be run in a few minutes.
111
+
112
+ ## Expected output
113
+
114
+ You should re-obtain the CSV files already in the folders and the plots used in the paper.
115
+
116
+ ## Expected run time for demo on a "normal" desktop computer
117
+
118
+ The expected run time to run every notebook on a "normal" desktop computer is around 10 minutes (excluding the training and inference of Mistral 7B).
119
+
120
+ ## Instructions for use on custom data
121
+
122
+ The code is designed only to analyze the mining section of bitcointalk.org.
123
+
124
+ **Acknowledgments**
125
+ This work was partially supported by the National Natural Science Foundation of China (Grant No.
126
+ T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
127
+ Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
128
+ Computational Science and Engineering at Southern University of Science and Technology. The
129
+ authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
130
+ for helpful comments. Any errors are our own.