cgrumbach commited on
Commit
9df3062
·
verified ·
1 Parent(s): e738466

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -30
README.md CHANGED
@@ -1,8 +1,10 @@
1
  # Title
2
- Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin
 
 
 
 
3
 
4
- - Cyrille Grumbach, ETH Zurich, Switzerland (cgrumbach@ethz.ch)
5
- - Didier Sornette, Southern University of Science and Technology, China (didier@sustech.edu.cn)
6
 
7
  # Folders
8
 
@@ -10,31 +12,33 @@ Bitcoin Forum analysis underscores enduring and substantial carbon footprint of
10
 
11
  File main.ipynb is used to scrape the forum.
12
 
 
13
  ## hardwarelist
14
 
15
  Includes the following:
16
 
17
- - pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
18
- - get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
19
  - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
20
  - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
21
  - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
22
  - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
23
  - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
24
 
 
25
  ## bitcoinforum
26
 
27
  ### 1_forum_dataset
28
 
29
- It contains the raw HTML from the forum and code to parse it and combine it into data frames.
30
 
31
  ### 2_train_set_creation
32
 
33
- Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
34
 
35
  ### 3_training
36
 
37
- Trains Mistral 7B using LoRA, on the dataset generated earlier and saves the merged model.
38
 
39
 
40
  ### 4_inference
@@ -55,43 +59,45 @@ Includes the following files:
55
 
56
  Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
57
 
58
- monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency,max_efficiency
59
 
60
 
61
  ## plots
62
 
63
  Includes the following:
64
 
65
- - carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
66
  - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
67
- - appendix2.ipynb: Creates all plots needed for the paper
 
 
68
 
69
 
70
  # System requirements
71
 
72
- Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
73
 
74
- Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
75
 
76
  # Operating system
77
 
78
- Code that is not related to training or inference of Mistral 7B has been tested on Windows 10.
79
 
80
  Code for Mistral 7B training and inference has been tested on Runpod instances.
81
 
82
  # Installation guide for software dependencies
83
 
84
- For the code that is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
85
 
86
  ## Installation guide for Mistral 7B training and inference
87
 
88
  Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
89
 
90
- Also, install SGLang for inference.
91
 
92
  ## Typical install time on a "normal" desktop computer
93
 
94
- For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
95
 
96
  For Mistral 7B training and inference, the install time is around 1 hour.
97
 
@@ -101,17 +107,17 @@ For Mistral 7B training and inference, the install time is around 1 hour.
101
 
102
  Run the code in the order listed in the folders section above.
103
 
104
- Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
105
 
106
  - The scraper takes over 12 hours to run.
107
- - The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
108
- - The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
109
 
110
  All other files can be run in a few minutes.
111
 
112
  ## Expected output
113
 
114
- You should re-obtain the CSV files already in the folders and the plots used in the paper.
115
 
116
  ## Expected run time for demo on a "normal" desktop computer
117
 
@@ -119,12 +125,4 @@ The expected run time to run every notebook on a "normal" desktop computer is ar
119
 
120
  ## Instructions for use on custom data
121
 
122
- The code is designed only to analyze the mining section of bitcointalk.org.
123
-
124
- **Acknowledgments**
125
- This work was partially supported by the National Natural Science Foundation of China (Grant No.
126
- T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
127
- Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
128
- Computational Science and Engineering at Southern University of Science and Technology. The
129
- authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
130
- for helpful comments. Any errors are our own.
 
1
  # Title
2
+ Analyzing Bitcointalk.org with Large Language Models
3
+
4
+ - Researchers: Cyrille Grumbach, Didier Sornette
5
+ - Research Assistant: Timothé Laborie
6
+
7
 
 
 
8
 
9
  # Folders
10
 
 
12
 
13
  File main.ipynb is used to scrape the forum.
14
 
15
+
16
  ## hardwarelist
17
 
18
  Includes the following:
19
 
20
+ - pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates, originally made by Cyrille.
21
+ - get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: Can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
22
  - hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
23
  - 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
24
  - 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
25
  - 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
26
  - 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
27
 
28
+
29
  ## bitcoinforum
30
 
31
  ### 1_forum_dataset
32
 
33
+ Contains the raw HTML from the forum, and code to parse it and combine it into dataframes.
34
 
35
  ### 2_train_set_creation
36
 
37
+ Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, also creates the inputs that will be given to Mistral 7B after training.
38
 
39
  ### 3_training
40
 
41
+ Trains Mistral 7B using LoRA, on the dataset generated earlier, and saves the merged model
42
 
43
 
44
  ### 4_inference
 
59
 
60
  Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
61
 
62
+ monthly_stuff.csv contains columns: date,price,hashrate,coins_per_block,efficiency,max_efficiency
63
 
64
 
65
  ## plots
66
 
67
  Includes the following:
68
 
69
+ - carboncomparison folder: Contains the 17 sources used to create the carbon comparison table
70
  - carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
71
+ - appendix2.ipynb: Creates all plots from appendix2
72
+
73
+
74
 
75
 
76
  # System requirements
77
 
78
+ Running the training or inference of Mistral 7B requires an NVIDIA GPU with at least 24GB of VRAM (can also be a Runpod instance).
79
 
80
+ Everything else can be run on a normal desktop/laptop computer with python 3.10 installed.
81
 
82
  # Operating system
83
 
84
+ Code which is not related to training or inference of Mistral 7B has been tested on Windows 10.
85
 
86
  Code for Mistral 7B training and inference has been tested on Runpod instances.
87
 
88
  # Installation guide for software dependencies
89
 
90
+ For the code which is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
91
 
92
  ## Installation guide for Mistral 7B training and inference
93
 
94
  Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
95
 
96
+ Also install SGLang for inference.
97
 
98
  ## Typical install time on a "normal" desktop computer
99
 
100
+ For the code which is not related to training or inference of Mistral 7B, the install time is around 5 minutes.
101
 
102
  For Mistral 7B training and inference, the install time is around 1 hour.
103
 
 
107
 
108
  Run the code in the order listed in the folders section above.
109
 
110
+ Note: There are 3 files that normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
111
 
112
  - The scraper takes over 12 hours to run.
113
+ - The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ of OpenAI credits.
114
+ - The process of mapping the hardware names to those of the efficiency table takes around 3 hour and also costs about 10$ of OpenAI credits.
115
 
116
  All other files can be run in a few minutes.
117
 
118
  ## Expected output
119
 
120
+ You should re-obtain the csv files that are already in the folders, and the plots used in the paper.
121
 
122
  ## Expected run time for demo on a "normal" desktop computer
123
 
 
125
 
126
  ## Instructions for use on custom data
127
 
128
+ The code is designed only to analyse the mining section of bitcointalk.org.