Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Title
|
| 2 |
+
Bitcoin Forum analysis underscores enduring and substantial carbon footprint of Bitcoin
|
| 3 |
+
|
| 4 |
+
- Cyrille Grumbach, ETH Zurich, Switzerland (cgrumbach@ethz.ch)
|
| 5 |
+
- Didier Sornette, Southern University of Science and Technology, China (didier@sustech.edu.cn)
|
| 6 |
+
|
| 7 |
+
# Folders
|
| 8 |
+
|
| 9 |
+
## scraper
|
| 10 |
+
|
| 11 |
+
File main.ipynb is used to scrape the forum.
|
| 12 |
+
|
| 13 |
+
## hardwarelist
|
| 14 |
+
|
| 15 |
+
Includes the following:
|
| 16 |
+
|
| 17 |
+
- pmaxv1 folder: contains the maximum hardware efficiency for each date alongside some manually added updates originally made by Cyrille.
|
| 18 |
+
- get_hwd_asicminervalue.js and get_hwd_bitcoinwiki.js: These can be pasted into the browser console in the URLs listed within the files, used to extract the hardware efficiency table
|
| 19 |
+
- hardware_asicminervalue.txt and hardware_bitcoinwiki.txt: The raw output from the above scripts
|
| 20 |
+
- 1_cleanup_hardware_table.ipynb: Used to clean up the raw output, to create hardware_asicminervalue.csv and hardware_bitcoinwiki.csv
|
| 21 |
+
- 2_merge_tables.ipynb: Merges the two tables into hardware_merged.csv
|
| 22 |
+
- 3_paper_list.ipynb: Creates 4 things. 1: The hardware table in the appendix. 2: The pmaxv2.csv file, which uses the hardware_merged.csv file to create an improved table with the maximum hardware efficiency for each date. 3: The pmax evolution table for the paper. 4: The paper_list.csv file, which is used to create an excel sheet later
|
| 23 |
+
- 4_create_pmaxv3.ipynb: Creates the pmaxv3.csv file, which is the max between the pmaxv1.csv and pmaxv2.csv files
|
| 24 |
+
|
| 25 |
+
## bitcoinforum
|
| 26 |
+
|
| 27 |
+
### 1_forum_dataset
|
| 28 |
+
|
| 29 |
+
It contains the raw HTML from the forum and code to parse it and combine it into data frames.
|
| 30 |
+
|
| 31 |
+
### 2_train_set_creation
|
| 32 |
+
|
| 33 |
+
Combines the forum sections into one, truncates long threads, passes a random sample to GPT4 to get the training set for Mistral 7B, and also creates the inputs that will be given to Mistral 7B after training.
|
| 34 |
+
|
| 35 |
+
### 3_training
|
| 36 |
+
|
| 37 |
+
Trains Mistral 7B using LoRA, on the dataset generated earlier and saves the merged model.
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
### 4_inference
|
| 41 |
+
|
| 42 |
+
Runs inference of the trained Mistral 7B on inputs.csv created in part 2.
|
| 43 |
+
|
| 44 |
+
### 5_processing_extracted_data
|
| 45 |
+
|
| 46 |
+
Includes the following files:
|
| 47 |
+
|
| 48 |
+
- 1_processing.ipynb: Takes the raw output from Mistral 7B and converts it into hardware_instances.csv
|
| 49 |
+
- 2_create_mapping.ipynb: Uses GPT4 to map the hardware names to those of the efficiency table
|
| 50 |
+
- 3_add_efficiency.ipynb: Merges the mapped hardware instances and the efficiency table to get hardware_instances_with_efficiency.csv
|
| 51 |
+
- 4_visualizations.ipynb, not_usable_threads.txt, hardware_instances_inc_threads.csv: Only used for debugging
|
| 52 |
+
- hardware_mapping.py: automatically generated by step 3
|
| 53 |
+
|
| 54 |
+
### 6_merging
|
| 55 |
+
|
| 56 |
+
Averages the forum efficiency on a monthly basis, then merges it alongside the Bitcoin price, hashrate, coins per block, and maximum hardware efficiency to create monthly_stuff.csv
|
| 57 |
+
|
| 58 |
+
monthly_stuff.csv contains columns: date, price, hashrate, coins_per_block, efficiency,max_efficiency
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## plots
|
| 62 |
+
|
| 63 |
+
Includes the following:
|
| 64 |
+
|
| 65 |
+
- carbon-comparison folder: Contains the 17 sources used to create the carbon comparison table
|
| 66 |
+
- carbonintensity.html: Cambridge's table for the yearly gCO2e/kWh values, found at https://ccaf.io/cbnsi/cbeci/ghg/methodology
|
| 67 |
+
- appendix2.ipynb: Creates all plots needed for the paper
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
# System requirements
|
| 71 |
+
|
| 72 |
+
Running Mistral 7 B's training or inference requires an NVIDIA GPU with at least 24GB of VRAM (it can also be a Runpod instance).
|
| 73 |
+
|
| 74 |
+
Everything else can be run on a normal desktop/laptop computer with Python 3.10 installed.
|
| 75 |
+
|
| 76 |
+
# Operating system
|
| 77 |
+
|
| 78 |
+
Code that is not related to training or inference of Mistral 7B has been tested on Windows 10.
|
| 79 |
+
|
| 80 |
+
Code for Mistral 7B training and inference has been tested on Runpod instances.
|
| 81 |
+
|
| 82 |
+
# Installation guide for software dependencies
|
| 83 |
+
|
| 84 |
+
For the code that is not related to training or inference of Mistral 7B, use the packages listed in requirements.txt
|
| 85 |
+
|
| 86 |
+
## Installation guide for Mistral 7B training and inference
|
| 87 |
+
|
| 88 |
+
Setup a Runpod instance with the axolotl docker image, then install unsloth using the instructions at https://github.com/unslothai/unsloth
|
| 89 |
+
|
| 90 |
+
Also, install SGLang for inference.
|
| 91 |
+
|
| 92 |
+
## Typical install time on a "normal" desktop computer
|
| 93 |
+
|
| 94 |
+
For the code unrelated to training or inference of Mistral 7B, the install time is around 5 minutes.
|
| 95 |
+
|
| 96 |
+
For Mistral 7B training and inference, the install time is around 1 hour.
|
| 97 |
+
|
| 98 |
+
# Demo
|
| 99 |
+
|
| 100 |
+
## Instructions to run on data
|
| 101 |
+
|
| 102 |
+
Run the code in the order listed in the folders section above.
|
| 103 |
+
|
| 104 |
+
Note: 3 files normally take a long time to run. I have included a const "DEMO_MODE" at the top of each file. When turned on, the files will run on a tiny subset of the data. The original runtimes are as follows:
|
| 105 |
+
|
| 106 |
+
- The scraper takes over 12 hours to run.
|
| 107 |
+
- The process of creating the training set for Mistral 7B takes around 3 hours and costs about 10$ OpenAI credits.
|
| 108 |
+
- The process of mapping the hardware names to those of the efficiency table takes around 3 hours and also costs about 10$ of OpenAI credits.
|
| 109 |
+
|
| 110 |
+
All other files can be run in a few minutes.
|
| 111 |
+
|
| 112 |
+
## Expected output
|
| 113 |
+
|
| 114 |
+
You should re-obtain the CSV files already in the folders and the plots used in the paper.
|
| 115 |
+
|
| 116 |
+
## Expected run time for demo on a "normal" desktop computer
|
| 117 |
+
|
| 118 |
+
The expected run time to run every notebook on a "normal" desktop computer is around 10 minutes (excluding the training and inference of Mistral 7B).
|
| 119 |
+
|
| 120 |
+
## Instructions for use on custom data
|
| 121 |
+
|
| 122 |
+
The code is designed only to analyze the mining section of bitcointalk.org.
|
| 123 |
+
|
| 124 |
+
**Acknowledgments**
|
| 125 |
+
This work was partially supported by the National Natural Science Foundation of China (Grant No.
|
| 126 |
+
T2350710802 and No. U2039202), Shenzhen Science and Technology Innovation Commission
|
| 127 |
+
Project (Grants No. GJHZ20210705141805017 and No. K23405006), and the Center for
|
| 128 |
+
Computational Science and Engineering at Southern University of Science and Technology. The
|
| 129 |
+
authors acknowledge T. Laborie for excellent research assistance and Y. Cui and M. von Krosigk
|
| 130 |
+
for helpful comments. Any errors are our own.
|