Commit
·
e02aba1
1
Parent(s):
ab93dff
Updated model card
Browse files
README.md
CHANGED
|
@@ -19,8 +19,6 @@ It is trained on the following three financial communication corpus. The total c
|
|
| 19 |
- Corporate Reports 10-K & 10-Q: 2.5B tokens
|
| 20 |
- Earnings Call Transcripts: 1.3B tokens
|
| 21 |
- Analyst Reports: 1.1B tokens
|
| 22 |
-
- Demo.org Proprietary Reports
|
| 23 |
-
- Additional purchased data from Factset
|
| 24 |
|
| 25 |
The entire training is done using an **NVIDIA DGX-1** machine. The server has 4 Tesla P100 GPUs, providing a total of 128 GB of GPU memory. This machine enables us to train the BERT models using a batch size of 128. We utilize Horovord framework for multi-GPU training. Overall, the total time taken to perform pretraining for one model is approximately **2 days**.
|
| 26 |
|
|
|
|
| 19 |
- Corporate Reports 10-K & 10-Q: 2.5B tokens
|
| 20 |
- Earnings Call Transcripts: 1.3B tokens
|
| 21 |
- Analyst Reports: 1.1B tokens
|
|
|
|
|
|
|
| 22 |
|
| 23 |
The entire training is done using an **NVIDIA DGX-1** machine. The server has 4 Tesla P100 GPUs, providing a total of 128 GB of GPU memory. This machine enables us to train the BERT models using a batch size of 128. We utilize Horovord framework for multi-GPU training. Overall, the total time taken to perform pretraining for one model is approximately **2 days**.
|
| 24 |
|