Question Answering
Transformers
PyTorch
German
bert
scherrmann commited on
Commit
ff1be4a
·
1 Parent(s): d3c27dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -6
README.md CHANGED
@@ -17,14 +17,9 @@ This model is the [further-pretrained version of German FinBERT](https://hugging
17
  **Specialization:** Financial question answering
18
  **Base model:** [German_FinBert_FP](https://huggingface.co/scherrmann/GermanFinBert_FP)
19
 
20
- ## Pre-training
21
- German FinBERT's pre-training corpus includes a diverse range of financial documents, such as Bundesanzeiger reports, Handelsblatt articles, MarketScreener data, and additional sources including FAZ, ad-hoc announcements, LexisNexis & Event Registry content, Zeit Online articles, Wikipedia entries, and Gabler Wirtschaftslexikon. In total, the corpus spans from 1996 to 2023, consisting of 12.15 million documents with 10.12 billion tokens over 53.19 GB.
22
 
23
- I further pre-train the model for 10,400 steps with a batch size of 4096, which is one epoch. I use an Adam optimizer with decoupled weight decay regularization, with Adam parameters 0.9, 0.98, 1e − 6, a weight
24
- decay of 1e − 5 and a maximal learning of 1e − 4. I train the model using a Nvidia DGX A100 node consisting of 8 A100 GPUs with 80 GB of memory each.
25
-
26
- ## Performance
27
  ### Fine-tuning
 
28
  I fine-tune all models on all downstream tasks using the 1cycle policy of [Smith and Topin (2019)](https://arxiv.org/abs/1708.07120). I use the Adam optimization method of [Kingma and Ba (2014)](https://arxiv.org/abs/1412.6980) with
29
  standard parameters. For every model, I run a separate grid search on the respective evaluation set for each task to find the best hyper-parameter setup. I test different
30
  values for learning rate, batch size and number of epochs, following the suggestions of [Chalkidis et al. (2020)](https://aclanthology.org/2020.findings-emnlp.261/). After that, I report the results for all models on the respective
 
17
  **Specialization:** Financial question answering
18
  **Base model:** [German_FinBert_FP](https://huggingface.co/scherrmann/GermanFinBert_FP)
19
 
 
 
20
 
 
 
 
 
21
  ### Fine-tuning
22
+
23
  I fine-tune all models on all downstream tasks using the 1cycle policy of [Smith and Topin (2019)](https://arxiv.org/abs/1708.07120). I use the Adam optimization method of [Kingma and Ba (2014)](https://arxiv.org/abs/1412.6980) with
24
  standard parameters. For every model, I run a separate grid search on the respective evaluation set for each task to find the best hyper-parameter setup. I test different
25
  values for learning rate, batch size and number of epochs, following the suggestions of [Chalkidis et al. (2020)](https://aclanthology.org/2020.findings-emnlp.261/). After that, I report the results for all models on the respective