Question Answering
Transformers
PyTorch
German
bert
scherrmann commited on
Commit
d280b81
·
1 Parent(s): ff1be4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -20,10 +20,10 @@ This model is the [further-pretrained version of German FinBERT](https://hugging
20
 
21
  ### Fine-tuning
22
 
23
- I fine-tune all models on all downstream tasks using the 1cycle policy of [Smith and Topin (2019)](https://arxiv.org/abs/1708.07120). I use the Adam optimization method of [Kingma and Ba (2014)](https://arxiv.org/abs/1412.6980) with
24
- standard parameters. For every model, I run a separate grid search on the respective evaluation set for each task to find the best hyper-parameter setup. I test different
25
- values for learning rate, batch size and number of epochs, following the suggestions of [Chalkidis et al. (2020)](https://aclanthology.org/2020.findings-emnlp.261/). After that, I report the results for all models on the respective
26
- test set, using the tuned hyper-parameters.
27
 
28
  ### Results
29
 
 
20
 
21
  ### Fine-tuning
22
 
23
+ I fine-tune the model using the 1cycle policy of [Smith and Topin (2019)](https://arxiv.org/abs/1708.07120). I use the Adam optimization method of [Kingma and Ba (2014)](https://arxiv.org/abs/1412.6980) with
24
+ standard parameters.I run a grid search on the evaluation set to find the best hyper-parameter setup. I test different
25
+ values for learning rate, batch size and number of epochs, following the suggestions of [Chalkidis et al. (2020)](https://aclanthology.org/2020.findings-emnlp.261/). I repeat the fine-tuning for each setup five times with different seeds, to avoid getting good results by chance.
26
+ After finding the best model w.r.t the evaluation set, I report the mean result across seeds for that model on the test set.
27
 
28
  ### Results
29