Commit ·
a932602
1
Parent(s): ae37ee5
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,19 +3,19 @@ license: apache-2.0
|
|
| 3 |
language:
|
| 4 |
- de
|
| 5 |
---
|
| 6 |
-
# German FinBERT (Further Pre-trained Version)
|
| 7 |
|
| 8 |
German FinBERT is a BERT language model focusing on the financial domain within the German language. In my [paper](https://arxiv.org/pdf/2311.08793.pdf), I describe in more detail the steps taken to train the model and show that it outperforms its generic benchmarks for finance specific downstream tasks.
|
| 9 |
-
|
|
|
|
| 10 |
|
| 11 |
## Overview
|
| 12 |
**Author** Moritz Scherrmann
|
| 13 |
**Paper:** [here](https://arxiv.org/pdf/2311.08793.pdf)
|
| 14 |
**Architecture:** BERT base
|
| 15 |
**Language:** German
|
| 16 |
-
**Specialization:** Financial
|
| 17 |
-
**
|
| 18 |
-
**Framework:** [MosaicML](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert)
|
| 19 |
|
| 20 |
## Pre-training
|
| 21 |
German FinBERT's pre-training corpus includes a diverse range of financial documents, such as Bundesanzeiger reports, Handelsblatt articles, MarketScreener data, and additional sources including FAZ, ad-hoc announcements, LexisNexis & Event Registry content, Zeit Online articles, Wikipedia entries, and Gabler Wirtschaftslexikon. In total, the corpus spans from 1996 to 2023, consisting of 12.15 million documents with 10.12 billion tokens over 53.19 GB.
|
|
@@ -24,27 +24,19 @@ I further pre-train the model for 10,400 steps with a batch size of 4096, which
|
|
| 24 |
decay of 1e − 5 and a maximal learning of 1e − 4. I train the model using a Nvidia DGX A100 node consisting of 8 A100 GPUs with 80 GB of memory each.
|
| 25 |
|
| 26 |
## Performance
|
| 27 |
-
### Fine-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
### Benchmark Results
|
| 34 |
-
The further pre-trained German FinBERT model demonstrated the following performances on finance-specific downstream tasks:
|
| 35 |
|
| 36 |
-
|
| 37 |
-
- Macro F1: 86.08%
|
| 38 |
-
- Micro F1: 85.65%
|
| 39 |
|
| 40 |
Ad-Hoc QuAD (Question Answering):
|
| 41 |
- Exact Match (EM): 52.50%
|
| 42 |
- F1 Score: 74.61%
|
| 43 |
|
| 44 |
-
|
| 45 |
-
- Accuracy: 95.41%
|
| 46 |
-
- Macro F1: 91.49%
|
| 47 |
-
|
| 48 |
## Authors
|
| 49 |
Moritz Scherrmann: `scherrmann [at] lmu.de`
|
| 50 |
|
|
@@ -53,5 +45,5 @@ For additional details regarding the performance on fine-tune datasets and bench
|
|
| 53 |
|
| 54 |
See also:
|
| 55 |
- scherrmann/GermanFinBERT_SC
|
| 56 |
-
- scherrmann/
|
| 57 |
- scherrmann/GermanFinBERT_SC_Sentiment
|
|
|
|
| 3 |
language:
|
| 4 |
- de
|
| 5 |
---
|
| 6 |
+
# German FinBERT For QuAD(Further Pre-trained Version, Fine-Tuned for Financial Question Answering)
|
| 7 |
|
| 8 |
German FinBERT is a BERT language model focusing on the financial domain within the German language. In my [paper](https://arxiv.org/pdf/2311.08793.pdf), I describe in more detail the steps taken to train the model and show that it outperforms its generic benchmarks for finance specific downstream tasks.
|
| 9 |
+
|
| 10 |
+
This model is the [further-pretrained version of German FinBERT](https://huggingface.co/scherrmann/GermanFinBert_FP), after fine-tuning on the [German Ad-Hoc QuAD dataset](https://huggingface.co/datasets/scherrmann/adhoc_quad).
|
| 11 |
|
| 12 |
## Overview
|
| 13 |
**Author** Moritz Scherrmann
|
| 14 |
**Paper:** [here](https://arxiv.org/pdf/2311.08793.pdf)
|
| 15 |
**Architecture:** BERT base
|
| 16 |
**Language:** German
|
| 17 |
+
**Specialization:** Financial question answering
|
| 18 |
+
**Base model:** [German_FinBert_FP](https://huggingface.co/scherrmann/GermanFinBert_FP)
|
|
|
|
| 19 |
|
| 20 |
## Pre-training
|
| 21 |
German FinBERT's pre-training corpus includes a diverse range of financial documents, such as Bundesanzeiger reports, Handelsblatt articles, MarketScreener data, and additional sources including FAZ, ad-hoc announcements, LexisNexis & Event Registry content, Zeit Online articles, Wikipedia entries, and Gabler Wirtschaftslexikon. In total, the corpus spans from 1996 to 2023, consisting of 12.15 million documents with 10.12 billion tokens over 53.19 GB.
|
|
|
|
| 24 |
decay of 1e − 5 and a maximal learning of 1e − 4. I train the model using a Nvidia DGX A100 node consisting of 8 A100 GPUs with 80 GB of memory each.
|
| 25 |
|
| 26 |
## Performance
|
| 27 |
+
### Fine-tuning
|
| 28 |
+
I fine-tune all models on all downstream tasks using the 1cycle policy of [Smith and Topin (2019)](https://arxiv.org/abs/1708.07120). I use the Adam optimization method of [Kingma and Ba (2014)](https://arxiv.org/abs/1412.6980) with
|
| 29 |
+
standard parameters. For every model, I run a separate grid search on the respective evaluation set for each task to find the best hyper-parameter setup. I test different
|
| 30 |
+
values for learning rate, batch size and number of epochs, following the suggestions of [Chalkidis et al. (2020)](https://aclanthology.org/2020.findings-emnlp.261/). After that, I report the results for all models on the respective
|
| 31 |
+
test set, using the tuned hyper-parameters.
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
### Results
|
|
|
|
|
|
|
| 34 |
|
| 35 |
Ad-Hoc QuAD (Question Answering):
|
| 36 |
- Exact Match (EM): 52.50%
|
| 37 |
- F1 Score: 74.61%
|
| 38 |
|
| 39 |
+
|
|
|
|
|
|
|
|
|
|
| 40 |
## Authors
|
| 41 |
Moritz Scherrmann: `scherrmann [at] lmu.de`
|
| 42 |
|
|
|
|
| 45 |
|
| 46 |
See also:
|
| 47 |
- scherrmann/GermanFinBERT_SC
|
| 48 |
+
- scherrmann/GermanFinBERT_FP
|
| 49 |
- scherrmann/GermanFinBERT_SC_Sentiment
|