habiakl commited on
Commit ·
3cbe94e
1
Parent(s): eaf800d
Add model weights
Browse files- README.md +32 -0
- config.json +3 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +3 -0
- training_args.bin +3 -0
- vocab.txt +0 -0
README.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Financial Relation Extraction
|
| 2 |
+
|
| 3 |
+
## Process
|
| 4 |
+
|
| 5 |
+
Detecting the presence of a relationship between financial terms and qualifying the relationship in case of its presence. Example use cases:
|
| 6 |
+
|
| 7 |
+
* An A-B trust is a joint trust created by a married couple for the purpose of minimizing estate taxes. (<em>Relationship **exists**, type: **is**</em>)
|
| 8 |
+
* There are no withdrawal penalties. (<em>Relationship **does not exist**, type: **x**</em>)
|
| 9 |
+
|
| 10 |
+
## Data
|
| 11 |
+
The data consists of financial definitions collected from different sources (Wikimedia, IFRS, Investopedia) for financial indicators. Each definition has been split up into sentences, and term relationships in a sentence have been extracted using the [Stanford Open Information Extraction](https://nlp.stanford.edu/software/openie.html) module.
|
| 12 |
+
A typical row in the dataset consists of a definition sentence and its corresponding relationship label.
|
| 13 |
+
The labels were restricted to the 5 most-widely identified relationships, namely: **x** (no relationship), **has**, **is in**, **is** and **are**.
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
## Model
|
| 17 |
+
The model used is a standard Roberta-base transformer model from the Hugging Face library. See [HUGGING FACE DistilBERT base model](https://huggingface.co/distilbert-base-uncased) for more details about the model.
|
| 18 |
+
In addition, the model has been pretrained to initializa weigths that would otherwise be unused if loaded from an existing pretrained stock model.
|
| 19 |
+
|
| 20 |
+
## Metrics
|
| 21 |
+
The evaluation metrics used are: Precision, Recall and F1-score. The following is the classification report on the test set.
|
| 22 |
+
|
| 23 |
+
| relation | precision | recall | f1-score | support |
|
| 24 |
+
| ------------- |:-------------:|:-------------:|:-------------:| -----:|
|
| 25 |
+
| has | 0.7416 | 0.9674 | 0.8396 | 2362 |
|
| 26 |
+
| is in | 0.7813 | 0.7925 | 0.7869 | 2362 |
|
| 27 |
+
| is | 0.8650 | 0.6863 | 0.7653 | 2362 |
|
| 28 |
+
| are | 0.8365 | 0.8493 | 0.8429 | 2362 |
|
| 29 |
+
| x | 0.9515 | 0.8302 | 0.8867 | 2362 |
|
| 30 |
+
| | | | | |
|
| 31 |
+
| macro avg | 0.8352 | 0.8251 | 0.8243 | 11810 |
|
| 32 |
+
| weighted avg | 0.8352 | 0.8251 | 0.8243 | 11810 |
|
config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:07f7615afabda7ff754ea77e3a06a2d218132bfcc3aa42e22f22ac1585bd7718
|
| 3 |
+
size 774
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ff72da51e34eb2d892c303c6f5f7beed57b13965a4c9a0e1379fb95654da8d30
|
| 3 |
+
size 267872407
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:303df45a03609e4ead04bc3dc1536d0ab19b5358db685b6f3da123d05ec200e3
|
| 3 |
+
size 112
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ecafc709dc78a0d00e3bc20477606e97ccfd239fdfddd7e53fcb24300ba0bc13
|
| 3 |
+
size 466247
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87fab29eb94840215d6b277994841550362ceff337f16bbf44e9af30fd2fb62d
|
| 3 |
+
size 291
|
training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e2ab6d3f261a834531ac404acb765265a52a8016338c732b78daa1f299bf6002
|
| 3 |
+
size 2415
|
vocab.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|