Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ widget:
|
|
| 11 |
- text: "Total net sales decreased [X]% or $[MASK] billion during [XXXX] compared to [XXXX]."
|
| 12 |
- text: "Total net sales decreased [X]% or $[X.X] billion during [MASK] compared to [XXXX]."
|
| 13 |
- text: "During [MASK], the Company repurchased $[XX.X] billion of its common stock and paid dividend equivalents of $[XX.X] billion."
|
| 14 |
-
- text: "During 2019, the Company repurchased $[MASK] billion of its common stock and paid
|
| 15 |
---
|
| 16 |
|
| 17 |
# SEC-BERT
|
|
@@ -48,7 +48,7 @@ model = AutoModel.from_pretrained("nlpaueb/sec-bert-base")
|
|
| 48 |
|
| 49 |
## Pre-process Text
|
| 50 |
|
| 51 |
-
To use SEC-BERT-SHAPE, you have to pre-process texts replacing every numerical token with the corresponding shape pseudo-token from a list of 214 predefined shape pseudo-tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
|
| 52 |
Below there is an example of how you can pre-process a simple sentence. This approach is quite simple; feel free to modify it as you see fit.
|
| 53 |
|
| 54 |
```python
|
|
|
|
| 11 |
- text: "Total net sales decreased [X]% or $[MASK] billion during [XXXX] compared to [XXXX]."
|
| 12 |
- text: "Total net sales decreased [X]% or $[X.X] billion during [MASK] compared to [XXXX]."
|
| 13 |
- text: "During [MASK], the Company repurchased $[XX.X] billion of its common stock and paid dividend equivalents of $[XX.X] billion."
|
| 14 |
+
- text: "During 2019, the Company repurchased $[MASK] billion of its common stock and paid dividend equivalents of $[XX.X] billion."
|
| 15 |
---
|
| 16 |
|
| 17 |
# SEC-BERT
|
|
|
|
| 48 |
|
| 49 |
## Pre-process Text
|
| 50 |
|
| 51 |
+
To use SEC-BERT-SHAPE, you have to pre-process texts replacing every numerical token with the corresponding shape pseudo-token, from a list of 214 predefined shape pseudo-tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
|
| 52 |
Below there is an example of how you can pre-process a simple sentence. This approach is quite simple; feel free to modify it as you see fit.
|
| 53 |
|
| 54 |
```python
|