Update README.md
Browse files
README.md
CHANGED
|
@@ -47,6 +47,8 @@ tokenizer = AutoTokenizer.from_pretrained("nlpaueb/sec-bert-base")
|
|
| 47 |
model = AutoModel.from_pretrained("nlpaueb/sec-bert-base")
|
| 48 |
```
|
| 49 |
|
|
|
|
|
|
|
| 50 |
In order to use SEC-BERT-NUM, you have to pre-process texts replacing every numerical token with a corresponding shape pseudo token from a list of 214 predefined shape pseudo tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
|
| 51 |
Below there is an example how you can pre-process a simple sentence. This approach is quite simple, feel free to modify it as you see fit.
|
| 52 |
|
|
|
|
| 47 |
model = AutoModel.from_pretrained("nlpaueb/sec-bert-base")
|
| 48 |
```
|
| 49 |
|
| 50 |
+
## Pre-process Text
|
| 51 |
+
|
| 52 |
In order to use SEC-BERT-NUM, you have to pre-process texts replacing every numerical token with a corresponding shape pseudo token from a list of 214 predefined shape pseudo tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
|
| 53 |
Below there is an example how you can pre-process a simple sentence. This approach is quite simple, feel free to modify it as you see fit.
|
| 54 |
|