nlpaueb
/

sec-bert-shape

Model card Files Files and versions

nlpaueb commited on Mar 3, 2022

Commit

20ecec1

·

1 Parent(s): 86078dc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -47,6 +47,8 @@ tokenizer = AutoTokenizer.from_pretrained("nlpaueb/sec-bert-base")
 model = AutoModel.from_pretrained("nlpaueb/sec-bert-base")
 ```
 In order to use SEC-BERT-NUM, you have to pre-process texts replacing every numerical token with a corresponding shape pseudo token from a list of 214 predefined shape pseudo tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
 Below there is an example how you can pre-process a simple sentence. This approach is quite simple, feel free to modify it as you see fit.

 model = AutoModel.from_pretrained("nlpaueb/sec-bert-base")
 ```
+## Pre-process Text
 In order to use SEC-BERT-NUM, you have to pre-process texts replacing every numerical token with a corresponding shape pseudo token from a list of 214 predefined shape pseudo tokens. If the numerical token does not correspond to any shape pseudo token we replace it with the [NUM] pseudo-token.
 Below there is an example how you can pre-process a simple sentence. This approach is quite simple, feel free to modify it as you see fit.