Commit
·
4b41416
1
Parent(s):
d7dd8d6
Update README.md
Browse files
README.md
CHANGED
|
@@ -31,17 +31,6 @@ model = AutoModel.from_pretrained("dragosnicolae555/ALR_BERT")
|
|
| 31 |
|
| 32 |
#Here add your magic
|
| 33 |
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
As in the case of other romanian models Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
|
| 37 |
-
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
|
| 41 |
-
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.
|
| 45 |
|
| 46 |
### Evaluation
|
| 47 |
|
|
@@ -66,3 +55,15 @@ The model is trained on the following corpora (stats in the table below are afte
|
|
| 66 |
| OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
|
| 67 |
| Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
|
| 68 |
| **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
#Here add your magic
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
### Evaluation
|
| 36 |
|
|
|
|
| 55 |
| OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
|
| 56 |
| Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
|
| 57 |
| **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
|
| 66 |
+
```
|
| 67 |
+
text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
|
| 68 |
+
```
|
| 69 |
+
because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.
|