dragosnicolae555 commited on
Commit
4b41416
·
1 Parent(s): d7dd8d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -31,17 +31,6 @@ model = AutoModel.from_pretrained("dragosnicolae555/ALR_BERT")
31
 
32
  #Here add your magic
33
 
34
- ```
35
-
36
- As in the case of other romanian models Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
37
-
38
- ```
39
-
40
- text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
41
-
42
- ```
43
-
44
- because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.
45
 
46
  ### Evaluation
47
 
@@ -66,3 +55,15 @@ The model is trained on the following corpora (stats in the table below are afte
66
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
67
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
68
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  #Here add your magic
33
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ### Evaluation
36
 
 
55
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
56
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
57
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
58
+
59
+
60
+
61
+
62
+
63
+ ```
64
+
65
+ Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
66
+ ```
67
+ text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
68
+ ```
69
+ because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.