ufal
/

robeczech-base

Model card Files Files and versions

Milan Straka commited on Jan 5, 2024

Commit

41cfe08

·

1 Parent(s): 354716a

Better formulation.

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -14,7 +14,9 @@ tags:
 ## Version History
 - **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
-  tokenizer; the weights are unmodified.
   The tokenizer in the initial release (a) contained a hole (51959 did not
   correspond to any token), and (b) mapped several tokens (unseen during training
@@ -25,7 +27,7 @@ tags:
   In version 1.1, the tokenizer was modified by (a) removing the hole, (b)
   mapping all tokens to a unique ID. That also required increasing the
-  vocabulary sizes and embeddings weights (by replicating the embedding of the
   `[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
   the same results on any input, and the tokens in version 1.0 that mapped to
   a different ID than the `[UNK]` token map to the same ID in version 1.1.

 ## Version History
 - **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
+  tokenizer; the model parameters were mostly kept the same, but the embeddings
+  were enlarged (by copying suitable rows) to correspond to the updated
+  tokenizer.
   The tokenizer in the initial release (a) contained a hole (51959 did not
   correspond to any token), and (b) mapped several tokens (unseen during training
   In version 1.1, the tokenizer was modified by (a) removing the hole, (b)
   mapping all tokens to a unique ID. That also required increasing the
+  vocabulary size and embeddings weights (by replicating the embedding of the
   `[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
   the same results on any input, and the tokens in version 1.0 that mapped to
   a different ID than the `[UNK]` token map to the same ID in version 1.1.