jed351
/

gpt2-tiny-zh-hk

Feature Extraction

Model card Files Files and versions

jed351 commited on Jan 27, 2023

Commit

f6a57de

·

1 Parent(s): 351b0c8

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -7,16 +7,17 @@ It is simply a base model in which the embeddings and tokenizer were patched wit
-I used this repo to identify missing Cantonese characters
-https://github.com/ayaka14732/bert-tokenizer-cantonese
-My forked and modified version: https://github.com/jedcheng/bert-tokenizer-cantonese
-After identifying the missing characters, the huggingface library provides very high level API to modify the tokenizer and embeddings.
 ```
-Download your model from the Huggingface library
 tokenizer.add_tokens("your new tokens")
 model.resize_token_embeddings(len(tokenizer))
 ```

+I used this [repo](https://github.com/ayaka14732/bert-tokenizer-cantonese) to identify missing Cantonese characters
+[My forked and modified version](https://github.com/jedcheng/bert-tokenizer-cantonese)
+After identifying the missing characters, the Huggingface library provides very high level API to modify the tokenizer and embeddings.
 ```
+Download a tokenizer and a model from the Huggingface library. Then:
 tokenizer.add_tokens("your new tokens")
 model.resize_token_embeddings(len(tokenizer))
+tokenizer.push_to_hub("your model name")
 ```