mrcha033 commited on
Commit
072c33a
Β·
verified Β·
1 Parent(s): 6d6ef59

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -17
README.md CHANGED
@@ -26,13 +26,13 @@ A Korean language tokenizer with 96,000 vocabulary size, optimized for Korean te
26
 
27
  ## Usage
28
 
29
- ### With Transformers Library
30
 
31
  ```python
32
  from transformers import PreTrainedTokenizerFast
33
 
34
- # Load the tokenizer
35
- tokenizer = PreTrainedTokenizerFast.from_pretrained("./tokenizer_hf-96k")
36
 
37
  # Tokenize Korean text
38
  text = "μ•ˆλ…•ν•˜μ„Έμš”, ν•œκ΅­μ–΄ ν† ν¬λ‚˜μ΄μ €μž…λ‹ˆλ‹€."
@@ -47,20 +47,6 @@ decoded_text = tokenizer.decode(token_ids)
47
  print(f"Decoded: {decoded_text}")
48
  ```
49
 
50
- ### With Tokenizers Library
51
-
52
- ```python
53
- from tokenizers import Tokenizer
54
-
55
- # Load tokenizer
56
- tokenizer = Tokenizer.from_file("./tokenizer_hf-96k/tokenizer.json")
57
-
58
- # Encode text
59
- encoding = tokenizer.encode("μ•ˆλ…•ν•˜μ„Έμš”, ν•œκ΅­μ–΄ ν† ν¬λ‚˜μ΄μ €μž…λ‹ˆλ‹€.")
60
- print(f"Tokens: {encoding.tokens}")
61
- print(f"IDs: {encoding.ids}")
62
- ```
63
-
64
  ## Special Tokens
65
 
66
  - `<unk>` - Unknown token
 
26
 
27
  ## Usage
28
 
29
+ ### From Hugging Face Hub
30
 
31
  ```python
32
  from transformers import PreTrainedTokenizerFast
33
 
34
+ # Load the tokenizer from Hugging Face Hub
35
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("mrcha033/YunMin-tokenizer-96k")
36
 
37
  # Tokenize Korean text
38
  text = "μ•ˆλ…•ν•˜μ„Έμš”, ν•œκ΅­μ–΄ ν† ν¬λ‚˜μ΄μ €μž…λ‹ˆλ‹€."
 
47
  print(f"Decoded: {decoded_text}")
48
  ```
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  ## Special Tokens
51
 
52
  - `<unk>` - Unknown token