J-Raposo commited on
Commit
ffec9f2
·
verified ·
1 Parent(s): 9c6c107

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -38,7 +38,7 @@ This tokenizer is a byte-level BPE tokenizer (GPT-2 style) retrained on the Code
38
 
39
  ## Tokenizer details / configuration
40
  - **Tokenizer type:** Byte-level BPE (GPT-2–style / `tokenizers` fast API).
41
- - **Vocabulary size:** 50,257 (GPT-2 default) — **replace with the actual vocab size if different**.
42
  - **Special tokens:** standard GPT-2 tokens (e.g., ``) or custom tokens if you added any. Ensure `tokenizer_config.json` in the repo lists them.
43
  - **Normalization:** Byte-level normalization (works with arbitrary byte sequences / UTF-8).
44
  - **Files included:** `tokenizer.json` (preferred `tokenizers` fast format) or `vocab.json` + `merges.txt` (legacy), and `tokenizer_config.json`.
 
38
 
39
  ## Tokenizer details / configuration
40
  - **Tokenizer type:** Byte-level BPE (GPT-2–style / `tokenizers` fast API).
41
+ - **Vocabulary size:** 50,257 (GPT-2 default)
42
  - **Special tokens:** standard GPT-2 tokens (e.g., ``) or custom tokens if you added any. Ensure `tokenizer_config.json` in the repo lists them.
43
  - **Normalization:** Byte-level normalization (works with arbitrary byte sequences / UTF-8).
44
  - **Files included:** `tokenizer.json` (preferred `tokenizers` fast format) or `vocab.json` + `merges.txt` (legacy), and `tokenizer_config.json`.