mjbommar commited on
Commit
c485aa9
·
verified ·
1 Parent(s): 5712dec

Add tokenizer card

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -27,13 +27,13 @@ This tokenizer is designed for binary file classification and analysis tasks.
27
 
28
  | Token | ID | Purpose |
29
  |-------|-----|---------|
30
- | `<|start|>` | 0 | Beginning of sequence (BOS) |
31
- | `<|end|>` | 1 | End of sequence (EOS) |
32
- | `<|pad|>` | 2 | Padding |
33
- | `<|unk|>` | 3 | Unknown token |
34
- | `<|cls|>` | 4 | Classification token |
35
- | `<|sep|>` | 5 | Separator token |
36
- | `<|mask|>` | 6 | Mask token (for MLM) |
37
 
38
  ## Usage
39
 
 
27
 
28
  | Token | ID | Purpose |
29
  |-------|-----|---------|
30
+ | `<\|start\|>` | 0 | Beginning of sequence (BOS) |
31
+ | `<\|end\|>` | 1 | End of sequence (EOS) |
32
+ | `<\|pad\|>` | 2 | Padding |
33
+ | `<\|unk\|>` | 3 | Unknown token |
34
+ | `<\|cls\|>` | 4 | Classification token |
35
+ | `<\|sep\|>` | 5 | Separator token |
36
+ | `<\|mask\|>` | 6 | Mask token (for MLM) |
37
 
38
  ## Usage
39