Add tokenizer card
Browse files
README.md
CHANGED
|
@@ -27,13 +27,13 @@ This tokenizer is designed for binary file classification and analysis tasks.
|
|
| 27 |
|
| 28 |
| Token | ID | Purpose |
|
| 29 |
|-------|-----|---------|
|
| 30 |
-
|
|
| 31 |
-
|
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
| 35 |
-
|
|
| 36 |
-
|
|
| 37 |
|
| 38 |
## Usage
|
| 39 |
|
|
|
|
| 27 |
|
| 28 |
| Token | ID | Purpose |
|
| 29 |
|-------|-----|---------|
|
| 30 |
+
| `<\|start\|>` | 0 | Beginning of sequence (BOS) |
|
| 31 |
+
| `<\|end\|>` | 1 | End of sequence (EOS) |
|
| 32 |
+
| `<\|pad\|>` | 2 | Padding |
|
| 33 |
+
| `<\|unk\|>` | 3 | Unknown token |
|
| 34 |
+
| `<\|cls\|>` | 4 | Classification token |
|
| 35 |
+
| `<\|sep\|>` | 5 | Separator token |
|
| 36 |
+
| `<\|mask\|>` | 6 | Mask token (for MLM) |
|
| 37 |
|
| 38 |
## Usage
|
| 39 |
|