Merlin
tokenizer
tsuberim commited on
Commit
e4b6feb
·
verified ·
1 Parent(s): ad6208f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +35 -18
README.md CHANGED
@@ -27,28 +27,45 @@ BPE tokenizer for [Merlin](https://github.com/tsuberim/merlin) — a small LM pu
27
 
28
  ## Special tokens
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  | ID | Token | Role |
31
  |---|---|---|
32
- | 0 | `<\|bos\|>` | beginning of sequence |
33
- | 1 | `<\|eos\|>` | end of sequence / document separator |
34
- | 32000 | `<\|task\|>` | agent protocol |
35
- | 32001 | `<\|/task\|>` | agent protocol |
36
  | 32002 | `<\|think\|>` | thinking open |
37
  | 32003 | `<\|/think\|>` | thinking close |
38
- | 32004 | `<\|tool_call\|>` | tool call open |
39
- | 32005 | `<\|/tool_call\|>` | tool call close |
40
- | 32006 | `<\|tool_result\|>` | tool result open |
41
- | 32007 | `<\|/tool_result\|>` | tool result close |
42
- | 32008 | `<\|spawn\|>` | spawn agent |
43
- | 32009 | `<\|/spawn\|>` | spawn close |
44
- | 32010 | `<\|agent_id\|>` | agent ID open |
45
- | 32011 | `<\|/agent_id\|>` | agent ID close |
46
- | 32012 | `<\|wait\|>` | wait open |
47
- | 32013 | `<\|/wait\|>` | wait close |
48
- | 32014 | `<\|wait_result\|>` | wait result open |
49
- | 32015 | `<\|/wait_result\|>` | wait result close |
50
- | 32016 | `<\|done\|>` | task done open |
51
- | 32017 | `<\|/done\|>` | task done close |
52
 
53
  ## Usage
54
 
 
27
 
28
  ## Special tokens
29
 
30
+ Legacy tokens (original BPE training, IDs 0–13):
31
+
32
+ | ID | Token |
33
+ |---|---|
34
+ | 0 | `<\|bos\|>` |
35
+ | 1 | `<\|eos\|>` |
36
+ | 2 | `<\|pad\|>` |
37
+ | 3 | `<\|unk\|>` |
38
+ | 4 | `<\|user\|>` |
39
+ | 5 | `<\|assistant\|>` |
40
+ | 6 | `<\|tool_call\|>` |
41
+ | 7 | `<\|end_tool_call\|>` |
42
+ | 8 | `<\|tool_result\|>` |
43
+ | 9 | `<\|sep\|>` |
44
+ | 10 | `<\|end\|>` |
45
+ | 11 | `<\|python\|>` |
46
+ | 12 | `<\|bash\|>` |
47
+ | 13 | `<\|markdown\|>` |
48
+
49
+ Agent protocol tokens (patched in, IDs 32000–32015):
50
+
51
  | ID | Token | Role |
52
  |---|---|---|
53
+ | 32000 | `<\|task\|>` | task open |
54
+ | 32001 | `<\|/task\|>` | task close |
 
 
55
  | 32002 | `<\|think\|>` | thinking open |
56
  | 32003 | `<\|/think\|>` | thinking close |
57
+ | 32004 | `<\|/tool_call\|>` | tool call close |
58
+ | 32005 | `<\|/tool_result\|>` | tool result close |
59
+ | 32006 | `<\|spawn\|>` | spawn agent open |
60
+ | 32007 | `<\|/spawn\|>` | spawn agent close |
61
+ | 32008 | `<\|agent_id\|>` | agent ID open |
62
+ | 32009 | `<\|/agent_id\|>` | agent ID close |
63
+ | 32010 | `<\|wait\|>` | wait open |
64
+ | 32011 | `<\|/wait\|>` | wait close |
65
+ | 32012 | `<\|wait_result\|>` | wait result open |
66
+ | 32013 | `<\|/wait_result\|>` | wait result close |
67
+ | 32014 | `<\|done\|>` | done open |
68
+ | 32015 | `<\|/done\|>` | done close |
 
 
69
 
70
  ## Usage
71