forthezero commited on
Commit
2651102
·
verified ·
1 Parent(s): d0ea892

Upload 28 files

Browse files
.cache/tokenizer_en.json ADDED
The diff for this file is too large to render. See raw diff
 
.cache/tokenizer_zh.json ADDED
@@ -0,0 +1,5631 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 8000,
3
+ "lang": "zh",
4
+ "token_to_id": {
5
+ "<pad>": 0,
6
+ "<sos>": 1,
7
+ "<eos>": 2,
8
+ "<unk>": 3,
9
+ "<mask>": 4,
10
+ "!</w>": 5,
11
+ "\"</w>": 6,
12
+ ",</w>": 7,
13
+ ".</w>": 8,
14
+ "0</w>": 9,
15
+ "10": 10,
16
+ "100</w>": 11,
17
+ "10</w>": 12,
18
+ "18": 13,
19
+ "18</w>": 14,
20
+ "1</w>": 15,
21
+ "20</w>": 16,
22
+ "21</w>": 17,
23
+ "2</w>": 18,
24
+ "3</w>": 19,
25
+ "40</w>": 20,
26
+ "56</w>": 21,
27
+ "5</w>": 22,
28
+ "6</w>": 23,
29
+ "7</w>": 24,
30
+ "?</w>": 25,
31
+ "ali": 26,
32
+ "alice</w>": 27,
33
+ "ancy</w>": 28,
34
+ "ce</w>": 29,
35
+ "cy</w>": 30,
36
+ "e</w>": 31,
37
+ "el": 32,
38
+ "iel": 33,
39
+ "ir": 34,
40
+ "ja": 35,
41
+ "jac": 36,
42
+ "jack</w>": 37,
43
+ "jake</w>": 38,
44
+ "k</w>": 39,
45
+ "ka": 40,
46
+ "kate</w>": 41,
47
+ "ke": 42,
48
+ "ke</w>": 43,
49
+ "ken</w>": 44,
50
+ "li": 45,
51
+ "m</w>": 46,
52
+ "muir": 47,
53
+ "muiriel": 48,
54
+ "muiriel</w>": 49,
55
+ "n</w>": 50,
56
+ "nancy</w>": 51,
57
+ "ncy</w>": 52,
58
+ "om</w>": 53,
59
+ "te</w>": 54,
60
+ "tom</w>": 55,
61
+ "uir": 56,
62
+ "y</w>": 57,
63
+ "“</w>": 58,
64
+ "”</w>": 59,
65
+ "…</w>": 60,
66
+ "、</w>": 61,
67
+ "。</w>": 62,
68
+ "一</w>": 63,
69
+ "七</w>": 64,
70
+ "万</w>": 65,
71
+ "三</w>": 66,
72
+ "上</w>": 67,
73
+ "下</w>": 68,
74
+ "不</w>": 69,
75
+ "丑</w>": 70,
76
+ "世</w>": 71,
77
+ "业</w>": 72,
78
+ "两</w>": 73,
79
+ "严</w>": 74,
80
+ "个</w>": 75,
81
+ "中</w>": 76,
82
+ "丰</w>": 77,
83
+ "为</w>": 78,
84
+ "举</w>": 79,
85
+ "久</w>": 80,
86
+ "么</w>": 81,
87
+ "义</w>": 82,
88
+ "之</w>": 83,
89
+ "乎</w>": 84,
90
+ "乐</w>": 85,
91
+ "乘</w>": 86,
92
+ "九</w>": 87,
93
+ "也</w>": 88,
94
+ "习</w>": 89,
95
+ "书</w>": 90,
96
+ "买</w>": 91,
97
+ "了</w>": 92,
98
+ "予</w>": 93,
99
+ "争</w>": 94,
100
+ "事</w>": 95,
101
+ "于</w>": 96,
102
+ "互</w>": 97,
103
+ "些</w>": 98,
104
+ "交</w>": 99,
105
+ "亲</w>": 100,
106
+ "人</w>": 101,
107
+ "什</w>": 102,
108
+ "仅</w>": 103,
109
+ "今</w>": 104,
110
+ "从</w>": 105,
111
+ "他</w>": 106,
112
+ "付</w>": 107,
113
+ "代</w>": 108,
114
+ "以</w>": 109,
115
+ "仪</w>": 110,
116
+ "们</w>": 111,
117
+ "件</w>": 112,
118
+ "价</w>": 113,
119
+ "任</w>": 114,
120
+ "份</w>": 115,
121
+ "休</w>": 116,
122
+ "众</w>": 117,
123
+ "会</w>": 118,
124
+ "伟</w>": 119,
125
+ "传</w>": 120,
126
+ "伦</w>": 121,
127
+ "似</w>": 122,
128
+ "但</w>": 123,
129
+ "位</w>": 124,
130
+ "低</w>": 125,
131
+ "住</w>": 126,
132
+ "体</w>": 127,
133
+ "何</w>": 128,
134
+ "作</w>": 129,
135
+ "你</w>": 130,
136
+ "使</w>": 131,
137
+ "來</w>": 132,
138
+ "例</w>": 133,
139
+ "保</w>": 134,
140
+ "信</w>": 135,
141
+ "俱</w>": 136,
142
+ "個</w>": 137,
143
+ "們</w>": 138,
144
+ "候</w>": 139,
145
+ "借</w>": 140,
146
+ "倦</w>": 141,
147
+ "债</w>": 142,
148
+ "值</w>": 143,
149
+ "假</w>": 144,
150
+ "偏</w>": 145,
151
+ "做</w>": 146,
152
+ "停</w>": 147,
153
+ "偶</w>": 148,
154
+ "偷</w>": 149,
155
+ "像</w>": 150,
156
+ "僵</w>": 151,
157
+ "儿</w>": 152,
158
+ "元</w>": 153,
159
+ "先</w>": 154,
160
+ "光</w>": 155,
161
+ "克</w>": 156,
162
+ "免</w>": 157,
163
+ "兔</w>": 158,
164
+ "入</w>": 159,
165
+ "全</w>": 160,
166
+ "公</w>": 161,
167
+ "六</w>": 162,
168
+ "兰</w>": 163,
169
+ "关</w>": 164,
170
+ "兴</w>": 165,
171
+ "其</w>": 166,
172
+ "兼</w>": 167,
173
+ "内</w>": 168,
174
+ "再</w>": 169,
175
+ "冒</w>": 170,
176
+ "写</w>": 171,
177
+ "冰</w>": 172,
178
+ "冲</w>": 173,
179
+ "决</w>": 174,
180
+ "况</w>": 175,
181
+ "冷</w>": 176,
182
+ "准</w>": 177,
183
+ "几</w>": 178,
184
+ "出</w>": 179,
185
+ "分</w>": 180,
186
+ "切</w>": 181,
187
+ "划</w>": 182,
188
+ "则</w>": 183,
189
+ "创</w>": 184,
190
+ "利</w>": 185,
191
+ "到</w>": 186,
192
+ "制</w>": 187,
193
+ "前</w>": 188,
194
+ "劃</w>": 189,
195
+ "力</w>": 190,
196
+ "办</w>": 191,
197
+ "功</w>": 192,
198
+ "加</w>": 193,
199
+ "务</w>": 194,
200
+ "动</w>": 195,
201
+ "助</w>": 196,
202
+ "努</w>": 197,
203
+ "劳</w>": 198,
204
+ "勃</w>": 199,
205
+ "包</w>": 200,
206
+ "化</w>": 201,
207
+ "医</w>": 202,
208
+ "十</w>": 203,
209
+ "千</w>": 204,
210
+ "升</w>": 205,
211
+ "午</w>": 206,
212
+ "半</w>": 207,
213
+ "华</w>": 208,
214
+ "单</w>": 209,
215
+ "卖</w>": 210,
216
+ "卫</w>": 211,
217
+ "危</w>": 212,
218
+ "即</w>": 213,
219
+ "却</w>": 214,
220
+ "历</w>": 215,
221
+ "厌</w>": 216,
222
+ "厕</w>": 217,
223
+ "去</w>": 218,
224
+ "参</w>": 219,
225
+ "又</w>": 220,
226
+ "友</w>": 221,
227
+ "反</w>": 222,
228
+ "发</w>": 223,
229
+ "叔</w>": 224,
230
+ "取</w>": 225,
231
+ "受</w>": 226,
232
+ "变</w>": 227,
233
+ "口</w>": 228,
234
+ "古</w>": 229,
235
+ "另</w>": 230,
236
+ "只</w>": 231,
237
+ "叫</w>": 232,
238
+ "可</w>": 233,
239
+ "史</w>": 234,
240
+ "右</w>": 235,
241
+ "号</w>": 236,
242
+ "吃</w>": 237,
243
+ "��</w>": 238,
244
+ "同</w>": 239,
245
+ "名</w>": 240,
246
+ "后</w>": 241,
247
+ "向</w>": 242,
248
+ "吗</w>": 243,
249
+ "吧</w>": 244,
250
+ "听</w>": 245,
251
+ "告</w>": 246,
252
+ "员</w>": 247,
253
+ "呢</w>": 248,
254
+ "周</w>": 249,
255
+ "味</w>": 250,
256
+ "命</w>": 251,
257
+ "和</w>": 252,
258
+ "咖</w>": 253,
259
+ "品</w>": 254,
260
+ "响</w>": 255,
261
+ "哥</w>": 256,
262
+ "哦</w>": 257,
263
+ "哪</w>": 258,
264
+ "售</w>": 259,
265
+ "唯</w>": 260,
266
+ "唱</w>": 261,
267
+ "啊</w>": 262,
268
+ "問</w>": 263,
269
+ "啡</w>": 264,
270
+ "喜</w>": 265,
271
+ "喝</w>": 266,
272
+ "嗨</w>": 267,
273
+ "囚</w>": 268,
274
+ "回</w>": 269,
275
+ "因</w>": 270,
276
+ "团</w>": 271,
277
+ "园</w>": 272,
278
+ "困</w>": 273,
279
+ "国</w>": 274,
280
+ "图</w>": 275,
281
+ "圈</w>": 276,
282
+ "國</w>": 277,
283
+ "圣</w>": 278,
284
+ "在</w>": 279,
285
+ "地</w>": 280,
286
+ "场</w>": 281,
287
+ "坐</w>": 282,
288
+ "块</w>": 283,
289
+ "坚</w>": 284,
290
+ "城</w>": 285,
291
+ "堡</w>": 286,
292
+ "增</w>": 287,
293
+ "士</w>": 288,
294
+ "声</w>": 289,
295
+ "处</w>": 290,
296
+ "备</w>": 291,
297
+ "复</w>": 292,
298
+ "夏</w>": 293,
299
+ "外</w>": 294,
300
+ "多</w>": 295,
301
+ "夜</w>": 296,
302
+ "够</w>": 297,
303
+ "大</w>": 298,
304
+ "天</w>": 299,
305
+ "太</w>": 300,
306
+ "失</w>": 301,
307
+ "头</w>": 302,
308
+ "奇</w>": 303,
309
+ "奶</w>": 304,
310
+ "她</w>": 305,
311
+ "好</w>": 306,
312
+ "如</w>": 307,
313
+ "妈</w>": 308,
314
+ "妹</w>": 309,
315
+ "妻</w>": 310,
316
+ "始</w>": 311,
317
+ "姐</w>": 312,
318
+ "威</w>": 313,
319
+ "婚</w>": 314,
320
+ "子</w>": 315,
321
+ "字</w>": 316,
322
+ "季</w>": 317,
323
+ "学</w>": 318,
324
+ "孩</w>": 319,
325
+ "學</w>": 320,
326
+ "它</w>": 321,
327
+ "宇</w>": 322,
328
+ "守</w>": 323,
329
+ "安</w>": 324,
330
+ "完</w>": 325,
331
+ "宙</w>": 326,
332
+ "定</w>": 327,
333
+ "宝</w>": 328,
334
+ "实</w>": 329,
335
+ "客</w>": 330,
336
+ "宣</w>": 331,
337
+ "室</w>": 332,
338
+ "宵</w>": 333,
339
+ "家</w>": 334,
340
+ "寄</w>": 335,
341
+ "密</w>": 336,
342
+ "富</w>": 337,
343
+ "对</w>": 338,
344
+ "寻</w>": 339,
345
+ "将</w>": 340,
346
+ "尊</w>": 341,
347
+ "小</w>": 342,
348
+ "少</w>": 343,
349
+ "就</w>": 344,
350
+ "尼</w>": 345,
351
+ "局</w>": 346,
352
+ "屈</w>": 347,
353
+ "属</w>": 348,
354
+ "山</w>": 349,
355
+ "岁</w>": 350,
356
+ "岩</w>": 351,
357
+ "工</w>": 352,
358
+ "己</w>": 353,
359
+ "已</w>": 354,
360
+ "市</w>": 355,
361
+ "布</w>": 356,
362
+ "师</w>": 357,
363
+ "帖</w>": 358,
364
+ "带</w>": 359,
365
+ "席</w>": 360,
366
+ "帮</w>": 361,
367
+ "常</w>": 362,
368
+ "帽</w>": 363,
369
+ "干</w>": 364,
370
+ "平</w>": 365,
371
+ "年</w>": 366,
372
+ "幸</w>": 367,
373
+ "幹</w>": 368,
374
+ "广</w>": 369,
375
+ "庄</w>": 370,
376
+ "庆</w>": 371,
377
+ "床</w>": 372,
378
+ "应</w>": 373,
379
+ "底</w>": 374,
380
+ "庙</w>": 375,
381
+ "庞</w>": 376,
382
+ "度</w>": 377,
383
+ "座</w>": 378,
384
+ "庭</w>": 379,
385
+ "延</w>": 380,
386
+ "建</w>": 381,
387
+ "开</w>": 382,
388
+ "弃</w>": 383,
389
+ "式</w>": 384,
390
+ "弟</w>": 385,
391
+ "张</w>": 386,
392
+ "張</w>": 387,
393
+ "强</w>": 388,
394
+ "当</w>": 389,
395
+ "影</w>": 390,
396
+ "彻</w>": 391,
397
+ "往</w>": 392,
398
+ "径</w>": 393,
399
+ "待</w>": 394,
400
+ "很</w>": 395,
401
+ "後</w>": 396,
402
+ "徒</w>": 397,
403
+ "得</w>": 398,
404
+ "從</w>": 399,
405
+ "微</w>": 400,
406
+ "德</w>": 401,
407
+ "心</w>": 402,
408
+ "必</w>": 403,
409
+ "志</w>": 404,
410
+ "忙</w>": 405,
411
+ "快</w>": 406,
412
+ "念</w>": 407,
413
+ "怀</w>": 408,
414
+ "怎</w>": 409,
415
+ "急</w>": 410,
416
+ "总</w>": 411,
417
+ "息</w>": 412,
418
+ "悔</w>": 413,
419
+ "情</w>": 414,
420
+ "惊</w>": 415,
421
+ "惜</w>": 416,
422
+ "惡</w>": 417,
423
+ "想</w>": 418,
424
+ "愉</w>": 419,
425
+ "意</w>": 420,
426
+ "感</w>": 421,
427
+ "慢</w>": 422,
428
+ "應</w>": 423,
429
+ "戏</w>": 424,
430
+ "成</w>": 425,
431
+ "我</w>": 426,
432
+ "戒</w>": 427,
433
+ "或</w>": 428,
434
+ "戴</w>": 429,
435
+ "户</w>": 430,
436
+ "房</w>": 431,
437
+ "所</w>": 432,
438
+ "扇</w>": 433,
439
+ "手</w>": 434,
440
+ "才</w>": 435,
441
+ "打</w>": 436,
442
+ "托</w>": 437,
443
+ "扰</w>": 438,
444
+ "批</w>": 439,
445
+ "找</w>": 440,
446
+ "把</w>": 441,
447
+ "抓</w>": 442,
448
+ "护</w>": 443,
449
+ "报</w>": 444,
450
+ "抱</w>": 445,
451
+ "拆</w>": 446,
452
+ "拉</w>": 447,
453
+ "拜</w>": 448,
454
+ "拥</w>": 449,
455
+ "择</w>": 450,
456
+ "持</w>": 451,
457
+ "指</w>": 452,
458
+ "按</w>": 453,
459
+ "挑</w>": 454,
460
+ "挤</w>": 455,
461
+ "挥</w>": 456,
462
+ "据</w>": 457,
463
+ "接</w>": 458,
464
+ "推</w>": 459,
465
+ "措</w>": 460,
466
+ "揉</w>": 461,
467
+ "插</w>": 462,
468
+ "揭</w>": 463,
469
+ "携</w>": 464,
470
+ "摄</w>": 465,
471
+ "摇</w>": 466,
472
+ "摩</w>": 467,
473
+ "撒</w>": 468,
474
+ "播</w>": 469,
475
+ "擔</w>": 470,
476
+ "支</w>": 471,
477
+ "收</w>": 472,
478
+ "改</w>": 473,
479
+ "放</w>": 474,
480
+ "故</w>": 475,
481
+ "救</w>": 476,
482
+ "教</w>": 477,
483
+ "散</w>": 478,
484
+ "敦</w>": 479,
485
+ "敬</w>": 480,
486
+ "数</w>": 481,
487
+ "整</w>": 482,
488
+ "斯</w>": 483,
489
+ "新</w>": 484,
490
+ "方</w>": 485,
491
+ "施</w>": 486,
492
+ "旅</w>": 487,
493
+ "无</w>": 488,
494
+ "日</w>": 489,
495
+ "旦</w>": 490,
496
+ "早</w>": 491,
497
+ "时</w>": 492,
498
+ "明</w>": 493,
499
+ "星</w>": 494,
500
+ "昨</w>": 495,
501
+ "是</w>": 496,
502
+ "時</w>": 497,
503
+ "晃</w>": 498,
504
+ "晚</w>": 499,
505
+ "景</w>": 500,
506
+ "更</w>": 501,
507
+ "曾</w>": 502,
508
+ "最</w>": 503,
509
+ "會</w>": 504,
510
+ "月</w>": 505,
511
+ "有</w>": 506,
512
+ "朋</w>": 507,
513
+ "服</w>": 508,
514
+ "望</w>": 509,
515
+ "朝</w>": 510,
516
+ "期</w>": 511,
517
+ "本</w>": 512,
518
+ "术</w>": 513,
519
+ "机</w>": 514,
520
+ "杀</w>": 515,
521
+ "杂</w>": 516,
522
+ "权</w>": 517,
523
+ "村</w>": 518,
524
+ "条</w>": 519,
525
+ "来</w>": 520,
526
+ "杯</w>": 521,
527
+ "杰</w>": 522,
528
+ "松</w>": 523,
529
+ "果</w>": 524,
530
+ "架</w>": 525,
531
+ "某</w>": 526,
532
+ "标</w>": 527,
533
+ "栋</w>": 528,
534
+ "校</w>": 529,
535
+ "样</w>": 530,
536
+ "格</w>": 531,
537
+ "桌</w>": 532,
538
+ "桥</w>": 533,
539
+ "楼</w>": 534,
540
+ "概</w>": 535,
541
+ "樣</w>": 536,
542
+ "欠</w>": 537,
543
+ "次</w>": 538,
544
+ "欢</w>": 539,
545
+ "欲</w>": 540,
546
+ "款</w>": 541,
547
+ "歉</w>": 542,
548
+ "歌</w>": 543,
549
+ "歐</w>": 544,
550
+ "歡</w>": 545,
551
+ "止</w>": 546,
552
+ "正</w>": 547,
553
+ "步</w>": 548,
554
+ "死</w>": 549,
555
+ "段</w>": 550,
556
+ "母</w>": 551,
557
+ "每</w>": 552,
558
+ "比</w>": 553,
559
+ "毕</w>": 554,
560
+ "毛</w>": 555,
561
+ "毫</w>": 556,
562
+ "气</w>": 557,
563
+ "水</w>": 558,
564
+ "永</w>": 559,
565
+ "池</w>": 560,
566
+ "汽</w>": 561,
567
+ "沒</w>": 562,
568
+ "没</w>": 563,
569
+ "河</w>": 564,
570
+ "沸</w>": 565,
571
+ "油</w>": 566,
572
+ "沿</w>": 567,
573
+ "法</w>": 568,
574
+ "泪</w>": 569,
575
+ "泳</w>": 570,
576
+ "洗</w>": 571,
577
+ "津</w>": 572,
578
+ "活</w>": 573,
579
+ "派</w>": 574,
580
+ "流</w>": 575,
581
+ "济</w>": 576,
582
+ "消</w>": 577,
583
+ "涌</w>": 578,
584
+ "涨</w>": 579,
585
+ "清</w>": 580,
586
+ "温</w>": 581,
587
+ "港</w>": 582,
588
+ "游</w>": 583,
589
+ "湖</w>": 584,
590
+ "溜</w>": 585,
591
+ "滑</w>": 586,
592
+ "满</w>": 587,
593
+ "演</w>": 588,
594
+ "澄</w>": 589,
595
+ "澡</w>": 590,
596
+ "火</w>": 591,
597
+ "灯</w>": 592,
598
+ "灰</w>": 593,
599
+ "点</w>": 594,
600
+ "烟</w>": 595,
601
+ "烦</w>": 596,
602
+ "热</w>": 597,
603
+ "然</w>": 598,
604
+ "照</w>": 599,
605
+ "爱</w>": 600,
606
+ "父</w>": 601,
607
+ "爸</w>": 602,
608
+ "片</w>": 603,
609
+ "牛</w>": 604,
610
+ "物</w>": 605,
611
+ "狗</w>": 606,
612
+ "独</w>": 607,
613
+ "猫</w>": 608,
614
+ "王</w>": 609,
615
+ "玩</w>": 610,
616
+ "环</w>": 611,
617
+ "现</w>": 612,
618
+ "班</w>": 613,
619
+ "球</w>": 614,
620
+ "理</w>": 615,
621
+ "生</w>": 616,
622
+ "用</w>": 617,
623
+ "由</w>": 618,
624
+ "电</w>": 619,
625
+ "男</w>": 620,
626
+ "界</w>": 621,
627
+ "留</w>": 622,
628
+ "當</w>": 623,
629
+ "疑</w>": 624,
630
+ "疯</w>": 625,
631
+ "病</w>": 626,
632
+ "痛</w>": 627,
633
+ "瘋</w>": 628,
634
+ "發</w>": 629,
635
+ "白</w>": 630,
636
+ "百</w>": 631,
637
+ "的</w>": 632,
638
+ "盐</w>": 633,
639
+ "盖</w>": 634,
640
+ "盛</w>": 635,
641
+ "目</w>": 636,
642
+ "直</w>": 637,
643
+ "相</w>": 638,
644
+ "盹</w>": 639,
645
+ "看</w>": 640,
646
+ "真</w>": 641,
647
+ "眠</w>": 642,
648
+ "眼</w>": 643,
649
+ "着</w>": 644,
650
+ "睛</w>": 645,
651
+ "睡</w>": 646,
652
+ "知</w>": 647,
653
+ "短</w>": 648,
654
+ "石</w>": 649,
655
+ "码</w>": 650,
656
+ "破</w>": 651,
657
+ "确</w>": 652,
658
+ "碎</w>": 653,
659
+ "示</w>": 654,
660
+ "社</w>": 655,
661
+ "祝</w>": 656,
662
+ "神</w>": 657,
663
+ "票</w>": 658,
664
+ "福</w>": 659,
665
+ "离</w>": 660,
666
+ "私</w>": 661,
667
+ "种</w>": 662,
668
+ "秘</w>": 663,
669
+ "移</w>": 664,
670
+ "程</w>": 665,
671
+ "空</w>": 666,
672
+ "窗</w>": 667,
673
+ "窜</w>": 668,
674
+ "站</w>": 669,
675
+ "童</w>": 670,
676
+ "笑</w>": 671,
677
+ "笔</w>": 672,
678
+ "笛</w>": 673,
679
+ "第</w>": 674,
680
+ "笼</w>": 675,
681
+ "等</w>": 676,
682
+ "筑</w>": 677,
683
+ "答</w>": 678,
684
+ "简</w>": 679,
685
+ "籍</w>": 680,
686
+ "粗</w>": 681,
687
+ "精</w>": 682,
688
+ "糕</w>": 683,
689
+ "糟</w>": 684,
690
+ "素</w>": 685,
691
+ "索</w>": 686,
692
+ "給</w>": 687,
693
+ "經</w>": 688,
694
+ "總</w>": 689,
695
+ "红</w>": 690,
696
+ "纪</w>": 691,
697
+ "纯</w>": 692,
698
+ "纸</w>": 693,
699
+ "线</w>": 694,
700
+ "绅</w>": 695,
701
+ "终</w>": 696,
702
+ "经</w>": 697,
703
+ "结</w>": 698,
704
+ "给</w>": 699,
705
+ "统</w>": 700,
706
+ "绿</w>": 701,
707
+ "缺</w>": 702,
708
+ "网</w>": 703,
709
+ "罗</w>": 704,
710
+ "罚</w>": 705,
711
+ "置</w>": 706,
712
+ "美</w>": 707,
713
+ "群</w>": 708,
714
+ "習</w>": 709,
715
+ "老</w>": 710,
716
+ "考</w>": 711,
717
+ "者</w>": 712,
718
+ "而</w>": 713,
719
+ "耍</w>": 714,
720
+ "耗</w>": 715,
721
+ "职</w>": 716,
722
+ "肯</w>": 717,
723
+ "胖</w>": 718,
724
+ "能</w>": 719,
725
+ "脑</w>": 720,
726
+ "脚</w>": 721,
727
+ "脸</w>": 722,
728
+ "腾</w>": 723,
729
+ "腿</w>": 724,
730
+ "自</w>": 725,
731
+ "至</w>": 726,
732
+ "船</w>": 727,
733
+ "艰</w>": 728,
734
+ "色</w>": 729,
735
+ "艺</w>": 730,
736
+ "花</w>": 731,
737
+ "苏</w>": 732,
738
+ "英</w>": 733,
739
+ "茶</w>": 734,
740
+ "药</w>": 735,
741
+ "落</w>": 736,
742
+ "著</w>": 737,
743
+ "虑</w>": 738,
744
+ "虾</w>": 739,
745
+ "蜂</w>": 740,
746
+ "蝴</w>": 741,
747
+ "蝶</w>": 742,
748
+ "蠢</w>": 743,
749
+ "血</w>": 744,
750
+ "行</w>": 745,
751
+ "衣</w>": 746,
752
+ "表</w>": 747,
753
+ "被</w>": 748,
754
+ "裡</w>": 749,
755
+ "要</w>": 750,
756
+ "覆</w>": 751,
757
+ "覺</w>": 752,
758
+ "见</w>": 753,
759
+ "观</w>": 754,
760
+ "规</w>": 755,
761
+ "视</w>": 756,
762
+ "觉</w>": 757,
763
+ "解</w>": 758,
764
+ "言</w>": 759,
765
+ "計</w>": 760,
766
+ "試</w>": 761,
767
+ "話</w>": 762,
768
+ "該</w>": 763,
769
+ "誓</w>": 764,
770
+ "說</w>": 765,
771
+ "請</w>": 766,
772
+ "讀</w>": 767,
773
+ "變</w>": 768,
774
+ "计</w>": 769,
775
+ "订</w>": 770,
776
+ "认</w>": 771,
777
+ "让</w>": 772,
778
+ "训</w>": 773,
779
+ "议</w>": 774,
780
+ "记</w>": 775,
781
+ "讲</w>": 776,
782
+ "讶</w>": 777,
783
+ "许</w>": 778,
784
+ "论</w>": 779,
785
+ "设</w>": 780,
786
+ "访</w>": 781,
787
+ "证</w>": 782,
788
+ "评</w>": 783,
789
+ "识</w>": 784,
790
+ "诉</w>": 785,
791
+ "试</w>": 786,
792
+ "诗</w>": 787,
793
+ "诚</w>": 788,
794
+ "话</w>": 789,
795
+ "该</w>": 790,
796
+ "语</w>": 791,
797
+ "误</w>": 792,
798
+ "说</w>": 793,
799
+ "请</w>": 794,
800
+ "诺</w>": 795,
801
+ "读</w>": 796,
802
+ "课</w>": 797,
803
+ "谁</w>": 798,
804
+ "谈</w>": 799,
805
+ "谎</w>": 800,
806
+ "谢</w>": 801,
807
+ "象</w>": 802,
808
+ "賺</w>": 803,
809
+ "负</w>": 804,
810
+ "货</w>": 805,
811
+ "购</w>": 806,
812
+ "贷</w>": 807,
813
+ "费</w>": 808,
814
+ "赛</w>": 809,
815
+ "赢</w>": 810,
816
+ "走</w>": 811,
817
+ "赶</w>": 812,
818
+ "起</w>": 813,
819
+ "趕</w>": 814,
820
+ "趣</w>": 815,
821
+ "足</w>": 816,
822
+ "跑</w>": 817,
823
+ "跟</w>": 818,
824
+ "路</w>": 819,
825
+ "踢</w>": 820,
826
+ "躲</w>": 821,
827
+ "較</w>": 822,
828
+ "车</w>": 823,
829
+ "轨</w>": 824,
830
+ "转</w>": 825,
831
+ "轻</w>": 826,
832
+ "较</w>": 827,
833
+ "辆</w>": 828,
834
+ "辈</w>": 829,
835
+ "辜</w>": 830,
836
+ "辩</w>": 831,
837
+ "达</w>": 832,
838
+ "迅</w>": 833,
839
+ "过</w>": 834,
840
+ "近</w>": 835,
841
+ "还</w>": 836,
842
+ "这</w>": 837,
843
+ "进</w>": 838,
844
+ "远</w>": 839,
845
+ "迟</w>": 840,
846
+ "述</w>": 841,
847
+ "迷</w>": 842,
848
+ "迹</w>": 843,
849
+ "送</w>": 844,
850
+ "适</w>": 845,
851
+ "逃</w>": 846,
852
+ "选</w>": 847,
853
+ "透</w>": 848,
854
+ "递</w>": 849,
855
+ "途</w>": 850,
856
+ "這</w>": 851,
857
+ "通</w>": 852,
858
+ "速</w>": 853,
859
+ "造</w>": 854,
860
+ "進</w>": 855,
861
+ "過</w>": 856,
862
+ "道</w>": 857,
863
+ "遛</w>": 858,
864
+ "遠</w>": 859,
865
+ "邀</w>": 860,
866
+ "那</w>": 861,
867
+ "邻</w>": 862,
868
+ "部</w>": 863,
869
+ "都</w>": 864,
870
+ "酒</w>": 865,
871
+ "采</w>": 866,
872
+ "里</w>": 867,
873
+ "重</w>": 868,
874
+ "金</w>": 869,
875
+ "钟</w>": 870,
876
+ "钱</w>": 871,
877
+ "铁</w>": 872,
878
+ "铃</w>": 873,
879
+ "铭</w>": 874,
880
+ "银</w>": 875,
881
+ "销</w>": 876,
882
+ "错</w>": 877,
883
+ "镜</w>": 878,
884
+ "長</w>": 879,
885
+ "长</w>": 880,
886
+ "間</w>": 881,
887
+ "问</w>": 882,
888
+ "间</w>": 883,
889
+ "闻</w>": 884,
890
+ "阅</w>": 885,
891
+ "阐</w>": 886,
892
+ "防</w>": 887,
893
+ "阳</w>": 888,
894
+ "附</w>": 889,
895
+ "限</w>": 890,
896
+ "除</w>": 891,
897
+ "险</w>": 892,
898
+ "随</w>": 893,
899
+ "隻</w>": 894,
900
+ "难</w>": 895,
901
+ "雨</w>": 896,
902
+ "雪</w>": 897,
903
+ "零</w>": 898,
904
+ "雹</w>": 899,
905
+ "需</w>": 900,
906
+ "震</w>": 901,
907
+ "露</w>": 902,
908
+ "非</w>": 903,
909
+ "靠</w>": 904,
910
+ "面</w>": 905,
911
+ "音</w>": 906,
912
+ "題</w>": 907,
913
+ "项</w>": 908,
914
+ "须</w>": 909,
915
+ "顾</w>": 910,
916
+ "预</w>": 911,
917
+ "题</w>": 912,
918
+ "风</w>": 913,
919
+ "飞</w>": 914,
920
+ "食</w>": 915,
921
+ "餐</w>": 916,
922
+ "饭</w>": 917,
923
+ "饿</w>": 918,
924
+ "首</w>": 919,
925
+ "马</w>": 920,
926
+ "驶</w>": 921,
927
+ "验</w>": 922,
928
+ "骑</w>": 923,
929
+ "骗</w>": 924,
930
+ "高</w>": 925,
931
+ "鬼</w>": 926,
932
+ "鱼</w>": 927,
933
+ "鲍</w>": 928,
934
+ "鲜</w>": 929,
935
+ "麻</w>": 930,
936
+ "麼</w>": 931,
937
+ "點</w>": 932,
938
+ "鼠</w>": 933,
939
+ "龙</w>": 934,
940
+ "﹐</w>": 935,
941
+ "!</w>": 936,
942
+ ",</w>": 937,
943
+ "?</w>": 938
944
+ },
945
+ "id_to_token": {
946
+ "0": "<pad>",
947
+ "1": "<sos>",
948
+ "2": "<eos>",
949
+ "3": "<unk>",
950
+ "4": "<mask>",
951
+ "5": "!</w>",
952
+ "6": "\"</w>",
953
+ "7": ",</w>",
954
+ "8": ".</w>",
955
+ "9": "0</w>",
956
+ "10": "10",
957
+ "11": "100</w>",
958
+ "12": "10</w>",
959
+ "13": "18",
960
+ "14": "18</w>",
961
+ "15": "1</w>",
962
+ "16": "20</w>",
963
+ "17": "21</w>",
964
+ "18": "2</w>",
965
+ "19": "3</w>",
966
+ "20": "40</w>",
967
+ "21": "56</w>",
968
+ "22": "5</w>",
969
+ "23": "6</w>",
970
+ "24": "7</w>",
971
+ "25": "?</w>",
972
+ "26": "ali",
973
+ "27": "alice</w>",
974
+ "28": "ancy</w>",
975
+ "29": "ce</w>",
976
+ "30": "cy</w>",
977
+ "31": "e</w>",
978
+ "32": "el",
979
+ "33": "iel",
980
+ "34": "ir",
981
+ "35": "ja",
982
+ "36": "jac",
983
+ "37": "jack</w>",
984
+ "38": "jake</w>",
985
+ "39": "k</w>",
986
+ "40": "ka",
987
+ "41": "kate</w>",
988
+ "42": "ke",
989
+ "43": "ke</w>",
990
+ "44": "ken</w>",
991
+ "45": "li",
992
+ "46": "m</w>",
993
+ "47": "muir",
994
+ "48": "muiriel",
995
+ "49": "muiriel</w>",
996
+ "50": "n</w>",
997
+ "51": "nancy</w>",
998
+ "52": "ncy</w>",
999
+ "53": "om</w>",
1000
+ "54": "te</w>",
1001
+ "55": "tom</w>",
1002
+ "56": "uir",
1003
+ "57": "y</w>",
1004
+ "58": "“</w>",
1005
+ "59": "”</w>",
1006
+ "60": "…</w>",
1007
+ "61": "、</w>",
1008
+ "62": "。</w>",
1009
+ "63": "一</w>",
1010
+ "64": "七</w>",
1011
+ "65": "万</w>",
1012
+ "66": "三</w>",
1013
+ "67": "上</w>",
1014
+ "68": "下</w>",
1015
+ "69": "不</w>",
1016
+ "70": "丑</w>",
1017
+ "71": "世</w>",
1018
+ "72": "业</w>",
1019
+ "73": "两</w>",
1020
+ "74": "严</w>",
1021
+ "75": "个</w>",
1022
+ "76": "中</w>",
1023
+ "77": "丰</w>",
1024
+ "78": "为</w>",
1025
+ "79": "举</w>",
1026
+ "80": "久</w>",
1027
+ "81": "么</w>",
1028
+ "82": "义</w>",
1029
+ "83": "之</w>",
1030
+ "84": "乎</w>",
1031
+ "85": "乐</w>",
1032
+ "86": "乘</w>",
1033
+ "87": "九</w>",
1034
+ "88": "也</w>",
1035
+ "89": "习</w>",
1036
+ "90": "书</w>",
1037
+ "91": "买</w>",
1038
+ "92": "了</w>",
1039
+ "93": "予</w>",
1040
+ "94": "争</w>",
1041
+ "95": "事</w>",
1042
+ "96": "于</w>",
1043
+ "97": "互</w>",
1044
+ "98": "些</w>",
1045
+ "99": "交</w>",
1046
+ "100": "亲</w>",
1047
+ "101": "人</w>",
1048
+ "102": "什</w>",
1049
+ "103": "仅</w>",
1050
+ "104": "今</w>",
1051
+ "105": "从</w>",
1052
+ "106": "他</w>",
1053
+ "107": "付</w>",
1054
+ "108": "代</w>",
1055
+ "109": "以</w>",
1056
+ "110": "仪</w>",
1057
+ "111": "们</w>",
1058
+ "112": "件</w>",
1059
+ "113": "价</w>",
1060
+ "114": "任</w>",
1061
+ "115": "份</w>",
1062
+ "116": "休</w>",
1063
+ "117": "众</w>",
1064
+ "118": "会</w>",
1065
+ "119": "伟</w>",
1066
+ "120": "传</w>",
1067
+ "121": "伦</w>",
1068
+ "122": "似</w>",
1069
+ "123": "但</w>",
1070
+ "124": "位</w>",
1071
+ "125": "低</w>",
1072
+ "126": "住</w>",
1073
+ "127": "体</w>",
1074
+ "128": "何</w>",
1075
+ "129": "作</w>",
1076
+ "130": "你</w>",
1077
+ "131": "使</w>",
1078
+ "132": "來</w>",
1079
+ "133": "例</w>",
1080
+ "134": "保</w>",
1081
+ "135": "信</w>",
1082
+ "136": "俱</w>",
1083
+ "137": "個</w>",
1084
+ "138": "們</w>",
1085
+ "139": "候</w>",
1086
+ "140": "借</w>",
1087
+ "141": "倦</w>",
1088
+ "142": "债</w>",
1089
+ "143": "值</w>",
1090
+ "144": "假</w>",
1091
+ "145": "偏</w>",
1092
+ "146": "做</w>",
1093
+ "147": "停</w>",
1094
+ "148": "偶</w>",
1095
+ "149": "偷</w>",
1096
+ "150": "像</w>",
1097
+ "151": "僵</w>",
1098
+ "152": "儿</w>",
1099
+ "153": "元</w>",
1100
+ "154": "先</w>",
1101
+ "155": "光</w>",
1102
+ "156": "克</w>",
1103
+ "157": "免</w>",
1104
+ "158": "兔</w>",
1105
+ "159": "入</w>",
1106
+ "160": "全</w>",
1107
+ "161": "公</w>",
1108
+ "162": "六</w>",
1109
+ "163": "兰</w>",
1110
+ "164": "关</w>",
1111
+ "165": "兴</w>",
1112
+ "166": "其</w>",
1113
+ "167": "兼</w>",
1114
+ "168": "内</w>",
1115
+ "169": "再</w>",
1116
+ "170": "冒</w>",
1117
+ "171": "写</w>",
1118
+ "172": "冰</w>",
1119
+ "173": "冲</w>",
1120
+ "174": "决</w>",
1121
+ "175": "况</w>",
1122
+ "176": "冷</w>",
1123
+ "177": "准</w>",
1124
+ "178": "几</w>",
1125
+ "179": "出</w>",
1126
+ "180": "分</w>",
1127
+ "181": "切</w>",
1128
+ "182": "划</w>",
1129
+ "183": "则</w>",
1130
+ "184": "创</w>",
1131
+ "185": "利</w>",
1132
+ "186": "到</w>",
1133
+ "187": "制</w>",
1134
+ "188": "前</w>",
1135
+ "189": "劃</w>",
1136
+ "190": "力</w>",
1137
+ "191": "办</w>",
1138
+ "192": "功</w>",
1139
+ "193": "加</w>",
1140
+ "194": "务</w>",
1141
+ "195": "动</w>",
1142
+ "196": "助</w>",
1143
+ "197": "努</w>",
1144
+ "198": "劳</w>",
1145
+ "199": "勃</w>",
1146
+ "200": "包</w>",
1147
+ "201": "化</w>",
1148
+ "202": "医</w>",
1149
+ "203": "十</w>",
1150
+ "204": "千</w>",
1151
+ "205": "升</w>",
1152
+ "206": "午</w>",
1153
+ "207": "半</w>",
1154
+ "208": "华</w>",
1155
+ "209": "单</w>",
1156
+ "210": "卖</w>",
1157
+ "211": "卫</w>",
1158
+ "212": "危</w>",
1159
+ "213": "即</w>",
1160
+ "214": "却</w>",
1161
+ "215": "历</w>",
1162
+ "216": "厌</w>",
1163
+ "217": "厕</w>",
1164
+ "218": "去</w>",
1165
+ "219": "参</w>",
1166
+ "220": "又</w>",
1167
+ "221": "友</w>",
1168
+ "222": "反</w>",
1169
+ "223": "发</w>",
1170
+ "224": "叔</w>",
1171
+ "225": "取</w>",
1172
+ "226": "受</w>",
1173
+ "227": "变</w>",
1174
+ "228": "口</w>",
1175
+ "229": "古</w>",
1176
+ "230": "另</w>",
1177
+ "231": "只</w>",
1178
+ "232": "叫</w>",
1179
+ "233": "可</w>",
1180
+ "234": "史</w>",
1181
+ "235": "右</w>",
1182
+ "236": "号</w>",
1183
+ "237": "吃</w>",
1184
+ "238": "合</w>",
1185
+ "239": "同</w>",
1186
+ "240": "名</w>",
1187
+ "241": "后</w>",
1188
+ "242": "向</w>",
1189
+ "243": "吗</w>",
1190
+ "244": "吧</w>",
1191
+ "245": "听</w>",
1192
+ "246": "告</w>",
1193
+ "247": "员</w>",
1194
+ "248": "呢</w>",
1195
+ "249": "周</w>",
1196
+ "250": "味</w>",
1197
+ "251": "命</w>",
1198
+ "252": "和</w>",
1199
+ "253": "咖</w>",
1200
+ "254": "品</w>",
1201
+ "255": "响</w>",
1202
+ "256": "哥</w>",
1203
+ "257": "哦</w>",
1204
+ "258": "哪</w>",
1205
+ "259": "售</w>",
1206
+ "260": "唯</w>",
1207
+ "261": "唱</w>",
1208
+ "262": "啊</w>",
1209
+ "263": "問</w>",
1210
+ "264": "啡</w>",
1211
+ "265": "喜</w>",
1212
+ "266": "喝</w>",
1213
+ "267": "嗨</w>",
1214
+ "268": "囚</w>",
1215
+ "269": "回</w>",
1216
+ "270": "因</w>",
1217
+ "271": "团</w>",
1218
+ "272": "园</w>",
1219
+ "273": "困</w>",
1220
+ "274": "国</w>",
1221
+ "275": "图</w>",
1222
+ "276": "圈</w>",
1223
+ "277": "國</w>",
1224
+ "278": "圣</w>",
1225
+ "279": "在</w>",
1226
+ "280": "地</w>",
1227
+ "281": "场</w>",
1228
+ "282": "坐</w>",
1229
+ "283": "块</w>",
1230
+ "284": "坚</w>",
1231
+ "285": "城</w>",
1232
+ "286": "堡</w>",
1233
+ "287": "增</w>",
1234
+ "288": "士</w>",
1235
+ "289": "声</w>",
1236
+ "290": "处</w>",
1237
+ "291": "备</w>",
1238
+ "292": "复</w>",
1239
+ "293": "夏</w>",
1240
+ "294": "外</w>",
1241
+ "295": "多</w>",
1242
+ "296": "夜</w>",
1243
+ "297": "够</w>",
1244
+ "298": "大</w>",
1245
+ "299": "天</w>",
1246
+ "300": "太</w>",
1247
+ "301": "失</w>",
1248
+ "302": "头</w>",
1249
+ "303": "奇</w>",
1250
+ "304": "奶</w>",
1251
+ "305": "她</w>",
1252
+ "306": "好</w>",
1253
+ "307": "如</w>",
1254
+ "308": "妈</w>",
1255
+ "309": "妹</w>",
1256
+ "310": "妻</w>",
1257
+ "311": "始</w>",
1258
+ "312": "姐</w>",
1259
+ "313": "威</w>",
1260
+ "314": "婚</w>",
1261
+ "315": "子</w>",
1262
+ "316": "字</w>",
1263
+ "317": "季</w>",
1264
+ "318": "学</w>",
1265
+ "319": "孩</w>",
1266
+ "320": "學</w>",
1267
+ "321": "它</w>",
1268
+ "322": "宇</w>",
1269
+ "323": "守</w>",
1270
+ "324": "安</w>",
1271
+ "325": "完</w>",
1272
+ "326": "宙</w>",
1273
+ "327": "定</w>",
1274
+ "328": "宝</w>",
1275
+ "329": "实</w>",
1276
+ "330": "客</w>",
1277
+ "331": "宣</w>",
1278
+ "332": "室</w>",
1279
+ "333": "宵</w>",
1280
+ "334": "家</w>",
1281
+ "335": "寄</w>",
1282
+ "336": "密</w>",
1283
+ "337": "富</w>",
1284
+ "338": "对</w>",
1285
+ "339": "寻</w>",
1286
+ "340": "将</w>",
1287
+ "341": "尊</w>",
1288
+ "342": "小</w>",
1289
+ "343": "少</w>",
1290
+ "344": "就</w>",
1291
+ "345": "尼</w>",
1292
+ "346": "局</w>",
1293
+ "347": "屈</w>",
1294
+ "348": "属</w>",
1295
+ "349": "山</w>",
1296
+ "350": "岁</w>",
1297
+ "351": "岩</w>",
1298
+ "352": "工</w>",
1299
+ "353": "己</w>",
1300
+ "354": "已</w>",
1301
+ "355": "市</w>",
1302
+ "356": "布</w>",
1303
+ "357": "师</w>",
1304
+ "358": "帖</w>",
1305
+ "359": "带</w>",
1306
+ "360": "席</w>",
1307
+ "361": "帮</w>",
1308
+ "362": "常</w>",
1309
+ "363": "帽</w>",
1310
+ "364": "干</w>",
1311
+ "365": "平</w>",
1312
+ "366": "年</w>",
1313
+ "367": "幸</w>",
1314
+ "368": "幹</w>",
1315
+ "369": "广</w>",
1316
+ "370": "庄</w>",
1317
+ "371": "庆</w>",
1318
+ "372": "床</w>",
1319
+ "373": "应</w>",
1320
+ "374": "底</w>",
1321
+ "375": "庙</w>",
1322
+ "376": "庞</w>",
1323
+ "377": "度</w>",
1324
+ "378": "座</w>",
1325
+ "379": "庭</w>",
1326
+ "380": "延</w>",
1327
+ "381": "建</w>",
1328
+ "382": "开</w>",
1329
+ "383": "弃</w>",
1330
+ "384": "式</w>",
1331
+ "385": "弟</w>",
1332
+ "386": "张</w>",
1333
+ "387": "張</w>",
1334
+ "388": "强</w>",
1335
+ "389": "当</w>",
1336
+ "390": "影</w>",
1337
+ "391": "彻</w>",
1338
+ "392": "往</w>",
1339
+ "393": "径</w>",
1340
+ "394": "待</w>",
1341
+ "395": "很</w>",
1342
+ "396": "後</w>",
1343
+ "397": "徒</w>",
1344
+ "398": "得</w>",
1345
+ "399": "從</w>",
1346
+ "400": "微</w>",
1347
+ "401": "德</w>",
1348
+ "402": "心</w>",
1349
+ "403": "必</w>",
1350
+ "404": "志</w>",
1351
+ "405": "忙</w>",
1352
+ "406": "快</w>",
1353
+ "407": "念</w>",
1354
+ "408": "怀</w>",
1355
+ "409": "怎</w>",
1356
+ "410": "急</w>",
1357
+ "411": "总</w>",
1358
+ "412": "息</w>",
1359
+ "413": "悔</w>",
1360
+ "414": "情</w>",
1361
+ "415": "惊</w>",
1362
+ "416": "惜</w>",
1363
+ "417": "惡</w>",
1364
+ "418": "想</w>",
1365
+ "419": "愉</w>",
1366
+ "420": "意</w>",
1367
+ "421": "感</w>",
1368
+ "422": "慢</w>",
1369
+ "423": "應</w>",
1370
+ "424": "戏</w>",
1371
+ "425": "成</w>",
1372
+ "426": "我</w>",
1373
+ "427": "戒</w>",
1374
+ "428": "或</w>",
1375
+ "429": "戴</w>",
1376
+ "430": "户</w>",
1377
+ "431": "房</w>",
1378
+ "432": "所</w>",
1379
+ "433": "扇</w>",
1380
+ "434": "手</w>",
1381
+ "435": "才</w>",
1382
+ "436": "打</w>",
1383
+ "437": "托</w>",
1384
+ "438": "扰</w>",
1385
+ "439": "批</w>",
1386
+ "440": "找</w>",
1387
+ "441": "把</w>",
1388
+ "442": "抓</w>",
1389
+ "443": "护</w>",
1390
+ "444": "报</w>",
1391
+ "445": "抱</w>",
1392
+ "446": "拆</w>",
1393
+ "447": "拉</w>",
1394
+ "448": "拜</w>",
1395
+ "449": "拥</w>",
1396
+ "450": "择</w>",
1397
+ "451": "持</w>",
1398
+ "452": "指</w>",
1399
+ "453": "按</w>",
1400
+ "454": "挑</w>",
1401
+ "455": "挤</w>",
1402
+ "456": "挥</w>",
1403
+ "457": "据</w>",
1404
+ "458": "接</w>",
1405
+ "459": "推</w>",
1406
+ "460": "措</w>",
1407
+ "461": "揉</w>",
1408
+ "462": "插</w>",
1409
+ "463": "揭</w>",
1410
+ "464": "携</w>",
1411
+ "465": "摄</w>",
1412
+ "466": "摇</w>",
1413
+ "467": "摩</w>",
1414
+ "468": "撒</w>",
1415
+ "469": "播</w>",
1416
+ "470": "擔</w>",
1417
+ "471": "支</w>",
1418
+ "472": "收</w>",
1419
+ "473": "改</w>",
1420
+ "474": "放</w>",
1421
+ "475": "故</w>",
1422
+ "476": "救</w>",
1423
+ "477": "教</w>",
1424
+ "478": "散</w>",
1425
+ "479": "敦</w>",
1426
+ "480": "敬</w>",
1427
+ "481": "数</w>",
1428
+ "482": "整</w>",
1429
+ "483": "斯</w>",
1430
+ "484": "新</w>",
1431
+ "485": "方</w>",
1432
+ "486": "施</w>",
1433
+ "487": "旅</w>",
1434
+ "488": "无</w>",
1435
+ "489": "日</w>",
1436
+ "490": "旦</w>",
1437
+ "491": "早</w>",
1438
+ "492": "时</w>",
1439
+ "493": "明</w>",
1440
+ "494": "星</w>",
1441
+ "495": "昨</w>",
1442
+ "496": "是</w>",
1443
+ "497": "時</w>",
1444
+ "498": "晃</w>",
1445
+ "499": "晚</w>",
1446
+ "500": "景</w>",
1447
+ "501": "更</w>",
1448
+ "502": "曾</w>",
1449
+ "503": "最</w>",
1450
+ "504": "會</w>",
1451
+ "505": "月</w>",
1452
+ "506": "有</w>",
1453
+ "507": "朋</w>",
1454
+ "508": "服</w>",
1455
+ "509": "望</w>",
1456
+ "510": "朝</w>",
1457
+ "511": "期</w>",
1458
+ "512": "本</w>",
1459
+ "513": "术</w>",
1460
+ "514": "机</w>",
1461
+ "515": "杀</w>",
1462
+ "516": "杂</w>",
1463
+ "517": "权</w>",
1464
+ "518": "村</w>",
1465
+ "519": "条</w>",
1466
+ "520": "来</w>",
1467
+ "521": "杯</w>",
1468
+ "522": "杰</w>",
1469
+ "523": "松</w>",
1470
+ "524": "果</w>",
1471
+ "525": "架</w>",
1472
+ "526": "某</w>",
1473
+ "527": "标</w>",
1474
+ "528": "栋</w>",
1475
+ "529": "校</w>",
1476
+ "530": "样</w>",
1477
+ "531": "格</w>",
1478
+ "532": "桌</w>",
1479
+ "533": "桥</w>",
1480
+ "534": "楼</w>",
1481
+ "535": "概</w>",
1482
+ "536": "樣</w>",
1483
+ "537": "欠</w>",
1484
+ "538": "次</w>",
1485
+ "539": "欢</w>",
1486
+ "540": "欲</w>",
1487
+ "541": "款</w>",
1488
+ "542": "歉</w>",
1489
+ "543": "歌</w>",
1490
+ "544": "歐</w>",
1491
+ "545": "歡</w>",
1492
+ "546": "止</w>",
1493
+ "547": "正</w>",
1494
+ "548": "步</w>",
1495
+ "549": "死</w>",
1496
+ "550": "段</w>",
1497
+ "551": "母</w>",
1498
+ "552": "每</w>",
1499
+ "553": "比</w>",
1500
+ "554": "毕</w>",
1501
+ "555": "毛</w>",
1502
+ "556": "毫</w>",
1503
+ "557": "气</w>",
1504
+ "558": "水</w>",
1505
+ "559": "永</w>",
1506
+ "560": "池</w>",
1507
+ "561": "汽</w>",
1508
+ "562": "沒</w>",
1509
+ "563": "没</w>",
1510
+ "564": "河</w>",
1511
+ "565": "沸</w>",
1512
+ "566": "油</w>",
1513
+ "567": "沿</w>",
1514
+ "568": "法</w>",
1515
+ "569": "泪</w>",
1516
+ "570": "泳</w>",
1517
+ "571": "洗</w>",
1518
+ "572": "津</w>",
1519
+ "573": "活</w>",
1520
+ "574": "派</w>",
1521
+ "575": "流</w>",
1522
+ "576": "济</w>",
1523
+ "577": "消</w>",
1524
+ "578": "涌</w>",
1525
+ "579": "涨</w>",
1526
+ "580": "清</w>",
1527
+ "581": "温</w>",
1528
+ "582": "港</w>",
1529
+ "583": "游</w>",
1530
+ "584": "湖</w>",
1531
+ "585": "溜</w>",
1532
+ "586": "滑</w>",
1533
+ "587": "满</w>",
1534
+ "588": "演</w>",
1535
+ "589": "澄</w>",
1536
+ "590": "澡</w>",
1537
+ "591": "火</w>",
1538
+ "592": "灯</w>",
1539
+ "593": "灰</w>",
1540
+ "594": "点</w>",
1541
+ "595": "烟</w>",
1542
+ "596": "烦</w>",
1543
+ "597": "热</w>",
1544
+ "598": "然</w>",
1545
+ "599": "照</w>",
1546
+ "600": "爱</w>",
1547
+ "601": "父</w>",
1548
+ "602": "爸</w>",
1549
+ "603": "片</w>",
1550
+ "604": "牛</w>",
1551
+ "605": "物</w>",
1552
+ "606": "狗</w>",
1553
+ "607": "独</w>",
1554
+ "608": "猫</w>",
1555
+ "609": "王</w>",
1556
+ "610": "玩</w>",
1557
+ "611": "环</w>",
1558
+ "612": "现</w>",
1559
+ "613": "班</w>",
1560
+ "614": "球</w>",
1561
+ "615": "理</w>",
1562
+ "616": "生</w>",
1563
+ "617": "用</w>",
1564
+ "618": "由</w>",
1565
+ "619": "电</w>",
1566
+ "620": "男</w>",
1567
+ "621": "界</w>",
1568
+ "622": "留</w>",
1569
+ "623": "當</w>",
1570
+ "624": "疑</w>",
1571
+ "625": "疯</w>",
1572
+ "626": "病</w>",
1573
+ "627": "痛</w>",
1574
+ "628": "瘋</w>",
1575
+ "629": "發</w>",
1576
+ "630": "白</w>",
1577
+ "631": "百</w>",
1578
+ "632": "的</w>",
1579
+ "633": "盐</w>",
1580
+ "634": "盖</w>",
1581
+ "635": "盛</w>",
1582
+ "636": "目</w>",
1583
+ "637": "直</w>",
1584
+ "638": "相</w>",
1585
+ "639": "盹</w>",
1586
+ "640": "看</w>",
1587
+ "641": "真</w>",
1588
+ "642": "眠</w>",
1589
+ "643": "眼</w>",
1590
+ "644": "着</w>",
1591
+ "645": "睛</w>",
1592
+ "646": "睡</w>",
1593
+ "647": "知</w>",
1594
+ "648": "短</w>",
1595
+ "649": "石</w>",
1596
+ "650": "码</w>",
1597
+ "651": "破</w>",
1598
+ "652": "确</w>",
1599
+ "653": "碎</w>",
1600
+ "654": "示</w>",
1601
+ "655": "社</w>",
1602
+ "656": "祝</w>",
1603
+ "657": "神</w>",
1604
+ "658": "票</w>",
1605
+ "659": "福</w>",
1606
+ "660": "离</w>",
1607
+ "661": "私</w>",
1608
+ "662": "种</w>",
1609
+ "663": "秘</w>",
1610
+ "664": "移</w>",
1611
+ "665": "程</w>",
1612
+ "666": "空</w>",
1613
+ "667": "窗</w>",
1614
+ "668": "窜</w>",
1615
+ "669": "站</w>",
1616
+ "670": "童</w>",
1617
+ "671": "笑</w>",
1618
+ "672": "笔</w>",
1619
+ "673": "笛</w>",
1620
+ "674": "第</w>",
1621
+ "675": "笼</w>",
1622
+ "676": "等</w>",
1623
+ "677": "筑</w>",
1624
+ "678": "答</w>",
1625
+ "679": "简</w>",
1626
+ "680": "籍</w>",
1627
+ "681": "粗</w>",
1628
+ "682": "精</w>",
1629
+ "683": "糕</w>",
1630
+ "684": "糟</w>",
1631
+ "685": "素</w>",
1632
+ "686": "索</w>",
1633
+ "687": "給</w>",
1634
+ "688": "經</w>",
1635
+ "689": "總</w>",
1636
+ "690": "红</w>",
1637
+ "691": "纪</w>",
1638
+ "692": "纯</w>",
1639
+ "693": "纸</w>",
1640
+ "694": "线</w>",
1641
+ "695": "绅</w>",
1642
+ "696": "终</w>",
1643
+ "697": "经</w>",
1644
+ "698": "结</w>",
1645
+ "699": "给</w>",
1646
+ "700": "统</w>",
1647
+ "701": "绿</w>",
1648
+ "702": "缺</w>",
1649
+ "703": "网</w>",
1650
+ "704": "罗</w>",
1651
+ "705": "罚</w>",
1652
+ "706": "置</w>",
1653
+ "707": "美</w>",
1654
+ "708": "群</w>",
1655
+ "709": "習</w>",
1656
+ "710": "老</w>",
1657
+ "711": "考</w>",
1658
+ "712": "者</w>",
1659
+ "713": "而</w>",
1660
+ "714": "耍</w>",
1661
+ "715": "耗</w>",
1662
+ "716": "职</w>",
1663
+ "717": "肯</w>",
1664
+ "718": "胖</w>",
1665
+ "719": "能</w>",
1666
+ "720": "脑</w>",
1667
+ "721": "脚</w>",
1668
+ "722": "脸</w>",
1669
+ "723": "腾</w>",
1670
+ "724": "腿</w>",
1671
+ "725": "自</w>",
1672
+ "726": "至</w>",
1673
+ "727": "船</w>",
1674
+ "728": "艰</w>",
1675
+ "729": "色</w>",
1676
+ "730": "艺</w>",
1677
+ "731": "花</w>",
1678
+ "732": "苏</w>",
1679
+ "733": "英</w>",
1680
+ "734": "茶</w>",
1681
+ "735": "药</w>",
1682
+ "736": "落</w>",
1683
+ "737": "著</w>",
1684
+ "738": "虑</w>",
1685
+ "739": "虾</w>",
1686
+ "740": "蜂</w>",
1687
+ "741": "蝴</w>",
1688
+ "742": "蝶</w>",
1689
+ "743": "蠢</w>",
1690
+ "744": "血</w>",
1691
+ "745": "行</w>",
1692
+ "746": "衣</w>",
1693
+ "747": "表</w>",
1694
+ "748": "被</w>",
1695
+ "749": "裡</w>",
1696
+ "750": "要</w>",
1697
+ "751": "覆</w>",
1698
+ "752": "覺</w>",
1699
+ "753": "见</w>",
1700
+ "754": "观</w>",
1701
+ "755": "规</w>",
1702
+ "756": "视</w>",
1703
+ "757": "觉</w>",
1704
+ "758": "解</w>",
1705
+ "759": "言</w>",
1706
+ "760": "計</w>",
1707
+ "761": "試</w>",
1708
+ "762": "話</w>",
1709
+ "763": "該</w>",
1710
+ "764": "誓</w>",
1711
+ "765": "說</w>",
1712
+ "766": "請</w>",
1713
+ "767": "讀</w>",
1714
+ "768": "變</w>",
1715
+ "769": "计</w>",
1716
+ "770": "订</w>",
1717
+ "771": "认</w>",
1718
+ "772": "让</w>",
1719
+ "773": "训</w>",
1720
+ "774": "议</w>",
1721
+ "775": "记</w>",
1722
+ "776": "讲</w>",
1723
+ "777": "讶</w>",
1724
+ "778": "许</w>",
1725
+ "779": "论</w>",
1726
+ "780": "设</w>",
1727
+ "781": "访</w>",
1728
+ "782": "证</w>",
1729
+ "783": "评</w>",
1730
+ "784": "识</w>",
1731
+ "785": "诉</w>",
1732
+ "786": "试</w>",
1733
+ "787": "诗</w>",
1734
+ "788": "诚</w>",
1735
+ "789": "话</w>",
1736
+ "790": "该</w>",
1737
+ "791": "语</w>",
1738
+ "792": "误</w>",
1739
+ "793": "说</w>",
1740
+ "794": "请</w>",
1741
+ "795": "诺</w>",
1742
+ "796": "读</w>",
1743
+ "797": "课</w>",
1744
+ "798": "谁</w>",
1745
+ "799": "谈</w>",
1746
+ "800": "谎</w>",
1747
+ "801": "谢</w>",
1748
+ "802": "象</w>",
1749
+ "803": "賺</w>",
1750
+ "804": "负</w>",
1751
+ "805": "货</w>",
1752
+ "806": "购</w>",
1753
+ "807": "贷</w>",
1754
+ "808": "费</w>",
1755
+ "809": "赛</w>",
1756
+ "810": "赢</w>",
1757
+ "811": "走</w>",
1758
+ "812": "赶</w>",
1759
+ "813": "起</w>",
1760
+ "814": "趕</w>",
1761
+ "815": "趣</w>",
1762
+ "816": "足</w>",
1763
+ "817": "跑</w>",
1764
+ "818": "跟</w>",
1765
+ "819": "路</w>",
1766
+ "820": "踢</w>",
1767
+ "821": "躲</w>",
1768
+ "822": "較</w>",
1769
+ "823": "车</w>",
1770
+ "824": "轨</w>",
1771
+ "825": "转</w>",
1772
+ "826": "轻</w>",
1773
+ "827": "较</w>",
1774
+ "828": "辆</w>",
1775
+ "829": "辈</w>",
1776
+ "830": "辜</w>",
1777
+ "831": "辩</w>",
1778
+ "832": "达</w>",
1779
+ "833": "迅</w>",
1780
+ "834": "过</w>",
1781
+ "835": "近</w>",
1782
+ "836": "还</w>",
1783
+ "837": "这</w>",
1784
+ "838": "进</w>",
1785
+ "839": "远</w>",
1786
+ "840": "迟</w>",
1787
+ "841": "述</w>",
1788
+ "842": "迷</w>",
1789
+ "843": "迹</w>",
1790
+ "844": "送</w>",
1791
+ "845": "适</w>",
1792
+ "846": "逃</w>",
1793
+ "847": "选</w>",
1794
+ "848": "透</w>",
1795
+ "849": "递</w>",
1796
+ "850": "途</w>",
1797
+ "851": "這</w>",
1798
+ "852": "通</w>",
1799
+ "853": "速</w>",
1800
+ "854": "造</w>",
1801
+ "855": "進</w>",
1802
+ "856": "過</w>",
1803
+ "857": "道</w>",
1804
+ "858": "遛</w>",
1805
+ "859": "遠</w>",
1806
+ "860": "邀</w>",
1807
+ "861": "那</w>",
1808
+ "862": "邻</w>",
1809
+ "863": "部</w>",
1810
+ "864": "都</w>",
1811
+ "865": "酒</w>",
1812
+ "866": "采</w>",
1813
+ "867": "里</w>",
1814
+ "868": "重</w>",
1815
+ "869": "金</w>",
1816
+ "870": "钟</w>",
1817
+ "871": "钱</w>",
1818
+ "872": "铁</w>",
1819
+ "873": "铃</w>",
1820
+ "874": "铭</w>",
1821
+ "875": "银</w>",
1822
+ "876": "销</w>",
1823
+ "877": "错</w>",
1824
+ "878": "镜</w>",
1825
+ "879": "長</w>",
1826
+ "880": "长</w>",
1827
+ "881": "間</w>",
1828
+ "882": "问</w>",
1829
+ "883": "间</w>",
1830
+ "884": "闻</w>",
1831
+ "885": "阅</w>",
1832
+ "886": "阐</w>",
1833
+ "887": "防</w>",
1834
+ "888": "阳</w>",
1835
+ "889": "附</w>",
1836
+ "890": "限</w>",
1837
+ "891": "除</w>",
1838
+ "892": "险</w>",
1839
+ "893": "随</w>",
1840
+ "894": "隻</w>",
1841
+ "895": "难</w>",
1842
+ "896": "雨</w>",
1843
+ "897": "雪</w>",
1844
+ "898": "零</w>",
1845
+ "899": "雹</w>",
1846
+ "900": "需</w>",
1847
+ "901": "震</w>",
1848
+ "902": "露</w>",
1849
+ "903": "非</w>",
1850
+ "904": "靠</w>",
1851
+ "905": "面</w>",
1852
+ "906": "音</w>",
1853
+ "907": "題</w>",
1854
+ "908": "项</w>",
1855
+ "909": "须</w>",
1856
+ "910": "顾</w>",
1857
+ "911": "预</w>",
1858
+ "912": "题</w>",
1859
+ "913": "风</w>",
1860
+ "914": "飞</w>",
1861
+ "915": "食</w>",
1862
+ "916": "餐</w>",
1863
+ "917": "饭</w>",
1864
+ "918": "饿</w>",
1865
+ "919": "首</w>",
1866
+ "920": "马</w>",
1867
+ "921": "驶</w>",
1868
+ "922": "验</w>",
1869
+ "923": "骑</w>",
1870
+ "924": "骗</w>",
1871
+ "925": "高</w>",
1872
+ "926": "鬼</w>",
1873
+ "927": "鱼</w>",
1874
+ "928": "鲍</w>",
1875
+ "929": "鲜</w>",
1876
+ "930": "麻</w>",
1877
+ "931": "麼</w>",
1878
+ "932": "點</w>",
1879
+ "933": "鼠</w>",
1880
+ "934": "龙</w>",
1881
+ "935": "﹐</w>",
1882
+ "936": "!</w>",
1883
+ "937": ",</w>",
1884
+ "938": "?</w>"
1885
+ },
1886
+ "merges": [
1887
+ [
1888
+ "。",
1889
+ "</w>"
1890
+ ],
1891
+ [
1892
+ "我",
1893
+ "</w>"
1894
+ ],
1895
+ [
1896
+ "的",
1897
+ "</w>"
1898
+ ],
1899
+ [
1900
+ "了",
1901
+ "</w>"
1902
+ ],
1903
+ [
1904
+ "他",
1905
+ "</w>"
1906
+ ],
1907
+ [
1908
+ "是",
1909
+ "</w>"
1910
+ ],
1911
+ [
1912
+ "你",
1913
+ "</w>"
1914
+ ],
1915
+ [
1916
+ "这",
1917
+ "</w>"
1918
+ ],
1919
+ [
1920
+ "一",
1921
+ "</w>"
1922
+ ],
1923
+ [
1924
+ ",",
1925
+ "</w>"
1926
+ ],
1927
+ [
1928
+ "不",
1929
+ "</w>"
1930
+ ],
1931
+ [
1932
+ "在",
1933
+ "</w>"
1934
+ ],
1935
+ [
1936
+ "们",
1937
+ "</w>"
1938
+ ],
1939
+ [
1940
+ "有",
1941
+ "</w>"
1942
+ ],
1943
+ [
1944
+ "个",
1945
+ "</w>"
1946
+ ],
1947
+ [
1948
+ "?",
1949
+ "</w>"
1950
+ ],
1951
+ [
1952
+ "她",
1953
+ "</w>"
1954
+ ],
1955
+ [
1956
+ "很",
1957
+ "</w>"
1958
+ ],
1959
+ [
1960
+ "会",
1961
+ "</w>"
1962
+ ],
1963
+ [
1964
+ "去",
1965
+ "</w>"
1966
+ ],
1967
+ [
1968
+ "人",
1969
+ "</w>"
1970
+ ],
1971
+ [
1972
+ "要",
1973
+ "</w>"
1974
+ ],
1975
+ [
1976
+ "来",
1977
+ "</w>"
1978
+ ],
1979
+ [
1980
+ "生",
1981
+ "</w>"
1982
+ ],
1983
+ [
1984
+ "得",
1985
+ "</w>"
1986
+ ],
1987
+ [
1988
+ "上",
1989
+ "</w>"
1990
+ ],
1991
+ [
1992
+ "天",
1993
+ "</w>"
1994
+ ],
1995
+ [
1996
+ "就",
1997
+ "</w>"
1998
+ ],
1999
+ [
2000
+ "子",
2001
+ "</w>"
2002
+ ],
2003
+ [
2004
+ "到",
2005
+ "</w>"
2006
+ ],
2007
+ [
2008
+ "车",
2009
+ "</w>"
2010
+ ],
2011
+ [
2012
+ "么",
2013
+ "</w>"
2014
+ ],
2015
+ [
2016
+ "吗",
2017
+ "</w>"
2018
+ ],
2019
+ [
2020
+ "没",
2021
+ "</w>"
2022
+ ],
2023
+ [
2024
+ "里",
2025
+ "</w>"
2026
+ ],
2027
+ [
2028
+ "能",
2029
+ "</w>"
2030
+ ],
2031
+ [
2032
+ "想",
2033
+ "</w>"
2034
+ ],
2035
+ [
2036
+ "大",
2037
+ "</w>"
2038
+ ],
2039
+ [
2040
+ "可",
2041
+ "</w>"
2042
+ ],
2043
+ [
2044
+ "说",
2045
+ "</w>"
2046
+ ],
2047
+ [
2048
+ "那",
2049
+ "</w>"
2050
+ ],
2051
+ [
2052
+ "什",
2053
+ "</w>"
2054
+ ],
2055
+ [
2056
+ "下",
2057
+ "</w>"
2058
+ ],
2059
+ [
2060
+ "对",
2061
+ "</w>"
2062
+ ],
2063
+ [
2064
+ "看",
2065
+ "</w>"
2066
+ ],
2067
+ [
2068
+ "多",
2069
+ "</w>"
2070
+ ],
2071
+ [
2072
+ "!",
2073
+ "</w>"
2074
+ ],
2075
+ [
2076
+ "喜",
2077
+ "</w>"
2078
+ ],
2079
+ [
2080
+ "以",
2081
+ "</w>"
2082
+ ],
2083
+ [
2084
+ "学",
2085
+ "</w>"
2086
+ ],
2087
+ [
2088
+ "过",
2089
+ "</w>"
2090
+ ],
2091
+ [
2092
+ "知",
2093
+ "</w>"
2094
+ ],
2095
+ [
2096
+ "给",
2097
+ "</w>"
2098
+ ],
2099
+ [
2100
+ "都",
2101
+ "</w>"
2102
+ ],
2103
+ [
2104
+ "日",
2105
+ "</w>"
2106
+ ],
2107
+ [
2108
+ "家",
2109
+ "</w>"
2110
+ ],
2111
+ [
2112
+ "事",
2113
+ "</w>"
2114
+ ],
2115
+ [
2116
+ "好",
2117
+ "</w>"
2118
+ ],
2119
+ [
2120
+ "为",
2121
+ "</w>"
2122
+ ],
2123
+ [
2124
+ "行",
2125
+ "</w>"
2126
+ ],
2127
+ [
2128
+ "成",
2129
+ "</w>"
2130
+ ],
2131
+ [
2132
+ "欢",
2133
+ "</w>"
2134
+ ],
2135
+ [
2136
+ "时",
2137
+ "</w>"
2138
+ ],
2139
+ [
2140
+ "也",
2141
+ "</w>"
2142
+ ],
2143
+ [
2144
+ "道",
2145
+ "</w>"
2146
+ ],
2147
+ [
2148
+ "问",
2149
+ "</w>"
2150
+ ],
2151
+ [
2152
+ "开",
2153
+ "</w>"
2154
+ ],
2155
+ [
2156
+ "和",
2157
+ "</w>"
2158
+ ],
2159
+ [
2160
+ "孩",
2161
+ "</w>"
2162
+ ],
2163
+ [
2164
+ "出",
2165
+ "</w>"
2166
+ ],
2167
+ [
2168
+ "快",
2169
+ "</w>"
2170
+ ],
2171
+ [
2172
+ "常",
2173
+ "</w>"
2174
+ ],
2175
+ [
2176
+ "现",
2177
+ "</w>"
2178
+ ],
2179
+ [
2180
+ "间",
2181
+ "</w>"
2182
+ ],
2183
+ [
2184
+ "如",
2185
+ "</w>"
2186
+ ],
2187
+ [
2188
+ "无",
2189
+ "</w>"
2190
+ ],
2191
+ [
2192
+ "法",
2193
+ "</w>"
2194
+ ],
2195
+ [
2196
+ "地",
2197
+ "</w>"
2198
+ ],
2199
+ [
2200
+ "比",
2201
+ "</w>"
2202
+ ],
2203
+ [
2204
+ "回",
2205
+ "</w>"
2206
+ ],
2207
+ [
2208
+ "果",
2209
+ "</w>"
2210
+ ],
2211
+ [
2212
+ "“",
2213
+ "</w>"
2214
+ ],
2215
+ [
2216
+ "样",
2217
+ "</w>"
2218
+ ],
2219
+ [
2220
+ "”",
2221
+ "</w>"
2222
+ ],
2223
+ [
2224
+ "試",
2225
+ "</w>"
2226
+ ],
2227
+ [
2228
+ "从",
2229
+ "</w>"
2230
+ ],
2231
+ [
2232
+ "把",
2233
+ "</w>"
2234
+ ],
2235
+ [
2236
+ "做",
2237
+ "</w>"
2238
+ ],
2239
+ [
2240
+ "老",
2241
+ "</w>"
2242
+ ],
2243
+ [
2244
+ "?",
2245
+ "</w>"
2246
+ ],
2247
+ [
2248
+ "听",
2249
+ "</w>"
2250
+ ],
2251
+ [
2252
+ "本",
2253
+ "</w>"
2254
+ ],
2255
+ [
2256
+ "爸",
2257
+ "</w>"
2258
+ ],
2259
+ [
2260
+ "妈",
2261
+ "</w>"
2262
+ ],
2263
+ [
2264
+ "还",
2265
+ "</w>"
2266
+ ],
2267
+ [
2268
+ "這",
2269
+ "</w>"
2270
+ ],
2271
+ [
2272
+ "年",
2273
+ "</w>"
2274
+ ],
2275
+ [
2276
+ "用",
2277
+ "</w>"
2278
+ ],
2279
+ [
2280
+ "话",
2281
+ "</w>"
2282
+ ],
2283
+ [
2284
+ "旅",
2285
+ "</w>"
2286
+ ],
2287
+ [
2288
+ "明",
2289
+ "</w>"
2290
+ ],
2291
+ [
2292
+ "点",
2293
+ "</w>"
2294
+ ],
2295
+ [
2296
+ "完",
2297
+ "</w>"
2298
+ ],
2299
+ [
2300
+ "月",
2301
+ "</w>"
2302
+ ],
2303
+ [
2304
+ "着",
2305
+ "</w>"
2306
+ ],
2307
+ [
2308
+ "之",
2309
+ "</w>"
2310
+ ],
2311
+ [
2312
+ "周",
2313
+ "</w>"
2314
+ ],
2315
+ [
2316
+ "怎",
2317
+ "</w>"
2318
+ ],
2319
+ [
2320
+ "意",
2321
+ "</w>"
2322
+ ],
2323
+ [
2324
+ "重",
2325
+ "</w>"
2326
+ ],
2327
+ [
2328
+ "工",
2329
+ "</w>"
2330
+ ],
2331
+ [
2332
+ "哪",
2333
+ "</w>"
2334
+ ],
2335
+ [
2336
+ "国",
2337
+ "</w>"
2338
+ ],
2339
+ [
2340
+ "正",
2341
+ "</w>"
2342
+ ],
2343
+ [
2344
+ "游",
2345
+ "</w>"
2346
+ ],
2347
+ [
2348
+ "发",
2349
+ "</w>"
2350
+ ],
2351
+ [
2352
+ "起",
2353
+ "</w>"
2354
+ ],
2355
+ [
2356
+ "作",
2357
+ "</w>"
2358
+ ],
2359
+ [
2360
+ "些",
2361
+ "</w>"
2362
+ ],
2363
+ [
2364
+ "麼",
2365
+ "</w>"
2366
+ ],
2367
+ [
2368
+ "走",
2369
+ "</w>"
2370
+ ],
2371
+ [
2372
+ "后",
2373
+ "</w>"
2374
+ ],
2375
+ [
2376
+ "认",
2377
+ "</w>"
2378
+ ],
2379
+ [
2380
+ "前",
2381
+ "</w>"
2382
+ ],
2383
+ [
2384
+ ".",
2385
+ "</w>"
2386
+ ],
2387
+ [
2388
+ "物",
2389
+ "</w>"
2390
+ ],
2391
+ [
2392
+ "0",
2393
+ "</w>"
2394
+ ],
2395
+ [
2396
+ "美",
2397
+ "</w>"
2398
+ ],
2399
+ [
2400
+ "元",
2401
+ "</w>"
2402
+ ],
2403
+ [
2404
+ "它",
2405
+ "</w>"
2406
+ ],
2407
+ [
2408
+ "房",
2409
+ "</w>"
2410
+ ],
2411
+ [
2412
+ "员",
2413
+ "</w>"
2414
+ ],
2415
+ [
2416
+ "太",
2417
+ "</w>"
2418
+ ],
2419
+ [
2420
+ "几",
2421
+ "</w>"
2422
+ ],
2423
+ [
2424
+ "期",
2425
+ "</w>"
2426
+ ],
2427
+ [
2428
+ "球",
2429
+ "</w>"
2430
+ ],
2431
+ [
2432
+ "乐",
2433
+ "</w>"
2434
+ ],
2435
+ [
2436
+ "部",
2437
+ "</w>"
2438
+ ],
2439
+ [
2440
+ "书",
2441
+ "</w>"
2442
+ ],
2443
+ [
2444
+ "候",
2445
+ "</w>"
2446
+ ],
2447
+ [
2448
+ "但",
2449
+ "</w>"
2450
+ ],
2451
+ [
2452
+ "小",
2453
+ "</w>"
2454
+ ],
2455
+ [
2456
+ "自",
2457
+ "</w>"
2458
+ ],
2459
+ [
2460
+ "情",
2461
+ "</w>"
2462
+ ],
2463
+ [
2464
+ "讲",
2465
+ "</w>"
2466
+ ],
2467
+ [
2468
+ "经",
2469
+ "</w>"
2470
+ ],
2471
+ [
2472
+ "电",
2473
+ "</w>"
2474
+ ],
2475
+ [
2476
+ "高",
2477
+ "</w>"
2478
+ ],
2479
+ [
2480
+ "觉",
2481
+ "</w>"
2482
+ ],
2483
+ [
2484
+ "感",
2485
+ "</w>"
2486
+ ],
2487
+ [
2488
+ "直",
2489
+ "</w>"
2490
+ ],
2491
+ [
2492
+ "请",
2493
+ "</w>"
2494
+ ],
2495
+ [
2496
+ "告",
2497
+ "</w>"
2498
+ ],
2499
+ [
2500
+ "妹",
2501
+ "</w>"
2502
+ ],
2503
+ [
2504
+ "住",
2505
+ "</w>"
2506
+ ],
2507
+ [
2508
+ "让",
2509
+ "</w>"
2510
+ ],
2511
+ [
2512
+ "活",
2513
+ "</w>"
2514
+ ],
2515
+ [
2516
+ "真",
2517
+ "</w>"
2518
+ ],
2519
+ [
2520
+ "個",
2521
+ "</w>"
2522
+ ],
2523
+ [
2524
+ "始",
2525
+ "</w>"
2526
+ ],
2527
+ [
2528
+ "信",
2529
+ "</w>"
2530
+ ],
2531
+ [
2532
+ "更",
2533
+ "</w>"
2534
+ ],
2535
+ [
2536
+ "号",
2537
+ "</w>"
2538
+ ],
2539
+ [
2540
+ "們",
2541
+ "</w>"
2542
+ ],
2543
+ [
2544
+ "件",
2545
+ "</w>"
2546
+ ],
2547
+ [
2548
+ "外",
2549
+ "</w>"
2550
+ ],
2551
+ [
2552
+ "见",
2553
+ "</w>"
2554
+ ],
2555
+ [
2556
+ "于",
2557
+ "</w>"
2558
+ ],
2559
+ [
2560
+ "喝",
2561
+ "</w>"
2562
+ ],
2563
+ [
2564
+ "爱",
2565
+ "</w>"
2566
+ ],
2567
+ [
2568
+ "班",
2569
+ "</w>"
2570
+ ],
2571
+ [
2572
+ "少",
2573
+ "</w>"
2574
+ ],
2575
+ [
2576
+ "单",
2577
+ "</w>"
2578
+ ],
2579
+ [
2580
+ "世",
2581
+ "</w>"
2582
+ ],
2583
+ [
2584
+ "校",
2585
+ "</w>"
2586
+ ],
2587
+ [
2588
+ "最",
2589
+ "</w>"
2590
+ ],
2591
+ [
2592
+ "定",
2593
+ "</w>"
2594
+ ],
2595
+ [
2596
+ "力",
2597
+ "</w>"
2598
+ ],
2599
+ [
2600
+ "何",
2601
+ "</w>"
2602
+ ],
2603
+ [
2604
+ "吧",
2605
+ "</w>"
2606
+ ],
2607
+ [
2608
+ "该",
2609
+ "</w>"
2610
+ ],
2611
+ [
2612
+ "接",
2613
+ "</w>"
2614
+ ],
2615
+ [
2616
+ "将",
2617
+ "</w>"
2618
+ ],
2619
+ [
2620
+ "难",
2621
+ "</w>"
2622
+ ],
2623
+ [
2624
+ "识",
2625
+ "</w>"
2626
+ ],
2627
+ [
2628
+ "密",
2629
+ "</w>"
2630
+ ],
2631
+ [
2632
+ "打",
2633
+ "</w>"
2634
+ ],
2635
+ [
2636
+ "非",
2637
+ "</w>"
2638
+ ],
2639
+ [
2640
+ "中",
2641
+ "</w>"
2642
+ ],
2643
+ [
2644
+ "诉",
2645
+ "</w>"
2646
+ ],
2647
+ [
2648
+ "许",
2649
+ "</w>"
2650
+ ],
2651
+ [
2652
+ "i",
2653
+ "r"
2654
+ ],
2655
+ [
2656
+ "u",
2657
+ "ir"
2658
+ ],
2659
+ [
2660
+ "e",
2661
+ "l"
2662
+ ],
2663
+ [
2664
+ "m",
2665
+ "uir"
2666
+ ],
2667
+ [
2668
+ "i",
2669
+ "el"
2670
+ ],
2671
+ [
2672
+ "muir",
2673
+ "iel"
2674
+ ],
2675
+ [
2676
+ "muiriel",
2677
+ "</w>"
2678
+ ],
2679
+ [
2680
+ "再",
2681
+ "</w>"
2682
+ ],
2683
+ [
2684
+ "相",
2685
+ "</w>"
2686
+ ],
2687
+ [
2688
+ "其",
2689
+ "</w>"
2690
+ ],
2691
+ [
2692
+ "心",
2693
+ "</w>"
2694
+ ],
2695
+ [
2696
+ "长",
2697
+ "</w>"
2698
+ ],
2699
+ [
2700
+ "取",
2701
+ "</w>"
2702
+ ],
2703
+ [
2704
+ "语",
2705
+ "</w>"
2706
+ ],
2707
+ [
2708
+ "网",
2709
+ "</w>"
2710
+ ],
2711
+ [
2712
+ "消",
2713
+ "</w>"
2714
+ ],
2715
+ [
2716
+ "息",
2717
+ "</w>"
2718
+ ],
2719
+ [
2720
+ "惊",
2721
+ "</w>"
2722
+ ],
2723
+ [
2724
+ "等",
2725
+ "</w>"
2726
+ ],
2727
+ [
2728
+ "公",
2729
+ "</w>"
2730
+ ],
2731
+ [
2732
+ "简",
2733
+ "</w>"
2734
+ ],
2735
+ [
2736
+ "被",
2737
+ "</w>"
2738
+ ],
2739
+ [
2740
+ "种",
2741
+ "</w>"
2742
+ ],
2743
+ [
2744
+ "趣",
2745
+ "</w>"
2746
+ ],
2747
+ [
2748
+ "已",
2749
+ "</w>"
2750
+ ],
2751
+ [
2752
+ "影",
2753
+ "</w>"
2754
+ ],
2755
+ [
2756
+ "疑",
2757
+ "</w>"
2758
+ ],
2759
+ [
2760
+ "史",
2761
+ "</w>"
2762
+ ],
2763
+ [
2764
+ "题",
2765
+ "</w>"
2766
+ ],
2767
+ [
2768
+ "啊",
2769
+ "</w>"
2770
+ ],
2771
+ [
2772
+ "同",
2773
+ "</w>"
2774
+ ],
2775
+ [
2776
+ "睡",
2777
+ "</w>"
2778
+ ],
2779
+ [
2780
+ "离",
2781
+ "</w>"
2782
+ ],
2783
+ [
2784
+ "三",
2785
+ "</w>"
2786
+ ],
2787
+ [
2788
+ "方",
2789
+ "</w>"
2790
+ ],
2791
+ [
2792
+ "响",
2793
+ "</w>"
2794
+ ],
2795
+ [
2796
+ "兴",
2797
+ "</w>"
2798
+ ],
2799
+ [
2800
+ "医",
2801
+ "</w>"
2802
+ ],
2803
+ [
2804
+ "建",
2805
+ "</w>"
2806
+ ],
2807
+ [
2808
+ "议",
2809
+ "</w>"
2810
+ ],
2811
+ [
2812
+ "戒",
2813
+ "</w>"
2814
+ ],
2815
+ [
2816
+ "坐",
2817
+ "</w>"
2818
+ ],
2819
+ [
2820
+ "向",
2821
+ "</w>"
2822
+ ],
2823
+ [
2824
+ "切",
2825
+ "</w>"
2826
+ ],
2827
+ [
2828
+ "读",
2829
+ "</w>"
2830
+ ],
2831
+ [
2832
+ "火",
2833
+ "</w>"
2834
+ ],
2835
+ [
2836
+ "斯",
2837
+ "</w>"
2838
+ ],
2839
+ [
2840
+ "计",
2841
+ "</w>"
2842
+ ],
2843
+ [
2844
+ "往",
2845
+ "</w>"
2846
+ ],
2847
+ [
2848
+ "問",
2849
+ "</w>"
2850
+ ],
2851
+ [
2852
+ "除",
2853
+ "</w>"
2854
+ ],
2855
+ [
2856
+ "罗",
2857
+ "</w>"
2858
+ ],
2859
+ [
2860
+ "马",
2861
+ "</w>"
2862
+ ],
2863
+ [
2864
+ "任",
2865
+ "</w>"
2866
+ ],
2867
+ [
2868
+ "必",
2869
+ "</w>"
2870
+ ],
2871
+ [
2872
+ "须",
2873
+ "</w>"
2874
+ ],
2875
+ [
2876
+ "新",
2877
+ "</w>"
2878
+ ],
2879
+ [
2880
+ "客",
2881
+ "</w>"
2882
+ ],
2883
+ [
2884
+ "今",
2885
+ "</w>"
2886
+ ],
2887
+ [
2888
+ "而",
2889
+ "</w>"
2890
+ ],
2891
+ [
2892
+ "水",
2893
+ "</w>"
2894
+ ],
2895
+ [
2896
+ "名",
2897
+ "</w>"
2898
+ ],
2899
+ [
2900
+ "变",
2901
+ "</w>"
2902
+ ],
2903
+ [
2904
+ "界",
2905
+ "</w>"
2906
+ ],
2907
+ [
2908
+ "加",
2909
+ "</w>"
2910
+ ],
2911
+ [
2912
+ "使",
2913
+ "</w>"
2914
+ ],
2915
+ [
2916
+ "毫",
2917
+ "</w>"
2918
+ ],
2919
+ [
2920
+ "习",
2921
+ "</w>"
2922
+ ],
2923
+ [
2924
+ "玩",
2925
+ "</w>"
2926
+ ],
2927
+ [
2928
+ "耍",
2929
+ "</w>"
2930
+ ],
2931
+ [
2932
+ "记",
2933
+ "</w>"
2934
+ ],
2935
+ [
2936
+ "分",
2937
+ "</w>"
2938
+ ],
2939
+ [
2940
+ "待",
2941
+ "</w>"
2942
+ ],
2943
+ [
2944
+ "男",
2945
+ "</w>"
2946
+ ],
2947
+ [
2948
+ "俱",
2949
+ "</w>"
2950
+ ],
2951
+ [
2952
+ "图",
2953
+ "</w>"
2954
+ ],
2955
+ [
2956
+ "笑",
2957
+ "</w>"
2958
+ ],
2959
+ [
2960
+ "述",
2961
+ "</w>"
2962
+ ],
2963
+ [
2964
+ "理",
2965
+ "</w>"
2966
+ ],
2967
+ [
2968
+ "由",
2969
+ "</w>"
2970
+ ],
2971
+ [
2972
+ "山",
2973
+ "</w>"
2974
+ ],
2975
+ [
2976
+ "式",
2977
+ "</w>"
2978
+ ],
2979
+ [
2980
+ "己",
2981
+ "</w>"
2982
+ ],
2983
+ [
2984
+ "學",
2985
+ "</w>"
2986
+ ],
2987
+ [
2988
+ "目",
2989
+ "</w>"
2990
+ ],
2991
+ [
2992
+ "面",
2993
+ "</w>"
2994
+ ],
2995
+ [
2996
+ "骑",
2997
+ "</w>"
2998
+ ],
2999
+ [
3000
+ "实",
3001
+ "</w>"
3002
+ ],
3003
+ [
3004
+ "時",
3005
+ "</w>"
3006
+ ],
3007
+ [
3008
+ "服",
3009
+ "</w>"
3010
+ ],
3011
+ [
3012
+ "合",
3013
+ "</w>"
3014
+ ],
3015
+ [
3016
+ "手",
3017
+ "</w>"
3018
+ ],
3019
+ [
3020
+ "第",
3021
+ "</w>"
3022
+ ],
3023
+ [
3024
+ "母",
3025
+ "</w>"
3026
+ ],
3027
+ [
3028
+ "留",
3029
+ "</w>"
3030
+ ],
3031
+ [
3032
+ "买",
3033
+ "</w>"
3034
+ ],
3035
+ [
3036
+ "准",
3037
+ "</w>"
3038
+ ],
3039
+ [
3040
+ "权",
3041
+ "</w>"
3042
+ ],
3043
+ [
3044
+ "烟",
3045
+ "</w>"
3046
+ ],
3047
+ [
3048
+ "忙",
3049
+ "</w>"
3050
+ ],
3051
+ [
3052
+ "找",
3053
+ "</w>"
3054
+ ],
3055
+ [
3056
+ "應",
3057
+ "</w>"
3058
+ ],
3059
+ [
3060
+ "該",
3061
+ "</w>"
3062
+ ],
3063
+ [
3064
+ "乎",
3065
+ "</w>"
3066
+ ],
3067
+ [
3068
+ "放",
3069
+ "</w>"
3070
+ ],
3071
+ [
3072
+ "站",
3073
+ "</w>"
3074
+ ],
3075
+ [
3076
+ "早",
3077
+ "</w>"
3078
+ ],
3079
+ [
3080
+ "度",
3081
+ "</w>"
3082
+ ],
3083
+ [
3084
+ "交",
3085
+ "</w>"
3086
+ ],
3087
+ [
3088
+ "樣",
3089
+ "</w>"
3090
+ ],
3091
+ [
3092
+ "十",
3093
+ "</w>"
3094
+ ],
3095
+ [
3096
+ "足",
3097
+ "</w>"
3098
+ ],
3099
+ [
3100
+ "解",
3101
+ "</w>"
3102
+ ],
3103
+ [
3104
+ "底",
3105
+ "</w>"
3106
+ ],
3107
+ [
3108
+ "題",
3109
+ "</w>"
3110
+ ],
3111
+ [
3112
+ "死",
3113
+ "</w>"
3114
+ ],
3115
+ [
3116
+ "宇",
3117
+ "</w>"
3118
+ ],
3119
+ [
3120
+ "限",
3121
+ "</w>"
3122
+ ],
3123
+ [
3124
+ "通",
3125
+ "</w>"
3126
+ ],
3127
+ [
3128
+ "庭",
3129
+ "</w>"
3130
+ ],
3131
+ [
3132
+ "秘",
3133
+ "</w>"
3134
+ ],
3135
+ [
3136
+ "光",
3137
+ "</w>"
3138
+ ],
3139
+ [
3140
+ "错",
3141
+ "</w>"
3142
+ ],
3143
+ [
3144
+ "务",
3145
+ "</w>"
3146
+ ],
3147
+ [
3148
+ "當",
3149
+ "</w>"
3150
+ ],
3151
+ [
3152
+ "广",
3153
+ "</w>"
3154
+ ],
3155
+ [
3156
+ "场",
3157
+ "</w>"
3158
+ ],
3159
+ [
3160
+ "险",
3161
+ "</w>"
3162
+ ],
3163
+ [
3164
+ "昨",
3165
+ "</w>"
3166
+ ],
3167
+ [
3168
+ "e",
3169
+ "</w>"
3170
+ ],
3171
+ [
3172
+ "望",
3173
+ "</w>"
3174
+ ],
3175
+ [
3176
+ "轻",
3177
+ "</w>"
3178
+ ],
3179
+ [
3180
+ "所",
3181
+ "</w>"
3182
+ ],
3183
+ [
3184
+ "需",
3185
+ "</w>"
3186
+ ],
3187
+ [
3188
+ "帮",
3189
+ "</w>"
3190
+ ],
3191
+ [
3192
+ "偷",
3193
+ "</w>"
3194
+ ],
3195
+ [
3196
+ "岁",
3197
+ "</w>"
3198
+ ],
3199
+ [
3200
+ "酒",
3201
+ "</w>"
3202
+ ],
3203
+ [
3204
+ "园",
3205
+ "</w>"
3206
+ ],
3207
+ [
3208
+ "雨",
3209
+ "</w>"
3210
+ ],
3211
+ [
3212
+ "然",
3213
+ "</w>"
3214
+ ],
3215
+ [
3216
+ "每",
3217
+ "</w>"
3218
+ ],
3219
+ [
3220
+ "像",
3221
+ "</w>"
3222
+ ],
3223
+ [
3224
+ "功",
3225
+ "</w>"
3226
+ ],
3227
+ [
3228
+ "6",
3229
+ "</w>"
3230
+ ],
3231
+ [
3232
+ "写",
3233
+ "</w>"
3234
+ ],
3235
+ [
3236
+ "照",
3237
+ "</w>"
3238
+ ],
3239
+ [
3240
+ "猫",
3241
+ "</w>"
3242
+ ],
3243
+ [
3244
+ "划",
3245
+ "</w>"
3246
+ ],
3247
+ [
3248
+ "赛",
3249
+ "</w>"
3250
+ ],
3251
+ [
3252
+ "增",
3253
+ "</w>"
3254
+ ],
3255
+ [
3256
+ "则",
3257
+ "</w>"
3258
+ ],
3259
+ [
3260
+ "全",
3261
+ "</w>"
3262
+ ],
3263
+ [
3264
+ "洗",
3265
+ "</w>"
3266
+ ],
3267
+ [
3268
+ "1",
3269
+ "0</w>"
3270
+ ],
3271
+ [
3272
+ "义",
3273
+ "</w>"
3274
+ ],
3275
+ [
3276
+ "儿",
3277
+ "</w>"
3278
+ ],
3279
+ [
3280
+ "籍",
3281
+ "</w>"
3282
+ ],
3283
+ [
3284
+ "哦",
3285
+ "</w>"
3286
+ ],
3287
+ [
3288
+ "尊",
3289
+ "</w>"
3290
+ ],
3291
+ [
3292
+ "敬",
3293
+ "</w>"
3294
+ ],
3295
+ [
3296
+ "辈",
3297
+ "</w>"
3298
+ ],
3299
+ [
3300
+ "另",
3301
+ "</w>"
3302
+ ],
3303
+ [
3304
+ "程",
3305
+ "</w>"
3306
+ ],
3307
+ [
3308
+ "英",
3309
+ "</w>"
3310
+ ],
3311
+ [
3312
+ "师",
3313
+ "</w>"
3314
+ ],
3315
+ [
3316
+ "例",
3317
+ "</w>"
3318
+ ],
3319
+ [
3320
+ "腾",
3321
+ "</w>"
3322
+ ],
3323
+ [
3324
+ "钟",
3325
+ "</w>"
3326
+ ],
3327
+ [
3328
+ "吃",
3329
+ "</w>"
3330
+ ],
3331
+ [
3332
+ "脸",
3333
+ "</w>"
3334
+ ],
3335
+ [
3336
+ "据",
3337
+ "</w>"
3338
+ ],
3339
+ [
3340
+ "座",
3341
+ "</w>"
3342
+ ],
3343
+ [
3344
+ "雪",
3345
+ "</w>"
3346
+ ],
3347
+ [
3348
+ "款",
3349
+ "</w>"
3350
+ ],
3351
+ [
3352
+ "帽",
3353
+ "</w>"
3354
+ ],
3355
+ [
3356
+ "当",
3357
+ "</w>"
3358
+ ],
3359
+ [
3360
+ "办",
3361
+ "</w>"
3362
+ ],
3363
+ [
3364
+ "後",
3365
+ "</w>"
3366
+ ],
3367
+ [
3368
+ "厌",
3369
+ "</w>"
3370
+ ],
3371
+ [
3372
+ "倦",
3373
+ "</w>"
3374
+ ],
3375
+ [
3376
+ "观",
3377
+ "</w>"
3378
+ ],
3379
+ [
3380
+ "众",
3381
+ "</w>"
3382
+ ],
3383
+ [
3384
+ "制",
3385
+ "</w>"
3386
+ ],
3387
+ [
3388
+ "造",
3389
+ "</w>"
3390
+ ],
3391
+ [
3392
+ "借",
3393
+ "</w>"
3394
+ ],
3395
+ [
3396
+ "口",
3397
+ "</w>"
3398
+ ],
3399
+ [
3400
+ "石",
3401
+ "</w>"
3402
+ ],
3403
+ [
3404
+ "故",
3405
+ "</w>"
3406
+ ],
3407
+ [
3408
+ "艺",
3409
+ "</w>"
3410
+ ],
3411
+ [
3412
+ "术",
3413
+ "</w>"
3414
+ ],
3415
+ [
3416
+ "采",
3417
+ "</w>"
3418
+ ],
3419
+ [
3420
+ "预",
3421
+ "</w>"
3422
+ ],
3423
+ [
3424
+ "沒",
3425
+ "</w>"
3426
+ ],
3427
+ [
3428
+ "历",
3429
+ "</w>"
3430
+ ],
3431
+ [
3432
+ "肯",
3433
+ "</w>"
3434
+ ],
3435
+ [
3436
+ "毛",
3437
+ "</w>"
3438
+ ],
3439
+ [
3440
+ "条",
3441
+ "</w>"
3442
+ ],
3443
+ [
3444
+ "路",
3445
+ "</w>"
3446
+ ],
3447
+ [
3448
+ "父",
3449
+ "</w>"
3450
+ ],
3451
+ [
3452
+ "两",
3453
+ "</w>"
3454
+ ],
3455
+ [
3456
+ "受",
3457
+ "</w>"
3458
+ ],
3459
+ [
3460
+ "船",
3461
+ "</w>"
3462
+ ],
3463
+ [
3464
+ "朝",
3465
+ "</w>"
3466
+ ],
3467
+ [
3468
+ "确",
3469
+ "</w>"
3470
+ ],
3471
+ [
3472
+ "保",
3473
+ "</w>"
3474
+ ],
3475
+ [
3476
+ "覺",
3477
+ "</w>"
3478
+ ],
3479
+ [
3480
+ "先",
3481
+ "</w>"
3482
+ ],
3483
+ [
3484
+ "示",
3485
+ "</w>"
3486
+ ],
3487
+ [
3488
+ "温",
3489
+ "</w>"
3490
+ ],
3491
+ [
3492
+ "零",
3493
+ "</w>"
3494
+ ],
3495
+ [
3496
+ "报",
3497
+ "</w>"
3498
+ ],
3499
+ [
3500
+ "失",
3501
+ "</w>"
3502
+ ],
3503
+ [
3504
+ "视",
3505
+ "</w>"
3506
+ ],
3507
+ [
3508
+ "线",
3509
+ "</w>"
3510
+ ],
3511
+ [
3512
+ "士",
3513
+ "</w>"
3514
+ ],
3515
+ [
3516
+ "只",
3517
+ "</w>"
3518
+ ],
3519
+ [
3520
+ "宙",
3521
+ "</w>"
3522
+ ],
3523
+ [
3524
+ "晚",
3525
+ "</w>"
3526
+ ],
3527
+ [
3528
+ "声",
3529
+ "</w>"
3530
+ ],
3531
+ [
3532
+ "星",
3533
+ "</w>"
3534
+ ],
3535
+ [
3536
+ "歐",
3537
+ "</w>"
3538
+ ],
3539
+ [
3540
+ "歡",
3541
+ "</w>"
3542
+ ],
3543
+ [
3544
+ "神",
3545
+ "</w>"
3546
+ ],
3547
+ [
3548
+ "點",
3549
+ "</w>"
3550
+ ],
3551
+ [
3552
+ "热",
3553
+ "</w>"
3554
+ ],
3555
+ [
3556
+ "收",
3557
+ "</w>"
3558
+ ],
3559
+ [
3560
+ "短",
3561
+ "</w>"
3562
+ ],
3563
+ [
3564
+ "食",
3565
+ "</w>"
3566
+ ],
3567
+ [
3568
+ "欲",
3569
+ "</w>"
3570
+ ],
3571
+ [
3572
+ "钱",
3573
+ "</w>"
3574
+ ],
3575
+ [
3576
+ "圣",
3577
+ "</w>"
3578
+ ],
3579
+ [
3580
+ "夏",
3581
+ "</w>"
3582
+ ],
3583
+ [
3584
+ "总",
3585
+ "</w>"
3586
+ ],
3587
+ [
3588
+ "满",
3589
+ "</w>"
3590
+ ],
3591
+ [
3592
+ "室",
3593
+ "</w>"
3594
+ ],
3595
+ [
3596
+ "河",
3597
+ "</w>"
3598
+ ],
3599
+ [
3600
+ "危",
3601
+ "</w>"
3602
+ ],
3603
+ [
3604
+ "破",
3605
+ "</w>"
3606
+ ],
3607
+ [
3608
+ "惜",
3609
+ "</w>"
3610
+ ],
3611
+ [
3612
+ "蠢",
3613
+ "</w>"
3614
+ ],
3615
+ [
3616
+ "來",
3617
+ "</w>"
3618
+ ],
3619
+ [
3620
+ "過",
3621
+ "</w>"
3622
+ ],
3623
+ [
3624
+ "拥",
3625
+ "</w>"
3626
+ ],
3627
+ [
3628
+ "位",
3629
+ "</w>"
3630
+ ],
3631
+ [
3632
+ "冰",
3633
+ "</w>"
3634
+ ],
3635
+ [
3636
+ "乘",
3637
+ "</w>"
3638
+ ],
3639
+ [
3640
+ "备",
3641
+ "</w>"
3642
+ ],
3643
+ [
3644
+ "杯",
3645
+ "</w>"
3646
+ ],
3647
+ [
3648
+ "床",
3649
+ "</w>"
3650
+ ],
3651
+ [
3652
+ "說",
3653
+ "</w>"
3654
+ ],
3655
+ [
3656
+ "才",
3657
+ "</w>"
3658
+ ],
3659
+ [
3660
+ "支",
3661
+ "</w>"
3662
+ ],
3663
+ [
3664
+ "布",
3665
+ "</w>"
3666
+ ],
3667
+ [
3668
+ "订",
3669
+ "</w>"
3670
+ ],
3671
+ [
3672
+ "慢",
3673
+ "</w>"
3674
+ ],
3675
+ [
3676
+ "半",
3677
+ "</w>"
3678
+ ],
3679
+ [
3680
+ "會",
3681
+ "</w>"
3682
+ ],
3683
+ [
3684
+ "决",
3685
+ "</w>"
3686
+ ],
3687
+ [
3688
+ "某",
3689
+ "</w>"
3690
+ ],
3691
+ [
3692
+ "业",
3693
+ "</w>"
3694
+ ],
3695
+ [
3696
+ "城",
3697
+ "</w>"
3698
+ ],
3699
+ [
3700
+ "市",
3701
+ "</w>"
3702
+ ],
3703
+ [
3704
+ "应",
3705
+ "</w>"
3706
+ ],
3707
+ [
3708
+ "付",
3709
+ "</w>"
3710
+ ],
3711
+ [
3712
+ "2",
3713
+ "0</w>"
3714
+ ],
3715
+ [
3716
+ "隻",
3717
+ "</w>"
3718
+ ],
3719
+ [
3720
+ "严",
3721
+ "</w>"
3722
+ ],
3723
+ [
3724
+ "庙",
3725
+ "</w>"
3726
+ ],
3727
+ [
3728
+ "考",
3729
+ "</w>"
3730
+ ],
3731
+ [
3732
+ "虑",
3733
+ "</w>"
3734
+ ],
3735
+ [
3736
+ "停",
3737
+ "</w>"
3738
+ ],
3739
+ [
3740
+ "码",
3741
+ "</w>"
3742
+ ],
3743
+ [
3744
+ "眼",
3745
+ "</w>"
3746
+ ],
3747
+ [
3748
+ "色",
3749
+ "</w>"
3750
+ ],
3751
+ [
3752
+ "弟",
3753
+ "</w>"
3754
+ ],
3755
+ [
3756
+ "夜",
3757
+ "</w>"
3758
+ ],
3759
+ [
3760
+ "話",
3761
+ "</w>"
3762
+ ],
3763
+ [
3764
+ "缺",
3765
+ "</w>"
3766
+ ],
3767
+ [
3768
+ "验",
3769
+ "</w>"
3770
+ ],
3771
+ [
3772
+ "费",
3773
+ "</w>"
3774
+ ],
3775
+ [
3776
+ "票",
3777
+ "</w>"
3778
+ ],
3779
+ [
3780
+ "格",
3781
+ "</w>"
3782
+ ],
3783
+ [
3784
+ "批",
3785
+ "</w>"
3786
+ ],
3787
+ [
3788
+ "评",
3789
+ "</w>"
3790
+ ],
3791
+ [
3792
+ "达",
3793
+ "</w>"
3794
+ ],
3795
+ [
3796
+ "干",
3797
+ "</w>"
3798
+ ],
3799
+ [
3800
+ "…",
3801
+ "</w>"
3802
+ ],
3803
+ [
3804
+ "架",
3805
+ "</w>"
3806
+ ],
3807
+ [
3808
+ "次",
3809
+ "</w>"
3810
+ ],
3811
+ [
3812
+ "跑",
3813
+ "</w>"
3814
+ ],
3815
+ [
3816
+ "金",
3817
+ "</w>"
3818
+ ],
3819
+ [
3820
+ "屈",
3821
+ "</w>"
3822
+ ],
3823
+ [
3824
+ "止",
3825
+ "</w>"
3826
+ ],
3827
+ [
3828
+ "松",
3829
+ "</w>"
3830
+ ],
3831
+ [
3832
+ "牛",
3833
+ "</w>"
3834
+ ],
3835
+ [
3836
+ "j",
3837
+ "a"
3838
+ ],
3839
+ [
3840
+ "教",
3841
+ "</w>"
3842
+ ],
3843
+ [
3844
+ "言",
3845
+ "</w>"
3846
+ ],
3847
+ [
3848
+ "终",
3849
+ "</w>"
3850
+ ],
3851
+ [
3852
+ "讶",
3853
+ "</w>"
3854
+ ],
3855
+ [
3856
+ "、",
3857
+ "</w>"
3858
+ ],
3859
+ [
3860
+ "奇",
3861
+ "</w>"
3862
+ ],
3863
+ [
3864
+ "白",
3865
+ "</w>"
3866
+ ],
3867
+ [
3868
+ "谢",
3869
+ "</w>"
3870
+ ],
3871
+ [
3872
+ "况",
3873
+ "</w>"
3874
+ ],
3875
+ [
3876
+ "念",
3877
+ "</w>"
3878
+ ],
3879
+ [
3880
+ "裡",
3881
+ "</w>"
3882
+ ],
3883
+ [
3884
+ "\"",
3885
+ "</w>"
3886
+ ],
3887
+ [
3888
+ "参",
3889
+ "</w>"
3890
+ ],
3891
+ [
3892
+ "动",
3893
+ "</w>"
3894
+ ],
3895
+ [
3896
+ "茶",
3897
+ "</w>"
3898
+ ],
3899
+ [
3900
+ "午",
3901
+ "</w>"
3902
+ ],
3903
+ [
3904
+ "疯",
3905
+ "</w>"
3906
+ ],
3907
+ [
3908
+ "囚",
3909
+ "</w>"
3910
+ ],
3911
+ [
3912
+ "笼",
3913
+ "</w>"
3914
+ ],
3915
+ [
3916
+ "叔",
3917
+ "</w>"
3918
+ ],
3919
+ [
3920
+ "幸",
3921
+ "</w>"
3922
+ ],
3923
+ [
3924
+ "!",
3925
+ "</w>"
3926
+ ],
3927
+ [
3928
+ "狗",
3929
+ "</w>"
3930
+ ],
3931
+ [
3932
+ "字",
3933
+ "</w>"
3934
+ ],
3935
+ [
3936
+ "迟",
3937
+ "</w>"
3938
+ ],
3939
+ [
3940
+ "改",
3941
+ "</w>"
3942
+ ],
3943
+ [
3944
+ "宝",
3945
+ "</w>"
3946
+ ],
3947
+ [
3948
+ "随",
3949
+ "</w>"
3950
+ ],
3951
+ [
3952
+ "推",
3953
+ "</w>"
3954
+ ],
3955
+ [
3956
+ "移",
3957
+ "</w>"
3958
+ ],
3959
+ [
3960
+ "规",
3961
+ "</w>"
3962
+ ],
3963
+ [
3964
+ "安",
3965
+ "</w>"
3966
+ ],
3967
+ [
3968
+ "脚",
3969
+ "</w>"
3970
+ ],
3971
+ [
3972
+ "欠",
3973
+ "</w>"
3974
+ ],
3975
+ [
3976
+ "嗨",
3977
+ "</w>"
3978
+ ],
3979
+ [
3980
+ "至",
3981
+ "</w>"
3982
+ ],
3983
+ [
3984
+ "关",
3985
+ "</w>"
3986
+ ],
3987
+ [
3988
+ "偏",
3989
+ "</w>"
3990
+ ],
3991
+ [
3992
+ "胖",
3993
+ "</w>"
3994
+ ],
3995
+ [
3996
+ "铭",
3997
+ "</w>"
3998
+ ],
3999
+ [
4000
+ "咖",
4001
+ "</w>"
4002
+ ],
4003
+ [
4004
+ "啡",
4005
+ "</w>"
4006
+ ],
4007
+ [
4008
+ "揉",
4009
+ "</w>"
4010
+ ],
4011
+ [
4012
+ "碎",
4013
+ "</w>"
4014
+ ],
4015
+ [
4016
+ "代",
4017
+ "</w>"
4018
+ ],
4019
+ [
4020
+ "雹",
4021
+ "</w>"
4022
+ ],
4023
+ [
4024
+ "按",
4025
+ "</w>"
4026
+ ],
4027
+ [
4028
+ "处",
4029
+ "</w>"
4030
+ ],
4031
+ [
4032
+ "罚",
4033
+ "</w>"
4034
+ ],
4035
+ [
4036
+ "送",
4037
+ "</w>"
4038
+ ],
4039
+ [
4040
+ "货",
4041
+ "</w>"
4042
+ ],
4043
+ [
4044
+ "精",
4045
+ "</w>"
4046
+ ],
4047
+ [
4048
+ "插",
4049
+ "</w>"
4050
+ ],
4051
+ [
4052
+ "微",
4053
+ "</w>"
4054
+ ],
4055
+ [
4056
+ "试",
4057
+ "</w>"
4058
+ ],
4059
+ [
4060
+ "5",
4061
+ "</w>"
4062
+ ],
4063
+ [
4064
+ "丑",
4065
+ "</w>"
4066
+ ],
4067
+ [
4068
+ "鬼",
4069
+ "</w>"
4070
+ ],
4071
+ [
4072
+ "拉",
4073
+ "</w>"
4074
+ ],
4075
+ [
4076
+ "腿",
4077
+ "</w>"
4078
+ ],
4079
+ [
4080
+ "阐",
4081
+ "</w>"
4082
+ ],
4083
+ [
4084
+ "撒",
4085
+ "</w>"
4086
+ ],
4087
+ [
4088
+ "谎",
4089
+ "</w>"
4090
+ ],
4091
+ [
4092
+ "覆",
4093
+ "</w>"
4094
+ ],
4095
+ [
4096
+ "盖",
4097
+ "</w>"
4098
+ ],
4099
+ [
4100
+ "流",
4101
+ "</w>"
4102
+ ],
4103
+ [
4104
+ "靠",
4105
+ "</w>"
4106
+ ],
4107
+ [
4108
+ "習",
4109
+ "</w>"
4110
+ ],
4111
+ [
4112
+ "坚",
4113
+ "</w>"
4114
+ ],
4115
+ [
4116
+ "标",
4117
+ "</w>"
4118
+ ],
4119
+ [
4120
+ "数",
4121
+ "</w>"
4122
+ ],
4123
+ [
4124
+ "庞",
4125
+ "</w>"
4126
+ ],
4127
+ [
4128
+ "块",
4129
+ "</w>"
4130
+ ],
4131
+ [
4132
+ "岩",
4133
+ "</w>"
4134
+ ],
4135
+ [
4136
+ "落",
4137
+ "</w>"
4138
+ ],
4139
+ [
4140
+ "徒",
4141
+ "</w>"
4142
+ ],
4143
+ [
4144
+ "劳",
4145
+ "</w>"
4146
+ ],
4147
+ [
4148
+ "努",
4149
+ "</w>"
4150
+ ],
4151
+ [
4152
+ "伟",
4153
+ "</w>"
4154
+ ],
4155
+ [
4156
+ "强",
4157
+ "</w>"
4158
+ ],
4159
+ [
4160
+ "防",
4161
+ "</w>"
4162
+ ],
4163
+ [
4164
+ "措",
4165
+ "</w>"
4166
+ ],
4167
+ [
4168
+ "施",
4169
+ "</w>"
4170
+ ],
4171
+ [
4172
+ "摩",
4173
+ "</w>"
4174
+ ],
4175
+ [
4176
+ "托",
4177
+ "</w>"
4178
+ ],
4179
+ [
4180
+ "遛",
4181
+ "</w>"
4182
+ ],
4183
+ [
4184
+ "圈",
4185
+ "</w>"
4186
+ ],
4187
+ [
4188
+ "证",
4189
+ "</w>"
4190
+ ],
4191
+ [
4192
+ "怀",
4193
+ "</w>"
4194
+ ],
4195
+ [
4196
+ "間",
4197
+ "</w>"
4198
+ ],
4199
+ [
4200
+ "克",
4201
+ "</w>"
4202
+ ],
4203
+ [
4204
+ "升",
4205
+ "</w>"
4206
+ ],
4207
+ [
4208
+ "庆",
4209
+ "</w>"
4210
+ ],
4211
+ [
4212
+ "祝",
4213
+ "</w>"
4214
+ ],
4215
+ [
4216
+ "衣",
4217
+ "</w>"
4218
+ ],
4219
+ [
4220
+ "拜",
4221
+ "</w>"
4222
+ ],
4223
+ [
4224
+ "访",
4225
+ "</w>"
4226
+ ],
4227
+ [
4228
+ "因",
4229
+ "</w>"
4230
+ ],
4231
+ [
4232
+ "冒",
4233
+ "</w>"
4234
+ ],
4235
+ [
4236
+ "沿",
4237
+ "</w>"
4238
+ ],
4239
+ [
4240
+ "红",
4241
+ "</w>"
4242
+ ],
4243
+ [
4244
+ "绿",
4245
+ "</w>"
4246
+ ],
4247
+ [
4248
+ "灯",
4249
+ "</w>"
4250
+ ],
4251
+ [
4252
+ "右",
4253
+ "</w>"
4254
+ ],
4255
+ [
4256
+ "转",
4257
+ "</w>"
4258
+ ],
4259
+ [
4260
+ "跟",
4261
+ "</w>"
4262
+ ],
4263
+ [
4264
+ "千",
4265
+ "</w>"
4266
+ ],
4267
+ [
4268
+ "杀",
4269
+ "</w>"
4270
+ ],
4271
+ [
4272
+ "予",
4273
+ "</w>"
4274
+ ],
4275
+ [
4276
+ "寻",
4277
+ "</w>"
4278
+ ],
4279
+ [
4280
+ "逃",
4281
+ "</w>"
4282
+ ],
4283
+ [
4284
+ "途",
4285
+ "</w>"
4286
+ ],
4287
+ [
4288
+ "径",
4289
+ "</w>"
4290
+ ],
4291
+ [
4292
+ "伦",
4293
+ "</w>"
4294
+ ],
4295
+ [
4296
+ "敦",
4297
+ "</w>"
4298
+ ],
4299
+ [
4300
+ "似",
4301
+ "</w>"
4302
+ ],
4303
+ [
4304
+ "派",
4305
+ "</w>"
4306
+ ],
4307
+ [
4308
+ "头",
4309
+ "</w>"
4310
+ ],
4311
+ [
4312
+ "痛",
4313
+ "</w>"
4314
+ ],
4315
+ [
4316
+ "盐",
4317
+ "</w>"
4318
+ ],
4319
+ [
4320
+ "递",
4321
+ "</w>"
4322
+ ],
4323
+ [
4324
+ "指",
4325
+ "</w>"
4326
+ ],
4327
+ [
4328
+ "九",
4329
+ "</w>"
4330
+ ],
4331
+ [
4332
+ "低",
4333
+ "</w>"
4334
+ ],
4335
+ [
4336
+ "挥",
4337
+ "</w>"
4338
+ ],
4339
+ [
4340
+ "段",
4341
+ "</w>"
4342
+ ],
4343
+ [
4344
+ "y",
4345
+ "</w>"
4346
+ ],
4347
+ [
4348
+ "c",
4349
+ "y</w>"
4350
+ ],
4351
+ [
4352
+ "n",
4353
+ "cy</w>"
4354
+ ],
4355
+ [
4356
+ "a",
4357
+ "ncy</w>"
4358
+ ],
4359
+ [
4360
+ "n",
4361
+ "ancy</w>"
4362
+ ],
4363
+ [
4364
+ "私",
4365
+ "</w>"
4366
+ ],
4367
+ [
4368
+ "谈",
4369
+ "</w>"
4370
+ ],
4371
+ [
4372
+ "又",
4373
+ "</w>"
4374
+ ],
4375
+ [
4376
+ "绅",
4377
+ "</w>"
4378
+ ],
4379
+ [
4380
+ "味",
4381
+ "</w>"
4382
+ ],
4383
+ [
4384
+ "哥",
4385
+ "</w>"
4386
+ ],
4387
+ [
4388
+ "华",
4389
+ "</w>"
4390
+ ],
4391
+ [
4392
+ "m",
4393
+ "</w>"
4394
+ ],
4395
+ [
4396
+ "o",
4397
+ "m</w>"
4398
+ ],
4399
+ [
4400
+ "t",
4401
+ "om</w>"
4402
+ ],
4403
+ [
4404
+ "躲",
4405
+ "</w>"
4406
+ ],
4407
+ [
4408
+ "桌",
4409
+ "</w>"
4410
+ ],
4411
+ [
4412
+ "表",
4413
+ "</w>"
4414
+ ],
4415
+ [
4416
+ "澡",
4417
+ "</w>"
4418
+ ],
4419
+ [
4420
+ "筑",
4421
+ "</w>"
4422
+ ],
4423
+ [
4424
+ "震",
4425
+ "</w>"
4426
+ ],
4427
+ [
4428
+ "摇",
4429
+ "</w>"
4430
+ ],
4431
+ [
4432
+ "晃",
4433
+ "</w>"
4434
+ ],
4435
+ [
4436
+ "戴",
4437
+ "</w>"
4438
+ ],
4439
+ [
4440
+ "麻",
4441
+ "</w>"
4442
+ ],
4443
+ [
4444
+ "烦",
4445
+ "</w>"
4446
+ ],
4447
+ [
4448
+ "邻",
4449
+ "</w>"
4450
+ ],
4451
+ [
4452
+ "村",
4453
+ "</w>"
4454
+ ],
4455
+ [
4456
+ "象",
4457
+ "</w>"
4458
+ ],
4459
+ [
4460
+ "賺",
4461
+ "</w>"
4462
+ ],
4463
+ [
4464
+ "百",
4465
+ "</w>"
4466
+ ],
4467
+ [
4468
+ "較",
4469
+ "</w>"
4470
+ ],
4471
+ [
4472
+ "仅",
4473
+ "</w>"
4474
+ ],
4475
+ [
4476
+ "席",
4477
+ "</w>"
4478
+ ],
4479
+ [
4480
+ "血",
4481
+ "</w>"
4482
+ ],
4483
+ [
4484
+ "沸",
4485
+ "</w>"
4486
+ ],
4487
+ [
4488
+ "帖",
4489
+ "</w>"
4490
+ ],
4491
+ [
4492
+ "2",
4493
+ "</w>"
4494
+ ],
4495
+ [
4496
+ "休",
4497
+ "</w>"
4498
+ ],
4499
+ [
4500
+ "假",
4501
+ "</w>"
4502
+ ],
4503
+ [
4504
+ "阳",
4505
+ "</w>"
4506
+ ],
4507
+ [
4508
+ "选",
4509
+ "</w>"
4510
+ ],
4511
+ [
4512
+ "择",
4513
+ "</w>"
4514
+ ],
4515
+ [
4516
+ "或",
4517
+ "</w>"
4518
+ ],
4519
+ [
4520
+ "项",
4521
+ "</w>"
4522
+ ],
4523
+ [
4524
+ "艰",
4525
+ "</w>"
4526
+ ],
4527
+ [
4528
+ "却",
4529
+ "</w>"
4530
+ ],
4531
+ [
4532
+ "鲜",
4533
+ "</w>"
4534
+ ],
4535
+ [
4536
+ "龙",
4537
+ "</w>"
4538
+ ],
4539
+ [
4540
+ "虾",
4541
+ "</w>"
4542
+ ],
4543
+ [
4544
+ "著",
4545
+ "</w>"
4546
+ ],
4547
+ [
4548
+ "進",
4549
+ "</w>"
4550
+ ],
4551
+ [
4552
+ "計",
4553
+ "</w>"
4554
+ ],
4555
+ [
4556
+ "劃",
4557
+ "</w>"
4558
+ ],
4559
+ [
4560
+ "總",
4561
+ "</w>"
4562
+ ],
4563
+ [
4564
+ "發",
4565
+ "</w>"
4566
+ ],
4567
+ [
4568
+ "够",
4569
+ "</w>"
4570
+ ],
4571
+ [
4572
+ "威",
4573
+ "</w>"
4574
+ ],
4575
+ [
4576
+ "尼",
4577
+ "</w>"
4578
+ ],
4579
+ [
4580
+ "季",
4581
+ "</w>"
4582
+ ],
4583
+ [
4584
+ "挤",
4585
+ "</w>"
4586
+ ],
4587
+ [
4588
+ "诗",
4589
+ "</w>"
4590
+ ],
4591
+ [
4592
+ "兼",
4593
+ "</w>"
4594
+ ],
4595
+ [
4596
+ "者",
4597
+ "</w>"
4598
+ ],
4599
+ [
4600
+ "泳",
4601
+ "</w>"
4602
+ ],
4603
+ [
4604
+ "持",
4605
+ "</w>"
4606
+ ],
4607
+ [
4608
+ "传",
4609
+ "</w>"
4610
+ ],
4611
+ [
4612
+ "统",
4613
+ "</w>"
4614
+ ],
4615
+ [
4616
+ "设",
4617
+ "</w>"
4618
+ ],
4619
+ [
4620
+ "僵",
4621
+ "</w>"
4622
+ ],
4623
+ [
4624
+ "局",
4625
+ "</w>"
4626
+ ],
4627
+ [
4628
+ "從",
4629
+ "</w>"
4630
+ ],
4631
+ [
4632
+ "c",
4633
+ "e</w>"
4634
+ ],
4635
+ [
4636
+ "l",
4637
+ "i"
4638
+ ],
4639
+ [
4640
+ "a",
4641
+ "li"
4642
+ ],
4643
+ [
4644
+ "ali",
4645
+ "ce</w>"
4646
+ ],
4647
+ [
4648
+ "演",
4649
+ "</w>"
4650
+ ],
4651
+ [
4652
+ "唱",
4653
+ "</w>"
4654
+ ],
4655
+ [
4656
+ "骗",
4657
+ "</w>"
4658
+ ],
4659
+ [
4660
+ "争",
4661
+ "</w>"
4662
+ ],
4663
+ [
4664
+ "辩",
4665
+ "</w>"
4666
+ ],
4667
+ [
4668
+ "适",
4669
+ "</w>"
4670
+ ],
4671
+ [
4672
+ "职",
4673
+ "</w>"
4674
+ ],
4675
+ [
4676
+ "溜",
4677
+ "</w>"
4678
+ ],
4679
+ [
4680
+ "7",
4681
+ "</w>"
4682
+ ],
4683
+ [
4684
+ "铁",
4685
+ "</w>"
4686
+ ],
4687
+ [
4688
+ "摄",
4689
+ "</w>"
4690
+ ],
4691
+ [
4692
+ "糟",
4693
+ "</w>"
4694
+ ],
4695
+ [
4696
+ "糕",
4697
+ "</w>"
4698
+ ],
4699
+ [
4700
+ "透",
4701
+ "</w>"
4702
+ ],
4703
+ [
4704
+ "t",
4705
+ "e</w>"
4706
+ ],
4707
+ [
4708
+ "k",
4709
+ "a"
4710
+ ],
4711
+ [
4712
+ "ka",
4713
+ "te</w>"
4714
+ ],
4715
+ [
4716
+ ",",
4717
+ "</w>"
4718
+ ],
4719
+ [
4720
+ "急",
4721
+ "</w>"
4722
+ ],
4723
+ [
4724
+ "救",
4725
+ "</w>"
4726
+ ],
4727
+ [
4728
+ "池",
4729
+ "</w>"
4730
+ ],
4731
+ [
4732
+ "鱼",
4733
+ "</w>"
4734
+ ],
4735
+ [
4736
+ "挑",
4737
+ "</w>"
4738
+ ],
4739
+ [
4740
+ "病",
4741
+ "</w>"
4742
+ ],
4743
+ [
4744
+ "笔",
4745
+ "</w>"
4746
+ ],
4747
+ [
4748
+ "曾",
4749
+ "</w>"
4750
+ ],
4751
+ [
4752
+ "經",
4753
+ "</w>"
4754
+ ],
4755
+ [
4756
+ "空",
4757
+ "</w>"
4758
+ ],
4759
+ [
4760
+ "整",
4761
+ "</w>"
4762
+ ],
4763
+ [
4764
+ "愉",
4765
+ "</w>"
4766
+ ],
4767
+ [
4768
+ "杰",
4769
+ "</w>"
4770
+ ],
4771
+ [
4772
+ "姐",
4773
+ "</w>"
4774
+ ],
4775
+ [
4776
+ "��",
4777
+ "</w>"
4778
+ ],
4779
+ [
4780
+ "婚",
4781
+ "</w>"
4782
+ ],
4783
+ [
4784
+ "汽",
4785
+ "</w>"
4786
+ ],
4787
+ [
4788
+ "笛",
4789
+ "</w>"
4790
+ ],
4791
+ [
4792
+ "驶",
4793
+ "</w>"
4794
+ ],
4795
+ [
4796
+ "港",
4797
+ "</w>"
4798
+ ],
4799
+ [
4800
+ "包",
4801
+ "</w>"
4802
+ ],
4803
+ [
4804
+ "眠",
4805
+ "</w>"
4806
+ ],
4807
+ [
4808
+ "命",
4809
+ "</w>"
4810
+ ],
4811
+ [
4812
+ "困",
4813
+ "</w>"
4814
+ ],
4815
+ [
4816
+ "蝴",
4817
+ "</w>"
4818
+ ],
4819
+ [
4820
+ "蝶",
4821
+ "</w>"
4822
+ ],
4823
+ [
4824
+ "滑",
4825
+ "</w>"
4826
+ ],
4827
+ [
4828
+ "诚",
4829
+ "</w>"
4830
+ ],
4831
+ [
4832
+ "德",
4833
+ "</w>"
4834
+ ],
4835
+ [
4836
+ "仪",
4837
+ "</w>"
4838
+ ],
4839
+ [
4840
+ "庄",
4841
+ "</w>"
4842
+ ],
4843
+ [
4844
+ "举",
4845
+ "</w>"
4846
+ ],
4847
+ [
4848
+ "内",
4849
+ "</w>"
4850
+ ],
4851
+ [
4852
+ "反",
4853
+ "</w>"
4854
+ ],
4855
+ [
4856
+ "论",
4857
+ "</w>"
4858
+ ],
4859
+ [
4860
+ "擔",
4861
+ "</w>"
4862
+ ],
4863
+ [
4864
+ "揭",
4865
+ "</w>"
4866
+ ],
4867
+ [
4868
+ "露",
4869
+ "</w>"
4870
+ ],
4871
+ [
4872
+ "平",
4873
+ "</w>"
4874
+ ],
4875
+ [
4876
+ "涌",
4877
+ "</w>"
4878
+ ],
4879
+ [
4880
+ "泪",
4881
+ "</w>"
4882
+ ],
4883
+ [
4884
+ "景",
4885
+ "</w>"
4886
+ ],
4887
+ [
4888
+ "誓",
4889
+ "</w>"
4890
+ ],
4891
+ [
4892
+ "赢",
4893
+ "</w>"
4894
+ ],
4895
+ [
4896
+ "彻",
4897
+ "</w>"
4898
+ ],
4899
+ [
4900
+ "进",
4901
+ "</w>"
4902
+ ],
4903
+ [
4904
+ "铃",
4905
+ "</w>"
4906
+ ],
4907
+ [
4908
+ "亲",
4909
+ "</w>"
4910
+ ],
4911
+ [
4912
+ "独",
4913
+ "</w>"
4914
+ ],
4915
+ [
4916
+ "赶",
4917
+ "</w>"
4918
+ ],
4919
+ [
4920
+ "份",
4921
+ "</w>"
4922
+ ],
4923
+ [
4924
+ "瘋",
4925
+ "</w>"
4926
+ ],
4927
+ [
4928
+ "永",
4929
+ "</w>"
4930
+ ],
4931
+ [
4932
+ "遠",
4933
+ "</w>"
4934
+ ],
4935
+ [
4936
+ "踢",
4937
+ "</w>"
4938
+ ],
4939
+ [
4940
+ "長",
4941
+ "</w>"
4942
+ ],
4943
+ [
4944
+ "國",
4945
+ "</w>"
4946
+ ],
4947
+ [
4948
+ "王",
4949
+ "</w>"
4950
+ ],
4951
+ [
4952
+ "1",
4953
+ "</w>"
4954
+ ],
4955
+ [
4956
+ "2",
4957
+ "1</w>"
4958
+ ],
4959
+ [
4960
+ "惡",
4961
+ "</w>"
4962
+ ],
4963
+ [
4964
+ "兔",
4965
+ "</w>"
4966
+ ],
4967
+ [
4968
+ "免",
4969
+ "</w>"
4970
+ ],
4971
+ [
4972
+ "辜",
4973
+ "</w>"
4974
+ ],
4975
+ [
4976
+ "负",
4977
+ "</w>"
4978
+ ],
4979
+ [
4980
+ "饿",
4981
+ "</w>"
4982
+ ],
4983
+ [
4984
+ "請",
4985
+ "</w>"
4986
+ ],
4987
+ [
4988
+ "寄",
4989
+ "</w>"
4990
+ ],
4991
+ [
4992
+ "給",
4993
+ "</w>"
4994
+ ],
4995
+ [
4996
+ "張",
4997
+ "</w>"
4998
+ ],
4999
+ [
5000
+ "远",
5001
+ "</w>"
5002
+ ],
5003
+ [
5004
+ "银",
5005
+ "</w>"
5006
+ ],
5007
+ [
5008
+ "风",
5009
+ "</w>"
5010
+ ],
5011
+ [
5012
+ "户",
5013
+ "</w>"
5014
+ ],
5015
+ [
5016
+ "较",
5017
+ "</w>"
5018
+ ],
5019
+ [
5020
+ "贷",
5021
+ "</w>"
5022
+ ],
5023
+ [
5024
+ "利",
5025
+ "</w>"
5026
+ ],
5027
+ [
5028
+ "课",
5029
+ "</w>"
5030
+ ],
5031
+ [
5032
+ "济",
5033
+ "</w>"
5034
+ ],
5035
+ [
5036
+ "蜂",
5037
+ "</w>"
5038
+ ],
5039
+ [
5040
+ "即",
5041
+ "</w>"
5042
+ ],
5043
+ [
5044
+ "餐",
5045
+ "</w>"
5046
+ ],
5047
+ [
5048
+ "体",
5049
+ "</w>"
5050
+ ],
5051
+ [
5052
+ "销",
5053
+ "</w>"
5054
+ ],
5055
+ [
5056
+ "售",
5057
+ "</w>"
5058
+ ],
5059
+ [
5060
+ "宵",
5061
+ "</w>"
5062
+ ],
5063
+ [
5064
+ "旦",
5065
+ "</w>"
5066
+ ],
5067
+ [
5068
+ "花",
5069
+ "</w>"
5070
+ ],
5071
+ [
5072
+ "k",
5073
+ "e"
5074
+ ],
5075
+ [
5076
+ "n",
5077
+ "</w>"
5078
+ ],
5079
+ [
5080
+ "ke",
5081
+ "n</w>"
5082
+ ],
5083
+ [
5084
+ "七",
5085
+ "</w>"
5086
+ ],
5087
+ [
5088
+ "拆",
5089
+ "</w>"
5090
+ ],
5091
+ [
5092
+ "桥",
5093
+ "</w>"
5094
+ ],
5095
+ [
5096
+ "朋",
5097
+ "</w>"
5098
+ ],
5099
+ [
5100
+ "友",
5101
+ "</w>"
5102
+ ],
5103
+ [
5104
+ "讀",
5105
+ "</w>"
5106
+ ],
5107
+ [
5108
+ "﹐",
5109
+ "</w>"
5110
+ ],
5111
+ [
5112
+ "六",
5113
+ "</w>"
5114
+ ],
5115
+ [
5116
+ "弃",
5117
+ "</w>"
5118
+ ],
5119
+ [
5120
+ "盹",
5121
+ "</w>"
5122
+ ],
5123
+ [
5124
+ "飞",
5125
+ "</w>"
5126
+ ],
5127
+ [
5128
+ "机",
5129
+ "</w>"
5130
+ ],
5131
+ [
5132
+ "携",
5133
+ "</w>"
5134
+ ],
5135
+ [
5136
+ "带",
5137
+ "</w>"
5138
+ ],
5139
+ [
5140
+ "4",
5141
+ "0</w>"
5142
+ ],
5143
+ [
5144
+ "护",
5145
+ "</w>"
5146
+ ],
5147
+ [
5148
+ "扰",
5149
+ "</w>"
5150
+ ],
5151
+ [
5152
+ "唯",
5153
+ "</w>"
5154
+ ],
5155
+ [
5156
+ "卫",
5157
+ "</w>"
5158
+ ],
5159
+ [
5160
+ "3",
5161
+ "</w>"
5162
+ ],
5163
+ [
5164
+ "纯",
5165
+ "</w>"
5166
+ ],
5167
+ [
5168
+ "属",
5169
+ "</w>"
5170
+ ],
5171
+ [
5172
+ "偶",
5173
+ "</w>"
5174
+ ],
5175
+ [
5176
+ "津",
5177
+ "</w>"
5178
+ ],
5179
+ [
5180
+ "音",
5181
+ "</w>"
5182
+ ],
5183
+ [
5184
+ "值",
5185
+ "</w>"
5186
+ ],
5187
+ [
5188
+ "睛",
5189
+ "</w>"
5190
+ ],
5191
+ [
5192
+ "k",
5193
+ "e</w>"
5194
+ ],
5195
+ [
5196
+ "ja",
5197
+ "ke</w>"
5198
+ ],
5199
+ [
5200
+ "扇",
5201
+ "</w>"
5202
+ ],
5203
+ [
5204
+ "窗",
5205
+ "</w>"
5206
+ ],
5207
+ [
5208
+ "叫",
5209
+ "</w>"
5210
+ ],
5211
+ [
5212
+ "ja",
5213
+ "c"
5214
+ ],
5215
+ [
5216
+ "k",
5217
+ "</w>"
5218
+ ],
5219
+ [
5220
+ "jac",
5221
+ "k</w>"
5222
+ ],
5223
+ [
5224
+ "幹",
5225
+ "</w>"
5226
+ ],
5227
+ [
5228
+ "鲍",
5229
+ "</w>"
5230
+ ],
5231
+ [
5232
+ "勃",
5233
+ "</w>"
5234
+ ],
5235
+ [
5236
+ "丰",
5237
+ "</w>"
5238
+ ],
5239
+ [
5240
+ "富",
5241
+ "</w>"
5242
+ ],
5243
+ [
5244
+ "答",
5245
+ "</w>"
5246
+ ],
5247
+ [
5248
+ "复",
5249
+ "</w>"
5250
+ ],
5251
+ [
5252
+ "悔",
5253
+ "</w>"
5254
+ ],
5255
+ [
5256
+ "概",
5257
+ "</w>"
5258
+ ],
5259
+ [
5260
+ "澄",
5261
+ "</w>"
5262
+ ],
5263
+ [
5264
+ "清",
5265
+ "</w>"
5266
+ ],
5267
+ [
5268
+ "价",
5269
+ "</w>"
5270
+ ],
5271
+ [
5272
+ "涨",
5273
+ "</w>"
5274
+ ],
5275
+ [
5276
+ "守",
5277
+ "</w>"
5278
+ ],
5279
+ [
5280
+ "诺",
5281
+ "</w>"
5282
+ ],
5283
+ [
5284
+ "顾",
5285
+ "</w>"
5286
+ ],
5287
+ [
5288
+ "迷",
5289
+ "</w>"
5290
+ ],
5291
+ [
5292
+ "社",
5293
+ "</w>"
5294
+ ],
5295
+ [
5296
+ "团",
5297
+ "</w>"
5298
+ ],
5299
+ [
5300
+ "抓",
5301
+ "</w>"
5302
+ ],
5303
+ [
5304
+ "鼠",
5305
+ "</w>"
5306
+ ],
5307
+ [
5308
+ "纪",
5309
+ "</w>"
5310
+ ],
5311
+ [
5312
+ "品",
5313
+ "</w>"
5314
+ ],
5315
+ [
5316
+ "阅",
5317
+ "</w>"
5318
+ ],
5319
+ [
5320
+ "饭",
5321
+ "</w>"
5322
+ ],
5323
+ [
5324
+ "购",
5325
+ "</w>"
5326
+ ],
5327
+ [
5328
+ "镜",
5329
+ "</w>"
5330
+ ],
5331
+ [
5332
+ "迅",
5333
+ "</w>"
5334
+ ],
5335
+ [
5336
+ "速",
5337
+ "</w>"
5338
+ ],
5339
+ [
5340
+ "窜",
5341
+ "</w>"
5342
+ ],
5343
+ [
5344
+ "入",
5345
+ "</w>"
5346
+ ],
5347
+ [
5348
+ "群",
5349
+ "</w>"
5350
+ ],
5351
+ [
5352
+ "耗",
5353
+ "</w>"
5354
+ ],
5355
+ [
5356
+ "气",
5357
+ "</w>"
5358
+ ],
5359
+ [
5360
+ "化",
5361
+ "</w>"
5362
+ ],
5363
+ [
5364
+ "附",
5365
+ "</w>"
5366
+ ],
5367
+ [
5368
+ "近",
5369
+ "</w>"
5370
+ ],
5371
+ [
5372
+ "张",
5373
+ "</w>"
5374
+ ],
5375
+ [
5376
+ "片",
5377
+ "</w>"
5378
+ ],
5379
+ [
5380
+ "童",
5381
+ "</w>"
5382
+ ],
5383
+ [
5384
+ "福",
5385
+ "</w>"
5386
+ ],
5387
+ [
5388
+ "药",
5389
+ "</w>"
5390
+ ],
5391
+ [
5392
+ "创",
5393
+ "</w>"
5394
+ ],
5395
+ [
5396
+ "迹",
5397
+ "</w>"
5398
+ ],
5399
+ [
5400
+ "厕",
5401
+ "</w>"
5402
+ ],
5403
+ [
5404
+ "冲",
5405
+ "</w>"
5406
+ ],
5407
+ [
5408
+ "轨",
5409
+ "</w>"
5410
+ ],
5411
+ [
5412
+ "1",
5413
+ "8"
5414
+ ],
5415
+ [
5416
+ "18",
5417
+ "</w>"
5418
+ ],
5419
+ [
5420
+ "环",
5421
+ "</w>"
5422
+ ],
5423
+ [
5424
+ "素",
5425
+ "</w>"
5426
+ ],
5427
+ [
5428
+ "5",
5429
+ "6</w>"
5430
+ ],
5431
+ [
5432
+ "粗",
5433
+ "</w>"
5434
+ ],
5435
+ [
5436
+ "趕",
5437
+ "</w>"
5438
+ ],
5439
+ [
5440
+ "久",
5441
+ "</w>"
5442
+ ],
5443
+ [
5444
+ "妻",
5445
+ "</w>"
5446
+ ],
5447
+ [
5448
+ "互",
5449
+ "</w>"
5450
+ ],
5451
+ [
5452
+ "助",
5453
+ "</w>"
5454
+ ],
5455
+ [
5456
+ "训",
5457
+ "</w>"
5458
+ ],
5459
+ [
5460
+ "脑",
5461
+ "</w>"
5462
+ ],
5463
+ [
5464
+ "戏",
5465
+ "</w>"
5466
+ ],
5467
+ [
5468
+ "散",
5469
+ "</w>"
5470
+ ],
5471
+ [
5472
+ "步",
5473
+ "</w>"
5474
+ ],
5475
+ [
5476
+ "油",
5477
+ "</w>"
5478
+ ],
5479
+ [
5480
+ "置",
5481
+ "</w>"
5482
+ ],
5483
+ [
5484
+ "债",
5485
+ "</w>"
5486
+ ],
5487
+ [
5488
+ "冷",
5489
+ "</w>"
5490
+ ],
5491
+ [
5492
+ "湖",
5493
+ "</w>"
5494
+ ],
5495
+ [
5496
+ "结",
5497
+ "</w>"
5498
+ ],
5499
+ [
5500
+ "首",
5501
+ "</w>"
5502
+ ],
5503
+ [
5504
+ "歌",
5505
+ "</w>"
5506
+ ],
5507
+ [
5508
+ "1",
5509
+ "0"
5510
+ ],
5511
+ [
5512
+ "10",
5513
+ "0</w>"
5514
+ ],
5515
+ [
5516
+ "万",
5517
+ "</w>"
5518
+ ],
5519
+ [
5520
+ "辆",
5521
+ "</w>"
5522
+ ],
5523
+ [
5524
+ "呢",
5525
+ "</w>"
5526
+ ],
5527
+ [
5528
+ "變",
5529
+ "</w>"
5530
+ ],
5531
+ [
5532
+ "卖",
5533
+ "</w>"
5534
+ ],
5535
+ [
5536
+ "栋",
5537
+ "</w>"
5538
+ ],
5539
+ [
5540
+ "灰",
5541
+ "</w>"
5542
+ ],
5543
+ [
5544
+ "楼",
5545
+ "</w>"
5546
+ ],
5547
+ [
5548
+ "毕",
5549
+ "</w>"
5550
+ ],
5551
+ [
5552
+ "索",
5553
+ "</w>"
5554
+ ],
5555
+ [
5556
+ "抱",
5557
+ "</w>"
5558
+ ],
5559
+ [
5560
+ "歉",
5561
+ "</w>"
5562
+ ],
5563
+ [
5564
+ "盛",
5565
+ "</w>"
5566
+ ],
5567
+ [
5568
+ "邀",
5569
+ "</w>"
5570
+ ],
5571
+ [
5572
+ "延",
5573
+ "</w>"
5574
+ ],
5575
+ [
5576
+ "误",
5577
+ "</w>"
5578
+ ],
5579
+ [
5580
+ "苏",
5581
+ "</w>"
5582
+ ],
5583
+ [
5584
+ "兰",
5585
+ "</w>"
5586
+ ],
5587
+ [
5588
+ "古",
5589
+ "</w>"
5590
+ ],
5591
+ [
5592
+ "堡",
5593
+ "</w>"
5594
+ ],
5595
+ [
5596
+ "谁",
5597
+ "</w>"
5598
+ ],
5599
+ [
5600
+ "纸",
5601
+ "</w>"
5602
+ ],
5603
+ [
5604
+ "杂",
5605
+ "</w>"
5606
+ ],
5607
+ [
5608
+ "志",
5609
+ "</w>"
5610
+ ],
5611
+ [
5612
+ "闻",
5613
+ "</w>"
5614
+ ],
5615
+ [
5616
+ "播",
5617
+ "</w>"
5618
+ ],
5619
+ [
5620
+ "奶",
5621
+ "</w>"
5622
+ ]
5623
+ ],
5624
+ "special_tokens": [
5625
+ "<pad>",
5626
+ "<sos>",
5627
+ "<eos>",
5628
+ "<unk>",
5629
+ "<mask>"
5630
+ ]
5631
+ }
PLAN.md ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Diffutslator 实现计划
2
+
3
+ 基于扩散模型的中英互译系统
4
+
5
+ ## 一、架构概述
6
+
7
+ ```
8
+ ┌─────────────────────────────────────────────────────────────────┐
9
+ │ 噪声空间 (共享) │
10
+ │ [L×D] │
11
+ │ ┌─────────────────────────┐ │
12
+ │ │ │ │
13
+ │ 中文扩散 ↗ 语言切换判断器 ↖ 英文扩散 │
14
+ │ (加噪) [分类器] (加噪) │
15
+ │ │ │ │
16
+ │ └─────────────────────────┘ │
17
+ │ ↓ ↓ │
18
+ │ 中文逆扩散 英文逆扩散 │
19
+ │ (去噪) (去噪) │
20
+ │ ↓ ↓ │
21
+ │ ┌────────────┐ ┌────────────┐ │
22
+ │ │ 中文解码器 │ │ 英文解码器 │ │
23
+ │ └────────────┘ └────────────┘ │
24
+ │ ↓ ↓ │
25
+ │ 中文输出 英文输出 │
26
+ └─────────────────────────────────────────────────────────────────┘
27
+ ```
28
+
29
+ ### 核心设计决策
30
+
31
+ | 问题 | 决策 | 理由 |
32
+ |------|------|------|
33
+ | 扩散空间 | 词嵌入连续空间 | 实现成熟、CPU友好、训练稳定 |
34
+ | 长度处理 | 变长序列 + 长度嵌入 | 扩散可变长,逆扩散收敛到目标长度 |
35
+ | 双向切换 | 可学习分类器 | 让模型自己判断何时切换 |
36
+
37
+ ## 二、模块设计
38
+
39
+ ### 2.1 分词器 (tokenizer.py)
40
+
41
+ **中文分词**:字符级 + BPE
42
+ - 字符级处理中文字符
43
+ - BPE处理罕见词和数字
44
+
45
+ **英文分词**:BPE
46
+ - 使用相同的BPE算法
47
+ - 与中文共享词表大小设置
48
+
49
+ **词表**:
50
+ - 中文词表:8000 tokens
51
+ - 英文词表:8000 tokens
52
+ - 特殊token:`<pad>`, `<sos>`, `<eos>`, `<mask>`, `<unk>`
53
+
54
+ ### 2.2 嵌入层 (embedding.py)
55
+
56
+ ```python
57
+ class LanguageEmbedding:
58
+ """语言特定的嵌入层"""
59
+ - token_embedding: [vocab_size, d_model]
60
+ - position_embedding: [max_len, d_model]
61
+ - length_embedding: [max_len, d_model] # 长度编码
62
+ ```
63
+
64
+ **参数**:
65
+ - `d_model = 256`(CPU环境下适中)
66
+ - `max_len = 128`(最大序列长度)
67
+
68
+ ### 2.3 扩散核心 (diffusion.py)
69
+
70
+ **前向扩散(加噪)**:
71
+ ```python
72
+ def forward_diffusion(x_0, t):
73
+ """
74
+ x_0: 初始嵌入 [batch, len, d_model]
75
+ t: 时间步 [batch]
76
+ 返回: x_t, noise
77
+ """
78
+ # 线性噪声调度
79
+ alpha_t = 1 - t / T # 简化调度
80
+ noise = randn_like(x_0)
81
+ x_t = sqrt(alpha_t) * x_0 + sqrt(1 - alpha_t) * noise
82
+ return x_t, noise
83
+ ```
84
+
85
+ **反向扩散(去噪)**:
86
+ ```python
87
+ def reverse_diffusion(x_t, t, model):
88
+ """
89
+ x_t: 当前噪声状态
90
+ t: 当前时间步
91
+ model: 噪声预测网络
92
+ """
93
+ predicted_noise = model(x_t, t)
94
+ x_t_minus_1 = denoise_step(x_t, predicted_noise, t)
95
+ return x_t_minus_1
96
+ ```
97
+
98
+ **时间调度**:
99
+ - 训练时:T = 1000 步
100
+ - 推理时:DDIM加速,可降到 10-50 步
101
+
102
+ ### 2.4 噪声预测网络 (model.py)
103
+
104
+ ```python
105
+ class DiffusionTransformer:
106
+ """预测噪声的Transformer"""
107
+ - 输入: x_t [batch, len, d_model], t [batch]
108
+ - 输出: predicted_noise [batch, len, d_model]
109
+
110
+ 结构:
111
+ - 语言特定的输入投影
112
+ - 时间步嵌入 (sinusoidal)
113
+ - N层 Transformer blocks
114
+ - 语言特定的输出投影
115
+ ```
116
+
117
+ **参数**(CPU优化):
118
+ - `n_layers = 4`
119
+ - `n_heads = 4`
120
+ - `d_ff = 512`
121
+ - 总参数量:约 2M
122
+
123
+ ### 2.5 语言切换器 (switcher.py)
124
+
125
+ ```python
126
+ class LanguageSwitcher:
127
+ """判断当前噪声更接近哪种语言"""
128
+ - 输入: x_t [batch, len, d_model]
129
+ - 输出: 语言概率 [batch, 2] # [中文, 英文]
130
+
131
+ 结构:
132
+ - 全局平均池化
133
+ - 2层MLP
134
+ - Softmax输出
135
+ ```
136
+
137
+ ### 2.6 训练流程 (train.py)
138
+
139
+ ```
140
+ 训练步骤:
141
+ 1. 加载中英平行句对 (zh, en)
142
+ 2. 分别嵌入到连续空间
143
+ 3. 随机采样时间步 t
144
+ 4. 对中文嵌入做前向扩散到 t 步 → zh_t
145
+ 5. 对英文嵌入做前向扩散到 t 步 → en_t
146
+ 6. 训练噪声预��网络预测噪声
147
+ 7. 训练切换器判断语言
148
+ 8. 反向传播更新参数
149
+ ```
150
+
151
+ **损失函数**:
152
+ ```python
153
+ L_total = L_noise_zh + L_noise_en + λ * L_switcher
154
+
155
+ L_noise: 噪声预测MSE损失
156
+ L_switcher: 语言分类交叉熵损失
157
+ ```
158
+
159
+ ### 2.7 推理流程 (inference.py)
160
+
161
+ **中文→英文翻译**:
162
+ ```
163
+ 1. 中文输入 → 中文嵌入
164
+ 2. 完整前向扩散到纯噪声 (T步)
165
+ 3. 迭代反向扩散:
166
+ for t in [T, T-1, ..., 1]:
167
+ - 切换器判断当前语言
168
+ - 若判断为中文→用中文去噪
169
+ - 若判断为英文→切换到英文去噪
170
+ - 输出当前步骤状态(可视化)
171
+ 4. 最终噪声状态 → 英文解码 → 英文输出
172
+ ```
173
+
174
+ **英文→中文翻译**:对称过程
175
+
176
+ ## 三、文件结构
177
+
178
+ ```
179
+ diffutslator/
180
+ ├── TASK.md # 任务描述
181
+ ├── PLAN.md # 本文件
182
+ ├── config.py # 超参数配置
183
+ ├── tokenizer.py # 分词器
184
+ ├── embedding.py # 嵌入层
185
+ ├── model.py # 扩散模型
186
+ ├── diffusion.py # 扩散过程
187
+ ├── switcher.py # 语言切换器
188
+ ├── dataset.py # 数据集加载
189
+ ├── train.py # 训练脚本
190
+ ├── inference.py # 推理脚本
191
+ ├── main.py # 主入口
192
+ ├── utils.py # 工具函数
193
+ └── checkpoints/ # 模型检查点
194
+ ```
195
+
196
+ ## 四、实现步骤
197
+
198
+ ### Phase 1: 基础框架(确保可训练)
199
+
200
+ 1. **配置文件** - 定义所有超参数
201
+ 2. **分词器** - 实现中英文分词
202
+ 3. **数据集** - 加载tatoeba数据
203
+ 4. **嵌入层** - 简单的token嵌入
204
+ 5. **扩散核心** - 前向和反向扩散
205
+ 6. **简单模型** - 基础噪声预测网络
206
+ 7. **训练脚本** - 带进度条的训练循环
207
+
208
+ **验证目标**:能在少量数据上跑通训练,loss下降
209
+
210
+ ### Phase 2: 完整架构
211
+
212
+ 1. **语言切换器** - 实现切换判断
213
+ 2. **变长处理** - 实现长度嵌入
214
+ 3. **完整模型** - 整合所有模块
215
+ 4. **推理脚本** - 可视化扩散过程
216
+
217
+ **验证目标**:完整训练流程,能输出翻译结果
218
+
219
+ ### Phase 3: 优化加速
220
+
221
+ 1. **DDIM采样** - 减少推理步数
222
+ 2. **训练加速** - 混合精度、梯度累积
223
+ 3. **模型调优** - 调整超参数
224
+
225
+ **验证目标**:提升训练和推理速度,改善翻译质量
226
+
227
+ ## 五、训练策略
228
+
229
+ ### 快速验证模式
230
+
231
+ ```bash
232
+ # 使用tatoeba前1000条数据
233
+ # batch_size=8, epochs=10
234
+ python train.py --quick --samples 1000
235
+ ```
236
+
237
+ ### 完整训练模式
238
+
239
+ ```bash
240
+ # 使用全部数据
241
+ # 支持暂停/继续
242
+ python train.py --full
243
+ # Ctrl+C 暂停,自动保存检查点
244
+ # python train.py --resume 继续训练
245
+ ```
246
+
247
+ ### 训练输出
248
+
249
+ ```
250
+ Epoch 1/10: 100%|████████| 125/125 [02:30<00:00, loss=0.452]
251
+ 预计剩余: 22:30 | 速度: 0.5 it/s
252
+ 最新检查点: checkpoints/model_epoch1.pt
253
+
254
+ 按 Ctrl+C 停止训练(自动保存)
255
+ ```
256
+
257
+ ## 六、推理展示
258
+
259
+ ```
260
+ $ python inference.py --zh "你好世界"
261
+
262
+ 翻译模式: 中文 → 英文
263
+ 输入: 你好世界
264
+
265
+ 扩散过程:
266
+ Step 1000: [噪声状态 - 切换器: 中文 95%]
267
+ Step 900: [噪声状态 - 切换器: 中文 78%]
268
+ Step 800: [噪声状态 - 切换器: 中文 52%]
269
+ Step 700: [噪声状态 - 切换器: 英文 61%] ← 语言切换!
270
+ Step 600: [噪声状态 - 切换器: 英文 89%]
271
+ ...
272
+ Step 50: [接近完整句子 - 切换器: 英文 99%]
273
+ Step 1: [完整句子]
274
+
275
+ 输出: Hello world
276
+ ```
277
+
278
+ ## 七、环境适配
279
+
280
+ 针对CPU环境的优化:
281
+
282
+ 1. **小模型**:参数量控制在2-5M
283
+ 2. **小批量**:batch_size = 4-16
284
+ 3. **梯度累积**:模拟更大batch
285
+ 4. **简单架构**:减少层数和维度
286
+ 5. **内存优化**:及时释放中间变量
287
+
288
+ ## 八、预期效果
289
+
290
+ | 指标 | 目标 |
291
+ |------|------|
292
+ | 训练速度 | 1-2 it/s (CPU) |
293
+ | 推理速度 | 1-5秒/句 (DDIM 50步) |
294
+ | 翻译质量 | 简单句子可理解 |
295
+ | 模型大小 | < 50MB |
296
+
297
+ ---
298
+
299
+ *计划制定完成,待用户确认后开始实现*
README.md ADDED
@@ -0,0 +1,284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Diffutslator
2
+
3
+ 基于扩散模型的中英互译系统。使用非自回归并行生成,通过DDIM加速推理
4
+
5
+ ## 原理
6
+
7
+ ### 扩散翻译的核心思想
8
+
9
+ 传统翻译模型(如Transformer)是自回归的,逐token生成。扩散模型则是非自回归的,并行生成所有token:
10
+
11
+ ```
12
+ 自回归: [SOS] → [token1] → [token2] → [token3] → [EOS]
13
+ ↓ ↓ ↓
14
+ 扩散: 噪声 ──同时去噪──→ 完整句子(一步生成所有token)
15
+ ```
16
+
17
+ ### 双向翻译架构
18
+
19
+ ```
20
+ ┌─────────────────────────────────────────────────────────────────────┐
21
+ │ 噪声空间 (共享) │
22
+ │ [L × D] │
23
+ │ │
24
+ │ 中文嵌入 ──前向扩散(q_sample)──→ 噪声 ←──前向扩散── 英文嵌入 │
25
+ │ ↓ ↓ │
26
+ │ 中文去噪网络 英文去噪网络 │
27
+ │ ↓ ↓ │
28
+ │ 中文逆扩散 英文逆扩散 │
29
+ │ ↓ ↓ │
30
+ │ 中文输出 英文输出 │
31
+ └─────────────────────────────────────────────────────────────────────┘
32
+ ```
33
+
34
+ ### 翻译流程
35
+
36
+ 以 **中译英** 为例:
37
+
38
+ 1. **编码**: 中文句子 → 中文token → 中文嵌入向量
39
+ 2. **前向扩散**: 中文嵌入添加噪声到指定时间步(或到纯噪声)
40
+ 3. **逆扩散去噪**:
41
+ - 前半段:用中文去噪网络(保持源语言特征)
42
+ - 后半段:切换到英文去噪网络(转向目标语言)
43
+ 4. **解码**: 最终嵌入 → 英文token → 英文句子
44
+
45
+ ### 为什么扩散能做翻译?
46
+
47
+ 扩散过程将数据逐步加噪变成纯噪声,逆扩散则从噪声恢复数据。关键洞察:
48
+
49
+ - 两种语言嵌入经过充分加噪后,在噪声空间中变得"不可区分"
50
+ - 从这个共享噪声空间出发,用不同语言的去噪路径,可以恢复到不同语言
51
+ - 类比:把中文和英文都"打散"成同样的积木,再用英文的说明书拼回去
52
+
53
+ ## 安装
54
+
55
+ ### 依赖
56
+
57
+ ```bash
58
+ pip install torch tqdm
59
+ ```
60
+
61
+ ### 硬件要求
62
+
63
+ - CPU训练可用(本项目针对CPU优化)
64
+ - 内存:至少4GB
65
+ - 推荐:GPU可大幅加速
66
+
67
+ ## 快速开始
68
+
69
+ ### 训练
70
+
71
+ ```bash
72
+ # 快速验证模式(1000条数据,5轮)
73
+ python train.py --quick
74
+
75
+ # 完整训练
76
+ python train.py
77
+
78
+ # 从检查点续训
79
+ python train.py --resume checkpoints/epoch_1.pt
80
+ ```
81
+
82
+ 训练中按 `Ctrl+C` 可安全中断,自动保存 `checkpoints/interrupted.pt`。
83
+
84
+ ### 推理
85
+
86
+ ```bash
87
+ # 中译英
88
+ python inference.py --text "你好世界" --zh
89
+
90
+ # 英译中
91
+ python inference.py --text "Hello world" --en
92
+
93
+ # 交互模式
94
+ python inference.py --interactive
95
+ ```
96
+
97
+ ## 详细使用
98
+
99
+ ### 训练命令
100
+
101
+ ```bash
102
+ # 基本训练
103
+ python train.py
104
+
105
+ # 快速验证(小数据集,少轮次)
106
+ python train.py --quick
107
+
108
+ # 从检查点续训
109
+ python train.py --resume checkpoints/best.pt
110
+
111
+ # 使用更多数据
112
+ python train.py --max-samples 10000
113
+
114
+ # 指定轮次和批量
115
+ python train.py --epochs 20 --batch-size 32
116
+ ```
117
+
118
+ ### 推理命令
119
+
120
+ ```bash
121
+ # 基本推理(中译英)
122
+ python inference.py --text "今天天气很好" --zh
123
+
124
+ # 英译中
125
+ python inference.py --text "The weather is nice today" --en
126
+
127
+ # 使用DDPM(更慢但可能更准)
128
+ python inference.py --text "你好" --zh --ddpm
129
+
130
+ # 交互模式
131
+ python inference.py --interactive
132
+
133
+ # 指定检查点
134
+ python inference.py --text "你好" --zh --checkpoint checkpoints/best.pt
135
+
136
+ # 静默模式(不显示扩散过程)
137
+ python inference.py --text "你好" --zh --quiet
138
+ ```
139
+
140
+ ## 配置参数
141
+
142
+ ### 模型配置 (ModelConfig)
143
+
144
+ | 参数 | 默认值 | 说明 |
145
+ |------|--------|------|
146
+ | `d_model` | 256 | 嵌入维度,影响模型容量 |
147
+ | `n_heads` | 4 | 多头注意力头数 |
148
+ | `n_layers` | 4 | Transformer编码器层数 |
149
+ | `d_ff` | 512 | 前馈网络隐藏层维度 |
150
+ | `max_len` | 128 | 最大序列长度 |
151
+ | `dropout` | 0.1 | Dropout比率 |
152
+ | `vocab_size_zh` | 8000 | 中文词表大小 |
153
+ | `vocab_size_en` | 8000 | 英文词表大小 |
154
+
155
+ ### 扩散配置 (DiffusionConfig)
156
+
157
+ | 参数 | 默认值 | 说明 |
158
+ |------|--------|------|
159
+ | `timesteps` | 1000 | 训练时的扩散总步数 |
160
+ | `ddim_steps` | 50 | DDIM推理采样步数 |
161
+ | `beta_start` | 0.0001 | 噪声调度起始值 |
162
+ | `beta_end` | 0.02 | 噪声调度结束值 |
163
+
164
+ ### 训练配置 (TrainingConfig)
165
+
166
+ | 参数 | 默认值 | 说明 |
167
+ |------|--------|------|
168
+ | `batch_size` | 64 | 批量大小 |
169
+ | `learning_rate` | 1e-4 | 学习率 |
170
+ | `weight_decay` | 0.01 | 权重衰减 |
171
+ | `warmup_steps` | 500 | 学习率预热步数 |
172
+ | `epochs` | 10 | 训练轮次 |
173
+ | `save_every` | 1 | 每N轮保存检查点 |
174
+
175
+ ### 数据配置 (DataConfig)
176
+
177
+ | 参数 | 默认值 | 说明 |
178
+ |------|--------|------|
179
+ | `max_samples` | None | 最大样本数(None=全部) |
180
+ | `min_len` | 2 | 最小句子长度 |
181
+ | `max_len` | 128 | 最大句子长度 |
182
+
183
+ ## 架构说明
184
+
185
+ ### 分词器 (tokenizer.py)
186
+
187
+ 使用BPE(Byte Pair Encoding)算法:
188
+
189
+ - **中文**: 字符级为主,BPE处理罕见词和数字
190
+ - **英文**: 标准BPE子词分割
191
+ - 词表大小:各8000 tokens
192
+ - 特殊token: `<pad>`, `<sos>`, `<eos>`, `<unk>`, `<mask>`
193
+
194
+ ```python
195
+ # 示例
196
+ tokenizer_zh.encode("你好世界") # [123, 456, 789]
197
+ tokenizer_en.encode("hello world") # [234, 567]
198
+ ```
199
+
200
+ ### 嵌入层 (embedding.py)
201
+
202
+ ```python
203
+ class LanguageEmbedding:
204
+ token_embedding # [vocab_size, d_model]
205
+ position_embedding # [max_len, d_model]
206
+ length_embedding # [max_len, d_model]
207
+ ```
208
+
209
+ 将离散token转换为连续向量,加入位置信息。
210
+
211
+ ### 噪声预测网络 (model.py)
212
+
213
+ ```python
214
+ class DiffusionTransformer:
215
+ """基于Transformer的噪声预测网络"""
216
+
217
+ # 输入: x_t [batch, len, d_model], t [batch], lang [str]
218
+ # 输出: predicted_noise [batch, len, d_model]
219
+
220
+ # 结构:
221
+ # 1. 时间步嵌入 (sinusoidal)
222
+ # 2. 语言特定输入投影
223
+ # 3. N层 Transformer blocks
224
+ # 4. 语言特定输出投影
225
+ ```
226
+
227
+ ### 扩散过程 (diffusion.py)
228
+
229
+ ```python
230
+ # 前向扩散(加噪)
231
+ x_t, noise = diffusion.q_sample(x_0, t) # x_0 → x_t
232
+
233
+ # 反向扩散(去噪)
234
+ x_t_minus_1 = diffusion.p_sample(x_t, t, predicted_noise)
235
+ ```
236
+
237
+ 使用线性噪声调度,支持DDIM加速采样。
238
+
239
+ ### 语言切换器 (switcher.py)
240
+
241
+ ```python
242
+ class LanguageSwitcher:
243
+ """判断当前噪声状态更接近哪种语言"""
244
+
245
+ # 输入: x_t [batch, len, d_model]
246
+ # 输出: lang_prob [batch, 2] # [中文概率, 英文概率]
247
+ ```
248
+
249
+ 在推理时判断何时切换去噪路径。
250
+
251
+ ## 文件结构
252
+
253
+ ```
254
+ diffutslator/
255
+ ├── config.py # 超参数配置
256
+ ├── tokenizer.py # BPE分词器
257
+ ├── embedding.py # 嵌入层
258
+ ├── model.py # 噪声预测网络 (Transformer)
259
+ ├── diffusion.py # 扩散过程 + DDIM采样
260
+ ├── switcher.py # 语言切换分类器
261
+ ├── dataset.py # 数据加载(流式)
262
+ ├── train.py # 训练脚本
263
+ ├── inference.py # 推理脚本
264
+ ├── main.py # 主入口
265
+ ├── utils.py # 工具函数
266
+ ├── .cache/ # 分词器缓存
267
+ │ ├── tokenizer_zh.json
268
+ │ └── tokenizer_en.json
269
+ └── checkpoints/ # 模型检查点
270
+ ├── best.pt
271
+ ├── epoch_1.pt
272
+ └── interrupted.pt
273
+ ```
274
+
275
+ ## 数据集
276
+
277
+ - `_dataset/cveto/`
278
+ - `_dataset/tatoeba.tsv`
279
+
280
+ ---
281
+
282
+ 上面是AI生成的,我到这补充一下
283
+
284
+ 生成这个项目的模型是GLM-5,用iflow cli,在我的电脑上训练了九个半小时,用了2.8w条数据,权重在checkpoints下
TASK.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## 项目目标
2
+
3
+ 基于扩散模型的翻译AI,实现两种语言的互译。
4
+
5
+ - 并行生成:非自回归,同时生成所有token
6
+ - 推理效率:通过DDIM加速,减少采样步数
7
+
8
+ 要求:
9
+
10
+ 1. 首要确保最终能够训练出来
11
+ 2. 最好提升训练和推理速度
12
+ 3. 其次提升效果
13
+
14
+ ---
15
+
16
+ ## 架构设计
17
+
18
+ 分为两个部分:中文处理和英文处理
19
+
20
+ ```
21
+ 中文处理 ←→ 英文处理
22
+ 中文源语言/翻译后语言 ←扩散/逆扩散→ 噪声 ←扩散/逆扩散→ 英文源语言/翻译后语言
23
+ ```
24
+
25
+ 以中文翻译成英文为例,先将中文通过中文翻译部分进行扩散为噪声,同时中文和英文部分识别这个噪声谁更加接近,但英文部分识别到接近英文比例大于中文时,切换英文部分逆扩散,也就是通过扩散在两种语言之间相互转换
26
+
27
+ 扩散可以是将一个字符扩散为更多,也可以更少,也可以改变相对位置,也可以增删
28
+
29
+ ## 交互
30
+
31
+ 训练:命令行操作,带进度条,可看到数据集训练进度,可以看到预计时间和速度,可以随时暂停和停止训练,停止训练后保存权重
32
+
33
+ 先做一个用少量数据训练的模式确保可用,再让我在另一个终端做完整的训练
34
+
35
+ 运行:也是命令行操作,每一步扩散都要输出一行,可以看到每一步
36
+
37
+ ## 数据集
38
+
39
+ - `../_dataset/tatoeba.tsv`:tsv文件,一行一个句子,格式为:`编号(不用管)\t中文\t编号\t英文`
40
+ - `../_dataset/cveto/train.en`和`../_dataset/cveto/train.zh`:一行一个干净的句子,两个文件相同行号对应相同含义句子,即`trian.en`的第123行对应`train.zh`的第123行
TASK_NEXT.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TASK第二部分
2
+
3
+ 创建前端交互页面,不要动其它已有文件
4
+
5
+ 能调的参数都能在网页调
6
+
7
+ ## hf space网页
8
+
9
+ 创建文件夹`hfspace`
10
+
11
+ 在hf space上运行的一个演示站,没什么要求,用hf的资源进行推理
12
+
13
+ 需要告诉我启动的命令让我本地调试
checkpoints/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:283c995651aa11ebde09858ad5002cd932c1dd6dd5ede16be733c16cbb5c4c55
3
+ size 47986610
checkpoints/epoch_1.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebb5b81a9af19c5db8f418954484be00a0a238d2f8c297c76211f0e6600d21e2
3
+ size 48004138
checkpoints/interrupted.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47d11cef59cb881a28ae4ecfcfe787babca789843354bcce9946b9bc04cc397d
3
+ size 48006090
config.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Diffutslator 配置文件
3
+ 所有超参数集中管理
4
+ """
5
+
6
+ from dataclasses import dataclass, field
7
+ from typing import Optional
8
+ import os
9
+
10
+
11
+ @dataclass
12
+ class ModelConfig:
13
+ """模型配置"""
14
+ d_model: int = 256 # 嵌入维度
15
+ n_heads: int = 4 # 注意力头数
16
+ n_layers: int = 4 # Transformer层数
17
+ d_ff: int = 512 # 前馈网络维度
18
+ max_len: int = 128 # 最大序列长度
19
+ dropout: float = 0.1 # Dropout率
20
+
21
+ # 词表
22
+ vocab_size_zh: int = 8000 # 中文词表大小
23
+ vocab_size_en: int = 8000 # 英文词表大小
24
+
25
+ # 特殊token
26
+ pad_token: str = "<pad>"
27
+ sos_token: str = "<sos>"
28
+ eos_token: str = "<eos>"
29
+ unk_token: str = "<unk>"
30
+ mask_token: str = "<mask>"
31
+
32
+
33
+ @dataclass
34
+ class DiffusionConfig:
35
+ """扩散过程配置"""
36
+ timesteps: int = 1000 # 训练时的扩散步数
37
+ ddim_steps: int = 50 # DDIM推理步数
38
+
39
+ # 噪声调度 - 线性
40
+ beta_start: float = 0.0001
41
+ beta_end: float = 0.02
42
+
43
+ # 长度变化
44
+ length_noise_scale: float = 0.3 # 扩散时长度变化的噪声程度
45
+
46
+
47
+ @dataclass
48
+ class TrainingConfig:
49
+ """训练配置"""
50
+ batch_size: int = 64 # 批量大小(CPU擅长大批量)
51
+ gradient_accumulation: int = 1 # 梯度累积步数
52
+
53
+ learning_rate: float = 1e-4
54
+ weight_decay: float = 0.01
55
+ warmup_steps: int = 500
56
+
57
+ epochs: int = 10
58
+ save_every: int = 1 # 每多少epoch保存一次
59
+ eval_every: int = 100 # 每多少步评估一次
60
+
61
+ # 快速验证模式
62
+ quick_mode: bool = False
63
+ quick_samples: int = 1000
64
+
65
+ # 检查点
66
+ checkpoint_dir: str = "checkpoints"
67
+ resume: Optional[str] = None # 恢复训练的检查点路径
68
+
69
+
70
+ @dataclass
71
+ class DataConfig:
72
+ """数据配置"""
73
+ # 数据集路径
74
+ tatoeba_path: str = "../_dataset/tatoeba.tsv"
75
+ cveto_zh_path: str = "../_dataset/cveto/train.zh"
76
+ cveto_en_path: str = "../_dataset/cveto/train.en"
77
+
78
+ # 数据处理
79
+ max_samples: Optional[int] = None # 最大样本数(None=全部)
80
+ min_len: int = 2 # 最小句子长度
81
+ max_len: int = 128 # 最大句子长度
82
+
83
+ # 缓存
84
+ use_cache: bool = True # 是否缓存预处理后的数据
85
+ cache_dir: str = ".cache"
86
+
87
+
88
+ @dataclass
89
+ class Config:
90
+ """总配置"""
91
+ model: ModelConfig = field(default_factory=ModelConfig)
92
+ diffusion: DiffusionConfig = field(default_factory=DiffusionConfig)
93
+ training: TrainingConfig = field(default_factory=TrainingConfig)
94
+ data: DataConfig = field(default_factory=DataConfig)
95
+
96
+ # 项目根目录
97
+ project_dir: str = ""
98
+
99
+ def __post_init__(self):
100
+ # 设置项目根目录
101
+ self.project_dir = os.path.dirname(os.path.abspath(__file__))
102
+
103
+ # 更新相对路径为绝对路径
104
+ if not os.path.isabs(self.data.tatoeba_path):
105
+ self.data.tatoeba_path = os.path.join(self.project_dir, self.data.tatoeba_path)
106
+ if not os.path.isabs(self.data.cveto_zh_path):
107
+ self.data.cveto_zh_path = os.path.join(self.project_dir, self.data.cveto_zh_path)
108
+ if not os.path.isabs(self.data.cveto_en_path):
109
+ self.data.cveto_en_path = os.path.join(self.project_dir, self.data.cveto_en_path)
110
+
111
+ # 创建必要目录
112
+ os.makedirs(os.path.join(self.project_dir, self.training.checkpoint_dir), exist_ok=True)
113
+ os.makedirs(os.path.join(self.project_dir, self.data.cache_dir), exist_ok=True)
114
+
115
+ @classmethod
116
+ def quick(cls) -> "Config":
117
+ """快速验证模式配置"""
118
+ config = cls()
119
+ config.training.quick_mode = True
120
+ config.training.quick_samples = 1000
121
+ config.training.epochs = 5
122
+ config.training.batch_size = 32 # CPU擅长大批量
123
+ config.data.max_samples = 1000
124
+ return config
125
+
126
+
127
+ # 默认配置实例
128
+ default_config = Config()
dataset.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 数据集加载
3
+ 支持tatoeba和cveto数据集
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import random
9
+ import psutil
10
+ from typing import List, Tuple, Optional, Dict, Any
11
+ from dataclasses import dataclass
12
+ import torch
13
+ from torch.utils.data import Dataset, DataLoader
14
+
15
+ from tokenizer import Tokenizer
16
+
17
+
18
+ def check_memory():
19
+ """检查可用内存"""
20
+ mem = psutil.virtual_memory()
21
+ available_gb = mem.available / (1024**3)
22
+ return available_gb
23
+
24
+
25
+ @dataclass
26
+ class TranslationPair:
27
+ """翻译句对"""
28
+ zh: str
29
+ en: str
30
+
31
+
32
+ class TranslationDataset(Dataset):
33
+ """翻译数据集 - 流式处理,内存友好"""
34
+
35
+ def __init__(
36
+ self,
37
+ pairs: List[TranslationPair],
38
+ zh_tokenizer: Tokenizer,
39
+ en_tokenizer: Tokenizer,
40
+ max_len: int = 128,
41
+ cache_tokenized: bool = True,
42
+ ):
43
+ self.pairs = pairs
44
+ self.zh_tokenizer = zh_tokenizer
45
+ self.en_tokenizer = en_tokenizer
46
+ self.max_len = max_len
47
+
48
+ # 小缓存,只缓存最近访问的数据
49
+ self._cache: Dict[int, Dict[str, Any]] = {}
50
+ self._cache_size = min(5000, len(pairs) // 10) # 缓存10%或最多5000条
51
+
52
+ print(f" 数据集: {len(pairs)} 条 (流式处理)")
53
+
54
+ def __len__(self) -> int:
55
+ return len(self.pairs)
56
+
57
+ def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
58
+ # 检查缓存
59
+ if idx in self._cache:
60
+ return self._cache[idx]
61
+
62
+ # 处理数据
63
+ pair = self.pairs[idx]
64
+ zh_ids = self.zh_tokenizer.encode(pair.zh, add_sos=True, add_eos=True)[:self.max_len]
65
+ en_ids = self.en_tokenizer.encode(pair.en, add_sos=True, add_eos=True)[:self.max_len]
66
+
67
+ result = {
68
+ 'zh_ids': torch.tensor(zh_ids, dtype=torch.long),
69
+ 'en_ids': torch.tensor(en_ids, dtype=torch.long),
70
+ 'zh_len': len(zh_ids),
71
+ 'en_len': len(en_ids),
72
+ 'zh_text': pair.zh,
73
+ 'en_text': pair.en,
74
+ }
75
+
76
+ # 添加到缓存
77
+ if len(self._cache) < self._cache_size:
78
+ self._cache[idx] = result
79
+
80
+ return result
81
+
82
+
83
+ def collate_fn(batch: List[Dict[str, Any]]) -> Dict[str, torch.Tensor]:
84
+ """批处理函数,动态padding"""
85
+ zh_ids_list = [item['zh_ids'] for item in batch]
86
+ en_ids_list = [item['en_ids'] for item in batch]
87
+
88
+ # 找最大长度
89
+ max_zh_len = max(len(ids) for ids in zh_ids_list)
90
+ max_en_len = max(len(ids) for ids in en_ids_list)
91
+
92
+ # Padding
93
+ zh_padded = torch.zeros(len(batch), max_zh_len, dtype=torch.long)
94
+ en_padded = torch.zeros(len(batch), max_en_len, dtype=torch.long)
95
+
96
+ zh_lens = []
97
+ en_lens = []
98
+
99
+ for i, (zh_ids, en_ids) in enumerate(zip(zh_ids_list, en_ids_list)):
100
+ zh_padded[i, :len(zh_ids)] = zh_ids
101
+ en_padded[i, :len(en_ids)] = en_ids
102
+ zh_lens.append(len(zh_ids))
103
+ en_lens.append(len(en_ids))
104
+
105
+ return {
106
+ 'zh_ids': zh_padded,
107
+ 'en_ids': en_padded,
108
+ 'zh_lens': torch.tensor(zh_lens, dtype=torch.long),
109
+ 'en_lens': torch.tensor(en_lens, dtype=torch.long),
110
+ 'zh_texts': [item['zh_text'] for item in batch],
111
+ 'en_texts': [item['en_text'] for item in batch],
112
+ }
113
+
114
+
115
+ def load_tatoeba(path: str, max_samples: Optional[int] = None) -> List[TranslationPair]:
116
+ """加载tatoeba数据集
117
+
118
+ 格式: 编号\t中文\t编号\t英文
119
+ """
120
+ pairs = []
121
+ seen = set()
122
+
123
+ with open(path, 'r', encoding='utf-8') as f:
124
+ for line in f:
125
+ line = line.strip()
126
+ if not line:
127
+ continue
128
+
129
+ parts = line.split('\t')
130
+ if len(parts) < 4:
131
+ continue
132
+
133
+ zh = parts[1].strip()
134
+ en = parts[3].strip()
135
+
136
+ # 去重
137
+ key = (zh, en)
138
+ if key in seen:
139
+ continue
140
+ seen.add(key)
141
+
142
+ pairs.append(TranslationPair(zh=zh, en=en))
143
+
144
+ if max_samples and len(pairs) >= max_samples:
145
+ break
146
+
147
+ return pairs
148
+
149
+
150
+ def load_cveto(zh_path: str, en_path: str, max_samples: Optional[int] = None) -> List[TranslationPair]:
151
+ """加载cveto数据集
152
+
153
+ 两个文件,行号对应
154
+ """
155
+ pairs = []
156
+
157
+ # 先统计总行数
158
+ print(" 统计文件行数...", end="", flush=True)
159
+ with open(zh_path, 'r', encoding='utf-8') as f:
160
+ total_lines = sum(1 for _ in f)
161
+ print(f" {total_lines:,} 行")
162
+
163
+ print(" 读取数据...", end="", flush=True)
164
+ last_print = 0
165
+ with open(zh_path, 'r', encoding='utf-8') as zh_f, \
166
+ open(en_path, 'r', encoding='utf-8') as en_f:
167
+
168
+ for i, (zh_line, en_line) in enumerate(zip(zh_f, en_f)):
169
+ zh = zh_line.strip()
170
+ en = en_line.strip()
171
+
172
+ if zh and en:
173
+ pairs.append(TranslationPair(zh=zh, en=en))
174
+
175
+ # 每10万行打印一次进度
176
+ if i - last_print >= 100000:
177
+ print(f".{i//100000}", end="", flush=True)
178
+ last_print = i
179
+
180
+ if max_samples and len(pairs) >= max_samples:
181
+ break
182
+
183
+ print(f" 完成")
184
+ return pairs
185
+
186
+
187
+ def load_all_data(config) -> Tuple[List[TranslationPair], List[TranslationPair], List[TranslationPair]]:
188
+ """加载所有数据,返回训练集、验证集、测试集"""
189
+ print("加载数据集...")
190
+
191
+ # 加载tatoeba
192
+ tatoeba_path = config.data.tatoeba_path
193
+ if os.path.exists(tatoeba_path):
194
+ print(f" 加载 tatoeba: {tatoeba_path}")
195
+ tatoeba_pairs = load_tatoeba(tatoeba_path, max_samples=config.data.max_samples)
196
+ print(f" 句对数: {len(tatoeba_pairs)}")
197
+ else:
198
+ tatoeba_pairs = []
199
+ print(f" 警告: tatoeba路径不存在: {tatoeba_path}")
200
+
201
+ # 合并所有数据
202
+ all_pairs = tatoeba_pairs.copy()
203
+
204
+ # 如果还需要更多数据,加载cveto
205
+ if config.data.max_samples is None or len(all_pairs) < config.data.max_samples:
206
+ cveto_zh_path = config.data.cveto_zh_path
207
+ cveto_en_path = config.data.cveto_en_path
208
+
209
+ if os.path.exists(cveto_zh_path) and os.path.exists(cveto_en_path):
210
+ print(f" 加载 cveto...")
211
+ remaining = None
212
+ if config.data.max_samples:
213
+ remaining = config.data.max_samples - len(all_pairs)
214
+
215
+ cveto_pairs = load_cveto(cveto_zh_path, cveto_en_path, max_samples=remaining)
216
+ print(f" 句对数: {len(cveto_pairs)}")
217
+ all_pairs.extend(cveto_pairs)
218
+
219
+ # 过滤长度
220
+ print(f" 过滤数据...", end="", flush=True)
221
+ filtered_pairs = []
222
+ total = len(all_pairs)
223
+ last_print = 0
224
+ for i, pair in enumerate(all_pairs):
225
+ zh_len = len(pair.zh)
226
+ en_len = len(pair.en)
227
+ if config.data.min_len <= zh_len <= config.data.max_len and \
228
+ config.data.min_len <= en_len <= config.data.max_len:
229
+ filtered_pairs.append(pair)
230
+
231
+ # 每10万条打印进度
232
+ if i - last_print >= 100000:
233
+ progress = (i + 1) / total * 100
234
+ print(f".{progress:.0f}%", end="", flush=True)
235
+ last_print = i
236
+
237
+ print(f" 完成")
238
+
239
+ print(f" 过滤后句对数: {len(filtered_pairs)}")
240
+
241
+ # 打乱并分割
242
+ random.shuffle(filtered_pairs)
243
+ n = len(filtered_pairs)
244
+
245
+ # 80% 训练, 10% 验证, 10% 测试
246
+ train_end = int(n * 0.8)
247
+ val_end = int(n * 0.9)
248
+
249
+ train_pairs = filtered_pairs[:train_end]
250
+ val_pairs = filtered_pairs[train_end:val_end]
251
+ test_pairs = filtered_pairs[val_end:]
252
+
253
+ print(f" 训练集: {len(train_pairs)}")
254
+ print(f" 验证集: {len(val_pairs)}")
255
+ print(f" 测试集: {len(test_pairs)}")
256
+
257
+ return train_pairs, val_pairs, test_pairs
258
+
259
+
260
+ def create_dataloaders(
261
+ train_pairs: List[TranslationPair],
262
+ val_pairs: List[TranslationPair],
263
+ zh_tokenizer: Tokenizer,
264
+ en_tokenizer: Tokenizer,
265
+ config,
266
+ ) -> Tuple[DataLoader, DataLoader]:
267
+ """创建数据加载器"""
268
+ train_dataset = TranslationDataset(
269
+ train_pairs,
270
+ zh_tokenizer,
271
+ en_tokenizer,
272
+ max_len=config.model.max_len,
273
+ )
274
+
275
+ val_dataset = TranslationDataset(
276
+ val_pairs,
277
+ zh_tokenizer,
278
+ en_tokenizer,
279
+ max_len=config.model.max_len,
280
+ )
281
+
282
+ train_loader = DataLoader(
283
+ train_dataset,
284
+ batch_size=config.training.batch_size,
285
+ shuffle=True,
286
+ collate_fn=collate_fn,
287
+ num_workers=0, # CPU环境不用多进程
288
+ pin_memory=False,
289
+ )
290
+
291
+ val_loader = DataLoader(
292
+ val_dataset,
293
+ batch_size=config.training.batch_size,
294
+ shuffle=False,
295
+ collate_fn=collate_fn,
296
+ num_workers=0,
297
+ pin_memory=False,
298
+ )
299
+
300
+ return train_loader, val_loader
diffusion.py ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 扩散核心
3
+ 实现前向扩散和反向扩散,支持DDIM加速
4
+ """
5
+
6
+ import math
7
+ import torch
8
+ import torch.nn as nn
9
+ from typing import Tuple, Optional, List, Callable
10
+
11
+
12
+ class NoiseScheduler:
13
+ """噪声调度器"""
14
+
15
+ def __init__(
16
+ self,
17
+ timesteps: int = 1000,
18
+ beta_start: float = 0.0001,
19
+ beta_end: float = 0.02,
20
+ schedule: str = "linear",
21
+ ):
22
+ self.timesteps = timesteps
23
+
24
+ # 计算beta
25
+ if schedule == "linear":
26
+ self.betas = torch.linspace(beta_start, beta_end, timesteps)
27
+ elif schedule == "cosine":
28
+ # Cosine schedule
29
+ steps = timesteps + 1
30
+ x = torch.linspace(0, timesteps, steps)
31
+ alphas_cumprod = torch.cos(((x / timesteps) + 0.008) / 1.008 * math.pi * 0.5) ** 2
32
+ alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
33
+ self.betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
34
+ self.betas = torch.clip(self.betas, 0.0001, 0.9999)
35
+ else:
36
+ self.betas = torch.linspace(beta_start, beta_end, timesteps)
37
+
38
+ # 计算alpha
39
+ self.alphas = 1.0 - self.betas
40
+ self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
41
+ self.alphas_cumprod_prev = torch.cat([torch.tensor([1.0]), self.alphas_cumprod[:-1]])
42
+
43
+ # 前向扩散系数
44
+ self.sqrt_alphas_cumprod = torch.sqrt(self.alphas_cumprod)
45
+ self.sqrt_one_minus_alphas_cumprod = torch.sqrt(1.0 - self.alphas_cumprod)
46
+
47
+ # 反向扩散系数
48
+ self.sqrt_recip_alphas = torch.sqrt(1.0 / self.alphas)
49
+ self.posterior_variance = self.betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)
50
+
51
+ def to(self, device: torch.device) -> "NoiseScheduler":
52
+ """移动到指定设备"""
53
+ self.betas = self.betas.to(device)
54
+ self.alphas = self.alphas.to(device)
55
+ self.alphas_cumprod = self.alphas_cumprod.to(device)
56
+ self.alphas_cumprod_prev = self.alphas_cumprod_prev.to(device)
57
+ self.sqrt_alphas_cumprod = self.sqrt_alphas_cumprod.to(device)
58
+ self.sqrt_one_minus_alphas_cumprod = self.sqrt_one_minus_alphas_cumprod.to(device)
59
+ self.sqrt_recip_alphas = self.sqrt_recip_alphas.to(device)
60
+ self.posterior_variance = self.posterior_variance.to(device)
61
+ return self
62
+
63
+
64
+ class DiffusionProcess:
65
+ """扩散过程"""
66
+
67
+ def __init__(self, scheduler: NoiseScheduler):
68
+ self.scheduler = scheduler
69
+ self.timesteps = scheduler.timesteps
70
+
71
+ def q_sample(
72
+ self,
73
+ x_0: torch.Tensor,
74
+ t: torch.Tensor,
75
+ noise: Optional[torch.Tensor] = None,
76
+ ) -> Tuple[torch.Tensor, torch.Tensor]:
77
+ """前向扩散:从x_0采样x_t
78
+
79
+ Args:
80
+ x_0: 初始嵌入 [batch, seq_len, d_model]
81
+ t: 时间步 [batch]
82
+ noise: 可选噪声
83
+
84
+ Returns:
85
+ x_t: 加噪后的嵌入
86
+ noise: 使用的噪声
87
+ """
88
+ if noise is None:
89
+ noise = torch.randn_like(x_0)
90
+
91
+ # 获取系数
92
+ sqrt_alpha = self.scheduler.sqrt_alphas_cumprod[t]
93
+ sqrt_one_minus_alpha = self.scheduler.sqrt_one_minus_alphas_cumprod[t]
94
+
95
+ # 扩展维度以匹配序列
96
+ sqrt_alpha = sqrt_alpha.view(-1, 1, 1)
97
+ sqrt_one_minus_alpha = sqrt_one_minus_alpha.view(-1, 1, 1)
98
+
99
+ # 加噪
100
+ x_t = sqrt_alpha * x_0 + sqrt_one_minus_alpha * noise
101
+
102
+ return x_t, noise
103
+
104
+ def p_sample(
105
+ self,
106
+ x_t: torch.Tensor,
107
+ t: torch.Tensor,
108
+ predicted_noise: torch.Tensor,
109
+ ) -> torch.Tensor:
110
+ """反向扩散:从x_t采样x_{t-1}
111
+
112
+ Args:
113
+ x_t: 当前噪声状态 [batch, seq_len, d_model]
114
+ t: 当前时间步 [batch]
115
+ predicted_noise: 预测的噪声
116
+
117
+ Returns:
118
+ x_{t-1}
119
+ """
120
+ # 获取系数
121
+ sqrt_recip_alpha = self.scheduler.sqrt_recip_alphas[t]
122
+ sqrt_one_minus_alpha = self.scheduler.sqrt_one_minus_alphas_cumprod[t]
123
+ beta = self.scheduler.betas[t]
124
+
125
+ # 扩展维度
126
+ sqrt_recip_alpha = sqrt_recip_alpha.view(-1, 1, 1)
127
+ sqrt_one_minus_alpha = sqrt_one_minus_alpha.view(-1, 1, 1)
128
+ beta = beta.view(-1, 1, 1)
129
+
130
+ # 计算均值
131
+ mean = sqrt_recip_alpha * (x_t - beta * predicted_noise / sqrt_one_minus_alpha)
132
+
133
+ # 添加噪声(除了t=0)
134
+ if t[0] > 0:
135
+ posterior_var = self.scheduler.posterior_variance[t].view(-1, 1, 1)
136
+ noise = torch.randn_like(x_t)
137
+ x_t_minus_1 = mean + torch.sqrt(posterior_var) * noise
138
+ else:
139
+ x_t_minus_1 = mean
140
+
141
+ return x_t_minus_1
142
+
143
+ def q_sample_full(
144
+ self,
145
+ x_0: torch.Tensor,
146
+ target_len: Optional[int] = None,
147
+ ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
148
+ """完整前向扩散到纯噪声
149
+
150
+ Args:
151
+ x_0: 初始嵌入
152
+ target_len: 目标长度(用于变长序列)
153
+
154
+ Returns:
155
+ x_T: 纯噪声
156
+ noises: 所有时间步的噪声
157
+ t: 最终时间步
158
+ """
159
+ batch_size = x_0.size(0)
160
+ t = torch.full((batch_size,), self.timesteps - 1, dtype=torch.long, device=x_0.device)
161
+
162
+ noise = torch.randn_like(x_0)
163
+ x_T, _ = self.q_sample(x_0, t, noise)
164
+
165
+ return x_T, noise, t
166
+
167
+
168
+ class DDIMSampler:
169
+ """DDIM采样器,加速推理"""
170
+
171
+ def __init__(self, scheduler: NoiseScheduler, ddim_steps: int = 50):
172
+ self.scheduler = scheduler
173
+ self.timesteps = scheduler.timesteps
174
+ self.ddim_steps = ddim_steps
175
+
176
+ # 计算DDIM时间步
177
+ self.ddim_timesteps = self._get_ddim_timesteps()
178
+
179
+ def _get_ddim_timesteps(self) -> List[int]:
180
+ """获取DDIM采样使用的时间步"""
181
+ c = self.timesteps // self.ddim_steps
182
+ ddim_timesteps = [i * c for i in range(self.ddim_steps)]
183
+ ddim_timesteps = list(reversed(ddim_timesteps))
184
+ return ddim_timesteps
185
+
186
+ def ddim_step(
187
+ self,
188
+ x_t: torch.Tensor,
189
+ t: int,
190
+ t_prev: int,
191
+ predicted_noise: torch.Tensor,
192
+ eta: float = 0.0,
193
+ ) -> torch.Tensor:
194
+ """DDIM单步采样
195
+
196
+ Args:
197
+ x_t: 当前状态
198
+ t: 当前时间步
199
+ t_prev: 前一时间步
200
+ predicted_noise: 预测的噪声
201
+ eta: 随机性参数 (0=deterministic, 1=DDPM)
202
+
203
+ Returns:
204
+ x_{t-1}
205
+ """
206
+ device = x_t.device
207
+ batch_size = x_t.size(0)
208
+
209
+ # 获取alpha
210
+ alpha_t = self.scheduler.alphas_cumprod[t]
211
+ alpha_t_prev = self.scheduler.alphas_cumprod[t_prev] if t_prev >= 0 else torch.tensor(1.0).to(device)
212
+
213
+ # 预测x_0
214
+ sqrt_alpha_t = torch.sqrt(alpha_t)
215
+ sqrt_one_minus_alpha_t = torch.sqrt(1 - alpha_t)
216
+
217
+ sqrt_alpha_t = sqrt_alpha_t.view(1, 1, 1)
218
+ sqrt_one_minus_alpha_t = sqrt_one_minus_alpha_t.view(1, 1, 1)
219
+
220
+ pred_x0 = (x_t - sqrt_one_minus_alpha_t * predicted_noise) / sqrt_alpha_t
221
+
222
+ # 计算方差
223
+ sigma = eta * torch.sqrt(
224
+ (1 - alpha_t_prev) / (1 - alpha_t) * (1 - alpha_t / alpha_t_prev)
225
+ )
226
+
227
+ # 计算方向指向x_t
228
+ sqrt_one_minus_alpha_t_prev = torch.sqrt(1 - alpha_t_prev - sigma ** 2)
229
+ sqrt_one_minus_alpha_t_prev = sqrt_one_minus_alpha_t_prev.view(1, 1, 1)
230
+
231
+ # 计算均值
232
+ sqrt_alpha_t_prev = torch.sqrt(alpha_t_prev).view(1, 1, 1)
233
+ mean = sqrt_alpha_t_prev * pred_x0 + sqrt_one_minus_alpha_t_prev * predicted_noise
234
+
235
+ # 添加噪声
236
+ if eta > 0:
237
+ noise = torch.randn_like(x_t)
238
+ x_t_prev = mean + sigma.view(1, 1, 1) * noise
239
+ else:
240
+ x_t_prev = mean
241
+
242
+ return x_t_prev
243
+
244
+ def sample(
245
+ self,
246
+ x_T: torch.Tensor,
247
+ predict_noise_fn: Callable,
248
+ callback: Optional[Callable] = None,
249
+ ) -> torch.Tensor:
250
+ """完整DDIM采样
251
+
252
+ Args:
253
+ x_T: 纯噪声
254
+ predict_noise_fn: 噪声预测函数 (x_t, t) -> noise
255
+ callback: 回调函数,用于可视化
256
+
257
+ Returns:
258
+ x_0
259
+ """
260
+ x_t = x_T
261
+
262
+ for i, t in enumerate(self.ddim_timesteps[:-1]):
263
+ t_prev = self.ddim_timesteps[i + 1]
264
+
265
+ # 预测噪声
266
+ t_tensor = torch.full((x_t.size(0),), t, dtype=torch.long, device=x_t.device)
267
+ predicted_noise = predict_noise_fn(x_t, t_tensor)
268
+
269
+ # DDIM步骤
270
+ x_t = self.ddim_step(x_t, t, t_prev, predicted_noise, eta=0.0)
271
+
272
+ # 回调
273
+ if callback:
274
+ callback(t, x_t)
275
+
276
+ return x_t
277
+
278
+
279
+ def get_diffusion(config) -> Tuple[DiffusionProcess, DDIMSampler]:
280
+ """创建扩散过程和采样器"""
281
+ scheduler = NoiseScheduler(
282
+ timesteps=config.diffusion.timesteps,
283
+ beta_start=config.diffusion.beta_start,
284
+ beta_end=config.diffusion.beta_end,
285
+ )
286
+
287
+ diffusion = DiffusionProcess(scheduler)
288
+ ddim_sampler = DDIMSampler(scheduler, ddim_steps=config.diffusion.ddim_steps)
289
+
290
+ return diffusion, ddim_sampler
embedding.py ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 嵌入层
3
+ 语言特定的嵌入,包含位置编码和长度编码
4
+ """
5
+
6
+ import math
7
+ import torch
8
+ import torch.nn as nn
9
+ from typing import Optional
10
+
11
+
12
+ class PositionalEncoding(nn.Module):
13
+ """正弦位置编码"""
14
+
15
+ def __init__(self, d_model: int, max_len: int = 128, dropout: float = 0.1):
16
+ super().__init__()
17
+ self.dropout = nn.Dropout(p=dropout)
18
+
19
+ # 计算位置编码
20
+ pe = torch.zeros(max_len, d_model)
21
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
22
+ div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
23
+
24
+ pe[:, 0::2] = torch.sin(position * div_term)
25
+ pe[:, 1::2] = torch.cos(position * div_term)
26
+
27
+ pe = pe.unsqueeze(0) # [1, max_len, d_model]
28
+ self.register_buffer('pe', pe)
29
+
30
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
31
+ """
32
+ x: [batch, seq_len, d_model]
33
+ """
34
+ x = x + self.pe[:, :x.size(1), :]
35
+ return self.dropout(x)
36
+
37
+
38
+ class SinusoidalTimeEmbedding(nn.Module):
39
+ """时间步的正弦嵌入(用于扩散)"""
40
+
41
+ def __init__(self, d_model: int):
42
+ super().__init__()
43
+ self.d_model = d_model
44
+
45
+ def forward(self, t: torch.Tensor) -> torch.Tensor:
46
+ """
47
+ t: [batch] 时间步,范围 [0, T]
48
+ 返回: [batch, d_model]
49
+ """
50
+ # 归一化到 [0, 1]
51
+ t = t.float().unsqueeze(-1) # [batch, 1]
52
+
53
+ half_dim = self.d_model // 2
54
+ emb = math.log(10000) / (half_dim - 1)
55
+ emb = torch.exp(torch.arange(half_dim, device=t.device) * -emb)
56
+ emb = t * emb.unsqueeze(0) # [batch, half_dim]
57
+ emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=-1)
58
+
59
+ return emb
60
+
61
+
62
+ class LanguageEmbedding(nn.Module):
63
+ """语言特定的嵌入层"""
64
+
65
+ def __init__(
66
+ self,
67
+ vocab_size: int,
68
+ d_model: int,
69
+ max_len: int = 128,
70
+ dropout: float = 0.1,
71
+ ):
72
+ super().__init__()
73
+
74
+ self.d_model = d_model
75
+
76
+ # Token嵌入
77
+ self.token_embedding = nn.Embedding(vocab_size, d_model)
78
+
79
+ # 位置编码
80
+ self.position_encoding = PositionalEncoding(d_model, max_len, dropout)
81
+
82
+ # 长度嵌入(用于变长序列)
83
+ self.length_embedding = nn.Embedding(max_len + 1, d_model)
84
+
85
+ # 缩放
86
+ self.scale = math.sqrt(d_model)
87
+
88
+ # 初始化
89
+ nn.init.normal_(self.token_embedding.weight, mean=0.0, std=0.02)
90
+ nn.init.normal_(self.length_embedding.weight, mean=0.0, std=0.02)
91
+
92
+ def forward(
93
+ self,
94
+ token_ids: torch.Tensor,
95
+ lengths: Optional[torch.Tensor] = None,
96
+ ) -> torch.Tensor:
97
+ """
98
+ token_ids: [batch, seq_len]
99
+ lengths: [batch] 可选,序列实际长度
100
+ 返回: [batch, seq_len, d_model]
101
+ """
102
+ # Token嵌入
103
+ x = self.token_embedding(token_ids) * self.scale
104
+
105
+ # 位置编码
106
+ x = self.position_encoding(x)
107
+
108
+ # 长度嵌入
109
+ if lengths is not None:
110
+ # 将长度信息广播到每个位置
111
+ len_emb = self.length_embedding(lengths) # [batch, d_model]
112
+ x = x + len_emb.unsqueeze(1) # 广播到序列长度
113
+
114
+ return x
115
+
116
+ def embed_noise(self, shape: tuple, device: torch.device) -> torch.Tensor:
117
+ """生成纯噪声嵌入
118
+
119
+ shape: (batch, seq_len, d_model)
120
+ """
121
+ return torch.randn(shape, device=device)
122
+
123
+
124
+ class DualLanguageEmbedding(nn.Module):
125
+ """双语嵌入层,管理中英文嵌入"""
126
+
127
+ def __init__(
128
+ self,
129
+ vocab_size_zh: int,
130
+ vocab_size_en: int,
131
+ d_model: int,
132
+ max_len: int = 128,
133
+ dropout: float = 0.1,
134
+ ):
135
+ super().__init__()
136
+
137
+ self.d_model = d_model
138
+
139
+ self.zh_embedding = LanguageEmbedding(vocab_size_zh, d_model, max_len, dropout)
140
+ self.en_embedding = LanguageEmbedding(vocab_size_en, d_model, max_len, dropout)
141
+
142
+ def forward(
143
+ self,
144
+ token_ids: torch.Tensor,
145
+ lang: str,
146
+ lengths: Optional[torch.Tensor] = None,
147
+ ) -> torch.Tensor:
148
+ """
149
+ lang: 'zh' 或 'en'
150
+ """
151
+ if lang == 'zh':
152
+ return self.zh_embedding(token_ids, lengths)
153
+ else:
154
+ return self.en_embedding(token_ids, lengths)
155
+
156
+ def embed_tokens(
157
+ self,
158
+ zh_ids: Optional[torch.Tensor] = None,
159
+ en_ids: Optional[torch.Tensor] = None,
160
+ zh_lens: Optional[torch.Tensor] = None,
161
+ en_lens: Optional[torch.Tensor] = None,
162
+ ) -> tuple:
163
+ """同时嵌入中英文"""
164
+ zh_emb = None
165
+ en_emb = None
166
+
167
+ if zh_ids is not None:
168
+ zh_emb = self.zh_embedding(zh_ids, zh_lens)
169
+ if en_ids is not None:
170
+ en_emb = self.en_embedding(en_ids, en_lens)
171
+
172
+ return zh_emb, en_emb
173
+
174
+
175
+ class OutputProjection(nn.Module):
176
+ """输出投影层,将隐藏状态投影回词表空间"""
177
+
178
+ def __init__(self, d_model: int, vocab_size: int):
179
+ super().__init__()
180
+ self.projection = nn.Linear(d_model, vocab_size, bias=False)
181
+
182
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
183
+ """
184
+ x: [batch, seq_len, d_model]
185
+ 返回: [batch, seq_len, vocab_size] logits
186
+ """
187
+ return self.projection(x)
188
+
189
+
190
+ class DualOutputProjection(nn.Module):
191
+ """双语输出投影层"""
192
+
193
+ def __init__(self, d_model: int, vocab_size_zh: int, vocab_size_en: int):
194
+ super().__init__()
195
+
196
+ self.zh_projection = OutputProjection(d_model, vocab_size_zh)
197
+ self.en_projection = OutputProjection(d_model, vocab_size_en)
198
+
199
+ def forward(self, x: torch.Tensor, lang: str) -> torch.Tensor:
200
+ if lang == 'zh':
201
+ return self.zh_projection(x)
202
+ else:
203
+ return self.en_projection(x)
export_onnx.py ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 导出模型为JSON格式,用于WebGPU推理
3
+ """
4
+
5
+ import os
6
+ import json
7
+ import argparse
8
+ import torch
9
+ import torch.nn as nn
10
+ import numpy as np
11
+ from typing import Dict, Any, List
12
+
13
+ from config import Config
14
+ from tokenizer import Tokenizer
15
+ from embedding import DualLanguageEmbedding, DualOutputProjection
16
+ from model import create_model
17
+ from diffusion import get_diffusion
18
+
19
+
20
+ def tensor_to_list(t) -> list:
21
+ """将tensor转换为list"""
22
+ if isinstance(t, torch.Tensor):
23
+ return t.detach().cpu().numpy().tolist()
24
+ return t
25
+
26
+
27
+ def export_model(config: Config, checkpoint_path: str, output_dir: str):
28
+ """导出模型为JSON格式"""
29
+
30
+ print(f"加载检查点: {checkpoint_path}")
31
+ state = torch.load(checkpoint_path, map_location="cpu", weights_only=False)
32
+
33
+ # 加载分词器
34
+ cache_dir = os.path.join(config.project_dir, config.data.cache_dir)
35
+ zh_tokenizer = Tokenizer.load(os.path.join(cache_dir, "tokenizer_zh.json"))
36
+ en_tokenizer = Tokenizer.load(os.path.join(cache_dir, "tokenizer_en.json"))
37
+
38
+ # 创建模型
39
+ embedding = DualLanguageEmbedding(
40
+ vocab_size_zh=zh_tokenizer.vocab_size_actual,
41
+ vocab_size_en=en_tokenizer.vocab_size_actual,
42
+ d_model=config.model.d_model,
43
+ max_len=config.model.max_len,
44
+ dropout=0.0,
45
+ )
46
+
47
+ output_proj = DualOutputProjection(
48
+ d_model=config.model.d_model,
49
+ vocab_size_zh=zh_tokenizer.vocab_size_actual,
50
+ vocab_size_en=en_tokenizer.vocab_size_actual,
51
+ )
52
+
53
+ model = create_model(config)
54
+
55
+ # 加载权重
56
+ embedding.load_state_dict(state['embedding'])
57
+ output_proj.load_state_dict(state['output_proj'])
58
+ model.load_state_dict(state['model'])
59
+
60
+ embedding.eval()
61
+ output_proj.eval()
62
+ model.eval()
63
+
64
+ # 创建输出目录
65
+ os.makedirs(output_dir, exist_ok=True)
66
+
67
+ # 导出扩散参数
68
+ diffusion, ddim_sampler = get_diffusion(config)
69
+ scheduler = diffusion.scheduler
70
+
71
+ diffusion_params = {
72
+ 'timesteps': config.diffusion.timesteps,
73
+ 'ddim_steps': config.diffusion.ddim_steps,
74
+ 'betas': tensor_to_list(scheduler.betas),
75
+ 'alphas': tensor_to_list(scheduler.alphas),
76
+ 'alphas_cumprod': tensor_to_list(scheduler.alphas_cumprod),
77
+ 'sqrt_alphas_cumprod': tensor_to_list(scheduler.sqrt_alphas_cumprod),
78
+ 'sqrt_one_minus_alphas_cumprod': tensor_to_list(scheduler.sqrt_one_minus_alphas_cumprod),
79
+ 'ddim_timesteps': ddim_sampler.ddim_timesteps,
80
+ }
81
+
82
+ with open(os.path.join(output_dir, 'diffusion_params.json'), 'w') as f:
83
+ json.dump(diffusion_params, f)
84
+
85
+ print("导出扩散参数完成")
86
+
87
+ # 导出分词器
88
+ zh_vocab = {
89
+ 'token_to_id': zh_tokenizer.token_to_id,
90
+ 'id_to_token': {str(k): v for k, v in zh_tokenizer.id_to_token.items()},
91
+ 'merges': zh_tokenizer.merges,
92
+ 'special_tokens': zh_tokenizer.special_tokens,
93
+ 'lang': 'zh',
94
+ }
95
+
96
+ en_vocab = {
97
+ 'token_to_id': en_tokenizer.token_to_id,
98
+ 'id_to_token': {str(k): v for k, v in en_tokenizer.id_to_token.items()},
99
+ 'merges': en_tokenizer.merges,
100
+ 'special_tokens': en_tokenizer.special_tokens,
101
+ 'lang': 'en',
102
+ }
103
+
104
+ with open(os.path.join(output_dir, 'tokenizer_zh.json'), 'w', encoding='utf-8') as f:
105
+ json.dump(zh_vocab, f, ensure_ascii=False)
106
+
107
+ with open(os.path.join(output_dir, 'tokenizer_en.json'), 'w', encoding='utf-8') as f:
108
+ json.dump(en_vocab, f, ensure_ascii=False)
109
+
110
+ print("导出分词器完成")
111
+
112
+ # 导出嵌入层权重为JSON
113
+ def extract_embedding_weights(lang_emb):
114
+ """提取嵌入层权重"""
115
+ return {
116
+ 'token_embedding': tensor_to_list(lang_emb.token_embedding.weight),
117
+ 'position_encoding': tensor_to_list(lang_emb.position_encoding.pe),
118
+ 'length_embedding': tensor_to_list(lang_emb.length_embedding.weight),
119
+ 'scale': lang_emb.scale,
120
+ }
121
+
122
+ embedding_weights = {
123
+ 'zh': extract_embedding_weights(embedding.zh_embedding),
124
+ 'en': extract_embedding_weights(embedding.en_embedding),
125
+ }
126
+
127
+ with open(os.path.join(output_dir, 'embedding.json'), 'w') as f:
128
+ json.dump(embedding_weights, f)
129
+
130
+ print("导出嵌入层完成")
131
+
132
+ # 导出输出投影权重
133
+ output_weights = {
134
+ 'zh_projection': tensor_to_list(output_proj.zh_projection.projection.weight),
135
+ 'en_projection': tensor_to_list(output_proj.en_projection.projection.weight),
136
+ }
137
+
138
+ with open(os.path.join(output_dir, 'output_proj.json'), 'w') as f:
139
+ json.dump(output_weights, f)
140
+
141
+ print("导出输出投影完成")
142
+
143
+ # 导出噪声预测模型权重
144
+ def extract_model_weights(model):
145
+ """提取模型权重"""
146
+ weights = {}
147
+
148
+ # 时间嵌入
149
+ weights['time_mlp'] = {
150
+ '0.weight': tensor_to_list(model.time_mlp[0].weight),
151
+ '0.bias': tensor_to_list(model.time_mlp[0].bias),
152
+ '2.weight': tensor_to_list(model.time_mlp[2].weight),
153
+ '2.bias': tensor_to_list(model.time_mlp[2].bias),
154
+ }
155
+
156
+ # 语言特定投影
157
+ weights['zh_input_proj'] = {
158
+ 'weight': tensor_to_list(model.zh_input_proj.weight),
159
+ 'bias': tensor_to_list(model.zh_input_proj.bias),
160
+ }
161
+ weights['en_input_proj'] = {
162
+ 'weight': tensor_to_list(model.en_input_proj.weight),
163
+ 'bias': tensor_to_list(model.en_input_proj.bias),
164
+ }
165
+ weights['zh_output_proj'] = {
166
+ 'weight': tensor_to_list(model.zh_output_proj.weight),
167
+ 'bias': tensor_to_list(model.zh_output_proj.bias),
168
+ }
169
+ weights['en_output_proj'] = {
170
+ 'weight': tensor_to_list(model.en_output_proj.weight),
171
+ 'bias': tensor_to_list(model.en_output_proj.bias),
172
+ }
173
+
174
+ # 输出归一化
175
+ weights['output_norm'] = {
176
+ 'weight': tensor_to_list(model.output_norm.weight),
177
+ 'bias': tensor_to_list(model.output_norm.bias),
178
+ }
179
+
180
+ # Transformer层
181
+ weights['layers'] = []
182
+ for i, layer in enumerate(model.layers):
183
+ layer_weights = {
184
+ # 自注意力
185
+ 'w_q.weight': tensor_to_list(layer.attn.w_q.weight),
186
+ 'w_q.bias': tensor_to_list(layer.attn.w_q.bias),
187
+ 'w_k.weight': tensor_to_list(layer.attn.w_k.weight),
188
+ 'w_k.bias': tensor_to_list(layer.attn.w_k.bias),
189
+ 'w_v.weight': tensor_to_list(layer.attn.w_v.weight),
190
+ 'w_v.bias': tensor_to_list(layer.attn.w_v.bias),
191
+ 'w_o.weight': tensor_to_list(layer.attn.w_o.weight),
192
+ 'w_o.bias': tensor_to_list(layer.attn.w_o.bias),
193
+ # 前馈网络
194
+ 'w1.weight': tensor_to_list(layer.ff.w1.weight),
195
+ 'w1.bias': tensor_to_list(layer.ff.w1.bias),
196
+ 'w2.weight': tensor_to_list(layer.ff.w2.weight),
197
+ 'w2.bias': tensor_to_list(layer.ff.w2.bias),
198
+ # LayerNorm
199
+ 'norm1.weight': tensor_to_list(layer.norm1.weight),
200
+ 'norm1.bias': tensor_to_list(layer.norm1.bias),
201
+ 'norm2.weight': tensor_to_list(layer.norm2.weight),
202
+ 'norm2.bias': tensor_to_list(layer.norm2.bias),
203
+ }
204
+ weights['layers'].append(layer_weights)
205
+
206
+ return weights
207
+
208
+ model_weights = extract_model_weights(model)
209
+
210
+ with open(os.path.join(output_dir, 'model.json'), 'w') as f:
211
+ json.dump(model_weights, f)
212
+
213
+ print("导出模型权重完成")
214
+
215
+ # 导出配置
216
+ config_dict = {
217
+ 'd_model': config.model.d_model,
218
+ 'n_heads': config.model.n_heads,
219
+ 'n_layers': config.model.n_layers,
220
+ 'd_ff': config.model.d_ff,
221
+ 'max_len': config.model.max_len,
222
+ 'vocab_size_zh': zh_tokenizer.vocab_size_actual,
223
+ 'vocab_size_en': en_tokenizer.vocab_size_actual,
224
+ }
225
+
226
+ with open(os.path.join(output_dir, 'config.json'), 'w') as f:
227
+ json.dump(config_dict, f)
228
+
229
+ print(f"\n导出完成! 文件保存在: {output_dir}")
230
+ print("文件列表:")
231
+ for f in os.listdir(output_dir):
232
+ path = os.path.join(output_dir, f)
233
+ size = os.path.getsize(path) / 1024 / 1024
234
+ print(f" {f}: {size:.2f} MB")
235
+
236
+
237
+ if __name__ == "__main__":
238
+ parser = argparse.ArgumentParser(description="导出模型为JSON格式")
239
+ parser.add_argument("--checkpoint", type=str, default="checkpoints/best.pt", help="检查点路径")
240
+ parser.add_argument("--output", type=str, default="web/models", help="输出目录")
241
+
242
+ args = parser.parse_args()
243
+
244
+ config = Config()
245
+ export_model(config, args.checkpoint, args.output)
hfspace/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Diffutslator
3
+ emoji: 🌐
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # Diffutslator 扩散翻译器
14
+
15
+ 基于扩散模型的机器翻译系统。模型在翻译过程中可视化语言嵌入空间的渐变。
16
+
17
+ ## 功能
18
+
19
+ - 中英文互译
20
+ - 可调节DDIM推理步数
21
+ - 可视化扩散过程
22
+
23
+ ## 使用方法
24
+
25
+ 1. 输入要翻译的文本
26
+ 2. 选择翻译方向(或自动检测)
27
+ 3. 调整DDIM步数(越多质量越高,速度越慢)
28
+ 4. 点击翻译
hfspace/__pycache__/app.cpython-312.pyc ADDED
Binary file (48.1 kB). View file
 
hfspace/app.py ADDED
@@ -0,0 +1,889 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Diffutslator Hugging Face Space 应用
3
+ 基于扩散模型的机器翻译演示
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import torch
9
+ import torch.nn as nn
10
+ import torch.nn.functional as F
11
+ import math
12
+ import gradio as gr
13
+ from typing import Optional, Tuple, List
14
+ from dataclasses import dataclass, field
15
+ import json
16
+
17
+
18
+ # ==================== 配置(与config.py保持一致,用于加载检查点)====================
19
+ @dataclass
20
+ class ModelConfig:
21
+ d_model: int = 256
22
+ n_heads: int = 4
23
+ n_layers: int = 4
24
+ d_ff: int = 512
25
+ max_len: int = 128
26
+ dropout: float = 0.1
27
+ vocab_size_zh: int = 8000
28
+ vocab_size_en: int = 8000
29
+ pad_token: str = "<pad>"
30
+ sos_token: str = "<sos>"
31
+ eos_token: str = "<eos>"
32
+ unk_token: str = "<unk>"
33
+ mask_token: str = "<mask>"
34
+
35
+
36
+ @dataclass
37
+ class DiffusionConfig:
38
+ timesteps: int = 1000
39
+ ddim_steps: int = 50
40
+ beta_start: float = 0.0001
41
+ beta_end: float = 0.02
42
+ length_noise_scale: float = 0.3
43
+
44
+
45
+ @dataclass
46
+ class TrainingConfig:
47
+ batch_size: int = 64
48
+ gradient_accumulation: int = 1
49
+ learning_rate: float = 1e-4
50
+ weight_decay: float = 0.01
51
+ warmup_steps: int = 500
52
+ epochs: int = 10
53
+ save_every: int = 1
54
+ eval_every: int = 100
55
+ quick_mode: bool = False
56
+ quick_samples: int = 1000
57
+ checkpoint_dir: str = "checkpoints"
58
+ resume: Optional[str] = None
59
+
60
+
61
+ @dataclass
62
+ class DataConfig:
63
+ tatoeba_path: str = ""
64
+ cveto_zh_path: str = ""
65
+ cveto_en_path: str = ""
66
+ max_samples: Optional[int] = None
67
+ min_len: int = 2
68
+ max_len: int = 128
69
+ use_cache: bool = True
70
+ cache_dir: str = ".cache"
71
+
72
+
73
+ @dataclass
74
+ class Config:
75
+ model: ModelConfig = field(default_factory=ModelConfig)
76
+ diffusion: DiffusionConfig = field(default_factory=DiffusionConfig)
77
+ training: TrainingConfig = field(default_factory=TrainingConfig)
78
+ data: DataConfig = field(default_factory=DataConfig)
79
+ project_dir: str = ""
80
+
81
+
82
+ # 创建一个假的config模块,用于加载检查点时反序列化
83
+ class _FakeConfigModule:
84
+ Config = Config
85
+ ModelConfig = ModelConfig
86
+ DiffusionConfig = DiffusionConfig
87
+ TrainingConfig = TrainingConfig
88
+ DataConfig = DataConfig
89
+
90
+
91
+ # 将假模块注入sys.modules
92
+ sys.modules['config'] = _FakeConfigModule()
93
+
94
+
95
+ # ==================== 分词器 ====================
96
+ import re
97
+
98
+ class Tokenizer:
99
+ """BPE分词器(与tokenizer.py兼容)"""
100
+
101
+ def __init__(self, vocab_size: int = 8000, lang: str = "zh"):
102
+ self.vocab_size = vocab_size
103
+ self.lang = lang
104
+
105
+ # 特殊token
106
+ self.pad_token = "<pad>"
107
+ self.sos_token = "<sos>"
108
+ self.eos_token = "<eos>"
109
+ self.unk_token = "<unk>"
110
+ self.mask_token = "<mask>"
111
+
112
+ self.special_tokens = [self.pad_token, self.sos_token, self.eos_token, self.unk_token, self.mask_token]
113
+
114
+ # 词表
115
+ self.token_to_id: dict = {}
116
+ self.id_to_token: dict = {}
117
+
118
+ # BPE合并规则
119
+ self.merges: list = []
120
+ self.bpe_ranks: dict = {}
121
+
122
+ @property
123
+ def vocab_size_actual(self) -> int:
124
+ return len(self.token_to_id)
125
+
126
+ @property
127
+ def pad_id(self) -> int:
128
+ return self.token_to_id[self.pad_token]
129
+
130
+ @property
131
+ def sos_id(self) -> int:
132
+ return self.token_to_id[self.sos_token]
133
+
134
+ @property
135
+ def eos_id(self) -> int:
136
+ return self.token_to_id[self.eos_token]
137
+
138
+ @property
139
+ def unk_id(self) -> int:
140
+ return self.token_to_id[self.unk_token]
141
+
142
+ def _is_chinese(self, char: str) -> bool:
143
+ return '\u4e00' <= char <= '\u9fff'
144
+
145
+ def _pre_tokenize(self, text: str) -> List[str]:
146
+ """预分词"""
147
+ if self.lang == "zh":
148
+ tokens = []
149
+ current = ""
150
+ for char in text:
151
+ if self._is_chinese(char):
152
+ if current:
153
+ tokens.append(current)
154
+ current = ""
155
+ tokens.append(char)
156
+ elif char.isalnum():
157
+ current += char.lower()
158
+ else:
159
+ if current:
160
+ tokens.append(current)
161
+ current = ""
162
+ if char.strip():
163
+ tokens.append(char)
164
+ if current:
165
+ tokens.append(current)
166
+ return tokens
167
+ else:
168
+ text = text.lower()
169
+ tokens = re.findall(r"\w+|[^\w\s]", text)
170
+ return tokens
171
+
172
+ def _get_pairs(self, word: tuple) -> set:
173
+ """获取词中的所有相邻字符对"""
174
+ pairs = set()
175
+ prev = word[0]
176
+ for char in word[1:]:
177
+ pairs.add((prev, char))
178
+ prev = char
179
+ return pairs
180
+
181
+ def _apply_bpe(self, token: str) -> List[str]:
182
+ """对单个token应用BPE"""
183
+ if not token:
184
+ return []
185
+
186
+ word = tuple(token) + ('</w>',)
187
+
188
+ while True:
189
+ pairs = self._get_pairs(word)
190
+ if not pairs:
191
+ break
192
+
193
+ # 找到rank最高的pair
194
+ min_pair = None
195
+ min_rank = float('inf')
196
+ for pair in pairs:
197
+ rank = self.bpe_ranks.get(pair, float('inf'))
198
+ if rank < min_rank:
199
+ min_rank = rank
200
+ min_pair = pair
201
+
202
+ if min_pair is None or min_rank == float('inf'):
203
+ break
204
+
205
+ # 合并
206
+ new_word = []
207
+ i = 0
208
+ while i < len(word):
209
+ if i < len(word) - 1 and word[i] == min_pair[0] and word[i + 1] == min_pair[1]:
210
+ new_word.append(min_pair[0] + min_pair[1])
211
+ i += 2
212
+ else:
213
+ new_word.append(word[i])
214
+ i += 1
215
+ word = tuple(new_word)
216
+
217
+ return [t for t in word if t != '</w>']
218
+
219
+ def encode(self, text: str, add_sos: bool = True, add_eos: bool = True) -> List[int]:
220
+ """编码文本为token id序列"""
221
+ tokens = self._pre_tokenize(text)
222
+
223
+ ids = []
224
+ if add_sos:
225
+ ids.append(self.sos_id)
226
+
227
+ for token in tokens:
228
+ bpe_tokens = self._apply_bpe(token)
229
+ for t in bpe_tokens:
230
+ ids.append(self.token_to_id.get(t, self.unk_id))
231
+
232
+ if add_eos:
233
+ ids.append(self.eos_id)
234
+
235
+ return ids
236
+
237
+ def decode(self, ids: List[int], skip_special: bool = True) -> str:
238
+ """解码token id序列为文本"""
239
+ tokens = []
240
+ for id in ids:
241
+ token = self.id_to_token.get(id, self.unk_token)
242
+ if skip_special and token in self.special_tokens:
243
+ continue
244
+ token = token.replace('</w>', '')
245
+ if token:
246
+ tokens.append(token)
247
+
248
+ if self.lang == "en":
249
+ text = ' '.join(tokens)
250
+ text = re.sub(r'\s+([.,!?;:\'\"])', r'\1', text)
251
+ text = re.sub(r'([.,!?;:])([a-zA-Z])', r'\1 \2', text)
252
+ text = re.sub(r'\s+', ' ', text).strip()
253
+ else:
254
+ text = ''.join(tokens)
255
+
256
+ return text
257
+
258
+ @classmethod
259
+ def load(cls, path: str) -> "Tokenizer":
260
+ """加载分词器"""
261
+ with open(path, "r", encoding="utf-8") as f:
262
+ data = json.load(f)
263
+
264
+ tokenizer = cls(vocab_size=data["vocab_size"], lang=data["lang"])
265
+ tokenizer.token_to_id = data["token_to_id"]
266
+ tokenizer.id_to_token = {int(k): v for k, v in data["id_to_token"].items()}
267
+ tokenizer.merges = [tuple(m) for m in data["merges"]]
268
+ tokenizer.bpe_ranks = {pair: i for i, pair in enumerate(tokenizer.merges)}
269
+ tokenizer.special_tokens = data["special_tokens"]
270
+
271
+ return tokenizer
272
+
273
+
274
+ # ==================== 模型组件 ====================
275
+ class PositionalEncoding(nn.Module):
276
+ """正弦位置编码"""
277
+
278
+ def __init__(self, d_model: int, max_len: int = 128, dropout: float = 0.1):
279
+ super().__init__()
280
+ self.dropout = nn.Dropout(p=dropout)
281
+
282
+ pe = torch.zeros(max_len, d_model)
283
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
284
+ div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
285
+ pe[:, 0::2] = torch.sin(position * div_term)
286
+ pe[:, 1::2] = torch.cos(position * div_term)
287
+ pe = pe.unsqueeze(0)
288
+ self.register_buffer("pe", pe)
289
+
290
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
291
+ x = x + self.pe[:, :x.size(1), :]
292
+ return self.dropout(x)
293
+
294
+
295
+ class SinusoidalTimeEmbedding(nn.Module):
296
+ """时间步的正弦嵌入(用于扩散)"""
297
+
298
+ def __init__(self, d_model: int):
299
+ super().__init__()
300
+ self.d_model = d_model
301
+
302
+ def forward(self, t: torch.Tensor) -> torch.Tensor:
303
+ t = t.float().unsqueeze(-1)
304
+ half_dim = self.d_model // 2
305
+ emb = math.log(10000) / (half_dim - 1)
306
+ emb = torch.exp(torch.arange(half_dim, device=t.device) * -emb)
307
+ emb = t * emb.unsqueeze(0)
308
+ emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=-1)
309
+ return emb
310
+
311
+
312
+ class LanguageEmbedding(nn.Module):
313
+ """语言特定的嵌入层"""
314
+
315
+ def __init__(self, vocab_size: int, d_model: int, max_len: int = 128, dropout: float = 0.1):
316
+ super().__init__()
317
+ self.d_model = d_model
318
+ self.token_embedding = nn.Embedding(vocab_size, d_model)
319
+ self.position_encoding = PositionalEncoding(d_model, max_len, dropout)
320
+ self.length_embedding = nn.Embedding(max_len + 1, d_model)
321
+ self.scale = math.sqrt(d_model)
322
+
323
+ def forward(self, token_ids: torch.Tensor, lengths: Optional[torch.Tensor] = None) -> torch.Tensor:
324
+ x = self.token_embedding(token_ids) * self.scale
325
+ x = self.position_encoding(x)
326
+ if lengths is not None:
327
+ len_emb = self.length_embedding(lengths)
328
+ x = x + len_emb.unsqueeze(1)
329
+ return x
330
+
331
+
332
+ class DualLanguageEmbedding(nn.Module):
333
+ """双语嵌入层"""
334
+
335
+ def __init__(self, vocab_size_zh: int, vocab_size_en: int, d_model: int, max_len: int = 128, dropout: float = 0.1):
336
+ super().__init__()
337
+ self.d_model = d_model
338
+ self.zh_embedding = LanguageEmbedding(vocab_size_zh, d_model, max_len, dropout)
339
+ self.en_embedding = LanguageEmbedding(vocab_size_en, d_model, max_len, dropout)
340
+
341
+ def forward(self, token_ids: torch.Tensor, lang: str, lengths: Optional[torch.Tensor] = None) -> torch.Tensor:
342
+ if lang == 'zh':
343
+ return self.zh_embedding(token_ids, lengths)
344
+ else:
345
+ return self.en_embedding(token_ids, lengths)
346
+
347
+
348
+ class OutputProjection(nn.Module):
349
+ """输出投影层"""
350
+
351
+ def __init__(self, d_model: int, vocab_size: int):
352
+ super().__init__()
353
+ self.projection = nn.Linear(d_model, vocab_size, bias=False)
354
+
355
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
356
+ return self.projection(x)
357
+
358
+
359
+ class DualOutputProjection(nn.Module):
360
+ """双语输出投影层"""
361
+
362
+ def __init__(self, d_model: int, vocab_size_zh: int, vocab_size_en: int):
363
+ super().__init__()
364
+ self.zh_projection = OutputProjection(d_model, vocab_size_zh)
365
+ self.en_projection = OutputProjection(d_model, vocab_size_en)
366
+
367
+ def forward(self, x: torch.Tensor, lang: str) -> torch.Tensor:
368
+ if lang == 'zh':
369
+ return self.zh_projection(x)
370
+ else:
371
+ return self.en_projection(x)
372
+
373
+
374
+ class MultiHeadAttention(nn.Module):
375
+ """多头自注意力"""
376
+
377
+ def __init__(self, d_model: int, n_heads: int, dropout: float = 0.1):
378
+ super().__init__()
379
+ assert d_model % n_heads == 0
380
+ self.d_model = d_model
381
+ self.n_heads = n_heads
382
+ self.d_k = d_model // n_heads
383
+ self.w_q = nn.Linear(d_model, d_model)
384
+ self.w_k = nn.Linear(d_model, d_model)
385
+ self.w_v = nn.Linear(d_model, d_model)
386
+ self.w_o = nn.Linear(d_model, d_model)
387
+ self.dropout = nn.Dropout(dropout)
388
+
389
+ def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
390
+ batch_size = q.size(0)
391
+ q = self.w_q(q).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
392
+ k = self.w_k(k).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
393
+ v = self.w_v(v).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
394
+ scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(self.d_k)
395
+ if mask is not None:
396
+ scores = scores.masked_fill(mask == 0, float('-inf'))
397
+ attn = F.softmax(scores, dim=-1)
398
+ attn = self.dropout(attn)
399
+ out = torch.matmul(attn, v)
400
+ out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
401
+ return self.w_o(out)
402
+
403
+
404
+ class FeedForward(nn.Module):
405
+ """前馈网络"""
406
+
407
+ def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1):
408
+ super().__init__()
409
+ self.w1 = nn.Linear(d_model, d_ff)
410
+ self.w2 = nn.Linear(d_ff, d_model)
411
+ self.dropout = nn.Dropout(dropout)
412
+
413
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
414
+ return self.dropout(self.w2(F.gelu(self.w1(x))))
415
+
416
+
417
+ class TransformerBlock(nn.Module):
418
+ """Transformer块"""
419
+
420
+ def __init__(self, d_model: int, n_heads: int, d_ff: int, dropout: float = 0.1):
421
+ super().__init__()
422
+ self.attn = MultiHeadAttention(d_model, n_heads, dropout)
423
+ self.ff = FeedForward(d_model, d_ff, dropout)
424
+ self.norm1 = nn.LayerNorm(d_model)
425
+ self.norm2 = nn.LayerNorm(d_model)
426
+ self.dropout = nn.Dropout(dropout)
427
+
428
+ def forward(self, x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
429
+ x = x + self.dropout(self.attn(self.norm1(x), self.norm1(x), self.norm1(x), mask))
430
+ x = x + self.dropout(self.ff(self.norm2(x)))
431
+ return x
432
+
433
+
434
+ class DualNoisePredictor(nn.Module):
435
+ """双语言噪声预测器"""
436
+
437
+ def __init__(self, d_model: int = 256, n_heads: int = 4, n_layers: int = 4, d_ff: int = 512, max_len: int = 128, dropout: float = 0.1):
438
+ super().__init__()
439
+ self.d_model = d_model
440
+
441
+ # 时间步嵌入(共享)
442
+ self.time_embedding = SinusoidalTimeEmbedding(d_model)
443
+ self.time_mlp = nn.Sequential(
444
+ nn.Linear(d_model, d_model * 4),
445
+ nn.GELU(),
446
+ nn.Linear(d_model * 4, d_model),
447
+ )
448
+
449
+ # 语言特定的输入投影
450
+ self.zh_input_proj = nn.Linear(d_model, d_model)
451
+ self.en_input_proj = nn.Linear(d_model, d_model)
452
+
453
+ # 共享Transformer层
454
+ self.layers = nn.ModuleList([
455
+ TransformerBlock(d_model, n_heads, d_ff, dropout)
456
+ for _ in range(n_layers)
457
+ ])
458
+
459
+ # 语言特定的输出投影
460
+ self.zh_output_proj = nn.Linear(d_model, d_model)
461
+ self.en_output_proj = nn.Linear(d_model, d_model)
462
+
463
+ self.output_norm = nn.LayerNorm(d_model)
464
+
465
+ def forward(self, x_t: torch.Tensor, t: torch.Tensor, lang: str = "zh", mask: Optional[torch.Tensor] = None) -> torch.Tensor:
466
+ # 时间步嵌入
467
+ t_emb = self.time_embedding(t)
468
+ t_emb = self.time_mlp(t_emb)
469
+
470
+ # 语言特定输入投影
471
+ if lang == "zh":
472
+ x = self.zh_input_proj(x_t)
473
+ else:
474
+ x = self.en_input_proj(x_t)
475
+
476
+ # 添加时间信息
477
+ x = x + t_emb.unsqueeze(1)
478
+
479
+ # 共享Transformer
480
+ for layer in self.layers:
481
+ x = layer(x, mask)
482
+
483
+ # 输出归一化
484
+ x = self.output_norm(x)
485
+
486
+ # 语言特定输出投影
487
+ if lang == "zh":
488
+ noise_pred = self.zh_output_proj(x)
489
+ else:
490
+ noise_pred = self.en_output_proj(x)
491
+
492
+ return noise_pred
493
+
494
+
495
+ class LanguageSwitcher(nn.Module):
496
+ """语言切换分类器"""
497
+
498
+ def __init__(self, d_model: int = 256, hidden_dim: int = 128, dropout: float = 0.1):
499
+ super().__init__()
500
+ self.global_pool = nn.AdaptiveAvgPool1d(1)
501
+ self.classifier = nn.Sequential(
502
+ nn.Linear(d_model, hidden_dim),
503
+ nn.GELU(),
504
+ nn.Dropout(dropout),
505
+ nn.Linear(hidden_dim, hidden_dim),
506
+ nn.GELU(),
507
+ nn.Dropout(dropout),
508
+ nn.Linear(hidden_dim, 2),
509
+ )
510
+
511
+ def forward(self, x_t: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
512
+ if mask is not None:
513
+ x_t = x_t * mask.unsqueeze(-1)
514
+ x = x_t.transpose(1, 2)
515
+ x = self.global_pool(x).squeeze(-1)
516
+ logits = self.classifier(x)
517
+ return logits
518
+
519
+ def predict(self, x_t: torch.Tensor, mask: Optional[torch.Tensor] = None) -> Tuple[str, float]:
520
+ self.eval()
521
+ with torch.no_grad():
522
+ logits = self.forward(x_t, mask)
523
+ probs = F.softmax(logits, dim=-1)
524
+ zh_prob = probs[0, 0].item()
525
+ en_prob = probs[0, 1].item()
526
+ if zh_prob > en_prob:
527
+ return "zh", zh_prob
528
+ else:
529
+ return "en", en_prob
530
+
531
+
532
+ # ==================== 扩散过程 ====================
533
+ class Diffusion:
534
+ def __init__(self, config: DiffusionConfig):
535
+ self.config = config
536
+ self.timesteps = config.timesteps
537
+
538
+ # Beta schedule (linear)
539
+ betas = torch.linspace(config.beta_start, config.beta_end, self.timesteps)
540
+ alphas = 1.0 - betas
541
+ alphas_cumprod = torch.cumprod(alphas, dim=0)
542
+
543
+ self.register_buffer("betas", betas)
544
+ self.register_buffer("alphas", alphas)
545
+ self.register_buffer("alphas_cumprod", alphas_cumprod)
546
+ self.register_buffer("sqrt_alphas_cumprod", torch.sqrt(alphas_cumprod))
547
+ self.register_buffer("sqrt_one_minus_alphas_cumprod", torch.sqrt(1 - alphas_cumprod))
548
+
549
+ def register_buffer(self, name: str, tensor: torch.Tensor):
550
+ setattr(self, name, tensor)
551
+
552
+ def q_sample(self, x_0: torch.Tensor, t: torch.Tensor, noise: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor]:
553
+ if noise is None:
554
+ noise = torch.randn_like(x_0)
555
+ sqrt_alpha = self.sqrt_alphas_cumprod[t]
556
+ sqrt_one_minus_alpha = self.sqrt_one_minus_alphas_cumprod[t]
557
+ x_t = sqrt_alpha.view(-1, 1, 1) * x_0 + sqrt_one_minus_alpha.view(-1, 1, 1) * noise
558
+ return x_t, noise
559
+
560
+ def p_sample(self, x_t: torch.Tensor, t: torch.Tensor, predicted_noise: torch.Tensor) -> torch.Tensor:
561
+ beta = self.betas[t]
562
+ sqrt_one_minus_alpha = self.sqrt_one_minus_alphas_cumprod[t]
563
+ sqrt_recip_alpha = 1.0 / torch.sqrt(self.alphas[t])
564
+
565
+ # 去噪
566
+ x_0_pred = sqrt_recip_alpha.view(-1, 1, 1) * (x_t - sqrt_one_minus_alpha.view(-1, 1, 1) * predicted_noise)
567
+
568
+ # 添加噪声(除了最后一步)
569
+ if t[0] > 0:
570
+ noise = torch.randn_like(x_t)
571
+ x_prev = x_0_pred + torch.sqrt(beta).view(-1, 1, 1) * noise
572
+ else:
573
+ x_prev = x_0_pred
574
+
575
+ return x_prev
576
+
577
+
578
+ class DDIMSampler:
579
+ def __init__(self, diffusion: Diffusion, ddim_steps: int = 50):
580
+ self.diffusion = diffusion
581
+ self.ddim_steps = ddim_steps
582
+
583
+ # 选择均匀分布的时间步,从高到低(从噪声到干净)
584
+ c = self.diffusion.timesteps // ddim_steps
585
+ ddim_timesteps = [i * c for i in range(ddim_steps)]
586
+ self.ddim_timesteps = torch.tensor(list(reversed(ddim_timesteps)))
587
+
588
+ def ddim_step(self, x_t: torch.Tensor, t: int, t_prev: int,
589
+ predicted_noise: torch.Tensor, eta: float = 0.0) -> torch.Tensor:
590
+ """DDIM单步"""
591
+ alpha_t = self.diffusion.alphas_cumprod[t]
592
+ alpha_prev = self.diffusion.alphas_cumprod[t_prev] if t_prev >= 0 else torch.tensor(1.0)
593
+
594
+ # 预测 x_0
595
+ x_0_pred = (x_t - torch.sqrt(1 - alpha_t) * predicted_noise) / torch.sqrt(alpha_t)
596
+
597
+ # 方差
598
+ sigma = eta * torch.sqrt((1 - alpha_prev) / (1 - alpha_t)) * torch.sqrt(1 - alpha_t / alpha_prev)
599
+
600
+ # DDIM更新
601
+ dir_xt = torch.sqrt(1 - alpha_prev - sigma ** 2) * predicted_noise
602
+
603
+ if t_prev >= 0:
604
+ noise = torch.randn_like(x_t)
605
+ x_prev = torch.sqrt(alpha_prev) * x_0_pred + dir_xt + sigma * noise
606
+ else:
607
+ x_prev = x_0_pred
608
+
609
+ return x_prev
610
+
611
+
612
+ # ==================== 翻译器 ====================
613
+ class Translator:
614
+ def __init__(self, model_dir: str = "."):
615
+ self.device = torch.device("cpu")
616
+
617
+ # 配置
618
+ self.model_config = ModelConfig()
619
+ self.diffusion_config = DiffusionConfig()
620
+
621
+ # 加载分词器
622
+ self.zh_tokenizer = Tokenizer.load(os.path.join(model_dir, "tokenizer_zh.json"))
623
+ self.en_tokenizer = Tokenizer.load(os.path.join(model_dir, "tokenizer_en.json"))
624
+
625
+ # 初始化模型
626
+ self.embedding = DualLanguageEmbedding(
627
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
628
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
629
+ d_model=self.model_config.d_model,
630
+ max_len=self.model_config.max_len,
631
+ dropout=0.0,
632
+ )
633
+
634
+ self.output_proj = DualOutputProjection(
635
+ d_model=self.model_config.d_model,
636
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
637
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
638
+ )
639
+
640
+ self.model = DualNoisePredictor(
641
+ d_model=self.model_config.d_model,
642
+ n_heads=self.model_config.n_heads,
643
+ n_layers=self.model_config.n_layers,
644
+ d_ff=self.model_config.d_ff,
645
+ max_len=self.model_config.max_len,
646
+ dropout=0.0,
647
+ )
648
+ self.switcher = LanguageSwitcher(
649
+ d_model=self.model_config.d_model,
650
+ hidden_dim=self.model_config.d_model // 2,
651
+ dropout=0.0,
652
+ )
653
+
654
+ self.diffusion = Diffusion(self.diffusion_config)
655
+
656
+ # 加载权重
657
+ self._load_checkpoint(os.path.join(model_dir, "best.pt"))
658
+
659
+ def _load_checkpoint(self, path: str):
660
+ state = torch.load(path, map_location=self.device, weights_only=False)
661
+ self.embedding.load_state_dict(state['embedding'])
662
+ self.output_proj.load_state_dict(state['output_proj'])
663
+ self.model.load_state_dict(state['model'])
664
+ self.switcher.load_state_dict(state['switcher'])
665
+ print(f"已加载模型: {path}")
666
+
667
+ def _encode(self, text: str, lang: str) -> torch.Tensor:
668
+ if lang == "zh":
669
+ ids = self.zh_tokenizer.encode(text, add_sos=True, add_eos=True)
670
+ else:
671
+ ids = self.en_tokenizer.encode(text, add_sos=True, add_eos=True)
672
+ return torch.tensor(ids, dtype=torch.long).unsqueeze(0)
673
+
674
+ def _decode(self, ids: torch.Tensor, lang: str) -> str:
675
+ ids = ids[0].tolist()
676
+ if lang == "zh":
677
+ return self.zh_tokenizer.decode(ids, skip_special=True)
678
+ else:
679
+ return self.en_tokenizer.decode(ids, skip_special=True)
680
+
681
+ def _embed_to_tokens(self, x: torch.Tensor, lang: str) -> torch.Tensor:
682
+ logits = self.output_proj(x, lang)
683
+ return logits.argmax(dim=-1)
684
+
685
+ @torch.no_grad()
686
+ def translate(
687
+ self,
688
+ text: str,
689
+ source_lang: str,
690
+ ddim_steps: int = 50,
691
+ show_process: bool = False,
692
+ ) -> Tuple[str, List[str]]:
693
+ """翻译文本,返回结果和中间过程"""
694
+ self.model.eval()
695
+ self.embedding.eval()
696
+ self.output_proj.eval()
697
+ self.switcher.eval()
698
+
699
+ target_lang = "en" if source_lang == "zh" else "zh"
700
+
701
+ # 更新DDIM步数
702
+ self.diffusion_config.ddim_steps = ddim_steps
703
+ ddim_sampler = DDIMSampler(self.diffusion, ddim_steps)
704
+
705
+ # 编码源语言
706
+ source_ids = self._encode(text, source_lang)
707
+ source_len = torch.tensor([source_ids.size(1)])
708
+
709
+ # 嵌入源语言
710
+ source_emb = self.embedding(source_ids, source_lang, source_len)
711
+
712
+ # 前向扩散到纯噪声
713
+ batch_size = source_emb.size(0)
714
+ t_full = torch.full((batch_size,), self.diffusion_config.timesteps - 1, dtype=torch.long)
715
+ noise = torch.randn_like(source_emb)
716
+ x_t, _ = self.diffusion.q_sample(source_emb, t_full, noise)
717
+
718
+ # DDIM反向扩散
719
+ timesteps = ddim_sampler.ddim_timesteps
720
+ total_steps = len(timesteps)
721
+ switch_point = total_steps // 2
722
+
723
+ process_steps = []
724
+
725
+ for i, t in enumerate(timesteps[:-1]):
726
+ t_prev = timesteps[i + 1]
727
+
728
+ # 语言切换
729
+ if i < switch_point:
730
+ current_lang = source_lang
731
+ else:
732
+ current_lang = target_lang
733
+
734
+ # 预测噪声
735
+ t_tensor = torch.full((x_t.size(0),), t.item(), dtype=torch.long)
736
+ predicted_noise = self.model(x_t, t_tensor, lang=current_lang)
737
+
738
+ # 记录过程
739
+ if show_process and i % max(1, total_steps // 10) == 0:
740
+ current_ids = self._embed_to_tokens(x_t, current_lang)
741
+ current_text = self._decode(current_ids, current_lang)
742
+ process_steps.append(f"Step {t.item()}: {current_text[:50]}")
743
+
744
+ # DDIM步骤
745
+ x_t = ddim_sampler.ddim_step(x_t, t.item(), t_prev.item(), predicted_noise, eta=0.0)
746
+
747
+ # 最终解码
748
+ final_ids = self._embed_to_tokens(x_t, target_lang)
749
+ result = self._decode(final_ids, target_lang)
750
+
751
+ return result, process_steps
752
+
753
+
754
+ # ==================== Gradio 应用 ====================
755
+ def create_app():
756
+ # 加载模型
757
+ print("正在加载模型...")
758
+ # 使用脚本所在目录作为模型目录
759
+ script_dir = os.path.dirname(os.path.abspath(__file__))
760
+ translator = Translator(model_dir=script_dir)
761
+ print("模型加载完成!")
762
+
763
+ def translate_text(text: str, language: str, ddim_steps: int, show_process: bool):
764
+ if not text.strip():
765
+ return "", []
766
+
767
+ # 自动检测或手动选择
768
+ if language == "自动检测":
769
+ if any('\u4e00' <= c <= '\u9fff' for c in text):
770
+ source_lang = "zh"
771
+ else:
772
+ source_lang = "en"
773
+ else:
774
+ source_lang = "zh" if language == "中文 → 英文" else "en"
775
+
776
+ try:
777
+ result, process = translator.translate(
778
+ text, source_lang, ddim_steps, show_process
779
+ )
780
+ process_text = "\n".join(process) if process else "(过程未显示)"
781
+ return result, process_text
782
+ except Exception as e:
783
+ return f"翻译出错: {str(e)}", ""
784
+
785
+ # 创建界面
786
+ with gr.Blocks(
787
+ title="Diffutslator",
788
+ theme=gr.themes.Soft(),
789
+ css="""
790
+ .output-box { min-height: 100px; }
791
+ .process-box { font-family: monospace; font-size: 12px; }
792
+ """
793
+ ) as app:
794
+ gr.Markdown(
795
+ """
796
+ # Diffutslator 扩散翻译器
797
+
798
+ 基于扩散模型的机器翻译系统,可视化翻译过程中的语言渐变。
799
+ """
800
+ )
801
+
802
+ with gr.Row():
803
+ with gr.Column(scale=2):
804
+ input_text = gr.Textbox(
805
+ label="输入文本",
806
+ placeholder="输入要翻译的中文或英文...",
807
+ lines=5,
808
+ )
809
+
810
+ with gr.Row():
811
+ language = gr.Dropdown(
812
+ choices=["自动检测", "中文 → 英文", "英文 → 中文"],
813
+ value="自动检测",
814
+ label="翻译方向",
815
+ )
816
+
817
+ ddim_steps = gr.Slider(
818
+ minimum=10,
819
+ maximum=100,
820
+ value=50,
821
+ step=5,
822
+ label="DDIM步数",
823
+ info="步数越多质量越高,速度越慢",
824
+ )
825
+
826
+ show_process = gr.Checkbox(
827
+ value=False,
828
+ label="显示扩散过程",
829
+ info="显示翻译中间步骤(会增加推理时间)",
830
+ )
831
+
832
+ translate_btn = gr.Button("翻译", variant="primary", size="lg")
833
+
834
+ with gr.Column(scale=2):
835
+ output_text = gr.Textbox(
836
+ label="翻译结果",
837
+ lines=5,
838
+ interactive=False,
839
+ elem_classes=["output-box"],
840
+ )
841
+
842
+ process_text = gr.Textbox(
843
+ label="扩散过程",
844
+ lines=5,
845
+ interactive=False,
846
+ visible=False,
847
+ elem_classes=["process-box"],
848
+ )
849
+
850
+ # 示例
851
+ gr.Examples(
852
+ examples=[
853
+ ["你好,世界!", "自动检测"],
854
+ ["Hello, how are you today?", "自动检测"],
855
+ ["机器学习正在改变世界。", "中文 → 英文"],
856
+ ["The quick brown fox jumps over the lazy dog.", "英文 → 中文"],
857
+ ],
858
+ inputs=[input_text, language],
859
+ )
860
+
861
+ # 事件处理
862
+ def toggle_process(show):
863
+ return gr.Textbox(visible=show)
864
+
865
+ show_process.change(
866
+ fn=toggle_process,
867
+ inputs=[show_process],
868
+ outputs=[process_text],
869
+ )
870
+
871
+ translate_btn.click(
872
+ fn=translate_text,
873
+ inputs=[input_text, language, ddim_steps, show_process],
874
+ outputs=[output_text, process_text],
875
+ )
876
+
877
+ # 回车提交
878
+ input_text.submit(
879
+ fn=translate_text,
880
+ inputs=[input_text, language, ddim_steps, show_process],
881
+ outputs=[output_text, process_text],
882
+ )
883
+
884
+ return app
885
+
886
+
887
+ if __name__ == "__main__":
888
+ app = create_app()
889
+ app.launch()
hfspace/best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:283c995651aa11ebde09858ad5002cd932c1dd6dd5ede16be733c16cbb5c4c55
3
+ size 47986610
hfspace/requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio>=4.0.0
2
+ torch>=2.0.0
hfspace/tokenizer_en.json ADDED
The diff for this file is too large to render. See raw diff
 
hfspace/tokenizer_zh.json ADDED
@@ -0,0 +1,5631 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 8000,
3
+ "lang": "zh",
4
+ "token_to_id": {
5
+ "<pad>": 0,
6
+ "<sos>": 1,
7
+ "<eos>": 2,
8
+ "<unk>": 3,
9
+ "<mask>": 4,
10
+ "!</w>": 5,
11
+ "\"</w>": 6,
12
+ ",</w>": 7,
13
+ ".</w>": 8,
14
+ "0</w>": 9,
15
+ "10": 10,
16
+ "100</w>": 11,
17
+ "10</w>": 12,
18
+ "18": 13,
19
+ "18</w>": 14,
20
+ "1</w>": 15,
21
+ "20</w>": 16,
22
+ "21</w>": 17,
23
+ "2</w>": 18,
24
+ "3</w>": 19,
25
+ "40</w>": 20,
26
+ "56</w>": 21,
27
+ "5</w>": 22,
28
+ "6</w>": 23,
29
+ "7</w>": 24,
30
+ "?</w>": 25,
31
+ "ali": 26,
32
+ "alice</w>": 27,
33
+ "ancy</w>": 28,
34
+ "ce</w>": 29,
35
+ "cy</w>": 30,
36
+ "e</w>": 31,
37
+ "el": 32,
38
+ "iel": 33,
39
+ "ir": 34,
40
+ "ja": 35,
41
+ "jac": 36,
42
+ "jack</w>": 37,
43
+ "jake</w>": 38,
44
+ "k</w>": 39,
45
+ "ka": 40,
46
+ "kate</w>": 41,
47
+ "ke": 42,
48
+ "ke</w>": 43,
49
+ "ken</w>": 44,
50
+ "li": 45,
51
+ "m</w>": 46,
52
+ "muir": 47,
53
+ "muiriel": 48,
54
+ "muiriel</w>": 49,
55
+ "n</w>": 50,
56
+ "nancy</w>": 51,
57
+ "ncy</w>": 52,
58
+ "om</w>": 53,
59
+ "te</w>": 54,
60
+ "tom</w>": 55,
61
+ "uir": 56,
62
+ "y</w>": 57,
63
+ "“</w>": 58,
64
+ "”</w>": 59,
65
+ "…</w>": 60,
66
+ "、</w>": 61,
67
+ "。</w>": 62,
68
+ "一</w>": 63,
69
+ "七</w>": 64,
70
+ "万</w>": 65,
71
+ "三</w>": 66,
72
+ "上</w>": 67,
73
+ "下</w>": 68,
74
+ "不</w>": 69,
75
+ "丑</w>": 70,
76
+ "世</w>": 71,
77
+ "业</w>": 72,
78
+ "两</w>": 73,
79
+ "严</w>": 74,
80
+ "个</w>": 75,
81
+ "中</w>": 76,
82
+ "丰</w>": 77,
83
+ "为</w>": 78,
84
+ "举</w>": 79,
85
+ "久</w>": 80,
86
+ "么</w>": 81,
87
+ "义</w>": 82,
88
+ "之</w>": 83,
89
+ "乎</w>": 84,
90
+ "乐</w>": 85,
91
+ "乘</w>": 86,
92
+ "九</w>": 87,
93
+ "也</w>": 88,
94
+ "习</w>": 89,
95
+ "书</w>": 90,
96
+ "买</w>": 91,
97
+ "了</w>": 92,
98
+ "予</w>": 93,
99
+ "争</w>": 94,
100
+ "事</w>": 95,
101
+ "于</w>": 96,
102
+ "互</w>": 97,
103
+ "些</w>": 98,
104
+ "交</w>": 99,
105
+ "亲</w>": 100,
106
+ "人</w>": 101,
107
+ "什</w>": 102,
108
+ "仅</w>": 103,
109
+ "今</w>": 104,
110
+ "从</w>": 105,
111
+ "他</w>": 106,
112
+ "付</w>": 107,
113
+ "代</w>": 108,
114
+ "以</w>": 109,
115
+ "仪</w>": 110,
116
+ "们</w>": 111,
117
+ "件</w>": 112,
118
+ "价</w>": 113,
119
+ "任</w>": 114,
120
+ "份</w>": 115,
121
+ "休</w>": 116,
122
+ "众</w>": 117,
123
+ "会</w>": 118,
124
+ "伟</w>": 119,
125
+ "传</w>": 120,
126
+ "伦</w>": 121,
127
+ "似</w>": 122,
128
+ "但</w>": 123,
129
+ "位</w>": 124,
130
+ "低</w>": 125,
131
+ "住</w>": 126,
132
+ "体</w>": 127,
133
+ "何</w>": 128,
134
+ "作</w>": 129,
135
+ "你</w>": 130,
136
+ "使</w>": 131,
137
+ "來</w>": 132,
138
+ "例</w>": 133,
139
+ "保</w>": 134,
140
+ "信</w>": 135,
141
+ "俱</w>": 136,
142
+ "個</w>": 137,
143
+ "們</w>": 138,
144
+ "候</w>": 139,
145
+ "借</w>": 140,
146
+ "倦</w>": 141,
147
+ "债</w>": 142,
148
+ "值</w>": 143,
149
+ "假</w>": 144,
150
+ "偏</w>": 145,
151
+ "做</w>": 146,
152
+ "停</w>": 147,
153
+ "偶</w>": 148,
154
+ "偷</w>": 149,
155
+ "像</w>": 150,
156
+ "僵</w>": 151,
157
+ "儿</w>": 152,
158
+ "元</w>": 153,
159
+ "先</w>": 154,
160
+ "光</w>": 155,
161
+ "克</w>": 156,
162
+ "免</w>": 157,
163
+ "兔</w>": 158,
164
+ "入</w>": 159,
165
+ "全</w>": 160,
166
+ "公</w>": 161,
167
+ "六</w>": 162,
168
+ "兰</w>": 163,
169
+ "关</w>": 164,
170
+ "兴</w>": 165,
171
+ "其</w>": 166,
172
+ "兼</w>": 167,
173
+ "内</w>": 168,
174
+ "再</w>": 169,
175
+ "冒</w>": 170,
176
+ "写</w>": 171,
177
+ "冰</w>": 172,
178
+ "冲</w>": 173,
179
+ "决</w>": 174,
180
+ "况</w>": 175,
181
+ "冷</w>": 176,
182
+ "准</w>": 177,
183
+ "几</w>": 178,
184
+ "出</w>": 179,
185
+ "分</w>": 180,
186
+ "切</w>": 181,
187
+ "划</w>": 182,
188
+ "则</w>": 183,
189
+ "创</w>": 184,
190
+ "利</w>": 185,
191
+ "到</w>": 186,
192
+ "制</w>": 187,
193
+ "前</w>": 188,
194
+ "劃</w>": 189,
195
+ "力</w>": 190,
196
+ "办</w>": 191,
197
+ "功</w>": 192,
198
+ "加</w>": 193,
199
+ "务</w>": 194,
200
+ "动</w>": 195,
201
+ "助</w>": 196,
202
+ "努</w>": 197,
203
+ "劳</w>": 198,
204
+ "勃</w>": 199,
205
+ "包</w>": 200,
206
+ "化</w>": 201,
207
+ "医</w>": 202,
208
+ "十</w>": 203,
209
+ "千</w>": 204,
210
+ "升</w>": 205,
211
+ "午</w>": 206,
212
+ "半</w>": 207,
213
+ "华</w>": 208,
214
+ "单</w>": 209,
215
+ "卖</w>": 210,
216
+ "卫</w>": 211,
217
+ "危</w>": 212,
218
+ "即</w>": 213,
219
+ "却</w>": 214,
220
+ "历</w>": 215,
221
+ "厌</w>": 216,
222
+ "厕</w>": 217,
223
+ "去</w>": 218,
224
+ "参</w>": 219,
225
+ "又</w>": 220,
226
+ "友</w>": 221,
227
+ "反</w>": 222,
228
+ "发</w>": 223,
229
+ "叔</w>": 224,
230
+ "取</w>": 225,
231
+ "受</w>": 226,
232
+ "变</w>": 227,
233
+ "口</w>": 228,
234
+ "古</w>": 229,
235
+ "另</w>": 230,
236
+ "只</w>": 231,
237
+ "叫</w>": 232,
238
+ "可</w>": 233,
239
+ "史</w>": 234,
240
+ "右</w>": 235,
241
+ "号</w>": 236,
242
+ "吃</w>": 237,
243
+ "��</w>": 238,
244
+ "同</w>": 239,
245
+ "名</w>": 240,
246
+ "后</w>": 241,
247
+ "向</w>": 242,
248
+ "吗</w>": 243,
249
+ "吧</w>": 244,
250
+ "听</w>": 245,
251
+ "告</w>": 246,
252
+ "员</w>": 247,
253
+ "呢</w>": 248,
254
+ "周</w>": 249,
255
+ "味</w>": 250,
256
+ "命</w>": 251,
257
+ "和</w>": 252,
258
+ "咖</w>": 253,
259
+ "品</w>": 254,
260
+ "响</w>": 255,
261
+ "哥</w>": 256,
262
+ "哦</w>": 257,
263
+ "哪</w>": 258,
264
+ "售</w>": 259,
265
+ "唯</w>": 260,
266
+ "唱</w>": 261,
267
+ "啊</w>": 262,
268
+ "問</w>": 263,
269
+ "啡</w>": 264,
270
+ "喜</w>": 265,
271
+ "喝</w>": 266,
272
+ "嗨</w>": 267,
273
+ "囚</w>": 268,
274
+ "回</w>": 269,
275
+ "因</w>": 270,
276
+ "团</w>": 271,
277
+ "园</w>": 272,
278
+ "困</w>": 273,
279
+ "国</w>": 274,
280
+ "图</w>": 275,
281
+ "圈</w>": 276,
282
+ "國</w>": 277,
283
+ "圣</w>": 278,
284
+ "在</w>": 279,
285
+ "地</w>": 280,
286
+ "场</w>": 281,
287
+ "坐</w>": 282,
288
+ "块</w>": 283,
289
+ "坚</w>": 284,
290
+ "城</w>": 285,
291
+ "堡</w>": 286,
292
+ "增</w>": 287,
293
+ "士</w>": 288,
294
+ "声</w>": 289,
295
+ "处</w>": 290,
296
+ "备</w>": 291,
297
+ "复</w>": 292,
298
+ "夏</w>": 293,
299
+ "外</w>": 294,
300
+ "多</w>": 295,
301
+ "夜</w>": 296,
302
+ "够</w>": 297,
303
+ "大</w>": 298,
304
+ "天</w>": 299,
305
+ "太</w>": 300,
306
+ "失</w>": 301,
307
+ "头</w>": 302,
308
+ "奇</w>": 303,
309
+ "奶</w>": 304,
310
+ "她</w>": 305,
311
+ "好</w>": 306,
312
+ "如</w>": 307,
313
+ "妈</w>": 308,
314
+ "妹</w>": 309,
315
+ "妻</w>": 310,
316
+ "始</w>": 311,
317
+ "姐</w>": 312,
318
+ "威</w>": 313,
319
+ "婚</w>": 314,
320
+ "子</w>": 315,
321
+ "字</w>": 316,
322
+ "季</w>": 317,
323
+ "学</w>": 318,
324
+ "孩</w>": 319,
325
+ "學</w>": 320,
326
+ "它</w>": 321,
327
+ "宇</w>": 322,
328
+ "守</w>": 323,
329
+ "安</w>": 324,
330
+ "完</w>": 325,
331
+ "宙</w>": 326,
332
+ "定</w>": 327,
333
+ "宝</w>": 328,
334
+ "实</w>": 329,
335
+ "客</w>": 330,
336
+ "宣</w>": 331,
337
+ "室</w>": 332,
338
+ "宵</w>": 333,
339
+ "家</w>": 334,
340
+ "寄</w>": 335,
341
+ "密</w>": 336,
342
+ "富</w>": 337,
343
+ "对</w>": 338,
344
+ "寻</w>": 339,
345
+ "将</w>": 340,
346
+ "尊</w>": 341,
347
+ "小</w>": 342,
348
+ "少</w>": 343,
349
+ "就</w>": 344,
350
+ "尼</w>": 345,
351
+ "局</w>": 346,
352
+ "屈</w>": 347,
353
+ "属</w>": 348,
354
+ "山</w>": 349,
355
+ "岁</w>": 350,
356
+ "岩</w>": 351,
357
+ "工</w>": 352,
358
+ "己</w>": 353,
359
+ "已</w>": 354,
360
+ "市</w>": 355,
361
+ "布</w>": 356,
362
+ "师</w>": 357,
363
+ "帖</w>": 358,
364
+ "带</w>": 359,
365
+ "席</w>": 360,
366
+ "帮</w>": 361,
367
+ "常</w>": 362,
368
+ "帽</w>": 363,
369
+ "干</w>": 364,
370
+ "平</w>": 365,
371
+ "年</w>": 366,
372
+ "幸</w>": 367,
373
+ "幹</w>": 368,
374
+ "广</w>": 369,
375
+ "庄</w>": 370,
376
+ "庆</w>": 371,
377
+ "床</w>": 372,
378
+ "应</w>": 373,
379
+ "底</w>": 374,
380
+ "庙</w>": 375,
381
+ "庞</w>": 376,
382
+ "度</w>": 377,
383
+ "座</w>": 378,
384
+ "庭</w>": 379,
385
+ "延</w>": 380,
386
+ "建</w>": 381,
387
+ "开</w>": 382,
388
+ "弃</w>": 383,
389
+ "式</w>": 384,
390
+ "弟</w>": 385,
391
+ "张</w>": 386,
392
+ "張</w>": 387,
393
+ "强</w>": 388,
394
+ "当</w>": 389,
395
+ "影</w>": 390,
396
+ "彻</w>": 391,
397
+ "往</w>": 392,
398
+ "径</w>": 393,
399
+ "待</w>": 394,
400
+ "很</w>": 395,
401
+ "後</w>": 396,
402
+ "徒</w>": 397,
403
+ "得</w>": 398,
404
+ "從</w>": 399,
405
+ "微</w>": 400,
406
+ "德</w>": 401,
407
+ "心</w>": 402,
408
+ "必</w>": 403,
409
+ "志</w>": 404,
410
+ "忙</w>": 405,
411
+ "快</w>": 406,
412
+ "念</w>": 407,
413
+ "怀</w>": 408,
414
+ "怎</w>": 409,
415
+ "急</w>": 410,
416
+ "总</w>": 411,
417
+ "息</w>": 412,
418
+ "悔</w>": 413,
419
+ "情</w>": 414,
420
+ "惊</w>": 415,
421
+ "惜</w>": 416,
422
+ "惡</w>": 417,
423
+ "想</w>": 418,
424
+ "愉</w>": 419,
425
+ "意</w>": 420,
426
+ "感</w>": 421,
427
+ "慢</w>": 422,
428
+ "應</w>": 423,
429
+ "戏</w>": 424,
430
+ "成</w>": 425,
431
+ "我</w>": 426,
432
+ "戒</w>": 427,
433
+ "或</w>": 428,
434
+ "戴</w>": 429,
435
+ "户</w>": 430,
436
+ "房</w>": 431,
437
+ "所</w>": 432,
438
+ "扇</w>": 433,
439
+ "手</w>": 434,
440
+ "才</w>": 435,
441
+ "打</w>": 436,
442
+ "托</w>": 437,
443
+ "扰</w>": 438,
444
+ "批</w>": 439,
445
+ "找</w>": 440,
446
+ "把</w>": 441,
447
+ "抓</w>": 442,
448
+ "护</w>": 443,
449
+ "报</w>": 444,
450
+ "抱</w>": 445,
451
+ "拆</w>": 446,
452
+ "拉</w>": 447,
453
+ "拜</w>": 448,
454
+ "拥</w>": 449,
455
+ "择</w>": 450,
456
+ "持</w>": 451,
457
+ "指</w>": 452,
458
+ "按</w>": 453,
459
+ "挑</w>": 454,
460
+ "挤</w>": 455,
461
+ "挥</w>": 456,
462
+ "据</w>": 457,
463
+ "接</w>": 458,
464
+ "推</w>": 459,
465
+ "措</w>": 460,
466
+ "揉</w>": 461,
467
+ "插</w>": 462,
468
+ "揭</w>": 463,
469
+ "携</w>": 464,
470
+ "摄</w>": 465,
471
+ "摇</w>": 466,
472
+ "摩</w>": 467,
473
+ "撒</w>": 468,
474
+ "播</w>": 469,
475
+ "擔</w>": 470,
476
+ "支</w>": 471,
477
+ "收</w>": 472,
478
+ "改</w>": 473,
479
+ "放</w>": 474,
480
+ "故</w>": 475,
481
+ "救</w>": 476,
482
+ "教</w>": 477,
483
+ "散</w>": 478,
484
+ "敦</w>": 479,
485
+ "敬</w>": 480,
486
+ "数</w>": 481,
487
+ "整</w>": 482,
488
+ "斯</w>": 483,
489
+ "新</w>": 484,
490
+ "方</w>": 485,
491
+ "施</w>": 486,
492
+ "旅</w>": 487,
493
+ "无</w>": 488,
494
+ "日</w>": 489,
495
+ "旦</w>": 490,
496
+ "早</w>": 491,
497
+ "时</w>": 492,
498
+ "明</w>": 493,
499
+ "星</w>": 494,
500
+ "昨</w>": 495,
501
+ "是</w>": 496,
502
+ "時</w>": 497,
503
+ "晃</w>": 498,
504
+ "晚</w>": 499,
505
+ "景</w>": 500,
506
+ "更</w>": 501,
507
+ "曾</w>": 502,
508
+ "最</w>": 503,
509
+ "會</w>": 504,
510
+ "月</w>": 505,
511
+ "有</w>": 506,
512
+ "朋</w>": 507,
513
+ "服</w>": 508,
514
+ "望</w>": 509,
515
+ "朝</w>": 510,
516
+ "期</w>": 511,
517
+ "本</w>": 512,
518
+ "术</w>": 513,
519
+ "机</w>": 514,
520
+ "杀</w>": 515,
521
+ "杂</w>": 516,
522
+ "权</w>": 517,
523
+ "村</w>": 518,
524
+ "条</w>": 519,
525
+ "来</w>": 520,
526
+ "杯</w>": 521,
527
+ "杰</w>": 522,
528
+ "松</w>": 523,
529
+ "果</w>": 524,
530
+ "架</w>": 525,
531
+ "某</w>": 526,
532
+ "标</w>": 527,
533
+ "栋</w>": 528,
534
+ "校</w>": 529,
535
+ "样</w>": 530,
536
+ "格</w>": 531,
537
+ "桌</w>": 532,
538
+ "桥</w>": 533,
539
+ "楼</w>": 534,
540
+ "概</w>": 535,
541
+ "樣</w>": 536,
542
+ "欠</w>": 537,
543
+ "次</w>": 538,
544
+ "欢</w>": 539,
545
+ "欲</w>": 540,
546
+ "款</w>": 541,
547
+ "歉</w>": 542,
548
+ "歌</w>": 543,
549
+ "歐</w>": 544,
550
+ "歡</w>": 545,
551
+ "止</w>": 546,
552
+ "正</w>": 547,
553
+ "步</w>": 548,
554
+ "死</w>": 549,
555
+ "段</w>": 550,
556
+ "母</w>": 551,
557
+ "每</w>": 552,
558
+ "比</w>": 553,
559
+ "毕</w>": 554,
560
+ "毛</w>": 555,
561
+ "毫</w>": 556,
562
+ "气</w>": 557,
563
+ "水</w>": 558,
564
+ "永</w>": 559,
565
+ "池</w>": 560,
566
+ "汽</w>": 561,
567
+ "沒</w>": 562,
568
+ "没</w>": 563,
569
+ "河</w>": 564,
570
+ "沸</w>": 565,
571
+ "油</w>": 566,
572
+ "沿</w>": 567,
573
+ "法</w>": 568,
574
+ "泪</w>": 569,
575
+ "泳</w>": 570,
576
+ "洗</w>": 571,
577
+ "津</w>": 572,
578
+ "活</w>": 573,
579
+ "派</w>": 574,
580
+ "流</w>": 575,
581
+ "济</w>": 576,
582
+ "消</w>": 577,
583
+ "涌</w>": 578,
584
+ "涨</w>": 579,
585
+ "清</w>": 580,
586
+ "温</w>": 581,
587
+ "港</w>": 582,
588
+ "游</w>": 583,
589
+ "湖</w>": 584,
590
+ "溜</w>": 585,
591
+ "滑</w>": 586,
592
+ "满</w>": 587,
593
+ "演</w>": 588,
594
+ "澄</w>": 589,
595
+ "澡</w>": 590,
596
+ "火</w>": 591,
597
+ "灯</w>": 592,
598
+ "灰</w>": 593,
599
+ "点</w>": 594,
600
+ "烟</w>": 595,
601
+ "烦</w>": 596,
602
+ "热</w>": 597,
603
+ "然</w>": 598,
604
+ "照</w>": 599,
605
+ "爱</w>": 600,
606
+ "父</w>": 601,
607
+ "爸</w>": 602,
608
+ "片</w>": 603,
609
+ "牛</w>": 604,
610
+ "物</w>": 605,
611
+ "狗</w>": 606,
612
+ "独</w>": 607,
613
+ "猫</w>": 608,
614
+ "王</w>": 609,
615
+ "玩</w>": 610,
616
+ "环</w>": 611,
617
+ "现</w>": 612,
618
+ "班</w>": 613,
619
+ "球</w>": 614,
620
+ "理</w>": 615,
621
+ "生</w>": 616,
622
+ "用</w>": 617,
623
+ "由</w>": 618,
624
+ "电</w>": 619,
625
+ "男</w>": 620,
626
+ "界</w>": 621,
627
+ "留</w>": 622,
628
+ "當</w>": 623,
629
+ "疑</w>": 624,
630
+ "疯</w>": 625,
631
+ "病</w>": 626,
632
+ "痛</w>": 627,
633
+ "瘋</w>": 628,
634
+ "發</w>": 629,
635
+ "白</w>": 630,
636
+ "百</w>": 631,
637
+ "的</w>": 632,
638
+ "盐</w>": 633,
639
+ "盖</w>": 634,
640
+ "盛</w>": 635,
641
+ "目</w>": 636,
642
+ "直</w>": 637,
643
+ "相</w>": 638,
644
+ "盹</w>": 639,
645
+ "看</w>": 640,
646
+ "真</w>": 641,
647
+ "眠</w>": 642,
648
+ "眼</w>": 643,
649
+ "着</w>": 644,
650
+ "睛</w>": 645,
651
+ "睡</w>": 646,
652
+ "知</w>": 647,
653
+ "短</w>": 648,
654
+ "石</w>": 649,
655
+ "码</w>": 650,
656
+ "破</w>": 651,
657
+ "确</w>": 652,
658
+ "碎</w>": 653,
659
+ "示</w>": 654,
660
+ "社</w>": 655,
661
+ "祝</w>": 656,
662
+ "神</w>": 657,
663
+ "票</w>": 658,
664
+ "福</w>": 659,
665
+ "离</w>": 660,
666
+ "私</w>": 661,
667
+ "种</w>": 662,
668
+ "秘</w>": 663,
669
+ "移</w>": 664,
670
+ "程</w>": 665,
671
+ "空</w>": 666,
672
+ "窗</w>": 667,
673
+ "窜</w>": 668,
674
+ "站</w>": 669,
675
+ "童</w>": 670,
676
+ "笑</w>": 671,
677
+ "笔</w>": 672,
678
+ "笛</w>": 673,
679
+ "第</w>": 674,
680
+ "笼</w>": 675,
681
+ "等</w>": 676,
682
+ "筑</w>": 677,
683
+ "答</w>": 678,
684
+ "简</w>": 679,
685
+ "籍</w>": 680,
686
+ "粗</w>": 681,
687
+ "精</w>": 682,
688
+ "糕</w>": 683,
689
+ "糟</w>": 684,
690
+ "素</w>": 685,
691
+ "索</w>": 686,
692
+ "給</w>": 687,
693
+ "經</w>": 688,
694
+ "總</w>": 689,
695
+ "红</w>": 690,
696
+ "纪</w>": 691,
697
+ "纯</w>": 692,
698
+ "纸</w>": 693,
699
+ "线</w>": 694,
700
+ "绅</w>": 695,
701
+ "终</w>": 696,
702
+ "经</w>": 697,
703
+ "结</w>": 698,
704
+ "给</w>": 699,
705
+ "统</w>": 700,
706
+ "绿</w>": 701,
707
+ "缺</w>": 702,
708
+ "网</w>": 703,
709
+ "罗</w>": 704,
710
+ "罚</w>": 705,
711
+ "置</w>": 706,
712
+ "美</w>": 707,
713
+ "群</w>": 708,
714
+ "習</w>": 709,
715
+ "老</w>": 710,
716
+ "考</w>": 711,
717
+ "者</w>": 712,
718
+ "而</w>": 713,
719
+ "耍</w>": 714,
720
+ "耗</w>": 715,
721
+ "职</w>": 716,
722
+ "肯</w>": 717,
723
+ "胖</w>": 718,
724
+ "能</w>": 719,
725
+ "脑</w>": 720,
726
+ "脚</w>": 721,
727
+ "脸</w>": 722,
728
+ "腾</w>": 723,
729
+ "腿</w>": 724,
730
+ "自</w>": 725,
731
+ "至</w>": 726,
732
+ "船</w>": 727,
733
+ "艰</w>": 728,
734
+ "色</w>": 729,
735
+ "艺</w>": 730,
736
+ "花</w>": 731,
737
+ "苏</w>": 732,
738
+ "英</w>": 733,
739
+ "茶</w>": 734,
740
+ "药</w>": 735,
741
+ "落</w>": 736,
742
+ "著</w>": 737,
743
+ "虑</w>": 738,
744
+ "虾</w>": 739,
745
+ "蜂</w>": 740,
746
+ "蝴</w>": 741,
747
+ "蝶</w>": 742,
748
+ "蠢</w>": 743,
749
+ "血</w>": 744,
750
+ "行</w>": 745,
751
+ "衣</w>": 746,
752
+ "表</w>": 747,
753
+ "被</w>": 748,
754
+ "裡</w>": 749,
755
+ "要</w>": 750,
756
+ "覆</w>": 751,
757
+ "覺</w>": 752,
758
+ "见</w>": 753,
759
+ "观</w>": 754,
760
+ "规</w>": 755,
761
+ "视</w>": 756,
762
+ "觉</w>": 757,
763
+ "解</w>": 758,
764
+ "言</w>": 759,
765
+ "計</w>": 760,
766
+ "試</w>": 761,
767
+ "話</w>": 762,
768
+ "該</w>": 763,
769
+ "誓</w>": 764,
770
+ "說</w>": 765,
771
+ "請</w>": 766,
772
+ "讀</w>": 767,
773
+ "變</w>": 768,
774
+ "计</w>": 769,
775
+ "订</w>": 770,
776
+ "认</w>": 771,
777
+ "让</w>": 772,
778
+ "训</w>": 773,
779
+ "议</w>": 774,
780
+ "记</w>": 775,
781
+ "讲</w>": 776,
782
+ "讶</w>": 777,
783
+ "许</w>": 778,
784
+ "论</w>": 779,
785
+ "设</w>": 780,
786
+ "访</w>": 781,
787
+ "证</w>": 782,
788
+ "评</w>": 783,
789
+ "识</w>": 784,
790
+ "诉</w>": 785,
791
+ "试</w>": 786,
792
+ "诗</w>": 787,
793
+ "诚</w>": 788,
794
+ "话</w>": 789,
795
+ "该</w>": 790,
796
+ "语</w>": 791,
797
+ "误</w>": 792,
798
+ "说</w>": 793,
799
+ "请</w>": 794,
800
+ "诺</w>": 795,
801
+ "读</w>": 796,
802
+ "课</w>": 797,
803
+ "谁</w>": 798,
804
+ "谈</w>": 799,
805
+ "谎</w>": 800,
806
+ "谢</w>": 801,
807
+ "象</w>": 802,
808
+ "賺</w>": 803,
809
+ "负</w>": 804,
810
+ "货</w>": 805,
811
+ "购</w>": 806,
812
+ "贷</w>": 807,
813
+ "费</w>": 808,
814
+ "赛</w>": 809,
815
+ "赢</w>": 810,
816
+ "走</w>": 811,
817
+ "赶</w>": 812,
818
+ "起</w>": 813,
819
+ "趕</w>": 814,
820
+ "趣</w>": 815,
821
+ "足</w>": 816,
822
+ "跑</w>": 817,
823
+ "跟</w>": 818,
824
+ "路</w>": 819,
825
+ "踢</w>": 820,
826
+ "躲</w>": 821,
827
+ "較</w>": 822,
828
+ "车</w>": 823,
829
+ "轨</w>": 824,
830
+ "转</w>": 825,
831
+ "轻</w>": 826,
832
+ "较</w>": 827,
833
+ "辆</w>": 828,
834
+ "辈</w>": 829,
835
+ "辜</w>": 830,
836
+ "辩</w>": 831,
837
+ "达</w>": 832,
838
+ "迅</w>": 833,
839
+ "过</w>": 834,
840
+ "近</w>": 835,
841
+ "还</w>": 836,
842
+ "这</w>": 837,
843
+ "进</w>": 838,
844
+ "远</w>": 839,
845
+ "迟</w>": 840,
846
+ "述</w>": 841,
847
+ "迷</w>": 842,
848
+ "迹</w>": 843,
849
+ "送</w>": 844,
850
+ "适</w>": 845,
851
+ "逃</w>": 846,
852
+ "选</w>": 847,
853
+ "透</w>": 848,
854
+ "递</w>": 849,
855
+ "途</w>": 850,
856
+ "這</w>": 851,
857
+ "通</w>": 852,
858
+ "速</w>": 853,
859
+ "造</w>": 854,
860
+ "進</w>": 855,
861
+ "過</w>": 856,
862
+ "道</w>": 857,
863
+ "遛</w>": 858,
864
+ "遠</w>": 859,
865
+ "邀</w>": 860,
866
+ "那</w>": 861,
867
+ "邻</w>": 862,
868
+ "部</w>": 863,
869
+ "都</w>": 864,
870
+ "酒</w>": 865,
871
+ "采</w>": 866,
872
+ "里</w>": 867,
873
+ "重</w>": 868,
874
+ "金</w>": 869,
875
+ "钟</w>": 870,
876
+ "钱</w>": 871,
877
+ "铁</w>": 872,
878
+ "铃</w>": 873,
879
+ "铭</w>": 874,
880
+ "银</w>": 875,
881
+ "销</w>": 876,
882
+ "错</w>": 877,
883
+ "镜</w>": 878,
884
+ "長</w>": 879,
885
+ "长</w>": 880,
886
+ "間</w>": 881,
887
+ "问</w>": 882,
888
+ "间</w>": 883,
889
+ "闻</w>": 884,
890
+ "阅</w>": 885,
891
+ "阐</w>": 886,
892
+ "防</w>": 887,
893
+ "阳</w>": 888,
894
+ "附</w>": 889,
895
+ "限</w>": 890,
896
+ "除</w>": 891,
897
+ "险</w>": 892,
898
+ "随</w>": 893,
899
+ "隻</w>": 894,
900
+ "难</w>": 895,
901
+ "雨</w>": 896,
902
+ "雪</w>": 897,
903
+ "零</w>": 898,
904
+ "雹</w>": 899,
905
+ "需</w>": 900,
906
+ "震</w>": 901,
907
+ "露</w>": 902,
908
+ "非</w>": 903,
909
+ "靠</w>": 904,
910
+ "面</w>": 905,
911
+ "音</w>": 906,
912
+ "題</w>": 907,
913
+ "项</w>": 908,
914
+ "须</w>": 909,
915
+ "顾</w>": 910,
916
+ "预</w>": 911,
917
+ "题</w>": 912,
918
+ "风</w>": 913,
919
+ "飞</w>": 914,
920
+ "食</w>": 915,
921
+ "餐</w>": 916,
922
+ "饭</w>": 917,
923
+ "饿</w>": 918,
924
+ "首</w>": 919,
925
+ "马</w>": 920,
926
+ "驶</w>": 921,
927
+ "验</w>": 922,
928
+ "骑</w>": 923,
929
+ "骗</w>": 924,
930
+ "高</w>": 925,
931
+ "鬼</w>": 926,
932
+ "鱼</w>": 927,
933
+ "鲍</w>": 928,
934
+ "鲜</w>": 929,
935
+ "麻</w>": 930,
936
+ "麼</w>": 931,
937
+ "點</w>": 932,
938
+ "鼠</w>": 933,
939
+ "龙</w>": 934,
940
+ "﹐</w>": 935,
941
+ "!</w>": 936,
942
+ ",</w>": 937,
943
+ "?</w>": 938
944
+ },
945
+ "id_to_token": {
946
+ "0": "<pad>",
947
+ "1": "<sos>",
948
+ "2": "<eos>",
949
+ "3": "<unk>",
950
+ "4": "<mask>",
951
+ "5": "!</w>",
952
+ "6": "\"</w>",
953
+ "7": ",</w>",
954
+ "8": ".</w>",
955
+ "9": "0</w>",
956
+ "10": "10",
957
+ "11": "100</w>",
958
+ "12": "10</w>",
959
+ "13": "18",
960
+ "14": "18</w>",
961
+ "15": "1</w>",
962
+ "16": "20</w>",
963
+ "17": "21</w>",
964
+ "18": "2</w>",
965
+ "19": "3</w>",
966
+ "20": "40</w>",
967
+ "21": "56</w>",
968
+ "22": "5</w>",
969
+ "23": "6</w>",
970
+ "24": "7</w>",
971
+ "25": "?</w>",
972
+ "26": "ali",
973
+ "27": "alice</w>",
974
+ "28": "ancy</w>",
975
+ "29": "ce</w>",
976
+ "30": "cy</w>",
977
+ "31": "e</w>",
978
+ "32": "el",
979
+ "33": "iel",
980
+ "34": "ir",
981
+ "35": "ja",
982
+ "36": "jac",
983
+ "37": "jack</w>",
984
+ "38": "jake</w>",
985
+ "39": "k</w>",
986
+ "40": "ka",
987
+ "41": "kate</w>",
988
+ "42": "ke",
989
+ "43": "ke</w>",
990
+ "44": "ken</w>",
991
+ "45": "li",
992
+ "46": "m</w>",
993
+ "47": "muir",
994
+ "48": "muiriel",
995
+ "49": "muiriel</w>",
996
+ "50": "n</w>",
997
+ "51": "nancy</w>",
998
+ "52": "ncy</w>",
999
+ "53": "om</w>",
1000
+ "54": "te</w>",
1001
+ "55": "tom</w>",
1002
+ "56": "uir",
1003
+ "57": "y</w>",
1004
+ "58": "“</w>",
1005
+ "59": "”</w>",
1006
+ "60": "…</w>",
1007
+ "61": "、</w>",
1008
+ "62": "。</w>",
1009
+ "63": "一</w>",
1010
+ "64": "七</w>",
1011
+ "65": "万</w>",
1012
+ "66": "三</w>",
1013
+ "67": "上</w>",
1014
+ "68": "下</w>",
1015
+ "69": "不</w>",
1016
+ "70": "丑</w>",
1017
+ "71": "世</w>",
1018
+ "72": "业</w>",
1019
+ "73": "两</w>",
1020
+ "74": "严</w>",
1021
+ "75": "个</w>",
1022
+ "76": "中</w>",
1023
+ "77": "丰</w>",
1024
+ "78": "为</w>",
1025
+ "79": "举</w>",
1026
+ "80": "久</w>",
1027
+ "81": "么</w>",
1028
+ "82": "义</w>",
1029
+ "83": "之</w>",
1030
+ "84": "乎</w>",
1031
+ "85": "乐</w>",
1032
+ "86": "乘</w>",
1033
+ "87": "九</w>",
1034
+ "88": "也</w>",
1035
+ "89": "习</w>",
1036
+ "90": "书</w>",
1037
+ "91": "买</w>",
1038
+ "92": "了</w>",
1039
+ "93": "予</w>",
1040
+ "94": "争</w>",
1041
+ "95": "事</w>",
1042
+ "96": "于</w>",
1043
+ "97": "互</w>",
1044
+ "98": "些</w>",
1045
+ "99": "交</w>",
1046
+ "100": "亲</w>",
1047
+ "101": "人</w>",
1048
+ "102": "什</w>",
1049
+ "103": "仅</w>",
1050
+ "104": "今</w>",
1051
+ "105": "从</w>",
1052
+ "106": "他</w>",
1053
+ "107": "付</w>",
1054
+ "108": "代</w>",
1055
+ "109": "以</w>",
1056
+ "110": "仪</w>",
1057
+ "111": "们</w>",
1058
+ "112": "件</w>",
1059
+ "113": "价</w>",
1060
+ "114": "任</w>",
1061
+ "115": "份</w>",
1062
+ "116": "休</w>",
1063
+ "117": "众</w>",
1064
+ "118": "会</w>",
1065
+ "119": "伟</w>",
1066
+ "120": "传</w>",
1067
+ "121": "伦</w>",
1068
+ "122": "似</w>",
1069
+ "123": "但</w>",
1070
+ "124": "位</w>",
1071
+ "125": "低</w>",
1072
+ "126": "住</w>",
1073
+ "127": "体</w>",
1074
+ "128": "何</w>",
1075
+ "129": "作</w>",
1076
+ "130": "你</w>",
1077
+ "131": "使</w>",
1078
+ "132": "來</w>",
1079
+ "133": "例</w>",
1080
+ "134": "保</w>",
1081
+ "135": "信</w>",
1082
+ "136": "俱</w>",
1083
+ "137": "個</w>",
1084
+ "138": "們</w>",
1085
+ "139": "候</w>",
1086
+ "140": "借</w>",
1087
+ "141": "倦</w>",
1088
+ "142": "债</w>",
1089
+ "143": "值</w>",
1090
+ "144": "假</w>",
1091
+ "145": "偏</w>",
1092
+ "146": "做</w>",
1093
+ "147": "停</w>",
1094
+ "148": "偶</w>",
1095
+ "149": "偷</w>",
1096
+ "150": "像</w>",
1097
+ "151": "僵</w>",
1098
+ "152": "儿</w>",
1099
+ "153": "元</w>",
1100
+ "154": "先</w>",
1101
+ "155": "光</w>",
1102
+ "156": "克</w>",
1103
+ "157": "免</w>",
1104
+ "158": "兔</w>",
1105
+ "159": "入</w>",
1106
+ "160": "全</w>",
1107
+ "161": "公</w>",
1108
+ "162": "六</w>",
1109
+ "163": "兰</w>",
1110
+ "164": "关</w>",
1111
+ "165": "兴</w>",
1112
+ "166": "其</w>",
1113
+ "167": "兼</w>",
1114
+ "168": "内</w>",
1115
+ "169": "再</w>",
1116
+ "170": "冒</w>",
1117
+ "171": "写</w>",
1118
+ "172": "冰</w>",
1119
+ "173": "冲</w>",
1120
+ "174": "决</w>",
1121
+ "175": "况</w>",
1122
+ "176": "冷</w>",
1123
+ "177": "准</w>",
1124
+ "178": "几</w>",
1125
+ "179": "出</w>",
1126
+ "180": "分</w>",
1127
+ "181": "切</w>",
1128
+ "182": "划</w>",
1129
+ "183": "则</w>",
1130
+ "184": "创</w>",
1131
+ "185": "利</w>",
1132
+ "186": "到</w>",
1133
+ "187": "制</w>",
1134
+ "188": "前</w>",
1135
+ "189": "劃</w>",
1136
+ "190": "力</w>",
1137
+ "191": "办</w>",
1138
+ "192": "功</w>",
1139
+ "193": "加</w>",
1140
+ "194": "务</w>",
1141
+ "195": "动</w>",
1142
+ "196": "助</w>",
1143
+ "197": "努</w>",
1144
+ "198": "劳</w>",
1145
+ "199": "勃</w>",
1146
+ "200": "包</w>",
1147
+ "201": "化</w>",
1148
+ "202": "医</w>",
1149
+ "203": "十</w>",
1150
+ "204": "千</w>",
1151
+ "205": "升</w>",
1152
+ "206": "午</w>",
1153
+ "207": "半</w>",
1154
+ "208": "华</w>",
1155
+ "209": "单</w>",
1156
+ "210": "卖</w>",
1157
+ "211": "卫</w>",
1158
+ "212": "危</w>",
1159
+ "213": "即</w>",
1160
+ "214": "却</w>",
1161
+ "215": "历</w>",
1162
+ "216": "厌</w>",
1163
+ "217": "厕</w>",
1164
+ "218": "去</w>",
1165
+ "219": "参</w>",
1166
+ "220": "又</w>",
1167
+ "221": "友</w>",
1168
+ "222": "反</w>",
1169
+ "223": "发</w>",
1170
+ "224": "叔</w>",
1171
+ "225": "取</w>",
1172
+ "226": "受</w>",
1173
+ "227": "变</w>",
1174
+ "228": "口</w>",
1175
+ "229": "古</w>",
1176
+ "230": "另</w>",
1177
+ "231": "只</w>",
1178
+ "232": "叫</w>",
1179
+ "233": "可</w>",
1180
+ "234": "史</w>",
1181
+ "235": "右</w>",
1182
+ "236": "号</w>",
1183
+ "237": "吃</w>",
1184
+ "238": "合</w>",
1185
+ "239": "同</w>",
1186
+ "240": "名</w>",
1187
+ "241": "后</w>",
1188
+ "242": "向</w>",
1189
+ "243": "吗</w>",
1190
+ "244": "吧</w>",
1191
+ "245": "听</w>",
1192
+ "246": "告</w>",
1193
+ "247": "员</w>",
1194
+ "248": "呢</w>",
1195
+ "249": "周</w>",
1196
+ "250": "味</w>",
1197
+ "251": "命</w>",
1198
+ "252": "和</w>",
1199
+ "253": "咖</w>",
1200
+ "254": "品</w>",
1201
+ "255": "响</w>",
1202
+ "256": "哥</w>",
1203
+ "257": "哦</w>",
1204
+ "258": "哪</w>",
1205
+ "259": "售</w>",
1206
+ "260": "唯</w>",
1207
+ "261": "唱</w>",
1208
+ "262": "啊</w>",
1209
+ "263": "問</w>",
1210
+ "264": "啡</w>",
1211
+ "265": "喜</w>",
1212
+ "266": "喝</w>",
1213
+ "267": "嗨</w>",
1214
+ "268": "囚</w>",
1215
+ "269": "回</w>",
1216
+ "270": "因</w>",
1217
+ "271": "团</w>",
1218
+ "272": "园</w>",
1219
+ "273": "困</w>",
1220
+ "274": "国</w>",
1221
+ "275": "图</w>",
1222
+ "276": "圈</w>",
1223
+ "277": "國</w>",
1224
+ "278": "圣</w>",
1225
+ "279": "在</w>",
1226
+ "280": "地</w>",
1227
+ "281": "场</w>",
1228
+ "282": "坐</w>",
1229
+ "283": "块</w>",
1230
+ "284": "坚</w>",
1231
+ "285": "城</w>",
1232
+ "286": "堡</w>",
1233
+ "287": "增</w>",
1234
+ "288": "士</w>",
1235
+ "289": "声</w>",
1236
+ "290": "处</w>",
1237
+ "291": "备</w>",
1238
+ "292": "复</w>",
1239
+ "293": "夏</w>",
1240
+ "294": "外</w>",
1241
+ "295": "多</w>",
1242
+ "296": "夜</w>",
1243
+ "297": "够</w>",
1244
+ "298": "大</w>",
1245
+ "299": "天</w>",
1246
+ "300": "太</w>",
1247
+ "301": "失</w>",
1248
+ "302": "头</w>",
1249
+ "303": "奇</w>",
1250
+ "304": "奶</w>",
1251
+ "305": "她</w>",
1252
+ "306": "好</w>",
1253
+ "307": "如</w>",
1254
+ "308": "妈</w>",
1255
+ "309": "妹</w>",
1256
+ "310": "妻</w>",
1257
+ "311": "始</w>",
1258
+ "312": "姐</w>",
1259
+ "313": "威</w>",
1260
+ "314": "婚</w>",
1261
+ "315": "子</w>",
1262
+ "316": "字</w>",
1263
+ "317": "季</w>",
1264
+ "318": "学</w>",
1265
+ "319": "孩</w>",
1266
+ "320": "學</w>",
1267
+ "321": "它</w>",
1268
+ "322": "宇</w>",
1269
+ "323": "守</w>",
1270
+ "324": "安</w>",
1271
+ "325": "完</w>",
1272
+ "326": "宙</w>",
1273
+ "327": "定</w>",
1274
+ "328": "宝</w>",
1275
+ "329": "实</w>",
1276
+ "330": "客</w>",
1277
+ "331": "宣</w>",
1278
+ "332": "室</w>",
1279
+ "333": "宵</w>",
1280
+ "334": "家</w>",
1281
+ "335": "寄</w>",
1282
+ "336": "密</w>",
1283
+ "337": "富</w>",
1284
+ "338": "对</w>",
1285
+ "339": "寻</w>",
1286
+ "340": "将</w>",
1287
+ "341": "尊</w>",
1288
+ "342": "小</w>",
1289
+ "343": "少</w>",
1290
+ "344": "就</w>",
1291
+ "345": "尼</w>",
1292
+ "346": "局</w>",
1293
+ "347": "屈</w>",
1294
+ "348": "属</w>",
1295
+ "349": "山</w>",
1296
+ "350": "岁</w>",
1297
+ "351": "岩</w>",
1298
+ "352": "工</w>",
1299
+ "353": "己</w>",
1300
+ "354": "已</w>",
1301
+ "355": "市</w>",
1302
+ "356": "布</w>",
1303
+ "357": "师</w>",
1304
+ "358": "帖</w>",
1305
+ "359": "带</w>",
1306
+ "360": "席</w>",
1307
+ "361": "帮</w>",
1308
+ "362": "常</w>",
1309
+ "363": "帽</w>",
1310
+ "364": "干</w>",
1311
+ "365": "平</w>",
1312
+ "366": "年</w>",
1313
+ "367": "幸</w>",
1314
+ "368": "幹</w>",
1315
+ "369": "广</w>",
1316
+ "370": "庄</w>",
1317
+ "371": "庆</w>",
1318
+ "372": "床</w>",
1319
+ "373": "应</w>",
1320
+ "374": "底</w>",
1321
+ "375": "庙</w>",
1322
+ "376": "庞</w>",
1323
+ "377": "度</w>",
1324
+ "378": "座</w>",
1325
+ "379": "庭</w>",
1326
+ "380": "延</w>",
1327
+ "381": "建</w>",
1328
+ "382": "开</w>",
1329
+ "383": "弃</w>",
1330
+ "384": "式</w>",
1331
+ "385": "弟</w>",
1332
+ "386": "张</w>",
1333
+ "387": "張</w>",
1334
+ "388": "强</w>",
1335
+ "389": "当</w>",
1336
+ "390": "影</w>",
1337
+ "391": "彻</w>",
1338
+ "392": "往</w>",
1339
+ "393": "径</w>",
1340
+ "394": "待</w>",
1341
+ "395": "很</w>",
1342
+ "396": "後</w>",
1343
+ "397": "徒</w>",
1344
+ "398": "得</w>",
1345
+ "399": "從</w>",
1346
+ "400": "微</w>",
1347
+ "401": "德</w>",
1348
+ "402": "心</w>",
1349
+ "403": "必</w>",
1350
+ "404": "志</w>",
1351
+ "405": "忙</w>",
1352
+ "406": "快</w>",
1353
+ "407": "念</w>",
1354
+ "408": "怀</w>",
1355
+ "409": "怎</w>",
1356
+ "410": "急</w>",
1357
+ "411": "总</w>",
1358
+ "412": "息</w>",
1359
+ "413": "悔</w>",
1360
+ "414": "情</w>",
1361
+ "415": "惊</w>",
1362
+ "416": "惜</w>",
1363
+ "417": "惡</w>",
1364
+ "418": "想</w>",
1365
+ "419": "愉</w>",
1366
+ "420": "意</w>",
1367
+ "421": "感</w>",
1368
+ "422": "慢</w>",
1369
+ "423": "應</w>",
1370
+ "424": "戏</w>",
1371
+ "425": "成</w>",
1372
+ "426": "我</w>",
1373
+ "427": "戒</w>",
1374
+ "428": "或</w>",
1375
+ "429": "戴</w>",
1376
+ "430": "户</w>",
1377
+ "431": "房</w>",
1378
+ "432": "所</w>",
1379
+ "433": "扇</w>",
1380
+ "434": "手</w>",
1381
+ "435": "才</w>",
1382
+ "436": "打</w>",
1383
+ "437": "托</w>",
1384
+ "438": "扰</w>",
1385
+ "439": "批</w>",
1386
+ "440": "找</w>",
1387
+ "441": "把</w>",
1388
+ "442": "抓</w>",
1389
+ "443": "护</w>",
1390
+ "444": "报</w>",
1391
+ "445": "抱</w>",
1392
+ "446": "拆</w>",
1393
+ "447": "拉</w>",
1394
+ "448": "拜</w>",
1395
+ "449": "拥</w>",
1396
+ "450": "择</w>",
1397
+ "451": "持</w>",
1398
+ "452": "指</w>",
1399
+ "453": "按</w>",
1400
+ "454": "挑</w>",
1401
+ "455": "挤</w>",
1402
+ "456": "挥</w>",
1403
+ "457": "据</w>",
1404
+ "458": "接</w>",
1405
+ "459": "推</w>",
1406
+ "460": "措</w>",
1407
+ "461": "揉</w>",
1408
+ "462": "插</w>",
1409
+ "463": "揭</w>",
1410
+ "464": "携</w>",
1411
+ "465": "摄</w>",
1412
+ "466": "摇</w>",
1413
+ "467": "摩</w>",
1414
+ "468": "撒</w>",
1415
+ "469": "播</w>",
1416
+ "470": "擔</w>",
1417
+ "471": "支</w>",
1418
+ "472": "收</w>",
1419
+ "473": "改</w>",
1420
+ "474": "放</w>",
1421
+ "475": "故</w>",
1422
+ "476": "救</w>",
1423
+ "477": "教</w>",
1424
+ "478": "散</w>",
1425
+ "479": "敦</w>",
1426
+ "480": "敬</w>",
1427
+ "481": "数</w>",
1428
+ "482": "整</w>",
1429
+ "483": "斯</w>",
1430
+ "484": "新</w>",
1431
+ "485": "方</w>",
1432
+ "486": "施</w>",
1433
+ "487": "旅</w>",
1434
+ "488": "无</w>",
1435
+ "489": "日</w>",
1436
+ "490": "旦</w>",
1437
+ "491": "早</w>",
1438
+ "492": "时</w>",
1439
+ "493": "明</w>",
1440
+ "494": "星</w>",
1441
+ "495": "昨</w>",
1442
+ "496": "是</w>",
1443
+ "497": "時</w>",
1444
+ "498": "晃</w>",
1445
+ "499": "晚</w>",
1446
+ "500": "景</w>",
1447
+ "501": "更</w>",
1448
+ "502": "曾</w>",
1449
+ "503": "最</w>",
1450
+ "504": "會</w>",
1451
+ "505": "月</w>",
1452
+ "506": "有</w>",
1453
+ "507": "朋</w>",
1454
+ "508": "服</w>",
1455
+ "509": "望</w>",
1456
+ "510": "朝</w>",
1457
+ "511": "期</w>",
1458
+ "512": "本</w>",
1459
+ "513": "术</w>",
1460
+ "514": "机</w>",
1461
+ "515": "杀</w>",
1462
+ "516": "杂</w>",
1463
+ "517": "权</w>",
1464
+ "518": "村</w>",
1465
+ "519": "条</w>",
1466
+ "520": "来</w>",
1467
+ "521": "杯</w>",
1468
+ "522": "杰</w>",
1469
+ "523": "松</w>",
1470
+ "524": "果</w>",
1471
+ "525": "架</w>",
1472
+ "526": "某</w>",
1473
+ "527": "标</w>",
1474
+ "528": "栋</w>",
1475
+ "529": "校</w>",
1476
+ "530": "样</w>",
1477
+ "531": "格</w>",
1478
+ "532": "桌</w>",
1479
+ "533": "桥</w>",
1480
+ "534": "楼</w>",
1481
+ "535": "概</w>",
1482
+ "536": "樣</w>",
1483
+ "537": "欠</w>",
1484
+ "538": "次</w>",
1485
+ "539": "欢</w>",
1486
+ "540": "欲</w>",
1487
+ "541": "款</w>",
1488
+ "542": "歉</w>",
1489
+ "543": "歌</w>",
1490
+ "544": "歐</w>",
1491
+ "545": "歡</w>",
1492
+ "546": "止</w>",
1493
+ "547": "正</w>",
1494
+ "548": "步</w>",
1495
+ "549": "死</w>",
1496
+ "550": "段</w>",
1497
+ "551": "母</w>",
1498
+ "552": "每</w>",
1499
+ "553": "比</w>",
1500
+ "554": "毕</w>",
1501
+ "555": "毛</w>",
1502
+ "556": "毫</w>",
1503
+ "557": "气</w>",
1504
+ "558": "水</w>",
1505
+ "559": "永</w>",
1506
+ "560": "池</w>",
1507
+ "561": "汽</w>",
1508
+ "562": "沒</w>",
1509
+ "563": "没</w>",
1510
+ "564": "河</w>",
1511
+ "565": "沸</w>",
1512
+ "566": "油</w>",
1513
+ "567": "沿</w>",
1514
+ "568": "法</w>",
1515
+ "569": "泪</w>",
1516
+ "570": "泳</w>",
1517
+ "571": "洗</w>",
1518
+ "572": "津</w>",
1519
+ "573": "活</w>",
1520
+ "574": "派</w>",
1521
+ "575": "流</w>",
1522
+ "576": "济</w>",
1523
+ "577": "消</w>",
1524
+ "578": "涌</w>",
1525
+ "579": "涨</w>",
1526
+ "580": "清</w>",
1527
+ "581": "温</w>",
1528
+ "582": "港</w>",
1529
+ "583": "游</w>",
1530
+ "584": "湖</w>",
1531
+ "585": "溜</w>",
1532
+ "586": "滑</w>",
1533
+ "587": "满</w>",
1534
+ "588": "演</w>",
1535
+ "589": "澄</w>",
1536
+ "590": "澡</w>",
1537
+ "591": "火</w>",
1538
+ "592": "灯</w>",
1539
+ "593": "灰</w>",
1540
+ "594": "点</w>",
1541
+ "595": "烟</w>",
1542
+ "596": "烦</w>",
1543
+ "597": "热</w>",
1544
+ "598": "然</w>",
1545
+ "599": "照</w>",
1546
+ "600": "爱</w>",
1547
+ "601": "父</w>",
1548
+ "602": "爸</w>",
1549
+ "603": "片</w>",
1550
+ "604": "牛</w>",
1551
+ "605": "物</w>",
1552
+ "606": "狗</w>",
1553
+ "607": "独</w>",
1554
+ "608": "猫</w>",
1555
+ "609": "王</w>",
1556
+ "610": "玩</w>",
1557
+ "611": "环</w>",
1558
+ "612": "现</w>",
1559
+ "613": "班</w>",
1560
+ "614": "球</w>",
1561
+ "615": "理</w>",
1562
+ "616": "生</w>",
1563
+ "617": "用</w>",
1564
+ "618": "由</w>",
1565
+ "619": "电</w>",
1566
+ "620": "男</w>",
1567
+ "621": "界</w>",
1568
+ "622": "留</w>",
1569
+ "623": "當</w>",
1570
+ "624": "疑</w>",
1571
+ "625": "疯</w>",
1572
+ "626": "病</w>",
1573
+ "627": "痛</w>",
1574
+ "628": "瘋</w>",
1575
+ "629": "發</w>",
1576
+ "630": "白</w>",
1577
+ "631": "百</w>",
1578
+ "632": "的</w>",
1579
+ "633": "盐</w>",
1580
+ "634": "盖</w>",
1581
+ "635": "盛</w>",
1582
+ "636": "目</w>",
1583
+ "637": "直</w>",
1584
+ "638": "相</w>",
1585
+ "639": "盹</w>",
1586
+ "640": "看</w>",
1587
+ "641": "真</w>",
1588
+ "642": "眠</w>",
1589
+ "643": "眼</w>",
1590
+ "644": "着</w>",
1591
+ "645": "睛</w>",
1592
+ "646": "睡</w>",
1593
+ "647": "知</w>",
1594
+ "648": "短</w>",
1595
+ "649": "石</w>",
1596
+ "650": "码</w>",
1597
+ "651": "破</w>",
1598
+ "652": "确</w>",
1599
+ "653": "碎</w>",
1600
+ "654": "示</w>",
1601
+ "655": "社</w>",
1602
+ "656": "祝</w>",
1603
+ "657": "神</w>",
1604
+ "658": "票</w>",
1605
+ "659": "福</w>",
1606
+ "660": "离</w>",
1607
+ "661": "私</w>",
1608
+ "662": "种</w>",
1609
+ "663": "秘</w>",
1610
+ "664": "移</w>",
1611
+ "665": "程</w>",
1612
+ "666": "空</w>",
1613
+ "667": "窗</w>",
1614
+ "668": "窜</w>",
1615
+ "669": "站</w>",
1616
+ "670": "童</w>",
1617
+ "671": "笑</w>",
1618
+ "672": "笔</w>",
1619
+ "673": "笛</w>",
1620
+ "674": "第</w>",
1621
+ "675": "笼</w>",
1622
+ "676": "等</w>",
1623
+ "677": "筑</w>",
1624
+ "678": "答</w>",
1625
+ "679": "简</w>",
1626
+ "680": "籍</w>",
1627
+ "681": "粗</w>",
1628
+ "682": "精</w>",
1629
+ "683": "糕</w>",
1630
+ "684": "糟</w>",
1631
+ "685": "素</w>",
1632
+ "686": "索</w>",
1633
+ "687": "給</w>",
1634
+ "688": "經</w>",
1635
+ "689": "總</w>",
1636
+ "690": "红</w>",
1637
+ "691": "纪</w>",
1638
+ "692": "纯</w>",
1639
+ "693": "纸</w>",
1640
+ "694": "线</w>",
1641
+ "695": "绅</w>",
1642
+ "696": "终</w>",
1643
+ "697": "经</w>",
1644
+ "698": "结</w>",
1645
+ "699": "给</w>",
1646
+ "700": "统</w>",
1647
+ "701": "绿</w>",
1648
+ "702": "缺</w>",
1649
+ "703": "网</w>",
1650
+ "704": "罗</w>",
1651
+ "705": "罚</w>",
1652
+ "706": "置</w>",
1653
+ "707": "美</w>",
1654
+ "708": "群</w>",
1655
+ "709": "習</w>",
1656
+ "710": "老</w>",
1657
+ "711": "考</w>",
1658
+ "712": "者</w>",
1659
+ "713": "而</w>",
1660
+ "714": "耍</w>",
1661
+ "715": "耗</w>",
1662
+ "716": "职</w>",
1663
+ "717": "肯</w>",
1664
+ "718": "胖</w>",
1665
+ "719": "能</w>",
1666
+ "720": "脑</w>",
1667
+ "721": "脚</w>",
1668
+ "722": "脸</w>",
1669
+ "723": "腾</w>",
1670
+ "724": "腿</w>",
1671
+ "725": "自</w>",
1672
+ "726": "至</w>",
1673
+ "727": "船</w>",
1674
+ "728": "艰</w>",
1675
+ "729": "色</w>",
1676
+ "730": "艺</w>",
1677
+ "731": "花</w>",
1678
+ "732": "苏</w>",
1679
+ "733": "英</w>",
1680
+ "734": "茶</w>",
1681
+ "735": "药</w>",
1682
+ "736": "落</w>",
1683
+ "737": "著</w>",
1684
+ "738": "虑</w>",
1685
+ "739": "虾</w>",
1686
+ "740": "蜂</w>",
1687
+ "741": "蝴</w>",
1688
+ "742": "蝶</w>",
1689
+ "743": "蠢</w>",
1690
+ "744": "血</w>",
1691
+ "745": "行</w>",
1692
+ "746": "衣</w>",
1693
+ "747": "表</w>",
1694
+ "748": "被</w>",
1695
+ "749": "裡</w>",
1696
+ "750": "要</w>",
1697
+ "751": "覆</w>",
1698
+ "752": "覺</w>",
1699
+ "753": "见</w>",
1700
+ "754": "观</w>",
1701
+ "755": "规</w>",
1702
+ "756": "视</w>",
1703
+ "757": "觉</w>",
1704
+ "758": "解</w>",
1705
+ "759": "言</w>",
1706
+ "760": "計</w>",
1707
+ "761": "試</w>",
1708
+ "762": "話</w>",
1709
+ "763": "該</w>",
1710
+ "764": "誓</w>",
1711
+ "765": "說</w>",
1712
+ "766": "請</w>",
1713
+ "767": "讀</w>",
1714
+ "768": "變</w>",
1715
+ "769": "计</w>",
1716
+ "770": "订</w>",
1717
+ "771": "认</w>",
1718
+ "772": "让</w>",
1719
+ "773": "训</w>",
1720
+ "774": "议</w>",
1721
+ "775": "记</w>",
1722
+ "776": "讲</w>",
1723
+ "777": "讶</w>",
1724
+ "778": "许</w>",
1725
+ "779": "论</w>",
1726
+ "780": "设</w>",
1727
+ "781": "访</w>",
1728
+ "782": "证</w>",
1729
+ "783": "评</w>",
1730
+ "784": "识</w>",
1731
+ "785": "诉</w>",
1732
+ "786": "试</w>",
1733
+ "787": "诗</w>",
1734
+ "788": "诚</w>",
1735
+ "789": "话</w>",
1736
+ "790": "该</w>",
1737
+ "791": "语</w>",
1738
+ "792": "误</w>",
1739
+ "793": "说</w>",
1740
+ "794": "请</w>",
1741
+ "795": "诺</w>",
1742
+ "796": "读</w>",
1743
+ "797": "课</w>",
1744
+ "798": "谁</w>",
1745
+ "799": "谈</w>",
1746
+ "800": "谎</w>",
1747
+ "801": "谢</w>",
1748
+ "802": "象</w>",
1749
+ "803": "賺</w>",
1750
+ "804": "负</w>",
1751
+ "805": "货</w>",
1752
+ "806": "购</w>",
1753
+ "807": "贷</w>",
1754
+ "808": "费</w>",
1755
+ "809": "赛</w>",
1756
+ "810": "赢</w>",
1757
+ "811": "走</w>",
1758
+ "812": "赶</w>",
1759
+ "813": "起</w>",
1760
+ "814": "趕</w>",
1761
+ "815": "趣</w>",
1762
+ "816": "足</w>",
1763
+ "817": "跑</w>",
1764
+ "818": "跟</w>",
1765
+ "819": "路</w>",
1766
+ "820": "踢</w>",
1767
+ "821": "躲</w>",
1768
+ "822": "較</w>",
1769
+ "823": "车</w>",
1770
+ "824": "轨</w>",
1771
+ "825": "转</w>",
1772
+ "826": "轻</w>",
1773
+ "827": "较</w>",
1774
+ "828": "辆</w>",
1775
+ "829": "辈</w>",
1776
+ "830": "辜</w>",
1777
+ "831": "辩</w>",
1778
+ "832": "达</w>",
1779
+ "833": "迅</w>",
1780
+ "834": "过</w>",
1781
+ "835": "近</w>",
1782
+ "836": "还</w>",
1783
+ "837": "这</w>",
1784
+ "838": "进</w>",
1785
+ "839": "远</w>",
1786
+ "840": "迟</w>",
1787
+ "841": "述</w>",
1788
+ "842": "迷</w>",
1789
+ "843": "迹</w>",
1790
+ "844": "送</w>",
1791
+ "845": "适</w>",
1792
+ "846": "逃</w>",
1793
+ "847": "选</w>",
1794
+ "848": "透</w>",
1795
+ "849": "递</w>",
1796
+ "850": "途</w>",
1797
+ "851": "這</w>",
1798
+ "852": "通</w>",
1799
+ "853": "速</w>",
1800
+ "854": "造</w>",
1801
+ "855": "進</w>",
1802
+ "856": "過</w>",
1803
+ "857": "道</w>",
1804
+ "858": "遛</w>",
1805
+ "859": "遠</w>",
1806
+ "860": "邀</w>",
1807
+ "861": "那</w>",
1808
+ "862": "邻</w>",
1809
+ "863": "部</w>",
1810
+ "864": "都</w>",
1811
+ "865": "酒</w>",
1812
+ "866": "采</w>",
1813
+ "867": "里</w>",
1814
+ "868": "重</w>",
1815
+ "869": "金</w>",
1816
+ "870": "钟</w>",
1817
+ "871": "钱</w>",
1818
+ "872": "铁</w>",
1819
+ "873": "铃</w>",
1820
+ "874": "铭</w>",
1821
+ "875": "银</w>",
1822
+ "876": "销</w>",
1823
+ "877": "错</w>",
1824
+ "878": "镜</w>",
1825
+ "879": "長</w>",
1826
+ "880": "长</w>",
1827
+ "881": "間</w>",
1828
+ "882": "问</w>",
1829
+ "883": "间</w>",
1830
+ "884": "闻</w>",
1831
+ "885": "阅</w>",
1832
+ "886": "阐</w>",
1833
+ "887": "防</w>",
1834
+ "888": "阳</w>",
1835
+ "889": "附</w>",
1836
+ "890": "限</w>",
1837
+ "891": "除</w>",
1838
+ "892": "险</w>",
1839
+ "893": "随</w>",
1840
+ "894": "隻</w>",
1841
+ "895": "难</w>",
1842
+ "896": "雨</w>",
1843
+ "897": "雪</w>",
1844
+ "898": "零</w>",
1845
+ "899": "雹</w>",
1846
+ "900": "需</w>",
1847
+ "901": "震</w>",
1848
+ "902": "露</w>",
1849
+ "903": "非</w>",
1850
+ "904": "靠</w>",
1851
+ "905": "面</w>",
1852
+ "906": "音</w>",
1853
+ "907": "題</w>",
1854
+ "908": "项</w>",
1855
+ "909": "须</w>",
1856
+ "910": "顾</w>",
1857
+ "911": "预</w>",
1858
+ "912": "题</w>",
1859
+ "913": "风</w>",
1860
+ "914": "飞</w>",
1861
+ "915": "食</w>",
1862
+ "916": "餐</w>",
1863
+ "917": "饭</w>",
1864
+ "918": "饿</w>",
1865
+ "919": "首</w>",
1866
+ "920": "马</w>",
1867
+ "921": "驶</w>",
1868
+ "922": "验</w>",
1869
+ "923": "骑</w>",
1870
+ "924": "骗</w>",
1871
+ "925": "高</w>",
1872
+ "926": "鬼</w>",
1873
+ "927": "鱼</w>",
1874
+ "928": "鲍</w>",
1875
+ "929": "鲜</w>",
1876
+ "930": "麻</w>",
1877
+ "931": "麼</w>",
1878
+ "932": "點</w>",
1879
+ "933": "鼠</w>",
1880
+ "934": "龙</w>",
1881
+ "935": "﹐</w>",
1882
+ "936": "!</w>",
1883
+ "937": ",</w>",
1884
+ "938": "?</w>"
1885
+ },
1886
+ "merges": [
1887
+ [
1888
+ "。",
1889
+ "</w>"
1890
+ ],
1891
+ [
1892
+ "我",
1893
+ "</w>"
1894
+ ],
1895
+ [
1896
+ "的",
1897
+ "</w>"
1898
+ ],
1899
+ [
1900
+ "了",
1901
+ "</w>"
1902
+ ],
1903
+ [
1904
+ "他",
1905
+ "</w>"
1906
+ ],
1907
+ [
1908
+ "是",
1909
+ "</w>"
1910
+ ],
1911
+ [
1912
+ "你",
1913
+ "</w>"
1914
+ ],
1915
+ [
1916
+ "这",
1917
+ "</w>"
1918
+ ],
1919
+ [
1920
+ "一",
1921
+ "</w>"
1922
+ ],
1923
+ [
1924
+ ",",
1925
+ "</w>"
1926
+ ],
1927
+ [
1928
+ "不",
1929
+ "</w>"
1930
+ ],
1931
+ [
1932
+ "在",
1933
+ "</w>"
1934
+ ],
1935
+ [
1936
+ "们",
1937
+ "</w>"
1938
+ ],
1939
+ [
1940
+ "有",
1941
+ "</w>"
1942
+ ],
1943
+ [
1944
+ "个",
1945
+ "</w>"
1946
+ ],
1947
+ [
1948
+ "?",
1949
+ "</w>"
1950
+ ],
1951
+ [
1952
+ "她",
1953
+ "</w>"
1954
+ ],
1955
+ [
1956
+ "很",
1957
+ "</w>"
1958
+ ],
1959
+ [
1960
+ "会",
1961
+ "</w>"
1962
+ ],
1963
+ [
1964
+ "去",
1965
+ "</w>"
1966
+ ],
1967
+ [
1968
+ "人",
1969
+ "</w>"
1970
+ ],
1971
+ [
1972
+ "要",
1973
+ "</w>"
1974
+ ],
1975
+ [
1976
+ "来",
1977
+ "</w>"
1978
+ ],
1979
+ [
1980
+ "生",
1981
+ "</w>"
1982
+ ],
1983
+ [
1984
+ "得",
1985
+ "</w>"
1986
+ ],
1987
+ [
1988
+ "上",
1989
+ "</w>"
1990
+ ],
1991
+ [
1992
+ "天",
1993
+ "</w>"
1994
+ ],
1995
+ [
1996
+ "就",
1997
+ "</w>"
1998
+ ],
1999
+ [
2000
+ "子",
2001
+ "</w>"
2002
+ ],
2003
+ [
2004
+ "到",
2005
+ "</w>"
2006
+ ],
2007
+ [
2008
+ "车",
2009
+ "</w>"
2010
+ ],
2011
+ [
2012
+ "么",
2013
+ "</w>"
2014
+ ],
2015
+ [
2016
+ "吗",
2017
+ "</w>"
2018
+ ],
2019
+ [
2020
+ "没",
2021
+ "</w>"
2022
+ ],
2023
+ [
2024
+ "里",
2025
+ "</w>"
2026
+ ],
2027
+ [
2028
+ "能",
2029
+ "</w>"
2030
+ ],
2031
+ [
2032
+ "想",
2033
+ "</w>"
2034
+ ],
2035
+ [
2036
+ "大",
2037
+ "</w>"
2038
+ ],
2039
+ [
2040
+ "可",
2041
+ "</w>"
2042
+ ],
2043
+ [
2044
+ "说",
2045
+ "</w>"
2046
+ ],
2047
+ [
2048
+ "那",
2049
+ "</w>"
2050
+ ],
2051
+ [
2052
+ "什",
2053
+ "</w>"
2054
+ ],
2055
+ [
2056
+ "下",
2057
+ "</w>"
2058
+ ],
2059
+ [
2060
+ "对",
2061
+ "</w>"
2062
+ ],
2063
+ [
2064
+ "看",
2065
+ "</w>"
2066
+ ],
2067
+ [
2068
+ "多",
2069
+ "</w>"
2070
+ ],
2071
+ [
2072
+ "!",
2073
+ "</w>"
2074
+ ],
2075
+ [
2076
+ "喜",
2077
+ "</w>"
2078
+ ],
2079
+ [
2080
+ "以",
2081
+ "</w>"
2082
+ ],
2083
+ [
2084
+ "学",
2085
+ "</w>"
2086
+ ],
2087
+ [
2088
+ "过",
2089
+ "</w>"
2090
+ ],
2091
+ [
2092
+ "知",
2093
+ "</w>"
2094
+ ],
2095
+ [
2096
+ "给",
2097
+ "</w>"
2098
+ ],
2099
+ [
2100
+ "都",
2101
+ "</w>"
2102
+ ],
2103
+ [
2104
+ "日",
2105
+ "</w>"
2106
+ ],
2107
+ [
2108
+ "家",
2109
+ "</w>"
2110
+ ],
2111
+ [
2112
+ "事",
2113
+ "</w>"
2114
+ ],
2115
+ [
2116
+ "好",
2117
+ "</w>"
2118
+ ],
2119
+ [
2120
+ "为",
2121
+ "</w>"
2122
+ ],
2123
+ [
2124
+ "行",
2125
+ "</w>"
2126
+ ],
2127
+ [
2128
+ "成",
2129
+ "</w>"
2130
+ ],
2131
+ [
2132
+ "欢",
2133
+ "</w>"
2134
+ ],
2135
+ [
2136
+ "时",
2137
+ "</w>"
2138
+ ],
2139
+ [
2140
+ "也",
2141
+ "</w>"
2142
+ ],
2143
+ [
2144
+ "道",
2145
+ "</w>"
2146
+ ],
2147
+ [
2148
+ "问",
2149
+ "</w>"
2150
+ ],
2151
+ [
2152
+ "开",
2153
+ "</w>"
2154
+ ],
2155
+ [
2156
+ "和",
2157
+ "</w>"
2158
+ ],
2159
+ [
2160
+ "孩",
2161
+ "</w>"
2162
+ ],
2163
+ [
2164
+ "出",
2165
+ "</w>"
2166
+ ],
2167
+ [
2168
+ "快",
2169
+ "</w>"
2170
+ ],
2171
+ [
2172
+ "常",
2173
+ "</w>"
2174
+ ],
2175
+ [
2176
+ "现",
2177
+ "</w>"
2178
+ ],
2179
+ [
2180
+ "间",
2181
+ "</w>"
2182
+ ],
2183
+ [
2184
+ "如",
2185
+ "</w>"
2186
+ ],
2187
+ [
2188
+ "无",
2189
+ "</w>"
2190
+ ],
2191
+ [
2192
+ "法",
2193
+ "</w>"
2194
+ ],
2195
+ [
2196
+ "地",
2197
+ "</w>"
2198
+ ],
2199
+ [
2200
+ "比",
2201
+ "</w>"
2202
+ ],
2203
+ [
2204
+ "回",
2205
+ "</w>"
2206
+ ],
2207
+ [
2208
+ "果",
2209
+ "</w>"
2210
+ ],
2211
+ [
2212
+ "“",
2213
+ "</w>"
2214
+ ],
2215
+ [
2216
+ "样",
2217
+ "</w>"
2218
+ ],
2219
+ [
2220
+ "”",
2221
+ "</w>"
2222
+ ],
2223
+ [
2224
+ "試",
2225
+ "</w>"
2226
+ ],
2227
+ [
2228
+ "从",
2229
+ "</w>"
2230
+ ],
2231
+ [
2232
+ "把",
2233
+ "</w>"
2234
+ ],
2235
+ [
2236
+ "做",
2237
+ "</w>"
2238
+ ],
2239
+ [
2240
+ "老",
2241
+ "</w>"
2242
+ ],
2243
+ [
2244
+ "?",
2245
+ "</w>"
2246
+ ],
2247
+ [
2248
+ "听",
2249
+ "</w>"
2250
+ ],
2251
+ [
2252
+ "本",
2253
+ "</w>"
2254
+ ],
2255
+ [
2256
+ "爸",
2257
+ "</w>"
2258
+ ],
2259
+ [
2260
+ "妈",
2261
+ "</w>"
2262
+ ],
2263
+ [
2264
+ "还",
2265
+ "</w>"
2266
+ ],
2267
+ [
2268
+ "這",
2269
+ "</w>"
2270
+ ],
2271
+ [
2272
+ "年",
2273
+ "</w>"
2274
+ ],
2275
+ [
2276
+ "用",
2277
+ "</w>"
2278
+ ],
2279
+ [
2280
+ "话",
2281
+ "</w>"
2282
+ ],
2283
+ [
2284
+ "旅",
2285
+ "</w>"
2286
+ ],
2287
+ [
2288
+ "明",
2289
+ "</w>"
2290
+ ],
2291
+ [
2292
+ "点",
2293
+ "</w>"
2294
+ ],
2295
+ [
2296
+ "完",
2297
+ "</w>"
2298
+ ],
2299
+ [
2300
+ "月",
2301
+ "</w>"
2302
+ ],
2303
+ [
2304
+ "着",
2305
+ "</w>"
2306
+ ],
2307
+ [
2308
+ "之",
2309
+ "</w>"
2310
+ ],
2311
+ [
2312
+ "周",
2313
+ "</w>"
2314
+ ],
2315
+ [
2316
+ "怎",
2317
+ "</w>"
2318
+ ],
2319
+ [
2320
+ "意",
2321
+ "</w>"
2322
+ ],
2323
+ [
2324
+ "重",
2325
+ "</w>"
2326
+ ],
2327
+ [
2328
+ "工",
2329
+ "</w>"
2330
+ ],
2331
+ [
2332
+ "哪",
2333
+ "</w>"
2334
+ ],
2335
+ [
2336
+ "国",
2337
+ "</w>"
2338
+ ],
2339
+ [
2340
+ "正",
2341
+ "</w>"
2342
+ ],
2343
+ [
2344
+ "游",
2345
+ "</w>"
2346
+ ],
2347
+ [
2348
+ "发",
2349
+ "</w>"
2350
+ ],
2351
+ [
2352
+ "起",
2353
+ "</w>"
2354
+ ],
2355
+ [
2356
+ "作",
2357
+ "</w>"
2358
+ ],
2359
+ [
2360
+ "些",
2361
+ "</w>"
2362
+ ],
2363
+ [
2364
+ "麼",
2365
+ "</w>"
2366
+ ],
2367
+ [
2368
+ "走",
2369
+ "</w>"
2370
+ ],
2371
+ [
2372
+ "后",
2373
+ "</w>"
2374
+ ],
2375
+ [
2376
+ "认",
2377
+ "</w>"
2378
+ ],
2379
+ [
2380
+ "前",
2381
+ "</w>"
2382
+ ],
2383
+ [
2384
+ ".",
2385
+ "</w>"
2386
+ ],
2387
+ [
2388
+ "物",
2389
+ "</w>"
2390
+ ],
2391
+ [
2392
+ "0",
2393
+ "</w>"
2394
+ ],
2395
+ [
2396
+ "美",
2397
+ "</w>"
2398
+ ],
2399
+ [
2400
+ "元",
2401
+ "</w>"
2402
+ ],
2403
+ [
2404
+ "它",
2405
+ "</w>"
2406
+ ],
2407
+ [
2408
+ "房",
2409
+ "</w>"
2410
+ ],
2411
+ [
2412
+ "员",
2413
+ "</w>"
2414
+ ],
2415
+ [
2416
+ "太",
2417
+ "</w>"
2418
+ ],
2419
+ [
2420
+ "几",
2421
+ "</w>"
2422
+ ],
2423
+ [
2424
+ "期",
2425
+ "</w>"
2426
+ ],
2427
+ [
2428
+ "球",
2429
+ "</w>"
2430
+ ],
2431
+ [
2432
+ "乐",
2433
+ "</w>"
2434
+ ],
2435
+ [
2436
+ "部",
2437
+ "</w>"
2438
+ ],
2439
+ [
2440
+ "书",
2441
+ "</w>"
2442
+ ],
2443
+ [
2444
+ "候",
2445
+ "</w>"
2446
+ ],
2447
+ [
2448
+ "但",
2449
+ "</w>"
2450
+ ],
2451
+ [
2452
+ "小",
2453
+ "</w>"
2454
+ ],
2455
+ [
2456
+ "自",
2457
+ "</w>"
2458
+ ],
2459
+ [
2460
+ "情",
2461
+ "</w>"
2462
+ ],
2463
+ [
2464
+ "讲",
2465
+ "</w>"
2466
+ ],
2467
+ [
2468
+ "经",
2469
+ "</w>"
2470
+ ],
2471
+ [
2472
+ "电",
2473
+ "</w>"
2474
+ ],
2475
+ [
2476
+ "高",
2477
+ "</w>"
2478
+ ],
2479
+ [
2480
+ "觉",
2481
+ "</w>"
2482
+ ],
2483
+ [
2484
+ "感",
2485
+ "</w>"
2486
+ ],
2487
+ [
2488
+ "直",
2489
+ "</w>"
2490
+ ],
2491
+ [
2492
+ "请",
2493
+ "</w>"
2494
+ ],
2495
+ [
2496
+ "告",
2497
+ "</w>"
2498
+ ],
2499
+ [
2500
+ "妹",
2501
+ "</w>"
2502
+ ],
2503
+ [
2504
+ "住",
2505
+ "</w>"
2506
+ ],
2507
+ [
2508
+ "让",
2509
+ "</w>"
2510
+ ],
2511
+ [
2512
+ "活",
2513
+ "</w>"
2514
+ ],
2515
+ [
2516
+ "真",
2517
+ "</w>"
2518
+ ],
2519
+ [
2520
+ "個",
2521
+ "</w>"
2522
+ ],
2523
+ [
2524
+ "始",
2525
+ "</w>"
2526
+ ],
2527
+ [
2528
+ "信",
2529
+ "</w>"
2530
+ ],
2531
+ [
2532
+ "更",
2533
+ "</w>"
2534
+ ],
2535
+ [
2536
+ "号",
2537
+ "</w>"
2538
+ ],
2539
+ [
2540
+ "們",
2541
+ "</w>"
2542
+ ],
2543
+ [
2544
+ "件",
2545
+ "</w>"
2546
+ ],
2547
+ [
2548
+ "外",
2549
+ "</w>"
2550
+ ],
2551
+ [
2552
+ "见",
2553
+ "</w>"
2554
+ ],
2555
+ [
2556
+ "于",
2557
+ "</w>"
2558
+ ],
2559
+ [
2560
+ "喝",
2561
+ "</w>"
2562
+ ],
2563
+ [
2564
+ "爱",
2565
+ "</w>"
2566
+ ],
2567
+ [
2568
+ "班",
2569
+ "</w>"
2570
+ ],
2571
+ [
2572
+ "少",
2573
+ "</w>"
2574
+ ],
2575
+ [
2576
+ "单",
2577
+ "</w>"
2578
+ ],
2579
+ [
2580
+ "世",
2581
+ "</w>"
2582
+ ],
2583
+ [
2584
+ "校",
2585
+ "</w>"
2586
+ ],
2587
+ [
2588
+ "最",
2589
+ "</w>"
2590
+ ],
2591
+ [
2592
+ "定",
2593
+ "</w>"
2594
+ ],
2595
+ [
2596
+ "力",
2597
+ "</w>"
2598
+ ],
2599
+ [
2600
+ "何",
2601
+ "</w>"
2602
+ ],
2603
+ [
2604
+ "吧",
2605
+ "</w>"
2606
+ ],
2607
+ [
2608
+ "该",
2609
+ "</w>"
2610
+ ],
2611
+ [
2612
+ "接",
2613
+ "</w>"
2614
+ ],
2615
+ [
2616
+ "将",
2617
+ "</w>"
2618
+ ],
2619
+ [
2620
+ "难",
2621
+ "</w>"
2622
+ ],
2623
+ [
2624
+ "识",
2625
+ "</w>"
2626
+ ],
2627
+ [
2628
+ "密",
2629
+ "</w>"
2630
+ ],
2631
+ [
2632
+ "打",
2633
+ "</w>"
2634
+ ],
2635
+ [
2636
+ "非",
2637
+ "</w>"
2638
+ ],
2639
+ [
2640
+ "中",
2641
+ "</w>"
2642
+ ],
2643
+ [
2644
+ "诉",
2645
+ "</w>"
2646
+ ],
2647
+ [
2648
+ "许",
2649
+ "</w>"
2650
+ ],
2651
+ [
2652
+ "i",
2653
+ "r"
2654
+ ],
2655
+ [
2656
+ "u",
2657
+ "ir"
2658
+ ],
2659
+ [
2660
+ "e",
2661
+ "l"
2662
+ ],
2663
+ [
2664
+ "m",
2665
+ "uir"
2666
+ ],
2667
+ [
2668
+ "i",
2669
+ "el"
2670
+ ],
2671
+ [
2672
+ "muir",
2673
+ "iel"
2674
+ ],
2675
+ [
2676
+ "muiriel",
2677
+ "</w>"
2678
+ ],
2679
+ [
2680
+ "再",
2681
+ "</w>"
2682
+ ],
2683
+ [
2684
+ "相",
2685
+ "</w>"
2686
+ ],
2687
+ [
2688
+ "其",
2689
+ "</w>"
2690
+ ],
2691
+ [
2692
+ "心",
2693
+ "</w>"
2694
+ ],
2695
+ [
2696
+ "长",
2697
+ "</w>"
2698
+ ],
2699
+ [
2700
+ "取",
2701
+ "</w>"
2702
+ ],
2703
+ [
2704
+ "语",
2705
+ "</w>"
2706
+ ],
2707
+ [
2708
+ "网",
2709
+ "</w>"
2710
+ ],
2711
+ [
2712
+ "消",
2713
+ "</w>"
2714
+ ],
2715
+ [
2716
+ "息",
2717
+ "</w>"
2718
+ ],
2719
+ [
2720
+ "惊",
2721
+ "</w>"
2722
+ ],
2723
+ [
2724
+ "等",
2725
+ "</w>"
2726
+ ],
2727
+ [
2728
+ "公",
2729
+ "</w>"
2730
+ ],
2731
+ [
2732
+ "简",
2733
+ "</w>"
2734
+ ],
2735
+ [
2736
+ "被",
2737
+ "</w>"
2738
+ ],
2739
+ [
2740
+ "种",
2741
+ "</w>"
2742
+ ],
2743
+ [
2744
+ "趣",
2745
+ "</w>"
2746
+ ],
2747
+ [
2748
+ "已",
2749
+ "</w>"
2750
+ ],
2751
+ [
2752
+ "影",
2753
+ "</w>"
2754
+ ],
2755
+ [
2756
+ "疑",
2757
+ "</w>"
2758
+ ],
2759
+ [
2760
+ "史",
2761
+ "</w>"
2762
+ ],
2763
+ [
2764
+ "题",
2765
+ "</w>"
2766
+ ],
2767
+ [
2768
+ "啊",
2769
+ "</w>"
2770
+ ],
2771
+ [
2772
+ "同",
2773
+ "</w>"
2774
+ ],
2775
+ [
2776
+ "睡",
2777
+ "</w>"
2778
+ ],
2779
+ [
2780
+ "离",
2781
+ "</w>"
2782
+ ],
2783
+ [
2784
+ "三",
2785
+ "</w>"
2786
+ ],
2787
+ [
2788
+ "方",
2789
+ "</w>"
2790
+ ],
2791
+ [
2792
+ "响",
2793
+ "</w>"
2794
+ ],
2795
+ [
2796
+ "兴",
2797
+ "</w>"
2798
+ ],
2799
+ [
2800
+ "医",
2801
+ "</w>"
2802
+ ],
2803
+ [
2804
+ "建",
2805
+ "</w>"
2806
+ ],
2807
+ [
2808
+ "议",
2809
+ "</w>"
2810
+ ],
2811
+ [
2812
+ "戒",
2813
+ "</w>"
2814
+ ],
2815
+ [
2816
+ "坐",
2817
+ "</w>"
2818
+ ],
2819
+ [
2820
+ "向",
2821
+ "</w>"
2822
+ ],
2823
+ [
2824
+ "切",
2825
+ "</w>"
2826
+ ],
2827
+ [
2828
+ "读",
2829
+ "</w>"
2830
+ ],
2831
+ [
2832
+ "火",
2833
+ "</w>"
2834
+ ],
2835
+ [
2836
+ "斯",
2837
+ "</w>"
2838
+ ],
2839
+ [
2840
+ "计",
2841
+ "</w>"
2842
+ ],
2843
+ [
2844
+ "往",
2845
+ "</w>"
2846
+ ],
2847
+ [
2848
+ "問",
2849
+ "</w>"
2850
+ ],
2851
+ [
2852
+ "除",
2853
+ "</w>"
2854
+ ],
2855
+ [
2856
+ "罗",
2857
+ "</w>"
2858
+ ],
2859
+ [
2860
+ "马",
2861
+ "</w>"
2862
+ ],
2863
+ [
2864
+ "任",
2865
+ "</w>"
2866
+ ],
2867
+ [
2868
+ "必",
2869
+ "</w>"
2870
+ ],
2871
+ [
2872
+ "须",
2873
+ "</w>"
2874
+ ],
2875
+ [
2876
+ "新",
2877
+ "</w>"
2878
+ ],
2879
+ [
2880
+ "客",
2881
+ "</w>"
2882
+ ],
2883
+ [
2884
+ "今",
2885
+ "</w>"
2886
+ ],
2887
+ [
2888
+ "而",
2889
+ "</w>"
2890
+ ],
2891
+ [
2892
+ "水",
2893
+ "</w>"
2894
+ ],
2895
+ [
2896
+ "名",
2897
+ "</w>"
2898
+ ],
2899
+ [
2900
+ "变",
2901
+ "</w>"
2902
+ ],
2903
+ [
2904
+ "界",
2905
+ "</w>"
2906
+ ],
2907
+ [
2908
+ "加",
2909
+ "</w>"
2910
+ ],
2911
+ [
2912
+ "使",
2913
+ "</w>"
2914
+ ],
2915
+ [
2916
+ "毫",
2917
+ "</w>"
2918
+ ],
2919
+ [
2920
+ "习",
2921
+ "</w>"
2922
+ ],
2923
+ [
2924
+ "玩",
2925
+ "</w>"
2926
+ ],
2927
+ [
2928
+ "耍",
2929
+ "</w>"
2930
+ ],
2931
+ [
2932
+ "记",
2933
+ "</w>"
2934
+ ],
2935
+ [
2936
+ "分",
2937
+ "</w>"
2938
+ ],
2939
+ [
2940
+ "待",
2941
+ "</w>"
2942
+ ],
2943
+ [
2944
+ "男",
2945
+ "</w>"
2946
+ ],
2947
+ [
2948
+ "俱",
2949
+ "</w>"
2950
+ ],
2951
+ [
2952
+ "图",
2953
+ "</w>"
2954
+ ],
2955
+ [
2956
+ "笑",
2957
+ "</w>"
2958
+ ],
2959
+ [
2960
+ "述",
2961
+ "</w>"
2962
+ ],
2963
+ [
2964
+ "理",
2965
+ "</w>"
2966
+ ],
2967
+ [
2968
+ "由",
2969
+ "</w>"
2970
+ ],
2971
+ [
2972
+ "山",
2973
+ "</w>"
2974
+ ],
2975
+ [
2976
+ "式",
2977
+ "</w>"
2978
+ ],
2979
+ [
2980
+ "己",
2981
+ "</w>"
2982
+ ],
2983
+ [
2984
+ "學",
2985
+ "</w>"
2986
+ ],
2987
+ [
2988
+ "目",
2989
+ "</w>"
2990
+ ],
2991
+ [
2992
+ "面",
2993
+ "</w>"
2994
+ ],
2995
+ [
2996
+ "骑",
2997
+ "</w>"
2998
+ ],
2999
+ [
3000
+ "实",
3001
+ "</w>"
3002
+ ],
3003
+ [
3004
+ "時",
3005
+ "</w>"
3006
+ ],
3007
+ [
3008
+ "服",
3009
+ "</w>"
3010
+ ],
3011
+ [
3012
+ "合",
3013
+ "</w>"
3014
+ ],
3015
+ [
3016
+ "手",
3017
+ "</w>"
3018
+ ],
3019
+ [
3020
+ "第",
3021
+ "</w>"
3022
+ ],
3023
+ [
3024
+ "母",
3025
+ "</w>"
3026
+ ],
3027
+ [
3028
+ "留",
3029
+ "</w>"
3030
+ ],
3031
+ [
3032
+ "买",
3033
+ "</w>"
3034
+ ],
3035
+ [
3036
+ "准",
3037
+ "</w>"
3038
+ ],
3039
+ [
3040
+ "权",
3041
+ "</w>"
3042
+ ],
3043
+ [
3044
+ "烟",
3045
+ "</w>"
3046
+ ],
3047
+ [
3048
+ "忙",
3049
+ "</w>"
3050
+ ],
3051
+ [
3052
+ "找",
3053
+ "</w>"
3054
+ ],
3055
+ [
3056
+ "應",
3057
+ "</w>"
3058
+ ],
3059
+ [
3060
+ "該",
3061
+ "</w>"
3062
+ ],
3063
+ [
3064
+ "乎",
3065
+ "</w>"
3066
+ ],
3067
+ [
3068
+ "放",
3069
+ "</w>"
3070
+ ],
3071
+ [
3072
+ "站",
3073
+ "</w>"
3074
+ ],
3075
+ [
3076
+ "早",
3077
+ "</w>"
3078
+ ],
3079
+ [
3080
+ "度",
3081
+ "</w>"
3082
+ ],
3083
+ [
3084
+ "交",
3085
+ "</w>"
3086
+ ],
3087
+ [
3088
+ "樣",
3089
+ "</w>"
3090
+ ],
3091
+ [
3092
+ "十",
3093
+ "</w>"
3094
+ ],
3095
+ [
3096
+ "足",
3097
+ "</w>"
3098
+ ],
3099
+ [
3100
+ "解",
3101
+ "</w>"
3102
+ ],
3103
+ [
3104
+ "底",
3105
+ "</w>"
3106
+ ],
3107
+ [
3108
+ "題",
3109
+ "</w>"
3110
+ ],
3111
+ [
3112
+ "死",
3113
+ "</w>"
3114
+ ],
3115
+ [
3116
+ "宇",
3117
+ "</w>"
3118
+ ],
3119
+ [
3120
+ "限",
3121
+ "</w>"
3122
+ ],
3123
+ [
3124
+ "通",
3125
+ "</w>"
3126
+ ],
3127
+ [
3128
+ "庭",
3129
+ "</w>"
3130
+ ],
3131
+ [
3132
+ "秘",
3133
+ "</w>"
3134
+ ],
3135
+ [
3136
+ "光",
3137
+ "</w>"
3138
+ ],
3139
+ [
3140
+ "错",
3141
+ "</w>"
3142
+ ],
3143
+ [
3144
+ "务",
3145
+ "</w>"
3146
+ ],
3147
+ [
3148
+ "當",
3149
+ "</w>"
3150
+ ],
3151
+ [
3152
+ "广",
3153
+ "</w>"
3154
+ ],
3155
+ [
3156
+ "场",
3157
+ "</w>"
3158
+ ],
3159
+ [
3160
+ "险",
3161
+ "</w>"
3162
+ ],
3163
+ [
3164
+ "昨",
3165
+ "</w>"
3166
+ ],
3167
+ [
3168
+ "e",
3169
+ "</w>"
3170
+ ],
3171
+ [
3172
+ "望",
3173
+ "</w>"
3174
+ ],
3175
+ [
3176
+ "轻",
3177
+ "</w>"
3178
+ ],
3179
+ [
3180
+ "所",
3181
+ "</w>"
3182
+ ],
3183
+ [
3184
+ "需",
3185
+ "</w>"
3186
+ ],
3187
+ [
3188
+ "帮",
3189
+ "</w>"
3190
+ ],
3191
+ [
3192
+ "偷",
3193
+ "</w>"
3194
+ ],
3195
+ [
3196
+ "岁",
3197
+ "</w>"
3198
+ ],
3199
+ [
3200
+ "酒",
3201
+ "</w>"
3202
+ ],
3203
+ [
3204
+ "园",
3205
+ "</w>"
3206
+ ],
3207
+ [
3208
+ "雨",
3209
+ "</w>"
3210
+ ],
3211
+ [
3212
+ "然",
3213
+ "</w>"
3214
+ ],
3215
+ [
3216
+ "每",
3217
+ "</w>"
3218
+ ],
3219
+ [
3220
+ "像",
3221
+ "</w>"
3222
+ ],
3223
+ [
3224
+ "功",
3225
+ "</w>"
3226
+ ],
3227
+ [
3228
+ "6",
3229
+ "</w>"
3230
+ ],
3231
+ [
3232
+ "写",
3233
+ "</w>"
3234
+ ],
3235
+ [
3236
+ "照",
3237
+ "</w>"
3238
+ ],
3239
+ [
3240
+ "猫",
3241
+ "</w>"
3242
+ ],
3243
+ [
3244
+ "划",
3245
+ "</w>"
3246
+ ],
3247
+ [
3248
+ "赛",
3249
+ "</w>"
3250
+ ],
3251
+ [
3252
+ "增",
3253
+ "</w>"
3254
+ ],
3255
+ [
3256
+ "则",
3257
+ "</w>"
3258
+ ],
3259
+ [
3260
+ "全",
3261
+ "</w>"
3262
+ ],
3263
+ [
3264
+ "洗",
3265
+ "</w>"
3266
+ ],
3267
+ [
3268
+ "1",
3269
+ "0</w>"
3270
+ ],
3271
+ [
3272
+ "义",
3273
+ "</w>"
3274
+ ],
3275
+ [
3276
+ "儿",
3277
+ "</w>"
3278
+ ],
3279
+ [
3280
+ "籍",
3281
+ "</w>"
3282
+ ],
3283
+ [
3284
+ "哦",
3285
+ "</w>"
3286
+ ],
3287
+ [
3288
+ "尊",
3289
+ "</w>"
3290
+ ],
3291
+ [
3292
+ "敬",
3293
+ "</w>"
3294
+ ],
3295
+ [
3296
+ "辈",
3297
+ "</w>"
3298
+ ],
3299
+ [
3300
+ "另",
3301
+ "</w>"
3302
+ ],
3303
+ [
3304
+ "程",
3305
+ "</w>"
3306
+ ],
3307
+ [
3308
+ "英",
3309
+ "</w>"
3310
+ ],
3311
+ [
3312
+ "师",
3313
+ "</w>"
3314
+ ],
3315
+ [
3316
+ "例",
3317
+ "</w>"
3318
+ ],
3319
+ [
3320
+ "腾",
3321
+ "</w>"
3322
+ ],
3323
+ [
3324
+ "钟",
3325
+ "</w>"
3326
+ ],
3327
+ [
3328
+ "吃",
3329
+ "</w>"
3330
+ ],
3331
+ [
3332
+ "脸",
3333
+ "</w>"
3334
+ ],
3335
+ [
3336
+ "据",
3337
+ "</w>"
3338
+ ],
3339
+ [
3340
+ "座",
3341
+ "</w>"
3342
+ ],
3343
+ [
3344
+ "雪",
3345
+ "</w>"
3346
+ ],
3347
+ [
3348
+ "款",
3349
+ "</w>"
3350
+ ],
3351
+ [
3352
+ "帽",
3353
+ "</w>"
3354
+ ],
3355
+ [
3356
+ "当",
3357
+ "</w>"
3358
+ ],
3359
+ [
3360
+ "办",
3361
+ "</w>"
3362
+ ],
3363
+ [
3364
+ "後",
3365
+ "</w>"
3366
+ ],
3367
+ [
3368
+ "厌",
3369
+ "</w>"
3370
+ ],
3371
+ [
3372
+ "倦",
3373
+ "</w>"
3374
+ ],
3375
+ [
3376
+ "观",
3377
+ "</w>"
3378
+ ],
3379
+ [
3380
+ "众",
3381
+ "</w>"
3382
+ ],
3383
+ [
3384
+ "制",
3385
+ "</w>"
3386
+ ],
3387
+ [
3388
+ "造",
3389
+ "</w>"
3390
+ ],
3391
+ [
3392
+ "借",
3393
+ "</w>"
3394
+ ],
3395
+ [
3396
+ "口",
3397
+ "</w>"
3398
+ ],
3399
+ [
3400
+ "石",
3401
+ "</w>"
3402
+ ],
3403
+ [
3404
+ "故",
3405
+ "</w>"
3406
+ ],
3407
+ [
3408
+ "艺",
3409
+ "</w>"
3410
+ ],
3411
+ [
3412
+ "术",
3413
+ "</w>"
3414
+ ],
3415
+ [
3416
+ "采",
3417
+ "</w>"
3418
+ ],
3419
+ [
3420
+ "预",
3421
+ "</w>"
3422
+ ],
3423
+ [
3424
+ "沒",
3425
+ "</w>"
3426
+ ],
3427
+ [
3428
+ "历",
3429
+ "</w>"
3430
+ ],
3431
+ [
3432
+ "肯",
3433
+ "</w>"
3434
+ ],
3435
+ [
3436
+ "毛",
3437
+ "</w>"
3438
+ ],
3439
+ [
3440
+ "条",
3441
+ "</w>"
3442
+ ],
3443
+ [
3444
+ "路",
3445
+ "</w>"
3446
+ ],
3447
+ [
3448
+ "父",
3449
+ "</w>"
3450
+ ],
3451
+ [
3452
+ "两",
3453
+ "</w>"
3454
+ ],
3455
+ [
3456
+ "受",
3457
+ "</w>"
3458
+ ],
3459
+ [
3460
+ "船",
3461
+ "</w>"
3462
+ ],
3463
+ [
3464
+ "朝",
3465
+ "</w>"
3466
+ ],
3467
+ [
3468
+ "确",
3469
+ "</w>"
3470
+ ],
3471
+ [
3472
+ "保",
3473
+ "</w>"
3474
+ ],
3475
+ [
3476
+ "覺",
3477
+ "</w>"
3478
+ ],
3479
+ [
3480
+ "先",
3481
+ "</w>"
3482
+ ],
3483
+ [
3484
+ "示",
3485
+ "</w>"
3486
+ ],
3487
+ [
3488
+ "温",
3489
+ "</w>"
3490
+ ],
3491
+ [
3492
+ "零",
3493
+ "</w>"
3494
+ ],
3495
+ [
3496
+ "报",
3497
+ "</w>"
3498
+ ],
3499
+ [
3500
+ "失",
3501
+ "</w>"
3502
+ ],
3503
+ [
3504
+ "视",
3505
+ "</w>"
3506
+ ],
3507
+ [
3508
+ "线",
3509
+ "</w>"
3510
+ ],
3511
+ [
3512
+ "士",
3513
+ "</w>"
3514
+ ],
3515
+ [
3516
+ "只",
3517
+ "</w>"
3518
+ ],
3519
+ [
3520
+ "宙",
3521
+ "</w>"
3522
+ ],
3523
+ [
3524
+ "晚",
3525
+ "</w>"
3526
+ ],
3527
+ [
3528
+ "声",
3529
+ "</w>"
3530
+ ],
3531
+ [
3532
+ "星",
3533
+ "</w>"
3534
+ ],
3535
+ [
3536
+ "歐",
3537
+ "</w>"
3538
+ ],
3539
+ [
3540
+ "歡",
3541
+ "</w>"
3542
+ ],
3543
+ [
3544
+ "神",
3545
+ "</w>"
3546
+ ],
3547
+ [
3548
+ "點",
3549
+ "</w>"
3550
+ ],
3551
+ [
3552
+ "热",
3553
+ "</w>"
3554
+ ],
3555
+ [
3556
+ "收",
3557
+ "</w>"
3558
+ ],
3559
+ [
3560
+ "短",
3561
+ "</w>"
3562
+ ],
3563
+ [
3564
+ "食",
3565
+ "</w>"
3566
+ ],
3567
+ [
3568
+ "欲",
3569
+ "</w>"
3570
+ ],
3571
+ [
3572
+ "钱",
3573
+ "</w>"
3574
+ ],
3575
+ [
3576
+ "圣",
3577
+ "</w>"
3578
+ ],
3579
+ [
3580
+ "夏",
3581
+ "</w>"
3582
+ ],
3583
+ [
3584
+ "总",
3585
+ "</w>"
3586
+ ],
3587
+ [
3588
+ "满",
3589
+ "</w>"
3590
+ ],
3591
+ [
3592
+ "室",
3593
+ "</w>"
3594
+ ],
3595
+ [
3596
+ "河",
3597
+ "</w>"
3598
+ ],
3599
+ [
3600
+ "危",
3601
+ "</w>"
3602
+ ],
3603
+ [
3604
+ "破",
3605
+ "</w>"
3606
+ ],
3607
+ [
3608
+ "惜",
3609
+ "</w>"
3610
+ ],
3611
+ [
3612
+ "蠢",
3613
+ "</w>"
3614
+ ],
3615
+ [
3616
+ "來",
3617
+ "</w>"
3618
+ ],
3619
+ [
3620
+ "過",
3621
+ "</w>"
3622
+ ],
3623
+ [
3624
+ "拥",
3625
+ "</w>"
3626
+ ],
3627
+ [
3628
+ "位",
3629
+ "</w>"
3630
+ ],
3631
+ [
3632
+ "冰",
3633
+ "</w>"
3634
+ ],
3635
+ [
3636
+ "乘",
3637
+ "</w>"
3638
+ ],
3639
+ [
3640
+ "备",
3641
+ "</w>"
3642
+ ],
3643
+ [
3644
+ "杯",
3645
+ "</w>"
3646
+ ],
3647
+ [
3648
+ "床",
3649
+ "</w>"
3650
+ ],
3651
+ [
3652
+ "說",
3653
+ "</w>"
3654
+ ],
3655
+ [
3656
+ "才",
3657
+ "</w>"
3658
+ ],
3659
+ [
3660
+ "支",
3661
+ "</w>"
3662
+ ],
3663
+ [
3664
+ "布",
3665
+ "</w>"
3666
+ ],
3667
+ [
3668
+ "订",
3669
+ "</w>"
3670
+ ],
3671
+ [
3672
+ "慢",
3673
+ "</w>"
3674
+ ],
3675
+ [
3676
+ "半",
3677
+ "</w>"
3678
+ ],
3679
+ [
3680
+ "會",
3681
+ "</w>"
3682
+ ],
3683
+ [
3684
+ "决",
3685
+ "</w>"
3686
+ ],
3687
+ [
3688
+ "某",
3689
+ "</w>"
3690
+ ],
3691
+ [
3692
+ "业",
3693
+ "</w>"
3694
+ ],
3695
+ [
3696
+ "城",
3697
+ "</w>"
3698
+ ],
3699
+ [
3700
+ "市",
3701
+ "</w>"
3702
+ ],
3703
+ [
3704
+ "应",
3705
+ "</w>"
3706
+ ],
3707
+ [
3708
+ "付",
3709
+ "</w>"
3710
+ ],
3711
+ [
3712
+ "2",
3713
+ "0</w>"
3714
+ ],
3715
+ [
3716
+ "隻",
3717
+ "</w>"
3718
+ ],
3719
+ [
3720
+ "严",
3721
+ "</w>"
3722
+ ],
3723
+ [
3724
+ "庙",
3725
+ "</w>"
3726
+ ],
3727
+ [
3728
+ "考",
3729
+ "</w>"
3730
+ ],
3731
+ [
3732
+ "虑",
3733
+ "</w>"
3734
+ ],
3735
+ [
3736
+ "停",
3737
+ "</w>"
3738
+ ],
3739
+ [
3740
+ "码",
3741
+ "</w>"
3742
+ ],
3743
+ [
3744
+ "眼",
3745
+ "</w>"
3746
+ ],
3747
+ [
3748
+ "色",
3749
+ "</w>"
3750
+ ],
3751
+ [
3752
+ "弟",
3753
+ "</w>"
3754
+ ],
3755
+ [
3756
+ "夜",
3757
+ "</w>"
3758
+ ],
3759
+ [
3760
+ "話",
3761
+ "</w>"
3762
+ ],
3763
+ [
3764
+ "缺",
3765
+ "</w>"
3766
+ ],
3767
+ [
3768
+ "验",
3769
+ "</w>"
3770
+ ],
3771
+ [
3772
+ "费",
3773
+ "</w>"
3774
+ ],
3775
+ [
3776
+ "票",
3777
+ "</w>"
3778
+ ],
3779
+ [
3780
+ "格",
3781
+ "</w>"
3782
+ ],
3783
+ [
3784
+ "批",
3785
+ "</w>"
3786
+ ],
3787
+ [
3788
+ "评",
3789
+ "</w>"
3790
+ ],
3791
+ [
3792
+ "达",
3793
+ "</w>"
3794
+ ],
3795
+ [
3796
+ "干",
3797
+ "</w>"
3798
+ ],
3799
+ [
3800
+ "…",
3801
+ "</w>"
3802
+ ],
3803
+ [
3804
+ "架",
3805
+ "</w>"
3806
+ ],
3807
+ [
3808
+ "次",
3809
+ "</w>"
3810
+ ],
3811
+ [
3812
+ "跑",
3813
+ "</w>"
3814
+ ],
3815
+ [
3816
+ "金",
3817
+ "</w>"
3818
+ ],
3819
+ [
3820
+ "屈",
3821
+ "</w>"
3822
+ ],
3823
+ [
3824
+ "止",
3825
+ "</w>"
3826
+ ],
3827
+ [
3828
+ "松",
3829
+ "</w>"
3830
+ ],
3831
+ [
3832
+ "牛",
3833
+ "</w>"
3834
+ ],
3835
+ [
3836
+ "j",
3837
+ "a"
3838
+ ],
3839
+ [
3840
+ "教",
3841
+ "</w>"
3842
+ ],
3843
+ [
3844
+ "言",
3845
+ "</w>"
3846
+ ],
3847
+ [
3848
+ "终",
3849
+ "</w>"
3850
+ ],
3851
+ [
3852
+ "讶",
3853
+ "</w>"
3854
+ ],
3855
+ [
3856
+ "、",
3857
+ "</w>"
3858
+ ],
3859
+ [
3860
+ "奇",
3861
+ "</w>"
3862
+ ],
3863
+ [
3864
+ "白",
3865
+ "</w>"
3866
+ ],
3867
+ [
3868
+ "谢",
3869
+ "</w>"
3870
+ ],
3871
+ [
3872
+ "况",
3873
+ "</w>"
3874
+ ],
3875
+ [
3876
+ "念",
3877
+ "</w>"
3878
+ ],
3879
+ [
3880
+ "裡",
3881
+ "</w>"
3882
+ ],
3883
+ [
3884
+ "\"",
3885
+ "</w>"
3886
+ ],
3887
+ [
3888
+ "参",
3889
+ "</w>"
3890
+ ],
3891
+ [
3892
+ "动",
3893
+ "</w>"
3894
+ ],
3895
+ [
3896
+ "茶",
3897
+ "</w>"
3898
+ ],
3899
+ [
3900
+ "午",
3901
+ "</w>"
3902
+ ],
3903
+ [
3904
+ "疯",
3905
+ "</w>"
3906
+ ],
3907
+ [
3908
+ "囚",
3909
+ "</w>"
3910
+ ],
3911
+ [
3912
+ "笼",
3913
+ "</w>"
3914
+ ],
3915
+ [
3916
+ "叔",
3917
+ "</w>"
3918
+ ],
3919
+ [
3920
+ "幸",
3921
+ "</w>"
3922
+ ],
3923
+ [
3924
+ "!",
3925
+ "</w>"
3926
+ ],
3927
+ [
3928
+ "狗",
3929
+ "</w>"
3930
+ ],
3931
+ [
3932
+ "字",
3933
+ "</w>"
3934
+ ],
3935
+ [
3936
+ "迟",
3937
+ "</w>"
3938
+ ],
3939
+ [
3940
+ "改",
3941
+ "</w>"
3942
+ ],
3943
+ [
3944
+ "宝",
3945
+ "</w>"
3946
+ ],
3947
+ [
3948
+ "随",
3949
+ "</w>"
3950
+ ],
3951
+ [
3952
+ "推",
3953
+ "</w>"
3954
+ ],
3955
+ [
3956
+ "移",
3957
+ "</w>"
3958
+ ],
3959
+ [
3960
+ "规",
3961
+ "</w>"
3962
+ ],
3963
+ [
3964
+ "安",
3965
+ "</w>"
3966
+ ],
3967
+ [
3968
+ "脚",
3969
+ "</w>"
3970
+ ],
3971
+ [
3972
+ "欠",
3973
+ "</w>"
3974
+ ],
3975
+ [
3976
+ "嗨",
3977
+ "</w>"
3978
+ ],
3979
+ [
3980
+ "至",
3981
+ "</w>"
3982
+ ],
3983
+ [
3984
+ "关",
3985
+ "</w>"
3986
+ ],
3987
+ [
3988
+ "偏",
3989
+ "</w>"
3990
+ ],
3991
+ [
3992
+ "胖",
3993
+ "</w>"
3994
+ ],
3995
+ [
3996
+ "铭",
3997
+ "</w>"
3998
+ ],
3999
+ [
4000
+ "咖",
4001
+ "</w>"
4002
+ ],
4003
+ [
4004
+ "啡",
4005
+ "</w>"
4006
+ ],
4007
+ [
4008
+ "揉",
4009
+ "</w>"
4010
+ ],
4011
+ [
4012
+ "碎",
4013
+ "</w>"
4014
+ ],
4015
+ [
4016
+ "代",
4017
+ "</w>"
4018
+ ],
4019
+ [
4020
+ "雹",
4021
+ "</w>"
4022
+ ],
4023
+ [
4024
+ "按",
4025
+ "</w>"
4026
+ ],
4027
+ [
4028
+ "处",
4029
+ "</w>"
4030
+ ],
4031
+ [
4032
+ "罚",
4033
+ "</w>"
4034
+ ],
4035
+ [
4036
+ "送",
4037
+ "</w>"
4038
+ ],
4039
+ [
4040
+ "货",
4041
+ "</w>"
4042
+ ],
4043
+ [
4044
+ "精",
4045
+ "</w>"
4046
+ ],
4047
+ [
4048
+ "插",
4049
+ "</w>"
4050
+ ],
4051
+ [
4052
+ "微",
4053
+ "</w>"
4054
+ ],
4055
+ [
4056
+ "试",
4057
+ "</w>"
4058
+ ],
4059
+ [
4060
+ "5",
4061
+ "</w>"
4062
+ ],
4063
+ [
4064
+ "丑",
4065
+ "</w>"
4066
+ ],
4067
+ [
4068
+ "鬼",
4069
+ "</w>"
4070
+ ],
4071
+ [
4072
+ "拉",
4073
+ "</w>"
4074
+ ],
4075
+ [
4076
+ "腿",
4077
+ "</w>"
4078
+ ],
4079
+ [
4080
+ "阐",
4081
+ "</w>"
4082
+ ],
4083
+ [
4084
+ "撒",
4085
+ "</w>"
4086
+ ],
4087
+ [
4088
+ "谎",
4089
+ "</w>"
4090
+ ],
4091
+ [
4092
+ "覆",
4093
+ "</w>"
4094
+ ],
4095
+ [
4096
+ "盖",
4097
+ "</w>"
4098
+ ],
4099
+ [
4100
+ "流",
4101
+ "</w>"
4102
+ ],
4103
+ [
4104
+ "靠",
4105
+ "</w>"
4106
+ ],
4107
+ [
4108
+ "習",
4109
+ "</w>"
4110
+ ],
4111
+ [
4112
+ "坚",
4113
+ "</w>"
4114
+ ],
4115
+ [
4116
+ "标",
4117
+ "</w>"
4118
+ ],
4119
+ [
4120
+ "数",
4121
+ "</w>"
4122
+ ],
4123
+ [
4124
+ "庞",
4125
+ "</w>"
4126
+ ],
4127
+ [
4128
+ "块",
4129
+ "</w>"
4130
+ ],
4131
+ [
4132
+ "岩",
4133
+ "</w>"
4134
+ ],
4135
+ [
4136
+ "落",
4137
+ "</w>"
4138
+ ],
4139
+ [
4140
+ "徒",
4141
+ "</w>"
4142
+ ],
4143
+ [
4144
+ "劳",
4145
+ "</w>"
4146
+ ],
4147
+ [
4148
+ "努",
4149
+ "</w>"
4150
+ ],
4151
+ [
4152
+ "伟",
4153
+ "</w>"
4154
+ ],
4155
+ [
4156
+ "强",
4157
+ "</w>"
4158
+ ],
4159
+ [
4160
+ "防",
4161
+ "</w>"
4162
+ ],
4163
+ [
4164
+ "措",
4165
+ "</w>"
4166
+ ],
4167
+ [
4168
+ "施",
4169
+ "</w>"
4170
+ ],
4171
+ [
4172
+ "摩",
4173
+ "</w>"
4174
+ ],
4175
+ [
4176
+ "托",
4177
+ "</w>"
4178
+ ],
4179
+ [
4180
+ "遛",
4181
+ "</w>"
4182
+ ],
4183
+ [
4184
+ "圈",
4185
+ "</w>"
4186
+ ],
4187
+ [
4188
+ "证",
4189
+ "</w>"
4190
+ ],
4191
+ [
4192
+ "怀",
4193
+ "</w>"
4194
+ ],
4195
+ [
4196
+ "間",
4197
+ "</w>"
4198
+ ],
4199
+ [
4200
+ "克",
4201
+ "</w>"
4202
+ ],
4203
+ [
4204
+ "升",
4205
+ "</w>"
4206
+ ],
4207
+ [
4208
+ "庆",
4209
+ "</w>"
4210
+ ],
4211
+ [
4212
+ "祝",
4213
+ "</w>"
4214
+ ],
4215
+ [
4216
+ "衣",
4217
+ "</w>"
4218
+ ],
4219
+ [
4220
+ "拜",
4221
+ "</w>"
4222
+ ],
4223
+ [
4224
+ "访",
4225
+ "</w>"
4226
+ ],
4227
+ [
4228
+ "因",
4229
+ "</w>"
4230
+ ],
4231
+ [
4232
+ "冒",
4233
+ "</w>"
4234
+ ],
4235
+ [
4236
+ "沿",
4237
+ "</w>"
4238
+ ],
4239
+ [
4240
+ "红",
4241
+ "</w>"
4242
+ ],
4243
+ [
4244
+ "绿",
4245
+ "</w>"
4246
+ ],
4247
+ [
4248
+ "灯",
4249
+ "</w>"
4250
+ ],
4251
+ [
4252
+ "右",
4253
+ "</w>"
4254
+ ],
4255
+ [
4256
+ "转",
4257
+ "</w>"
4258
+ ],
4259
+ [
4260
+ "跟",
4261
+ "</w>"
4262
+ ],
4263
+ [
4264
+ "千",
4265
+ "</w>"
4266
+ ],
4267
+ [
4268
+ "杀",
4269
+ "</w>"
4270
+ ],
4271
+ [
4272
+ "予",
4273
+ "</w>"
4274
+ ],
4275
+ [
4276
+ "寻",
4277
+ "</w>"
4278
+ ],
4279
+ [
4280
+ "逃",
4281
+ "</w>"
4282
+ ],
4283
+ [
4284
+ "途",
4285
+ "</w>"
4286
+ ],
4287
+ [
4288
+ "径",
4289
+ "</w>"
4290
+ ],
4291
+ [
4292
+ "伦",
4293
+ "</w>"
4294
+ ],
4295
+ [
4296
+ "敦",
4297
+ "</w>"
4298
+ ],
4299
+ [
4300
+ "似",
4301
+ "</w>"
4302
+ ],
4303
+ [
4304
+ "派",
4305
+ "</w>"
4306
+ ],
4307
+ [
4308
+ "头",
4309
+ "</w>"
4310
+ ],
4311
+ [
4312
+ "痛",
4313
+ "</w>"
4314
+ ],
4315
+ [
4316
+ "盐",
4317
+ "</w>"
4318
+ ],
4319
+ [
4320
+ "递",
4321
+ "</w>"
4322
+ ],
4323
+ [
4324
+ "指",
4325
+ "</w>"
4326
+ ],
4327
+ [
4328
+ "九",
4329
+ "</w>"
4330
+ ],
4331
+ [
4332
+ "低",
4333
+ "</w>"
4334
+ ],
4335
+ [
4336
+ "挥",
4337
+ "</w>"
4338
+ ],
4339
+ [
4340
+ "段",
4341
+ "</w>"
4342
+ ],
4343
+ [
4344
+ "y",
4345
+ "</w>"
4346
+ ],
4347
+ [
4348
+ "c",
4349
+ "y</w>"
4350
+ ],
4351
+ [
4352
+ "n",
4353
+ "cy</w>"
4354
+ ],
4355
+ [
4356
+ "a",
4357
+ "ncy</w>"
4358
+ ],
4359
+ [
4360
+ "n",
4361
+ "ancy</w>"
4362
+ ],
4363
+ [
4364
+ "私",
4365
+ "</w>"
4366
+ ],
4367
+ [
4368
+ "谈",
4369
+ "</w>"
4370
+ ],
4371
+ [
4372
+ "又",
4373
+ "</w>"
4374
+ ],
4375
+ [
4376
+ "绅",
4377
+ "</w>"
4378
+ ],
4379
+ [
4380
+ "味",
4381
+ "</w>"
4382
+ ],
4383
+ [
4384
+ "哥",
4385
+ "</w>"
4386
+ ],
4387
+ [
4388
+ "华",
4389
+ "</w>"
4390
+ ],
4391
+ [
4392
+ "m",
4393
+ "</w>"
4394
+ ],
4395
+ [
4396
+ "o",
4397
+ "m</w>"
4398
+ ],
4399
+ [
4400
+ "t",
4401
+ "om</w>"
4402
+ ],
4403
+ [
4404
+ "躲",
4405
+ "</w>"
4406
+ ],
4407
+ [
4408
+ "桌",
4409
+ "</w>"
4410
+ ],
4411
+ [
4412
+ "表",
4413
+ "</w>"
4414
+ ],
4415
+ [
4416
+ "澡",
4417
+ "</w>"
4418
+ ],
4419
+ [
4420
+ "筑",
4421
+ "</w>"
4422
+ ],
4423
+ [
4424
+ "震",
4425
+ "</w>"
4426
+ ],
4427
+ [
4428
+ "摇",
4429
+ "</w>"
4430
+ ],
4431
+ [
4432
+ "晃",
4433
+ "</w>"
4434
+ ],
4435
+ [
4436
+ "戴",
4437
+ "</w>"
4438
+ ],
4439
+ [
4440
+ "麻",
4441
+ "</w>"
4442
+ ],
4443
+ [
4444
+ "烦",
4445
+ "</w>"
4446
+ ],
4447
+ [
4448
+ "邻",
4449
+ "</w>"
4450
+ ],
4451
+ [
4452
+ "村",
4453
+ "</w>"
4454
+ ],
4455
+ [
4456
+ "象",
4457
+ "</w>"
4458
+ ],
4459
+ [
4460
+ "賺",
4461
+ "</w>"
4462
+ ],
4463
+ [
4464
+ "百",
4465
+ "</w>"
4466
+ ],
4467
+ [
4468
+ "較",
4469
+ "</w>"
4470
+ ],
4471
+ [
4472
+ "仅",
4473
+ "</w>"
4474
+ ],
4475
+ [
4476
+ "席",
4477
+ "</w>"
4478
+ ],
4479
+ [
4480
+ "血",
4481
+ "</w>"
4482
+ ],
4483
+ [
4484
+ "沸",
4485
+ "</w>"
4486
+ ],
4487
+ [
4488
+ "帖",
4489
+ "</w>"
4490
+ ],
4491
+ [
4492
+ "2",
4493
+ "</w>"
4494
+ ],
4495
+ [
4496
+ "休",
4497
+ "</w>"
4498
+ ],
4499
+ [
4500
+ "假",
4501
+ "</w>"
4502
+ ],
4503
+ [
4504
+ "阳",
4505
+ "</w>"
4506
+ ],
4507
+ [
4508
+ "选",
4509
+ "</w>"
4510
+ ],
4511
+ [
4512
+ "择",
4513
+ "</w>"
4514
+ ],
4515
+ [
4516
+ "或",
4517
+ "</w>"
4518
+ ],
4519
+ [
4520
+ "项",
4521
+ "</w>"
4522
+ ],
4523
+ [
4524
+ "艰",
4525
+ "</w>"
4526
+ ],
4527
+ [
4528
+ "却",
4529
+ "</w>"
4530
+ ],
4531
+ [
4532
+ "鲜",
4533
+ "</w>"
4534
+ ],
4535
+ [
4536
+ "龙",
4537
+ "</w>"
4538
+ ],
4539
+ [
4540
+ "虾",
4541
+ "</w>"
4542
+ ],
4543
+ [
4544
+ "著",
4545
+ "</w>"
4546
+ ],
4547
+ [
4548
+ "進",
4549
+ "</w>"
4550
+ ],
4551
+ [
4552
+ "計",
4553
+ "</w>"
4554
+ ],
4555
+ [
4556
+ "劃",
4557
+ "</w>"
4558
+ ],
4559
+ [
4560
+ "總",
4561
+ "</w>"
4562
+ ],
4563
+ [
4564
+ "發",
4565
+ "</w>"
4566
+ ],
4567
+ [
4568
+ "够",
4569
+ "</w>"
4570
+ ],
4571
+ [
4572
+ "威",
4573
+ "</w>"
4574
+ ],
4575
+ [
4576
+ "尼",
4577
+ "</w>"
4578
+ ],
4579
+ [
4580
+ "季",
4581
+ "</w>"
4582
+ ],
4583
+ [
4584
+ "挤",
4585
+ "</w>"
4586
+ ],
4587
+ [
4588
+ "诗",
4589
+ "</w>"
4590
+ ],
4591
+ [
4592
+ "兼",
4593
+ "</w>"
4594
+ ],
4595
+ [
4596
+ "者",
4597
+ "</w>"
4598
+ ],
4599
+ [
4600
+ "泳",
4601
+ "</w>"
4602
+ ],
4603
+ [
4604
+ "持",
4605
+ "</w>"
4606
+ ],
4607
+ [
4608
+ "传",
4609
+ "</w>"
4610
+ ],
4611
+ [
4612
+ "统",
4613
+ "</w>"
4614
+ ],
4615
+ [
4616
+ "设",
4617
+ "</w>"
4618
+ ],
4619
+ [
4620
+ "僵",
4621
+ "</w>"
4622
+ ],
4623
+ [
4624
+ "局",
4625
+ "</w>"
4626
+ ],
4627
+ [
4628
+ "從",
4629
+ "</w>"
4630
+ ],
4631
+ [
4632
+ "c",
4633
+ "e</w>"
4634
+ ],
4635
+ [
4636
+ "l",
4637
+ "i"
4638
+ ],
4639
+ [
4640
+ "a",
4641
+ "li"
4642
+ ],
4643
+ [
4644
+ "ali",
4645
+ "ce</w>"
4646
+ ],
4647
+ [
4648
+ "演",
4649
+ "</w>"
4650
+ ],
4651
+ [
4652
+ "唱",
4653
+ "</w>"
4654
+ ],
4655
+ [
4656
+ "骗",
4657
+ "</w>"
4658
+ ],
4659
+ [
4660
+ "争",
4661
+ "</w>"
4662
+ ],
4663
+ [
4664
+ "辩",
4665
+ "</w>"
4666
+ ],
4667
+ [
4668
+ "适",
4669
+ "</w>"
4670
+ ],
4671
+ [
4672
+ "职",
4673
+ "</w>"
4674
+ ],
4675
+ [
4676
+ "溜",
4677
+ "</w>"
4678
+ ],
4679
+ [
4680
+ "7",
4681
+ "</w>"
4682
+ ],
4683
+ [
4684
+ "铁",
4685
+ "</w>"
4686
+ ],
4687
+ [
4688
+ "摄",
4689
+ "</w>"
4690
+ ],
4691
+ [
4692
+ "糟",
4693
+ "</w>"
4694
+ ],
4695
+ [
4696
+ "糕",
4697
+ "</w>"
4698
+ ],
4699
+ [
4700
+ "透",
4701
+ "</w>"
4702
+ ],
4703
+ [
4704
+ "t",
4705
+ "e</w>"
4706
+ ],
4707
+ [
4708
+ "k",
4709
+ "a"
4710
+ ],
4711
+ [
4712
+ "ka",
4713
+ "te</w>"
4714
+ ],
4715
+ [
4716
+ ",",
4717
+ "</w>"
4718
+ ],
4719
+ [
4720
+ "急",
4721
+ "</w>"
4722
+ ],
4723
+ [
4724
+ "救",
4725
+ "</w>"
4726
+ ],
4727
+ [
4728
+ "池",
4729
+ "</w>"
4730
+ ],
4731
+ [
4732
+ "鱼",
4733
+ "</w>"
4734
+ ],
4735
+ [
4736
+ "挑",
4737
+ "</w>"
4738
+ ],
4739
+ [
4740
+ "病",
4741
+ "</w>"
4742
+ ],
4743
+ [
4744
+ "笔",
4745
+ "</w>"
4746
+ ],
4747
+ [
4748
+ "曾",
4749
+ "</w>"
4750
+ ],
4751
+ [
4752
+ "經",
4753
+ "</w>"
4754
+ ],
4755
+ [
4756
+ "空",
4757
+ "</w>"
4758
+ ],
4759
+ [
4760
+ "整",
4761
+ "</w>"
4762
+ ],
4763
+ [
4764
+ "愉",
4765
+ "</w>"
4766
+ ],
4767
+ [
4768
+ "杰",
4769
+ "</w>"
4770
+ ],
4771
+ [
4772
+ "姐",
4773
+ "</w>"
4774
+ ],
4775
+ [
4776
+ "��",
4777
+ "</w>"
4778
+ ],
4779
+ [
4780
+ "婚",
4781
+ "</w>"
4782
+ ],
4783
+ [
4784
+ "汽",
4785
+ "</w>"
4786
+ ],
4787
+ [
4788
+ "笛",
4789
+ "</w>"
4790
+ ],
4791
+ [
4792
+ "驶",
4793
+ "</w>"
4794
+ ],
4795
+ [
4796
+ "港",
4797
+ "</w>"
4798
+ ],
4799
+ [
4800
+ "包",
4801
+ "</w>"
4802
+ ],
4803
+ [
4804
+ "眠",
4805
+ "</w>"
4806
+ ],
4807
+ [
4808
+ "命",
4809
+ "</w>"
4810
+ ],
4811
+ [
4812
+ "困",
4813
+ "</w>"
4814
+ ],
4815
+ [
4816
+ "蝴",
4817
+ "</w>"
4818
+ ],
4819
+ [
4820
+ "蝶",
4821
+ "</w>"
4822
+ ],
4823
+ [
4824
+ "滑",
4825
+ "</w>"
4826
+ ],
4827
+ [
4828
+ "诚",
4829
+ "</w>"
4830
+ ],
4831
+ [
4832
+ "德",
4833
+ "</w>"
4834
+ ],
4835
+ [
4836
+ "仪",
4837
+ "</w>"
4838
+ ],
4839
+ [
4840
+ "庄",
4841
+ "</w>"
4842
+ ],
4843
+ [
4844
+ "举",
4845
+ "</w>"
4846
+ ],
4847
+ [
4848
+ "内",
4849
+ "</w>"
4850
+ ],
4851
+ [
4852
+ "反",
4853
+ "</w>"
4854
+ ],
4855
+ [
4856
+ "论",
4857
+ "</w>"
4858
+ ],
4859
+ [
4860
+ "擔",
4861
+ "</w>"
4862
+ ],
4863
+ [
4864
+ "揭",
4865
+ "</w>"
4866
+ ],
4867
+ [
4868
+ "露",
4869
+ "</w>"
4870
+ ],
4871
+ [
4872
+ "平",
4873
+ "</w>"
4874
+ ],
4875
+ [
4876
+ "涌",
4877
+ "</w>"
4878
+ ],
4879
+ [
4880
+ "泪",
4881
+ "</w>"
4882
+ ],
4883
+ [
4884
+ "景",
4885
+ "</w>"
4886
+ ],
4887
+ [
4888
+ "誓",
4889
+ "</w>"
4890
+ ],
4891
+ [
4892
+ "赢",
4893
+ "</w>"
4894
+ ],
4895
+ [
4896
+ "彻",
4897
+ "</w>"
4898
+ ],
4899
+ [
4900
+ "进",
4901
+ "</w>"
4902
+ ],
4903
+ [
4904
+ "铃",
4905
+ "</w>"
4906
+ ],
4907
+ [
4908
+ "亲",
4909
+ "</w>"
4910
+ ],
4911
+ [
4912
+ "独",
4913
+ "</w>"
4914
+ ],
4915
+ [
4916
+ "赶",
4917
+ "</w>"
4918
+ ],
4919
+ [
4920
+ "份",
4921
+ "</w>"
4922
+ ],
4923
+ [
4924
+ "瘋",
4925
+ "</w>"
4926
+ ],
4927
+ [
4928
+ "永",
4929
+ "</w>"
4930
+ ],
4931
+ [
4932
+ "遠",
4933
+ "</w>"
4934
+ ],
4935
+ [
4936
+ "踢",
4937
+ "</w>"
4938
+ ],
4939
+ [
4940
+ "長",
4941
+ "</w>"
4942
+ ],
4943
+ [
4944
+ "國",
4945
+ "</w>"
4946
+ ],
4947
+ [
4948
+ "王",
4949
+ "</w>"
4950
+ ],
4951
+ [
4952
+ "1",
4953
+ "</w>"
4954
+ ],
4955
+ [
4956
+ "2",
4957
+ "1</w>"
4958
+ ],
4959
+ [
4960
+ "惡",
4961
+ "</w>"
4962
+ ],
4963
+ [
4964
+ "兔",
4965
+ "</w>"
4966
+ ],
4967
+ [
4968
+ "免",
4969
+ "</w>"
4970
+ ],
4971
+ [
4972
+ "辜",
4973
+ "</w>"
4974
+ ],
4975
+ [
4976
+ "负",
4977
+ "</w>"
4978
+ ],
4979
+ [
4980
+ "饿",
4981
+ "</w>"
4982
+ ],
4983
+ [
4984
+ "請",
4985
+ "</w>"
4986
+ ],
4987
+ [
4988
+ "寄",
4989
+ "</w>"
4990
+ ],
4991
+ [
4992
+ "給",
4993
+ "</w>"
4994
+ ],
4995
+ [
4996
+ "張",
4997
+ "</w>"
4998
+ ],
4999
+ [
5000
+ "远",
5001
+ "</w>"
5002
+ ],
5003
+ [
5004
+ "银",
5005
+ "</w>"
5006
+ ],
5007
+ [
5008
+ "风",
5009
+ "</w>"
5010
+ ],
5011
+ [
5012
+ "户",
5013
+ "</w>"
5014
+ ],
5015
+ [
5016
+ "较",
5017
+ "</w>"
5018
+ ],
5019
+ [
5020
+ "贷",
5021
+ "</w>"
5022
+ ],
5023
+ [
5024
+ "利",
5025
+ "</w>"
5026
+ ],
5027
+ [
5028
+ "课",
5029
+ "</w>"
5030
+ ],
5031
+ [
5032
+ "济",
5033
+ "</w>"
5034
+ ],
5035
+ [
5036
+ "蜂",
5037
+ "</w>"
5038
+ ],
5039
+ [
5040
+ "即",
5041
+ "</w>"
5042
+ ],
5043
+ [
5044
+ "餐",
5045
+ "</w>"
5046
+ ],
5047
+ [
5048
+ "体",
5049
+ "</w>"
5050
+ ],
5051
+ [
5052
+ "销",
5053
+ "</w>"
5054
+ ],
5055
+ [
5056
+ "售",
5057
+ "</w>"
5058
+ ],
5059
+ [
5060
+ "宵",
5061
+ "</w>"
5062
+ ],
5063
+ [
5064
+ "旦",
5065
+ "</w>"
5066
+ ],
5067
+ [
5068
+ "花",
5069
+ "</w>"
5070
+ ],
5071
+ [
5072
+ "k",
5073
+ "e"
5074
+ ],
5075
+ [
5076
+ "n",
5077
+ "</w>"
5078
+ ],
5079
+ [
5080
+ "ke",
5081
+ "n</w>"
5082
+ ],
5083
+ [
5084
+ "七",
5085
+ "</w>"
5086
+ ],
5087
+ [
5088
+ "拆",
5089
+ "</w>"
5090
+ ],
5091
+ [
5092
+ "桥",
5093
+ "</w>"
5094
+ ],
5095
+ [
5096
+ "朋",
5097
+ "</w>"
5098
+ ],
5099
+ [
5100
+ "友",
5101
+ "</w>"
5102
+ ],
5103
+ [
5104
+ "讀",
5105
+ "</w>"
5106
+ ],
5107
+ [
5108
+ "﹐",
5109
+ "</w>"
5110
+ ],
5111
+ [
5112
+ "六",
5113
+ "</w>"
5114
+ ],
5115
+ [
5116
+ "弃",
5117
+ "</w>"
5118
+ ],
5119
+ [
5120
+ "盹",
5121
+ "</w>"
5122
+ ],
5123
+ [
5124
+ "飞",
5125
+ "</w>"
5126
+ ],
5127
+ [
5128
+ "机",
5129
+ "</w>"
5130
+ ],
5131
+ [
5132
+ "携",
5133
+ "</w>"
5134
+ ],
5135
+ [
5136
+ "带",
5137
+ "</w>"
5138
+ ],
5139
+ [
5140
+ "4",
5141
+ "0</w>"
5142
+ ],
5143
+ [
5144
+ "护",
5145
+ "</w>"
5146
+ ],
5147
+ [
5148
+ "扰",
5149
+ "</w>"
5150
+ ],
5151
+ [
5152
+ "唯",
5153
+ "</w>"
5154
+ ],
5155
+ [
5156
+ "卫",
5157
+ "</w>"
5158
+ ],
5159
+ [
5160
+ "3",
5161
+ "</w>"
5162
+ ],
5163
+ [
5164
+ "纯",
5165
+ "</w>"
5166
+ ],
5167
+ [
5168
+ "属",
5169
+ "</w>"
5170
+ ],
5171
+ [
5172
+ "偶",
5173
+ "</w>"
5174
+ ],
5175
+ [
5176
+ "津",
5177
+ "</w>"
5178
+ ],
5179
+ [
5180
+ "音",
5181
+ "</w>"
5182
+ ],
5183
+ [
5184
+ "值",
5185
+ "</w>"
5186
+ ],
5187
+ [
5188
+ "睛",
5189
+ "</w>"
5190
+ ],
5191
+ [
5192
+ "k",
5193
+ "e</w>"
5194
+ ],
5195
+ [
5196
+ "ja",
5197
+ "ke</w>"
5198
+ ],
5199
+ [
5200
+ "扇",
5201
+ "</w>"
5202
+ ],
5203
+ [
5204
+ "窗",
5205
+ "</w>"
5206
+ ],
5207
+ [
5208
+ "叫",
5209
+ "</w>"
5210
+ ],
5211
+ [
5212
+ "ja",
5213
+ "c"
5214
+ ],
5215
+ [
5216
+ "k",
5217
+ "</w>"
5218
+ ],
5219
+ [
5220
+ "jac",
5221
+ "k</w>"
5222
+ ],
5223
+ [
5224
+ "幹",
5225
+ "</w>"
5226
+ ],
5227
+ [
5228
+ "鲍",
5229
+ "</w>"
5230
+ ],
5231
+ [
5232
+ "勃",
5233
+ "</w>"
5234
+ ],
5235
+ [
5236
+ "丰",
5237
+ "</w>"
5238
+ ],
5239
+ [
5240
+ "富",
5241
+ "</w>"
5242
+ ],
5243
+ [
5244
+ "答",
5245
+ "</w>"
5246
+ ],
5247
+ [
5248
+ "复",
5249
+ "</w>"
5250
+ ],
5251
+ [
5252
+ "悔",
5253
+ "</w>"
5254
+ ],
5255
+ [
5256
+ "概",
5257
+ "</w>"
5258
+ ],
5259
+ [
5260
+ "澄",
5261
+ "</w>"
5262
+ ],
5263
+ [
5264
+ "清",
5265
+ "</w>"
5266
+ ],
5267
+ [
5268
+ "价",
5269
+ "</w>"
5270
+ ],
5271
+ [
5272
+ "涨",
5273
+ "</w>"
5274
+ ],
5275
+ [
5276
+ "守",
5277
+ "</w>"
5278
+ ],
5279
+ [
5280
+ "诺",
5281
+ "</w>"
5282
+ ],
5283
+ [
5284
+ "顾",
5285
+ "</w>"
5286
+ ],
5287
+ [
5288
+ "迷",
5289
+ "</w>"
5290
+ ],
5291
+ [
5292
+ "社",
5293
+ "</w>"
5294
+ ],
5295
+ [
5296
+ "团",
5297
+ "</w>"
5298
+ ],
5299
+ [
5300
+ "抓",
5301
+ "</w>"
5302
+ ],
5303
+ [
5304
+ "鼠",
5305
+ "</w>"
5306
+ ],
5307
+ [
5308
+ "纪",
5309
+ "</w>"
5310
+ ],
5311
+ [
5312
+ "品",
5313
+ "</w>"
5314
+ ],
5315
+ [
5316
+ "阅",
5317
+ "</w>"
5318
+ ],
5319
+ [
5320
+ "饭",
5321
+ "</w>"
5322
+ ],
5323
+ [
5324
+ "购",
5325
+ "</w>"
5326
+ ],
5327
+ [
5328
+ "镜",
5329
+ "</w>"
5330
+ ],
5331
+ [
5332
+ "迅",
5333
+ "</w>"
5334
+ ],
5335
+ [
5336
+ "速",
5337
+ "</w>"
5338
+ ],
5339
+ [
5340
+ "窜",
5341
+ "</w>"
5342
+ ],
5343
+ [
5344
+ "入",
5345
+ "</w>"
5346
+ ],
5347
+ [
5348
+ "群",
5349
+ "</w>"
5350
+ ],
5351
+ [
5352
+ "耗",
5353
+ "</w>"
5354
+ ],
5355
+ [
5356
+ "气",
5357
+ "</w>"
5358
+ ],
5359
+ [
5360
+ "化",
5361
+ "</w>"
5362
+ ],
5363
+ [
5364
+ "附",
5365
+ "</w>"
5366
+ ],
5367
+ [
5368
+ "近",
5369
+ "</w>"
5370
+ ],
5371
+ [
5372
+ "张",
5373
+ "</w>"
5374
+ ],
5375
+ [
5376
+ "片",
5377
+ "</w>"
5378
+ ],
5379
+ [
5380
+ "童",
5381
+ "</w>"
5382
+ ],
5383
+ [
5384
+ "福",
5385
+ "</w>"
5386
+ ],
5387
+ [
5388
+ "药",
5389
+ "</w>"
5390
+ ],
5391
+ [
5392
+ "创",
5393
+ "</w>"
5394
+ ],
5395
+ [
5396
+ "迹",
5397
+ "</w>"
5398
+ ],
5399
+ [
5400
+ "厕",
5401
+ "</w>"
5402
+ ],
5403
+ [
5404
+ "冲",
5405
+ "</w>"
5406
+ ],
5407
+ [
5408
+ "轨",
5409
+ "</w>"
5410
+ ],
5411
+ [
5412
+ "1",
5413
+ "8"
5414
+ ],
5415
+ [
5416
+ "18",
5417
+ "</w>"
5418
+ ],
5419
+ [
5420
+ "环",
5421
+ "</w>"
5422
+ ],
5423
+ [
5424
+ "素",
5425
+ "</w>"
5426
+ ],
5427
+ [
5428
+ "5",
5429
+ "6</w>"
5430
+ ],
5431
+ [
5432
+ "粗",
5433
+ "</w>"
5434
+ ],
5435
+ [
5436
+ "趕",
5437
+ "</w>"
5438
+ ],
5439
+ [
5440
+ "久",
5441
+ "</w>"
5442
+ ],
5443
+ [
5444
+ "妻",
5445
+ "</w>"
5446
+ ],
5447
+ [
5448
+ "互",
5449
+ "</w>"
5450
+ ],
5451
+ [
5452
+ "助",
5453
+ "</w>"
5454
+ ],
5455
+ [
5456
+ "训",
5457
+ "</w>"
5458
+ ],
5459
+ [
5460
+ "脑",
5461
+ "</w>"
5462
+ ],
5463
+ [
5464
+ "戏",
5465
+ "</w>"
5466
+ ],
5467
+ [
5468
+ "散",
5469
+ "</w>"
5470
+ ],
5471
+ [
5472
+ "步",
5473
+ "</w>"
5474
+ ],
5475
+ [
5476
+ "油",
5477
+ "</w>"
5478
+ ],
5479
+ [
5480
+ "置",
5481
+ "</w>"
5482
+ ],
5483
+ [
5484
+ "债",
5485
+ "</w>"
5486
+ ],
5487
+ [
5488
+ "冷",
5489
+ "</w>"
5490
+ ],
5491
+ [
5492
+ "湖",
5493
+ "</w>"
5494
+ ],
5495
+ [
5496
+ "结",
5497
+ "</w>"
5498
+ ],
5499
+ [
5500
+ "首",
5501
+ "</w>"
5502
+ ],
5503
+ [
5504
+ "歌",
5505
+ "</w>"
5506
+ ],
5507
+ [
5508
+ "1",
5509
+ "0"
5510
+ ],
5511
+ [
5512
+ "10",
5513
+ "0</w>"
5514
+ ],
5515
+ [
5516
+ "万",
5517
+ "</w>"
5518
+ ],
5519
+ [
5520
+ "辆",
5521
+ "</w>"
5522
+ ],
5523
+ [
5524
+ "呢",
5525
+ "</w>"
5526
+ ],
5527
+ [
5528
+ "變",
5529
+ "</w>"
5530
+ ],
5531
+ [
5532
+ "卖",
5533
+ "</w>"
5534
+ ],
5535
+ [
5536
+ "栋",
5537
+ "</w>"
5538
+ ],
5539
+ [
5540
+ "灰",
5541
+ "</w>"
5542
+ ],
5543
+ [
5544
+ "楼",
5545
+ "</w>"
5546
+ ],
5547
+ [
5548
+ "毕",
5549
+ "</w>"
5550
+ ],
5551
+ [
5552
+ "索",
5553
+ "</w>"
5554
+ ],
5555
+ [
5556
+ "抱",
5557
+ "</w>"
5558
+ ],
5559
+ [
5560
+ "歉",
5561
+ "</w>"
5562
+ ],
5563
+ [
5564
+ "盛",
5565
+ "</w>"
5566
+ ],
5567
+ [
5568
+ "邀",
5569
+ "</w>"
5570
+ ],
5571
+ [
5572
+ "延",
5573
+ "</w>"
5574
+ ],
5575
+ [
5576
+ "误",
5577
+ "</w>"
5578
+ ],
5579
+ [
5580
+ "苏",
5581
+ "</w>"
5582
+ ],
5583
+ [
5584
+ "兰",
5585
+ "</w>"
5586
+ ],
5587
+ [
5588
+ "古",
5589
+ "</w>"
5590
+ ],
5591
+ [
5592
+ "堡",
5593
+ "</w>"
5594
+ ],
5595
+ [
5596
+ "谁",
5597
+ "</w>"
5598
+ ],
5599
+ [
5600
+ "纸",
5601
+ "</w>"
5602
+ ],
5603
+ [
5604
+ "杂",
5605
+ "</w>"
5606
+ ],
5607
+ [
5608
+ "志",
5609
+ "</w>"
5610
+ ],
5611
+ [
5612
+ "闻",
5613
+ "</w>"
5614
+ ],
5615
+ [
5616
+ "播",
5617
+ "</w>"
5618
+ ],
5619
+ [
5620
+ "奶",
5621
+ "</w>"
5622
+ ]
5623
+ ],
5624
+ "special_tokens": [
5625
+ "<pad>",
5626
+ "<sos>",
5627
+ "<eos>",
5628
+ "<unk>",
5629
+ "<mask>"
5630
+ ]
5631
+ }
inference.py ADDED
@@ -0,0 +1,347 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 推理脚本
3
+ 可视化扩散翻译过程
4
+ """
5
+
6
+ import os
7
+ import argparse
8
+ import torch
9
+ import torch.nn.functional as F
10
+ from typing import Optional, Tuple, List
11
+
12
+ from config import Config
13
+ from tokenizer import Tokenizer
14
+ from embedding import DualLanguageEmbedding, DualOutputProjection
15
+ from model import create_model
16
+ from diffusion import get_diffusion
17
+ from switcher import create_switcher
18
+
19
+
20
+ class Translator:
21
+ """翻译器"""
22
+
23
+ def __init__(self, config: Config, checkpoint_path: Optional[str] = None):
24
+ self.config = config
25
+ self.device = torch.device("cpu")
26
+
27
+ # 加载分词器
28
+ cache_dir = os.path.join(config.project_dir, config.data.cache_dir)
29
+ self.zh_tokenizer = Tokenizer.load(os.path.join(cache_dir, "tokenizer_zh.json"))
30
+ self.en_tokenizer = Tokenizer.load(os.path.join(cache_dir, "tokenizer_en.json"))
31
+
32
+ # 初始化模型组件
33
+ self.embedding = DualLanguageEmbedding(
34
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
35
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
36
+ d_model=config.model.d_model,
37
+ max_len=config.model.max_len,
38
+ dropout=0.0, # 推理时不使用dropout
39
+ )
40
+
41
+ self.output_proj = DualOutputProjection(
42
+ d_model=config.model.d_model,
43
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
44
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
45
+ )
46
+
47
+ self.model = create_model(config)
48
+ self.switcher = create_switcher(config)
49
+
50
+ self.diffusion, self.ddim_sampler = get_diffusion(config)
51
+
52
+ # 加载权重
53
+ if checkpoint_path:
54
+ self._load_checkpoint(checkpoint_path)
55
+
56
+ def _load_checkpoint(self, path: str):
57
+ """加载检查点"""
58
+ state = torch.load(path, map_location=self.device, weights_only=False)
59
+
60
+ self.embedding.load_state_dict(state['embedding'])
61
+ self.output_proj.load_state_dict(state['output_proj'])
62
+ self.model.load_state_dict(state['model'])
63
+ self.switcher.load_state_dict(state['switcher'])
64
+
65
+ print(f"已加载检查点: {path}")
66
+
67
+ def _encode(self, text: str, lang: str) -> torch.Tensor:
68
+ """编码文本"""
69
+ if lang == "zh":
70
+ ids = self.zh_tokenizer.encode(text, add_sos=True, add_eos=True)
71
+ return torch.tensor(ids, dtype=torch.long).unsqueeze(0)
72
+ else:
73
+ ids = self.en_tokenizer.encode(text, add_sos=True, add_eos=True)
74
+ return torch.tensor(ids, dtype=torch.long).unsqueeze(0)
75
+
76
+ def _decode(self, ids: torch.Tensor, lang: str) -> str:
77
+ """解码为文本"""
78
+ ids = ids[0].tolist()
79
+ if lang == "zh":
80
+ return self.zh_tokenizer.decode(ids, skip_special=True)
81
+ else:
82
+ return self.en_tokenizer.decode(ids, skip_special=True)
83
+
84
+ def _embed_to_tokens(self, x: torch.Tensor, lang: str) -> torch.Tensor:
85
+ """从嵌入空间解码到token"""
86
+ logits = self.output_proj(x, lang)
87
+ ids = logits.argmax(dim=-1)
88
+ return ids
89
+
90
+ @torch.no_grad()
91
+ def translate(
92
+ self,
93
+ text: str,
94
+ source_lang: str,
95
+ verbose: bool = True,
96
+ ddim: bool = True,
97
+ ) -> str:
98
+ """翻译文本
99
+
100
+ Args:
101
+ text: 输入文本
102
+ source_lang: 源语言 "zh" 或 "en"
103
+ verbose: 是否打印扩散过程
104
+ ddim: 是否使用DDIM加速
105
+
106
+ Returns:
107
+ 翻译结果
108
+ """
109
+ self.model.eval()
110
+ self.embedding.eval()
111
+ self.output_proj.eval()
112
+ self.switcher.eval()
113
+
114
+ target_lang = "en" if source_lang == "zh" else "zh"
115
+
116
+ if verbose:
117
+ print(f"\n翻译模式: {source_lang.upper()} → {target_lang.upper()}")
118
+ print(f"输入: {text}")
119
+ print(f"\n扩散过程:")
120
+
121
+ # 编码源语言
122
+ source_ids = self._encode(text, source_lang)
123
+ source_len = torch.tensor([source_ids.size(1)])
124
+
125
+ # 嵌入源语言
126
+ source_emb = self.embedding(source_ids, source_lang, source_len)
127
+
128
+ # 完整前向扩散到纯噪声
129
+ if verbose:
130
+ print(f" 前向扩散: {source_lang} → 噪声空间")
131
+
132
+ batch_size = source_emb.size(0)
133
+ t_full = torch.full((batch_size,), self.config.diffusion.timesteps - 1, dtype=torch.long)
134
+ noise = torch.randn_like(source_emb)
135
+ x_t, _ = self.diffusion.q_sample(source_emb, t_full, noise)
136
+
137
+ # DDIM反向扩散
138
+ if ddim:
139
+ result = self._ddim_reverse(
140
+ x_t, source_lang, target_lang, verbose
141
+ )
142
+ else:
143
+ result = self._ddpm_reverse(
144
+ x_t, source_lang, target_lang, verbose
145
+ )
146
+
147
+ return result
148
+
149
+ def _ddim_reverse(
150
+ self,
151
+ x_t: torch.Tensor,
152
+ source_lang: str,
153
+ target_lang: str,
154
+ verbose: bool,
155
+ ) -> str:
156
+ """DDIM反向扩散"""
157
+ ddim_steps = self.config.diffusion.ddim_steps
158
+ timesteps = self.ddim_sampler.ddim_timesteps
159
+ total_steps = len(timesteps)
160
+ switch_point = total_steps // 2 # 在中间切换语言
161
+
162
+ for i, t in enumerate(timesteps[:-1]):
163
+ t_prev = timesteps[i + 1]
164
+
165
+ # 根据进度决定用哪种语言去噪和显示
166
+ # 前半段:源语言,后半段:目标语言
167
+ if i < switch_point:
168
+ current_lang = source_lang
169
+ else:
170
+ current_lang = target_lang
171
+
172
+ # 预测噪声
173
+ t_tensor = torch.full((x_t.size(0),), t, dtype=torch.long)
174
+ predicted_noise = self.model(x_t, t_tensor, lang=current_lang)
175
+
176
+ if verbose:
177
+ # 显示当前语言的解码结果
178
+ current_ids = self._embed_to_tokens(x_t, current_lang)
179
+ current_text = self._decode(current_ids, current_lang)
180
+ if len(current_text) > 50:
181
+ current_text = current_text[:50] + "..."
182
+
183
+ print(f" Step {t:4d} → {current_text}")
184
+
185
+ # DDIM步骤
186
+ x_t = self.ddim_sampler.ddim_step(x_t, t, t_prev, predicted_noise, eta=0.0)
187
+
188
+ # 最终解码
189
+ final_ids = self._embed_to_tokens(x_t, target_lang)
190
+ result = self._decode(final_ids, target_lang)
191
+
192
+ if verbose:
193
+ print(f"\n输出: {result}")
194
+
195
+ return result
196
+
197
+ def _ddpm_reverse(
198
+ self,
199
+ x_t: torch.Tensor,
200
+ source_lang: str,
201
+ target_lang: str,
202
+ verbose: bool,
203
+ ) -> str:
204
+ """DDPM反向扩散(标准方法,较慢)"""
205
+ total_steps = self.config.diffusion.timesteps
206
+ switch_point = total_steps // 2 # 在中间切换语言
207
+
208
+ for t in range(total_steps - 1, -1, -1):
209
+ # 根据时间步决定用哪种语言
210
+ if t > switch_point:
211
+ current_lang = source_lang
212
+ else:
213
+ current_lang = target_lang
214
+
215
+ t_tensor = torch.full((x_t.size(0),), t, dtype=torch.long)
216
+
217
+ # 预测噪声
218
+ predicted_noise = self.model(x_t, t_tensor, lang=current_lang)
219
+
220
+ if verbose:
221
+ current_ids = self._embed_to_tokens(x_t, current_lang)
222
+ current_text = self._decode(current_ids, current_lang)
223
+ if len(current_text) > 50:
224
+ current_text = current_text[:50] + "..."
225
+ print(f" Step {t:4d} → {current_text}")
226
+
227
+ # DDPM步骤
228
+ x_t = self.diffusion.p_sample(x_t, t_tensor, predicted_noise)
229
+
230
+ # 解码
231
+ final_ids = self._embed_to_tokens(x_t, target_lang)
232
+ result = self._decode(final_ids, target_lang)
233
+
234
+ if verbose:
235
+ print(f"\n输出: {result}")
236
+
237
+ return result
238
+
239
+ def interactive(self):
240
+ """交互模式"""
241
+ print("\n" + "=" * 50)
242
+ print("Diffutslator 交互翻译模式")
243
+ print("=" * 50)
244
+ print("输入 'zh: 文本' 翻译中文到英文")
245
+ print("输入 'en: text' 翻译英文到中文")
246
+ print("输入 'quit' 或 'exit' 退出")
247
+ print("=" * 50 + "\n")
248
+
249
+ while True:
250
+ try:
251
+ user_input = input(">>> ").strip()
252
+
253
+ if user_input.lower() in ['quit', 'exit', 'q']:
254
+ print("再见!")
255
+ break
256
+
257
+ if not user_input:
258
+ continue
259
+
260
+ # 解析输入
261
+ if user_input.lower().startswith('zh:'):
262
+ text = user_input[3:].strip()
263
+ source_lang = "zh"
264
+ elif user_input.lower().startswith('en:'):
265
+ text = user_input[3:].strip()
266
+ source_lang = "en"
267
+ else:
268
+ # 自动检测(简单判断)
269
+ if any('\u4e00' <= c <= '\u9fff' for c in user_input):
270
+ text = user_input
271
+ source_lang = "zh"
272
+ else:
273
+ text = user_input
274
+ source_lang = "en"
275
+
276
+ # 翻译
277
+ result = self.translate(text, source_lang, verbose=True)
278
+
279
+ except KeyboardInterrupt:
280
+ print("\n再见!")
281
+ break
282
+ except Exception as e:
283
+ print(f"错误: {e}")
284
+
285
+
286
+ def main():
287
+ parser = argparse.ArgumentParser(description="Diffutslator 推理脚本")
288
+
289
+ parser.add_argument("--checkpoint", type=str, default=None, help="检查点路径")
290
+ parser.add_argument("--text", type=str, default=None, help="要翻译的文本")
291
+ parser.add_argument("--zh", action="store_true", help="输入是中文")
292
+ parser.add_argument("--en", action="store_true", help="输入是英文")
293
+ parser.add_argument("--interactive", "-i", action="store_true", help="交互模式")
294
+ parser.add_argument("--quiet", "-q", action="store_true", help="安静模式,不打印过程")
295
+ parser.add_argument("--ddim-steps", type=int, default=50, help="DDIM步数")
296
+
297
+ args = parser.parse_args()
298
+
299
+ # 配置
300
+ config = Config()
301
+ config.diffusion.ddim_steps = args.ddim_steps
302
+
303
+ # 找检查点
304
+ checkpoint_path = args.checkpoint
305
+ if checkpoint_path is None:
306
+ checkpoint_dir = os.path.join(config.project_dir, config.training.checkpoint_dir)
307
+ best_path = os.path.join(checkpoint_dir, "best.pt")
308
+ if os.path.exists(best_path):
309
+ checkpoint_path = best_path
310
+ else:
311
+ # 找最新的检查点
312
+ checkpoints = [f for f in os.listdir(checkpoint_dir) if f.endswith('.pt')]
313
+ if checkpoints:
314
+ checkpoint_path = os.path.join(checkpoint_dir, checkpoints[-1])
315
+
316
+ if checkpoint_path is None:
317
+ print("错误: 未找到检查点,请先训练模型")
318
+ return
319
+
320
+ # 创建翻译器
321
+ translator = Translator(config, checkpoint_path)
322
+
323
+ # 模式
324
+ if args.interactive:
325
+ translator.interactive()
326
+ elif args.text:
327
+ if args.zh:
328
+ source_lang = "zh"
329
+ elif args.en:
330
+ source_lang = "en"
331
+ else:
332
+ # 自动检测
333
+ if any('\u4e00' <= c <= '\u9fff' for c in args.text):
334
+ source_lang = "zh"
335
+ else:
336
+ source_lang = "en"
337
+
338
+ result = translator.translate(args.text, source_lang, verbose=not args.quiet)
339
+ if args.quiet:
340
+ print(result)
341
+ else:
342
+ # 默认交互模式
343
+ translator.interactive()
344
+
345
+
346
+ if __name__ == "__main__":
347
+ main()
main.py ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Diffutslator 主入口
3
+ 基于扩散模型的中英互译系统
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import argparse
9
+
10
+
11
+ def main():
12
+ parser = argparse.ArgumentParser(
13
+ description="Diffutslator - 基于扩散模型的翻译系统",
14
+ formatter_class=argparse.RawDescriptionHelpFormatter,
15
+ epilog="""
16
+ 示例:
17
+ # 快速验证训练
18
+ python main.py train --quick
19
+
20
+ # 完整训练
21
+ python main.py train --full
22
+
23
+ # 从检查点恢复训练
24
+ python main.py train --resume checkpoints/epoch_5.pt
25
+
26
+ # 交互式翻译
27
+ python main.py translate
28
+
29
+ # 翻译单个句子
30
+ python main.py translate --text "你好世界" --zh
31
+
32
+ # 使用更多DDIM步数
33
+ python main.py translate --text "Hello world" --en --ddim-steps 100
34
+ """
35
+ )
36
+
37
+ subparsers = parser.add_subparsers(dest="command", help="命令")
38
+
39
+ # 训练命令
40
+ train_parser = subparsers.add_parser("train", help="训练模型")
41
+ train_parser.add_argument("--quick", action="store_true", help="快速验证模式")
42
+ train_parser.add_argument("--full", action="store_true", help="完整训练模式")
43
+ train_parser.add_argument("--samples", type=int, default=None, help="使用的数据量")
44
+ train_parser.add_argument("--epochs", type=int, default=None, help="训练轮数")
45
+ train_parser.add_argument("--batch-size", type=int, default=None, help="批量大小")
46
+ train_parser.add_argument("--resume", type=str, default=None, help="恢复训练的检查点")
47
+
48
+ # 翻译命令
49
+ translate_parser = subparsers.add_parser("translate", help="翻译文本")
50
+ translate_parser.add_argument("--checkpoint", type=str, default=None, help="检查点路径")
51
+ translate_parser.add_argument("--text", type=str, default=None, help="要翻译的文本")
52
+ translate_parser.add_argument("--zh", action="store_true", help="输入是中文")
53
+ translate_parser.add_argument("--en", action="store_true", help="输入是英文")
54
+ translate_parser.add_argument("--interactive", "-i", action="store_true", help="交互模式")
55
+ translate_parser.add_argument("--quiet", "-q", action="store_true", help="安静模式")
56
+ translate_parser.add_argument("--ddim-steps", type=int, default=50, help="DDIM步数")
57
+
58
+ args = parser.parse_args()
59
+
60
+ if args.command == "train":
61
+ # 导入并运行训练
62
+ from train import main as train_main
63
+ sys.argv = ["train.py"]
64
+
65
+ if args.quick:
66
+ sys.argv.append("--quick")
67
+ if args.full:
68
+ sys.argv.append("--full")
69
+ if args.samples:
70
+ sys.argv.extend(["--samples", str(args.samples)])
71
+ if args.epochs:
72
+ sys.argv.extend(["--epochs", str(args.epochs)])
73
+ if args.batch_size:
74
+ sys.argv.extend(["--batch-size", str(args.batch_size)])
75
+ if args.resume:
76
+ sys.argv.extend(["--resume", args.resume])
77
+
78
+ train_main()
79
+
80
+ elif args.command == "translate":
81
+ # 导入并运行推理
82
+ from inference import main as inference_main
83
+ sys.argv = ["inference.py"]
84
+
85
+ if args.checkpoint:
86
+ sys.argv.extend(["--checkpoint", args.checkpoint])
87
+ if args.text:
88
+ sys.argv.extend(["--text", args.text])
89
+ if args.zh:
90
+ sys.argv.append("--zh")
91
+ if args.en:
92
+ sys.argv.append("--en")
93
+ if args.interactive:
94
+ sys.argv.append("--interactive")
95
+ if args.quiet:
96
+ sys.argv.append("--quiet")
97
+ if args.ddim_steps:
98
+ sys.argv.extend(["--ddim-steps", str(args.ddim_steps)])
99
+
100
+ inference_main()
101
+
102
+ else:
103
+ parser.print_help()
104
+
105
+
106
+ if __name__ == "__main__":
107
+ main()
model.py ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 扩散模型
3
+ 轻量级Transformer用于噪声预测
4
+ """
5
+
6
+ import math
7
+ import torch
8
+ import torch.nn as nn
9
+ import torch.nn.functional as F
10
+ from typing import Optional, Tuple
11
+
12
+ from embedding import SinusoidalTimeEmbedding
13
+
14
+
15
+ class FeedForward(nn.Module):
16
+ """前馈网络"""
17
+
18
+ def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1):
19
+ super().__init__()
20
+ self.w1 = nn.Linear(d_model, d_ff)
21
+ self.w2 = nn.Linear(d_ff, d_model)
22
+ self.dropout = nn.Dropout(dropout)
23
+
24
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
25
+ return self.dropout(self.w2(F.gelu(self.w1(x))))
26
+
27
+
28
+ class MultiHeadAttention(nn.Module):
29
+ """多头自注意力"""
30
+
31
+ def __init__(self, d_model: int, n_heads: int, dropout: float = 0.1):
32
+ super().__init__()
33
+ assert d_model % n_heads == 0
34
+
35
+ self.d_model = d_model
36
+ self.n_heads = n_heads
37
+ self.d_k = d_model // n_heads
38
+
39
+ self.w_q = nn.Linear(d_model, d_model)
40
+ self.w_k = nn.Linear(d_model, d_model)
41
+ self.w_v = nn.Linear(d_model, d_model)
42
+ self.w_o = nn.Linear(d_model, d_model)
43
+
44
+ self.dropout = nn.Dropout(dropout)
45
+
46
+ def forward(
47
+ self,
48
+ q: torch.Tensor,
49
+ k: torch.Tensor,
50
+ v: torch.Tensor,
51
+ mask: Optional[torch.Tensor] = None,
52
+ ) -> torch.Tensor:
53
+ batch_size = q.size(0)
54
+
55
+ # 线性变换并分头
56
+ q = self.w_q(q).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
57
+ k = self.w_k(k).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
58
+ v = self.w_v(v).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
59
+
60
+ # 注意力计算
61
+ scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(self.d_k)
62
+
63
+ if mask is not None:
64
+ scores = scores.masked_fill(mask == 0, float('-inf'))
65
+
66
+ attn = F.softmax(scores, dim=-1)
67
+ attn = self.dropout(attn)
68
+
69
+ # 合并头
70
+ out = torch.matmul(attn, v)
71
+ out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
72
+
73
+ return self.w_o(out)
74
+
75
+
76
+ class TransformerBlock(nn.Module):
77
+ """Transformer块"""
78
+
79
+ def __init__(self, d_model: int, n_heads: int, d_ff: int, dropout: float = 0.1):
80
+ super().__init__()
81
+
82
+ self.attn = MultiHeadAttention(d_model, n_heads, dropout)
83
+ self.ff = FeedForward(d_model, d_ff, dropout)
84
+
85
+ self.norm1 = nn.LayerNorm(d_model)
86
+ self.norm2 = nn.LayerNorm(d_model)
87
+
88
+ self.dropout = nn.Dropout(dropout)
89
+
90
+ def forward(self, x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
91
+ # 自注意力 + 残差
92
+ x = x + self.dropout(self.attn(self.norm1(x), self.norm1(x), self.norm1(x), mask))
93
+ # 前馈 + 残差
94
+ x = x + self.dropout(self.ff(self.norm2(x)))
95
+ return x
96
+
97
+
98
+ class NoisePredictor(nn.Module):
99
+ """噪声预测网络
100
+
101
+ 输入: 加噪后的嵌入 x_t 和时间步 t
102
+ 输出: 预测的噪声
103
+ """
104
+
105
+ def __init__(
106
+ self,
107
+ d_model: int = 256,
108
+ n_heads: int = 4,
109
+ n_layers: int = 4,
110
+ d_ff: int = 512,
111
+ max_len: int = 128,
112
+ dropout: float = 0.1,
113
+ ):
114
+ super().__init__()
115
+
116
+ self.d_model = d_model
117
+
118
+ # 时间步嵌入
119
+ self.time_embedding = SinusoidalTimeEmbedding(d_model)
120
+ self.time_mlp = nn.Sequential(
121
+ nn.Linear(d_model, d_model * 4),
122
+ nn.GELU(),
123
+ nn.Linear(d_model * 4, d_model),
124
+ )
125
+
126
+ # Transformer层
127
+ self.layers = nn.ModuleList([
128
+ TransformerBlock(d_model, n_heads, d_ff, dropout)
129
+ for _ in range(n_layers)
130
+ ])
131
+
132
+ # 输出层
133
+ self.output_norm = nn.LayerNorm(d_model)
134
+ self.output_proj = nn.Linear(d_model, d_model)
135
+
136
+ # 初始化
137
+ self.apply(self._init_weights)
138
+
139
+ def _init_weights(self, module):
140
+ if isinstance(module, nn.Linear):
141
+ nn.init.normal_(module.weight, mean=0.0, std=0.02)
142
+ if module.bias is not None:
143
+ nn.init.zeros_(module.bias)
144
+ elif isinstance(module, nn.LayerNorm):
145
+ nn.init.ones_(module.weight)
146
+ nn.init.zeros_(module.bias)
147
+
148
+ def forward(
149
+ self,
150
+ x_t: torch.Tensor,
151
+ t: torch.Tensor,
152
+ mask: Optional[torch.Tensor] = None,
153
+ ) -> torch.Tensor:
154
+ """
155
+ x_t: [batch, seq_len, d_model] 加噪后的嵌入
156
+ t: [batch] 时间步
157
+ mask: [batch, seq_len] 可选的注意力mask
158
+
159
+ 返回: [batch, seq_len, d_model] 预测的噪声
160
+ """
161
+ batch_size, seq_len, _ = x_t.shape
162
+
163
+ # 时间步嵌入
164
+ t_emb = self.time_embedding(t) # [batch, d_model]
165
+ t_emb = self.time_mlp(t_emb) # [batch, d_model]
166
+
167
+ # 添加时间信息到每个位置
168
+ x = x_t + t_emb.unsqueeze(1)
169
+
170
+ # Transformer处理
171
+ for layer in self.layers:
172
+ x = layer(x, mask)
173
+
174
+ # 输出
175
+ x = self.output_norm(x)
176
+ noise_pred = self.output_proj(x)
177
+
178
+ return noise_pred
179
+
180
+
181
+ class DualNoisePredictor(nn.Module):
182
+ """双语言噪声预测器
183
+
184
+ 共享核心网络,语言特定的输入/输出投影
185
+ """
186
+
187
+ def __init__(
188
+ self,
189
+ d_model: int = 256,
190
+ n_heads: int = 4,
191
+ n_layers: int = 4,
192
+ d_ff: int = 512,
193
+ max_len: int = 128,
194
+ dropout: float = 0.1,
195
+ ):
196
+ super().__init__()
197
+
198
+ self.d_model = d_model
199
+
200
+ # 时间步嵌入(共享)
201
+ self.time_embedding = SinusoidalTimeEmbedding(d_model)
202
+ self.time_mlp = nn.Sequential(
203
+ nn.Linear(d_model, d_model * 4),
204
+ nn.GELU(),
205
+ nn.Linear(d_model * 4, d_model),
206
+ )
207
+
208
+ # 语言特定的输入投影
209
+ self.zh_input_proj = nn.Linear(d_model, d_model)
210
+ self.en_input_proj = nn.Linear(d_model, d_model)
211
+
212
+ # 共享Transformer层
213
+ self.layers = nn.ModuleList([
214
+ TransformerBlock(d_model, n_heads, d_ff, dropout)
215
+ for _ in range(n_layers)
216
+ ])
217
+
218
+ # 语言特定的输出投影
219
+ self.zh_output_proj = nn.Linear(d_model, d_model)
220
+ self.en_output_proj = nn.Linear(d_model, d_model)
221
+
222
+ self.output_norm = nn.LayerNorm(d_model)
223
+
224
+ # 初始化
225
+ self.apply(self._init_weights)
226
+
227
+ def _init_weights(self, module):
228
+ if isinstance(module, nn.Linear):
229
+ nn.init.normal_(module.weight, mean=0.0, std=0.02)
230
+ if module.bias is not None:
231
+ nn.init.zeros_(module.bias)
232
+ elif isinstance(module, nn.LayerNorm):
233
+ nn.init.ones_(module.weight)
234
+ nn.init.zeros_(module.bias)
235
+
236
+ def forward(
237
+ self,
238
+ x_t: torch.Tensor,
239
+ t: torch.Tensor,
240
+ lang: str = "zh",
241
+ mask: Optional[torch.Tensor] = None,
242
+ ) -> torch.Tensor:
243
+ """
244
+ x_t: [batch, seq_len, d_model]
245
+ t: [batch]
246
+ lang: "zh" 或 "en"
247
+ """
248
+ # 时间步嵌入
249
+ t_emb = self.time_embedding(t)
250
+ t_emb = self.time_mlp(t_emb)
251
+
252
+ # 语言特定输入投影
253
+ if lang == "zh":
254
+ x = self.zh_input_proj(x_t)
255
+ else:
256
+ x = self.en_input_proj(x_t)
257
+
258
+ # 添加时间信息
259
+ x = x + t_emb.unsqueeze(1)
260
+
261
+ # 共享Transformer
262
+ for layer in self.layers:
263
+ x = layer(x, mask)
264
+
265
+ # 输出归一化
266
+ x = self.output_norm(x)
267
+
268
+ # 语言特定输出投影
269
+ if lang == "zh":
270
+ noise_pred = self.zh_output_proj(x)
271
+ else:
272
+ noise_pred = self.en_output_proj(x)
273
+
274
+ return noise_pred
275
+
276
+
277
+ def create_model(config) -> DualNoisePredictor:
278
+ """创建模型"""
279
+ model = DualNoisePredictor(
280
+ d_model=config.model.d_model,
281
+ n_heads=config.model.n_heads,
282
+ n_layers=config.model.n_layers,
283
+ d_ff=config.model.d_ff,
284
+ max_len=config.model.max_len,
285
+ dropout=config.model.dropout,
286
+ )
287
+ return model
switcher.py ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 语言切换器
3
+ 判断当前噪声状态更接近哪种语言
4
+ """
5
+
6
+ import torch
7
+ import torch.nn as nn
8
+ import torch.nn.functional as F
9
+ from typing import Tuple, Optional
10
+
11
+
12
+ class LanguageSwitcher(nn.Module):
13
+ """语言切换分类器
14
+
15
+ 输入: 噪声状态 x_t [batch, seq_len, d_model]
16
+ 输出: 语言概率 [batch, 2] -> [中文概率, 英文概率]
17
+ """
18
+
19
+ def __init__(self, d_model: int = 256, hidden_dim: int = 128, dropout: float = 0.1):
20
+ super().__init__()
21
+
22
+ # 全局特征提取
23
+ self.global_pool = nn.AdaptiveAvgPool1d(1)
24
+
25
+ # 分类头
26
+ self.classifier = nn.Sequential(
27
+ nn.Linear(d_model, hidden_dim),
28
+ nn.GELU(),
29
+ nn.Dropout(dropout),
30
+ nn.Linear(hidden_dim, hidden_dim),
31
+ nn.GELU(),
32
+ nn.Dropout(dropout),
33
+ nn.Linear(hidden_dim, 2), # 2类:中文/英文
34
+ )
35
+
36
+ # 初始化
37
+ self.apply(self._init_weights)
38
+
39
+ def _init_weights(self, module):
40
+ if isinstance(module, nn.Linear):
41
+ nn.init.normal_(module.weight, mean=0.0, std=0.02)
42
+ if module.bias is not None:
43
+ nn.init.zeros_(module.bias)
44
+
45
+ def forward(self, x_t: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
46
+ """
47
+ x_t: [batch, seq_len, d_model]
48
+ mask: [batch, seq_len] 可选的mask
49
+
50
+ 返回: [batch, 2] logits (中文, 英文)
51
+ """
52
+ # 应用mask
53
+ if mask is not None:
54
+ x_t = x_t * mask.unsqueeze(-1)
55
+
56
+ # 全局池化: [batch, seq_len, d_model] -> [batch, d_model, seq_len] -> [batch, d_model, 1]
57
+ x = x_t.transpose(1, 2)
58
+ x = self.global_pool(x).squeeze(-1) # [batch, d_model]
59
+
60
+ # 分类
61
+ logits = self.classifier(x)
62
+
63
+ return logits
64
+
65
+ def predict(self, x_t: torch.Tensor, mask: Optional[torch.Tensor] = None) -> Tuple[str, float]:
66
+ """预测语言
67
+
68
+ 返回:
69
+ lang: "zh" 或 "en"
70
+ confidence: 置信度 [0, 1]
71
+ """
72
+ self.eval()
73
+ with torch.no_grad():
74
+ logits = self.forward(x_t, mask)
75
+ probs = F.softmax(logits, dim=-1)
76
+
77
+ # 取第一个样本(假设batch=1)
78
+ zh_prob = probs[0, 0].item()
79
+ en_prob = probs[0, 1].item()
80
+
81
+ if zh_prob > en_prob:
82
+ return "zh", zh_prob
83
+ else:
84
+ return "en", en_prob
85
+
86
+ def get_probabilities(self, x_t: torch.Tensor, mask: Optional[torch.Tensor] = None) -> Tuple[torch.Tensor, torch.Tensor]:
87
+ """获取中文和英文的概率
88
+
89
+ 返回:
90
+ zh_probs: [batch] 中文概率
91
+ en_probs: [batch] 英文概率
92
+ """
93
+ logits = self.forward(x_t, mask)
94
+ probs = F.softmax(logits, dim=-1)
95
+ return probs[:, 0], probs[:, 1]
96
+
97
+
98
+ class AdaptiveSwitcher(nn.Module):
99
+ """自适应语言切换器
100
+
101
+ 根据扩散时间步动态调整切换策略
102
+ - 早期(高噪声):更激进的切换
103
+ - 后期(低噪声):更保守的切换
104
+ """
105
+
106
+ def __init__(
107
+ self,
108
+ d_model: int = 256,
109
+ hidden_dim: int = 128,
110
+ dropout: float = 0.1,
111
+ switch_threshold: float = 0.6, # 切换阈值
112
+ ):
113
+ super().__init__()
114
+
115
+ self.switch_threshold = switch_threshold
116
+
117
+ # 基础切换器
118
+ self.base_switcher = LanguageSwitcher(d_model, hidden_dim, dropout)
119
+
120
+ # 时间调制
121
+ self.time_modulation = nn.Sequential(
122
+ nn.Linear(1, hidden_dim),
123
+ nn.GELU(),
124
+ nn.Linear(hidden_dim, 2),
125
+ nn.Sigmoid(),
126
+ )
127
+
128
+ def forward(
129
+ self,
130
+ x_t: torch.Tensor,
131
+ t: Optional[torch.Tensor] = None,
132
+ mask: Optional[torch.Tensor] = None,
133
+ ) -> torch.Tensor:
134
+ """
135
+ x_t: [batch, seq_len, d_model]
136
+ t: [batch] 时间步,用于调制
137
+ """
138
+ # 基础预测
139
+ logits = self.base_switcher(x_t, mask)
140
+
141
+ # 时间调制(可选)
142
+ if t is not None:
143
+ # 归一化时间
144
+ t_norm = t.float().unsqueeze(-1) / 1000.0 # [batch, 1]
145
+ modulation = self.time_modulation(t_norm) # [batch, 2]
146
+ logits = logits * modulation
147
+
148
+ return logits
149
+
150
+ def should_switch(
151
+ self,
152
+ x_t: torch.Tensor,
153
+ current_lang: str,
154
+ t: Optional[torch.Tensor] = None,
155
+ mask: Optional[torch.Tensor] = None,
156
+ ) -> Tuple[bool, str, float]:
157
+ """判断是否应该切换语言
158
+
159
+ 返回:
160
+ should_switch: 是否切换
161
+ new_lang: 新语言
162
+ confidence: 置信度
163
+ """
164
+ self.eval()
165
+ with torch.no_grad():
166
+ logits = self.forward(x_t, t, mask)
167
+ probs = F.softmax(logits, dim=-1)
168
+
169
+ zh_prob = probs[0, 0].item()
170
+ en_prob = probs[0, 1].item()
171
+
172
+ # 判断
173
+ predicted_lang = "zh" if zh_prob > en_prob else "en"
174
+ confidence = max(zh_prob, en_prob)
175
+
176
+ # 是否切换
177
+ should_switch = (
178
+ predicted_lang != current_lang and
179
+ confidence > self.switch_threshold
180
+ )
181
+
182
+ return should_switch, predicted_lang, confidence
183
+
184
+
185
+ def create_switcher(config) -> LanguageSwitcher:
186
+ """创建语言切换器"""
187
+ return LanguageSwitcher(
188
+ d_model=config.model.d_model,
189
+ hidden_dim=config.model.d_model // 2,
190
+ dropout=config.model.dropout,
191
+ )
tokenizer.py ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 分词器
3
+ 支持中文字符级和BPE
4
+ """
5
+
6
+ import os
7
+ import re
8
+ import json
9
+ import pickle
10
+ from typing import List, Dict, Optional, Tuple
11
+ from collections import Counter
12
+ from functools import lru_cache
13
+
14
+
15
+ class Tokenizer:
16
+ """基础分词器"""
17
+
18
+ def __init__(self, vocab_size: int = 8000, lang: str = "zh"):
19
+ self.vocab_size = vocab_size
20
+ self.lang = lang
21
+
22
+ # 特殊token
23
+ self.pad_token = "<pad>"
24
+ self.sos_token = "<sos>"
25
+ self.eos_token = "<eos>"
26
+ self.unk_token = "<unk>"
27
+ self.mask_token = "<mask>"
28
+
29
+ self.special_tokens = [self.pad_token, self.sos_token, self.eos_token, self.unk_token, self.mask_token]
30
+
31
+ # 词表
32
+ self.token_to_id: Dict[str, int] = {}
33
+ self.id_to_token: Dict[int, str] = {}
34
+
35
+ # BPE合并规则
36
+ self.merges: List[Tuple[str, str]] = []
37
+ self.bpe_ranks: Dict[Tuple[str, str], int] = {}
38
+
39
+ def _is_chinese(self, char: str) -> bool:
40
+ """判断是否为中文字符"""
41
+ return '\u4e00' <= char <= '\u9fff'
42
+
43
+ def _pre_tokenize(self, text: str) -> List[str]:
44
+ """预分词"""
45
+ if self.lang == "zh":
46
+ # 中文:字符级 + 保留英文单词和数字
47
+ tokens = []
48
+ current = ""
49
+ for char in text:
50
+ if self._is_chinese(char):
51
+ if current:
52
+ tokens.append(current)
53
+ current = ""
54
+ tokens.append(char)
55
+ elif char.isalnum():
56
+ current += char.lower()
57
+ else:
58
+ if current:
59
+ tokens.append(current)
60
+ current = ""
61
+ if char.strip():
62
+ tokens.append(char)
63
+ if current:
64
+ tokens.append(current)
65
+ return tokens
66
+ else:
67
+ # 英文:单词级
68
+ text = text.lower()
69
+ tokens = re.findall(r"\w+|[^\w\s]", text)
70
+ return tokens
71
+
72
+ def _get_pairs(self, word: Tuple[str, ...]) -> set:
73
+ """获取词中的所有相邻字符对"""
74
+ pairs = set()
75
+ prev = word[0]
76
+ for char in word[1:]:
77
+ pairs.add((prev, char))
78
+ prev = char
79
+ return pairs
80
+
81
+ def train_bpe(self, texts: List[str], num_merges: Optional[int] = None):
82
+ """训练BPE"""
83
+ if num_merges is None:
84
+ num_merges = self.vocab_size - len(self.special_tokens) - 100
85
+
86
+ # 统计词频
87
+ print(f" 统计词频 ({len(texts)} 文本)...", end="", flush=True)
88
+ word_freqs: Counter = Counter()
89
+ for text in texts:
90
+ for token in self._pre_tokenize(text):
91
+ # 将token拆分为字符序列
92
+ chars = tuple(token) + ('</w>',)
93
+ word_freqs[chars] += 1
94
+ print(f" {len(word_freqs)} 词")
95
+
96
+ # BPE合并
97
+ print(f" BPE合并 ({num_merges} 轮)...", end="", flush=True)
98
+ self.merges = []
99
+ last_print = 0
100
+ for i in range(num_merges):
101
+ # 统计相邻字符对频率
102
+ pairs: Counter = Counter()
103
+ for word, freq in word_freqs.items():
104
+ pairs_in_word = self._get_pairs(word)
105
+ for pair in pairs_in_word:
106
+ pairs[pair] += freq
107
+
108
+ if not pairs:
109
+ break
110
+
111
+ # 找最高频的pair
112
+ best_pair = max(pairs, key=pairs.get)
113
+ self.merges.append(best_pair)
114
+
115
+ # 合并所有词中的该pair
116
+ new_word_freqs: Counter = Counter()
117
+ bigram = re.escape(' '.join(best_pair))
118
+ pattern = re.compile(r'(?<!\S)' + bigram + r'(?!\S)')
119
+
120
+ for word, freq in word_freqs.items():
121
+ new_word = ' '.join(word)
122
+ new_word = pattern.sub(''.join(best_pair), new_word)
123
+ new_word = tuple(new_word.split())
124
+ new_word_freqs[new_word] += freq
125
+
126
+ word_freqs = new_word_freqs
127
+
128
+ # 每1000轮打印进度
129
+ if i - last_print >= 100:
130
+ print(f".{(i+1)//100}k", end="", flush=True)
131
+ last_print = i
132
+
133
+ print(f" 完成")
134
+
135
+ # 构建词表
136
+ self._build_vocab(word_freqs)
137
+
138
+ def _build_vocab(self, word_freqs: Counter):
139
+ """构建词表"""
140
+ # 特殊token
141
+ for i, token in enumerate(self.special_tokens):
142
+ self.token_to_id[token] = i
143
+ self.id_to_token[i] = token
144
+
145
+ # 收集所有token
146
+ vocab = set()
147
+ for word in word_freqs.keys():
148
+ for token in word:
149
+ if token != '</w>':
150
+ vocab.add(token)
151
+
152
+ # 添加合并后的token
153
+ for pair in self.merges:
154
+ vocab.add(''.join(pair))
155
+
156
+ # 按频率排序并截断
157
+ sorted_vocab = sorted(vocab)
158
+ for i, token in enumerate(sorted_vocab[:self.vocab_size - len(self.special_tokens)]):
159
+ idx = i + len(self.special_tokens)
160
+ self.token_to_id[token] = idx
161
+ self.id_to_token[idx] = token
162
+
163
+ def _apply_bpe(self, token: str) -> List[str]:
164
+ """对单个token应用BPE"""
165
+ if not token:
166
+ return []
167
+
168
+ word = tuple(token) + ('</w>',)
169
+
170
+ while True:
171
+ pairs = self._get_pairs(word)
172
+ if not pairs:
173
+ break
174
+
175
+ # 找到rank最高的pair
176
+ min_pair = None
177
+ min_rank = float('inf')
178
+ for pair in pairs:
179
+ rank = self.bpe_ranks.get(pair, float('inf'))
180
+ if rank < min_rank:
181
+ min_rank = rank
182
+ min_pair = pair
183
+
184
+ if min_pair is None or min_rank == float('inf'):
185
+ break
186
+
187
+ # 合并
188
+ new_word = []
189
+ i = 0
190
+ while i < len(word):
191
+ if i < len(word) - 1 and word[i] == min_pair[0] and word[i + 1] == min_pair[1]:
192
+ new_word.append(min_pair[0] + min_pair[1])
193
+ i += 2
194
+ else:
195
+ new_word.append(word[i])
196
+ i += 1
197
+ word = tuple(new_word)
198
+
199
+ # 移除</w>标记
200
+ return [t.replace('</w>', '') for t in word if t.replace('</w>', '')]
201
+
202
+ def encode(self, text: str, add_sos: bool = False, add_eos: bool = False) -> List[int]:
203
+ """编码文本为token id序列"""
204
+ # 缓存检查
205
+ cache_key = (text, add_sos, add_eos)
206
+ if hasattr(self, '_encode_cache') and cache_key in self._encode_cache:
207
+ return self._encode_cache[cache_key]
208
+
209
+ tokens = self._pre_tokenize(text)
210
+
211
+ ids = []
212
+ if add_sos:
213
+ ids.append(self.token_to_id[self.sos_token])
214
+
215
+ for token in tokens:
216
+ bpe_tokens = self._apply_bpe(token)
217
+ for t in bpe_tokens:
218
+ ids.append(self.token_to_id.get(t, self.token_to_id[self.unk_token]))
219
+
220
+ if add_eos:
221
+ ids.append(self.token_to_id[self.eos_token])
222
+
223
+ # 缓存结果(限制缓存大小)
224
+ if not hasattr(self, '_encode_cache'):
225
+ self._encode_cache = {}
226
+ if len(self._encode_cache) < 100000: # 最多缓存10万条
227
+ self._encode_cache[cache_key] = ids
228
+
229
+ return ids
230
+
231
+ def decode(self, ids: List[int], skip_special: bool = True) -> str:
232
+ """解码token id序列为文本"""
233
+ tokens = []
234
+ for id in ids:
235
+ token = self.id_to_token.get(id, self.unk_token)
236
+ if skip_special and token in self.special_tokens:
237
+ continue
238
+ # 移除BPE的</w>标记
239
+ token = token.replace('</w>', '')
240
+ if token: # 跳过空token
241
+ tokens.append(token)
242
+
243
+ if self.lang == "en":
244
+ # 英文:BPE子词之间用空格连接,然后清理多余空格
245
+ text = ' '.join(tokens)
246
+ # 标点符号前移除空格
247
+ text = re.sub(r'\s+([.,!?;:\'\"])', r'\1', text)
248
+ # 标点符号后添加空格(如果后面有字母)
249
+ text = re.sub(r'([.,!?;:])([a-zA-Z])', r'\1 \2', text)
250
+ # 清理多余空格
251
+ text = re.sub(r'\s+', ' ', text).strip()
252
+ else:
253
+ # 中文:直接拼接
254
+ text = ''.join(tokens)
255
+
256
+ return text
257
+
258
+ @property
259
+ def vocab_size_actual(self) -> int:
260
+ """实际词表大小"""
261
+ return len(self.token_to_id)
262
+
263
+ @property
264
+ def pad_id(self) -> int:
265
+ return self.token_to_id[self.pad_token]
266
+
267
+ @property
268
+ def sos_id(self) -> int:
269
+ return self.token_to_id[self.sos_token]
270
+
271
+ @property
272
+ def eos_id(self) -> int:
273
+ return self.token_to_id[self.eos_token]
274
+
275
+ @property
276
+ def unk_id(self) -> int:
277
+ return self.token_to_id[self.unk_token]
278
+
279
+ def save(self, path: str):
280
+ """保存分词器"""
281
+ data = {
282
+ 'vocab_size': self.vocab_size,
283
+ 'lang': self.lang,
284
+ 'token_to_id': self.token_to_id,
285
+ 'id_to_token': {int(k): v for k, v in self.id_to_token.items()},
286
+ 'merges': self.merges,
287
+ 'special_tokens': self.special_tokens,
288
+ }
289
+ with open(path, 'w', encoding='utf-8') as f:
290
+ json.dump(data, f, ensure_ascii=False, indent=2)
291
+
292
+ @classmethod
293
+ def load(cls, path: str) -> "Tokenizer":
294
+ """加载分词器"""
295
+ with open(path, 'r', encoding='utf-8') as f:
296
+ data = json.load(f)
297
+
298
+ tokenizer = cls(vocab_size=data['vocab_size'], lang=data['lang'])
299
+ tokenizer.token_to_id = data['token_to_id']
300
+ tokenizer.id_to_token = {int(k): v for k, v in data['id_to_token'].items()}
301
+ tokenizer.merges = [tuple(m) for m in data['merges']]
302
+ tokenizer.bpe_ranks = {pair: i for i, pair in enumerate(tokenizer.merges)}
303
+ tokenizer.special_tokens = data['special_tokens']
304
+
305
+ return tokenizer
306
+
307
+ def __len__(self) -> int:
308
+ return self.vocab_size_actual
309
+
310
+
311
+ def train_tokenizers(config, zh_texts: List[str], en_texts: List[str]) -> Tuple[Tokenizer, Tokenizer]:
312
+ """训练中英文分词器"""
313
+ print("训练中文分词器...")
314
+ zh_tokenizer = Tokenizer(vocab_size=config.model.vocab_size_zh, lang="zh")
315
+ zh_tokenizer.train_bpe(zh_texts)
316
+ zh_tokenizer.bpe_ranks = {pair: i for i, pair in enumerate(zh_tokenizer.merges)}
317
+
318
+ print("训练英文分词器...")
319
+ en_tokenizer = Tokenizer(vocab_size=config.model.vocab_size_en, lang="en")
320
+ en_tokenizer.train_bpe(en_texts)
321
+ en_tokenizer.bpe_ranks = {pair: i for i, pair in enumerate(en_tokenizer.merges)}
322
+
323
+ print(f"中文词表大小: {zh_tokenizer.vocab_size_actual}")
324
+ print(f"英文词表大小: {en_tokenizer.vocab_size_actual}")
325
+
326
+ return zh_tokenizer, en_tokenizer
train.py ADDED
@@ -0,0 +1,447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 训练脚本
3
+ 支持快速验证和完整训练,可暂停和恢复
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import signal
9
+ import argparse
10
+ import time
11
+ from typing import Optional
12
+ from datetime import datetime
13
+
14
+ import torch
15
+
16
+ # 设置PyTorch使用所有CPU核心
17
+ torch.set_num_threads(os.cpu_count())
18
+ # 启用OpenMP并行
19
+ os.environ['OMP_NUM_THREADS'] = str(os.cpu_count())
20
+ os.environ['MKL_NUM_THREADS'] = str(os.cpu_count())
21
+
22
+ import torch.nn as nn
23
+ import torch.optim as optim
24
+ from torch.optim.lr_scheduler import OneCycleLR
25
+
26
+ from config import Config
27
+ from tokenizer import Tokenizer, train_tokenizers
28
+ from dataset import load_all_data, create_dataloaders
29
+ from embedding import DualLanguageEmbedding, DualOutputProjection
30
+ from model import create_model
31
+ from diffusion import get_diffusion, NoiseScheduler
32
+ from switcher import create_switcher
33
+ from utils import ProgressTracker, count_parameters, format_number, save_checkpoint, load_checkpoint
34
+
35
+
36
+ class Trainer:
37
+ """训练器"""
38
+
39
+ def __init__(self, config: Config):
40
+ self.config = config
41
+ self.device = torch.device("cpu") # CPU训练
42
+
43
+ # 初始化组件
44
+ self._init_components()
45
+
46
+ # 训练状态
47
+ self.current_epoch = 0
48
+ self.global_step = 0
49
+ self.best_loss = float('inf')
50
+ self.should_stop = False
51
+
52
+ # 注册信号处理
53
+ signal.signal(signal.SIGINT, self._signal_handler)
54
+ signal.signal(signal.SIGTERM, self._signal_handler)
55
+
56
+ def _init_components(self):
57
+ """初始化所有组件"""
58
+ print("初始化训练组件...")
59
+
60
+ # 加载或训练分词器
61
+ tokenizer_path = os.path.join(self.config.project_dir, self.config.data.cache_dir)
62
+ zh_tokenizer_path = os.path.join(tokenizer_path, "tokenizer_zh.json")
63
+ en_tokenizer_path = os.path.join(tokenizer_path, "tokenizer_en.json")
64
+
65
+ if os.path.exists(zh_tokenizer_path) and os.path.exists(en_tokenizer_path):
66
+ print(" 加载已有分词器...")
67
+ self.zh_tokenizer = Tokenizer.load(zh_tokenizer_path)
68
+ self.en_tokenizer = Tokenizer.load(en_tokenizer_path)
69
+ else:
70
+ print(" 训练分词器...")
71
+ # 先加载数据用于训练分词器
72
+ train_pairs, _, _ = load_all_data(self.config)
73
+ zh_texts = [p.zh for p in train_pairs]
74
+ en_texts = [p.en for p in train_pairs]
75
+ self.zh_tokenizer, self.en_tokenizer = train_tokenizers(
76
+ self.config, zh_texts, en_texts
77
+ )
78
+ self.zh_tokenizer.save(zh_tokenizer_path)
79
+ self.en_tokenizer.save(en_tokenizer_path)
80
+
81
+ # 数据集
82
+ print(" 加载数据集...")
83
+ train_pairs, val_pairs, test_pairs = load_all_data(self.config)
84
+ self.train_loader, self.val_loader = create_dataloaders(
85
+ train_pairs, val_pairs,
86
+ self.zh_tokenizer, self.en_tokenizer,
87
+ self.config
88
+ )
89
+
90
+ # 嵌入层
91
+ print(" 初始化嵌入层...")
92
+ self.embedding = DualLanguageEmbedding(
93
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
94
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
95
+ d_model=self.config.model.d_model,
96
+ max_len=self.config.model.max_len,
97
+ dropout=self.config.model.dropout,
98
+ )
99
+
100
+ # 输出投影
101
+ self.output_proj = DualOutputProjection(
102
+ d_model=self.config.model.d_model,
103
+ vocab_size_zh=self.zh_tokenizer.vocab_size_actual,
104
+ vocab_size_en=self.en_tokenizer.vocab_size_actual,
105
+ )
106
+
107
+ # 噪声预测模型
108
+ print(" 初始化模型...")
109
+ self.model = create_model(self.config)
110
+
111
+ # 语言切换器
112
+ self.switcher = create_switcher(self.config)
113
+
114
+ # 扩散过程
115
+ self.diffusion, self.ddim_sampler = get_diffusion(self.config)
116
+ self.scheduler = self.diffusion.scheduler.to(self.device)
117
+
118
+ # 优化器
119
+ all_params = (
120
+ list(self.embedding.parameters()) +
121
+ list(self.output_proj.parameters()) +
122
+ list(self.model.parameters()) +
123
+ list(self.switcher.parameters())
124
+ )
125
+ self.optimizer = optim.AdamW(
126
+ all_params,
127
+ lr=self.config.training.learning_rate,
128
+ weight_decay=self.config.training.weight_decay,
129
+ )
130
+
131
+ # 学习率调度器
132
+ total_steps = len(self.train_loader) * self.config.training.epochs
133
+ self.lr_scheduler = OneCycleLR(
134
+ self.optimizer,
135
+ max_lr=self.config.training.learning_rate,
136
+ total_steps=total_steps,
137
+ pct_start=0.1,
138
+ anneal_strategy='cos',
139
+ )
140
+
141
+ # 损失函数
142
+ self.mse_loss = nn.MSELoss()
143
+ self.ce_loss = nn.CrossEntropyLoss()
144
+
145
+ # 打印模型信息
146
+ total_params = sum(count_parameters(m) for m in [self.embedding, self.output_proj, self.model, self.switcher])
147
+ print(f" 总参数量: {format_number(total_params)}")
148
+
149
+ def _signal_handler(self, signum, frame):
150
+ """信号处理:保存模型并退出"""
151
+ print("\n\n收到中断信号,保存检查点...")
152
+ self._save_checkpoint("interrupted")
153
+ self.should_stop = True
154
+
155
+ def _save_checkpoint(self, name: str):
156
+ """保存检查点"""
157
+ checkpoint_dir = os.path.join(self.config.project_dir, self.config.training.checkpoint_dir)
158
+ os.makedirs(checkpoint_dir, exist_ok=True)
159
+
160
+ path = os.path.join(checkpoint_dir, f"{name}.pt")
161
+
162
+ state = {
163
+ 'epoch': self.current_epoch,
164
+ 'global_step': self.global_step,
165
+ 'best_loss': self.best_loss,
166
+ 'embedding': self.embedding.state_dict(),
167
+ 'output_proj': self.output_proj.state_dict(),
168
+ 'model': self.model.state_dict(),
169
+ 'switcher': self.switcher.state_dict(),
170
+ 'optimizer': self.optimizer.state_dict(),
171
+ 'lr_scheduler': self.lr_scheduler.state_dict(),
172
+ 'config': self.config,
173
+ }
174
+
175
+ torch.save(state, path)
176
+ print(f" 检查点已保存: {path}")
177
+
178
+ def _load_checkpoint(self, path: str):
179
+ """加载检查点"""
180
+ state = torch.load(path, map_location=self.device, weights_only=False)
181
+
182
+ self.current_epoch = state['epoch']
183
+ self.global_step = state['global_step']
184
+ self.best_loss = state['best_loss']
185
+
186
+ self.embedding.load_state_dict(state['embedding'])
187
+ self.output_proj.load_state_dict(state['output_proj'])
188
+ self.model.load_state_dict(state['model'])
189
+ self.switcher.load_state_dict(state['switcher'])
190
+ self.optimizer.load_state_dict(state['optimizer'])
191
+ self.lr_scheduler.load_state_dict(state['lr_scheduler'])
192
+
193
+ print(f" 从检查点恢复: epoch={self.current_epoch}, step={self.global_step}")
194
+
195
+ def train_step(self, batch: dict) -> dict:
196
+ """单步训练"""
197
+ # 获取数据
198
+ zh_ids = batch['zh_ids'].to(self.device)
199
+ en_ids = batch['en_ids'].to(self.device)
200
+ zh_lens = batch['zh_lens'].to(self.device)
201
+ en_lens = batch['en_lens'].to(self.device)
202
+
203
+ batch_size = zh_ids.size(0)
204
+
205
+ # 嵌入
206
+ zh_emb = self.embedding(zh_ids, 'zh', zh_lens)
207
+ en_emb = self.embedding(en_ids, 'en', en_lens)
208
+
209
+ # 随机时间步
210
+ t_zh = torch.randint(0, self.config.diffusion.timesteps, (batch_size,), device=self.device)
211
+ t_en = torch.randint(0, self.config.diffusion.timesteps, (batch_size,), device=self.device)
212
+
213
+ # 前向扩散
214
+ zh_noisy, zh_noise = self.diffusion.q_sample(zh_emb, t_zh)
215
+ en_noisy, en_noise = self.diffusion.q_sample(en_emb, t_en)
216
+
217
+ # 预测噪声
218
+ zh_noise_pred = self.model(zh_noisy, t_zh, lang='zh')
219
+ en_noise_pred = self.model(en_noisy, t_en, lang='en')
220
+
221
+ # 噪声预测损失
222
+ loss_noise_zh = self.mse_loss(zh_noise_pred, zh_noise)
223
+ loss_noise_en = self.mse_loss(en_noise_pred, en_noise)
224
+
225
+ # 语言切换损失
226
+ # 标签: 0=中文, 1=英文
227
+ zh_labels = torch.zeros(batch_size, dtype=torch.long, device=self.device)
228
+ en_labels = torch.ones(batch_size, dtype=torch.long, device=self.device)
229
+
230
+ zh_switch_logits = self.switcher(zh_noisy)
231
+ en_switch_logits = self.switcher(en_noisy)
232
+
233
+ loss_switch = (
234
+ self.ce_loss(zh_switch_logits, zh_labels) +
235
+ self.ce_loss(en_switch_logits, en_labels)
236
+ ) / 2
237
+
238
+ # 总损失
239
+ loss = loss_noise_zh + loss_noise_en + 0.1 * loss_switch
240
+
241
+ # 反向传播(梯度累积)
242
+ loss = loss / self.config.training.gradient_accumulation
243
+ loss.backward()
244
+
245
+ return {
246
+ 'loss': loss.item() * self.config.training.gradient_accumulation,
247
+ 'loss_noise_zh': loss_noise_zh.item(),
248
+ 'loss_noise_en': loss_noise_en.item(),
249
+ 'loss_switch': loss_switch.item(),
250
+ }
251
+
252
+ def train_epoch(self, epoch: int) -> float:
253
+ """训练一个epoch"""
254
+ self.model.train()
255
+ self.embedding.train()
256
+ self.output_proj.train()
257
+ self.switcher.train()
258
+
259
+ total_loss = 0
260
+ num_batches = len(self.train_loader)
261
+
262
+ tracker = ProgressTracker(
263
+ total_steps=num_batches,
264
+ desc=f"Epoch {epoch}/{self.config.training.epochs}"
265
+ )
266
+
267
+ batch_size = self.config.training.batch_size
268
+
269
+ for batch_idx, batch in enumerate(self.train_loader):
270
+ if self.should_stop:
271
+ break
272
+
273
+ # 训练步骤
274
+ metrics = self.train_step(batch)
275
+ total_loss += metrics['loss']
276
+
277
+ # 梯度累积
278
+ if (batch_idx + 1) % self.config.training.gradient_accumulation == 0:
279
+ # 梯度裁剪
280
+ torch.nn.utils.clip_grad_norm_(
281
+ list(self.embedding.parameters()) +
282
+ list(self.output_proj.parameters()) +
283
+ list(self.model.parameters()) +
284
+ list(self.switcher.parameters()),
285
+ 1.0
286
+ )
287
+
288
+ # 更新参数
289
+ self.optimizer.step()
290
+ self.lr_scheduler.step()
291
+ self.optimizer.zero_grad()
292
+
293
+ self.global_step += 1
294
+
295
+ # 更新进度
296
+ tracker.update(batch_idx + 1, metrics['loss'])
297
+
298
+ # 每个batch都打印进度(实时反馈)
299
+ samples_speed = tracker.count * batch_size / tracker.elapsed if tracker.elapsed > 0 else 0
300
+ progress_str = tracker.format_progress(metrics['loss'])
301
+ progress_str = progress_str.replace("it/s", f"samples/s")
302
+ print(f"\r{progress_str} ({samples_speed:.0f} samples/s)", end="", flush=True)
303
+
304
+ print() # 换行
305
+ return total_loss / num_batches
306
+
307
+ @torch.no_grad()
308
+ def validate(self) -> float:
309
+ """验证"""
310
+ self.model.eval()
311
+ self.embedding.eval()
312
+ self.output_proj.eval()
313
+ self.switcher.eval()
314
+
315
+ total_loss = 0
316
+ num_batches = min(len(self.val_loader), 50) # 限制验证步数
317
+
318
+ for batch_idx, batch in enumerate(self.val_loader):
319
+ if batch_idx >= num_batches:
320
+ break
321
+
322
+ zh_ids = batch['zh_ids'].to(self.device)
323
+ en_ids = batch['en_ids'].to(self.device)
324
+ zh_lens = batch['zh_lens'].to(self.device)
325
+ en_lens = batch['en_lens'].to(self.device)
326
+
327
+ batch_size = zh_ids.size(0)
328
+
329
+ # 嵌入
330
+ zh_emb = self.embedding(zh_ids, 'zh', zh_lens)
331
+ en_emb = self.embedding(en_ids, 'en', en_lens)
332
+
333
+ # 随机时间步
334
+ t = torch.randint(0, self.config.diffusion.timesteps, (batch_size,), device=self.device)
335
+
336
+ # 前向扩散
337
+ zh_noisy, zh_noise = self.diffusion.q_sample(zh_emb, t)
338
+ en_noisy, en_noise = self.diffusion.q_sample(en_emb, t)
339
+
340
+ # 预测噪声
341
+ zh_noise_pred = self.model(zh_noisy, t, lang='zh')
342
+ en_noise_pred = self.model(en_noisy, t, lang='en')
343
+
344
+ # 损失
345
+ loss = self.mse_loss(zh_noise_pred, zh_noise) + self.mse_loss(en_noise_pred, en_noise)
346
+ total_loss += loss.item()
347
+
348
+ return total_loss / num_batches
349
+
350
+ def train(self):
351
+ """完整训练"""
352
+ print("\n" + "=" * 60)
353
+ print("开始训练")
354
+ print("=" * 60)
355
+
356
+ start_time = time.time()
357
+
358
+ for epoch in range(self.current_epoch + 1, self.config.training.epochs + 1):
359
+ if self.should_stop:
360
+ break
361
+
362
+ self.current_epoch = epoch
363
+
364
+ # 训练
365
+ train_loss = self.train_epoch(epoch)
366
+
367
+ # 验证
368
+ val_loss = self.validate()
369
+
370
+ # 打印结果
371
+ print(f"\nEpoch {epoch} 完成:")
372
+ print(f" 训练损失: {train_loss:.4f}")
373
+ print(f" 验证损失: {val_loss:.4f}")
374
+
375
+ # 保存检查点
376
+ if epoch % self.config.training.save_every == 0:
377
+ self._save_checkpoint(f"epoch_{epoch}")
378
+
379
+ # 保存最佳模型
380
+ if val_loss < self.best_loss:
381
+ self.best_loss = val_loss
382
+ self._save_checkpoint("best")
383
+ print(" 新的最佳模型!")
384
+
385
+ # 训练完成
386
+ elapsed = time.time() - start_time
387
+ print("\n" + "=" * 60)
388
+ print(f"训练完成! 总用时: {elapsed/60:.1f} 分钟")
389
+ print(f"最佳验证损失: {self.best_loss:.4f}")
390
+ print("=" * 60)
391
+
392
+
393
+ def main():
394
+ parser = argparse.ArgumentParser(description="Diffutslator 训练脚本")
395
+
396
+ # 模式
397
+ parser.add_argument("--quick", action="store_true", help="快速验证模式")
398
+ parser.add_argument("--full", action="store_true", help="完整训练模式")
399
+
400
+ # 参数覆盖
401
+ parser.add_argument("--samples", type=int, default=None, help="使用的数据量")
402
+ parser.add_argument("--epochs", type=int, default=None, help="训练轮数")
403
+ parser.add_argument("--batch-size", type=int, default=None, help="批量大小")
404
+ parser.add_argument("--resume", type=str, default=None, help="恢复训练的检查点路径")
405
+
406
+ args = parser.parse_args()
407
+
408
+ # 创建��置
409
+ if args.quick:
410
+ config = Config.quick()
411
+ print("模式: 快速验证")
412
+ else:
413
+ config = Config()
414
+ print("模式: 完整训练")
415
+
416
+ # 覆盖参数
417
+ if args.samples:
418
+ config.data.max_samples = args.samples
419
+ if args.epochs:
420
+ config.training.epochs = args.epochs
421
+ if args.batch_size:
422
+ config.training.batch_size = args.batch_size
423
+ if args.resume:
424
+ config.training.resume = args.resume
425
+
426
+ # 打印配置
427
+ print(f"\n配置:")
428
+ print(f" 数据量: {config.data.max_samples or '全部'}")
429
+ print(f" 批量大小: {config.training.batch_size}")
430
+ print(f" 梯度累积: {config.training.gradient_accumulation}")
431
+ print(f" 有效批量: {config.training.batch_size * config.training.gradient_accumulation}")
432
+ print(f" 训练轮数: {config.training.epochs}")
433
+ print(f" 学习率: {config.training.learning_rate}")
434
+
435
+ # 创建训练器
436
+ trainer = Trainer(config)
437
+
438
+ # 恢复训练
439
+ if config.training.resume:
440
+ trainer._load_checkpoint(config.training.resume)
441
+
442
+ # 开始训练
443
+ trainer.train()
444
+
445
+
446
+ if __name__ == "__main__":
447
+ main()
utils.py ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 工具函数
3
+ """
4
+
5
+ import time
6
+ import math
7
+ from typing import Optional
8
+ from datetime import datetime
9
+
10
+
11
+ class Timer:
12
+ """计时器,用于统计训练速度"""
13
+
14
+ def __init__(self):
15
+ self.start_time = None
16
+ self.elapsed = 0
17
+ self.count = 0
18
+
19
+ def start(self):
20
+ self.start_time = time.time()
21
+
22
+ def stop(self):
23
+ if self.start_time:
24
+ self.elapsed += time.time() - self.start_time
25
+ self.count += 1
26
+ self.start_time = None
27
+
28
+ def reset(self):
29
+ self.elapsed = 0
30
+ self.count = 0
31
+ self.start_time = None
32
+
33
+ @property
34
+ def avg_time(self) -> float:
35
+ if self.count == 0:
36
+ return 0
37
+ return self.elapsed / self.count
38
+
39
+ @property
40
+ def speed(self) -> float:
41
+ if self.elapsed == 0:
42
+ return 0
43
+ return self.count / self.elapsed
44
+
45
+
46
+ class ProgressTracker:
47
+ """训练进度追踪器"""
48
+
49
+ def __init__(self, total_steps: int, desc: str = "Training"):
50
+ self.total_steps = total_steps
51
+ self.desc = desc
52
+ self.current_step = 0
53
+ self.start_time = time.time()
54
+ self.loss_history = []
55
+
56
+ @property
57
+ def elapsed(self) -> float:
58
+ """已用时间"""
59
+ return time.time() - self.start_time
60
+
61
+ @property
62
+ def count(self) -> int:
63
+ """已处理步数"""
64
+ return self.current_step
65
+
66
+ def update(self, step: int, loss: Optional[float] = None):
67
+ self.current_step = step
68
+ if loss is not None:
69
+ self.loss_history.append(loss)
70
+
71
+ def format_progress(self, current_loss: Optional[float] = None) -> str:
72
+ """格式化进度显示"""
73
+ elapsed = time.time() - self.start_time
74
+ progress = self.current_step / self.total_steps
75
+
76
+ # 预计剩余时间
77
+ if progress > 0:
78
+ eta = elapsed / progress - elapsed
79
+ eta_str = self._format_time(eta)
80
+ else:
81
+ eta_str = "--:--:--"
82
+
83
+ # 速度
84
+ speed = self.current_step / elapsed if elapsed > 0 else 0
85
+
86
+ # 进度条
87
+ bar_len = 30
88
+ filled = int(bar_len * progress)
89
+ bar = "█" * filled + "░" * (bar_len - filled)
90
+
91
+ # 损失
92
+ loss_str = f"loss={current_loss:.4f}" if current_loss is not None else ""
93
+
94
+ return f"{self.desc}: |{bar}| {self.current_step}/{self.total_steps} [{self._format_time(elapsed)}<{eta_str}, {speed:.2f}it/s] {loss_str}"
95
+
96
+ @staticmethod
97
+ def _format_time(seconds: float) -> str:
98
+ if seconds < 0:
99
+ return "--:--:--"
100
+ hours = int(seconds // 3600)
101
+ minutes = int((seconds % 3600) // 60)
102
+ secs = int(seconds % 60)
103
+ return f"{hours:02d}:{minutes:02d}:{secs:02d}"
104
+
105
+
106
+ def cosine_similarity(a, b):
107
+ """计算余弦相似度"""
108
+ import torch
109
+ return torch.nn.functional.cosine_similarity(a, b, dim=-1)
110
+
111
+
112
+ def count_parameters(model) -> int:
113
+ """计算模型参数量"""
114
+ return sum(p.numel() for p in model.parameters() if p.requires_grad)
115
+
116
+
117
+ def format_number(n: int) -> str:
118
+ """格式化数字,添加千分位"""
119
+ if n >= 1_000_000:
120
+ return f"{n/1_000_000:.1f}M"
121
+ elif n >= 1_000:
122
+ return f"{n/1_000:.1f}K"
123
+ return str(n)
124
+
125
+
126
+ def get_timestamp() -> str:
127
+ """获取时间戳字符串"""
128
+ return datetime.now().strftime("%Y%m%d_%H%M%S")
129
+
130
+
131
+ def ensure_dir(path: str):
132
+ """确保目录存在"""
133
+ import os
134
+ os.makedirs(path, exist_ok=True)
135
+
136
+
137
+ def save_checkpoint(model, optimizer, epoch: int, step: int, loss: float, path: str):
138
+ """保存检查点"""
139
+ import torch
140
+ torch.save({
141
+ 'epoch': epoch,
142
+ 'step': step,
143
+ 'loss': loss,
144
+ 'model_state_dict': model.state_dict(),
145
+ 'optimizer_state_dict': optimizer.state_dict(),
146
+ }, path)
147
+
148
+
149
+ def load_checkpoint(model, optimizer, path: str):
150
+ """加载检查点"""
151
+ import torch
152
+ checkpoint = torch.load(path, map_location='cpu', weights_only=False)
153
+ model.load_state_dict(checkpoint['model_state_dict'])
154
+ optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
155
+ return checkpoint['epoch'], checkpoint['step'], checkpoint['loss']
156
+
157
+
158
+ class EarlyStopping:
159
+ """早停机制"""
160
+
161
+ def __init__(self, patience: int = 5, min_delta: float = 0.001):
162
+ self.patience = patience
163
+ self.min_delta = min_delta
164
+ self.counter = 0
165
+ self.best_loss = float('inf')
166
+ self.should_stop = False
167
+
168
+ def __call__(self, loss: float) -> bool:
169
+ if loss < self.best_loss - self.min_delta:
170
+ self.best_loss = loss
171
+ self.counter = 0
172
+ else:
173
+ self.counter += 1
174
+ if self.counter >= self.patience:
175
+ self.should_stop = True
176
+ return self.should_stop