tokenizer-de / tokenizer.json

Commit History

Final Rescue: Full vocab restoration with Unigram scores and Metaspace fix
4efabc8
verified

suchirsalhan commited on

Final Fix: Correct Metaspace mapping and Unigram scores
01c823f
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
e899c37
verified

suchirsalhan commited on

Fix: Robust SPM-to-HF conversion with merges and byte-fallback
75dbc72
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
5665244
verified

suchirsalhan commited on

Fix: Used Llama-style identity wrapping to preserve SPM IDs and fix spacing
ec89e42
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
f500518
verified

suchirsalhan commited on

Fix: Reverted to native SentencePiece handling (removed ByteLevel mismatch)
78bf40b
verified

suchirsalhan commited on

Fix: Applied ByteLevel pre-tokenization and decoding for proper spacing/hex handling
ce9873a
verified

suchirsalhan commited on

Fix: Universal Metaspace decoding without keyword conflicts
70dec91
verified

suchirsalhan commited on

Fix: Manual vocab extraction from spm.model pieces
233ea46
verified

suchirsalhan commited on

Upload folder using huggingface_hub
631d145
verified

suchirsalhan commited on

Upload folder using huggingface_hub
457e447
verified

suchirsalhan commited on

Upload folder using huggingface_hub
5e768c6
verified

suchirsalhan commited on

Upload folder using huggingface_hub
e002c4d
verified

suchirsalhan commited on

Upload folder using huggingface_hub
da8ffb7
verified

suchirsalhan commited on

Upload folder using huggingface_hub
bf6fc1d
verified

suchirsalhan commited on