tokenizer-tr / tokenizer_config.json

Commit History

Final Fix: Correct Metaspace mapping and Unigram scores
fbc7e19
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
bdeefea
verified

suchirsalhan commited on

Fix: Robust SPM-to-HF conversion with merges and byte-fallback
e262e6c
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
7db73b0
verified

suchirsalhan commited on

Fix: Used Llama-style identity wrapping to preserve SPM IDs and fix spacing
e2ceb06
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
dc7648a
verified

suchirsalhan commited on

Fix: Reverted to native SentencePiece handling (removed ByteLevel mismatch)
09ce278
verified

suchirsalhan commited on

Fix: Applied ByteLevel pre-tokenization and decoding for proper spacing/hex handling
9b3d785
verified

suchirsalhan commited on

Fix: Clean base tokenizer using universal metaspace decoding
ab1e08c
verified

suchirsalhan commited on

Fix: Universal Metaspace decoding without keyword conflicts
4a81f0a
verified

suchirsalhan commited on

Fix: Manual vocab extraction from spm.model pieces
1a420bc
verified

suchirsalhan commited on

Fix: Environment issues resolved, full Fast Tokenizer uploaded.
a59fbbf
verified

suchirsalhan commited on

Upload folder using huggingface_hub
3a62a5c
verified

suchirsalhan commited on

Upload folder using huggingface_hub
3127597
verified

suchirsalhan commited on

Upload folder using huggingface_hub
34e2259
verified

suchirsalhan commited on

Upload folder using huggingface_hub
7190372
verified

suchirsalhan commited on

Upload folder using huggingface_hub
1315cb9
verified

suchirsalhan commited on

Upload folder using huggingface_hub
94cdd10
verified

suchirsalhan commited on