tokenizer-fr / tokenizer.json

Commit History

Final Rescue: Full vocab restoration with Unigram scores and Metaspace fix
1076a6f
verified

suchirsalhan commited on

Final Fix: Correct Metaspace mapping and Unigram scores
f0177df
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
0deacac
verified

suchirsalhan commited on

Fix: Robust SPM-to-HF conversion with merges and byte-fallback
a592f91
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
b625d9b
verified

suchirsalhan commited on

Fix: Used Llama-style identity wrapping to preserve SPM IDs and fix spacing
5071bc9
verified

suchirsalhan commited on

Fix: Final Metaspace decoding using exact vocab marker
b964afc
verified

suchirsalhan commited on

Fix: Reverted to native SentencePiece handling (removed ByteLevel mismatch)
bb1906f
verified

suchirsalhan commited on

Fix: Applied ByteLevel pre-tokenization and decoding for proper spacing/hex handling
d6cebd8
verified

suchirsalhan commited on

Fix: Universal Metaspace decoding without keyword conflicts
fc7fbfd
verified

suchirsalhan commited on

Fix: Manual vocab extraction from spm.model pieces
3b98b43
verified

suchirsalhan commited on

Upload folder using huggingface_hub
56d2add
verified

suchirsalhan commited on

Upload folder using huggingface_hub
4427689
verified

suchirsalhan commited on

Upload folder using huggingface_hub
ada141f
verified

suchirsalhan commited on

Upload folder using huggingface_hub
699205a
verified

suchirsalhan commited on

Upload folder using huggingface_hub
1289fc4
verified

suchirsalhan commited on

Upload folder using huggingface_hub
31579c9
verified

suchirsalhan commited on