ModernBERT-large-tokenizer-fix / tokenizer_config.json

Commit History

fix: also use `add_prefix_space = True` in tokenizer config
d36a72c
verified

stefan-it commited on

feat: use RoBERTa tokenizer to (hopefully) fix some tokenization problems for token classification tasks
8962aeb
verified

stefan-it commited on

Set tokenizer "model_max_length" property to 8192 (#9)
45bb465
verified

bwarner NohTow commited on

undo last commit
4bbcbf4
verified

bclavie commited on

Add `"add_prefix_space": true,`; this allows for much stronger token-level performance (e.g. NER, ColBERT) (#10)
d1d612e
verified

bclavie tomaarsen HF Staff commited on

Update tokenizer: Set lstrip=True for [MASK]
b1cadbc

Tom Aarsen commited on

Upload ModernBERT-large model
dca61cc

Tom Aarsen commited on