Fix tokenization: add simple_tok parameter (default=True) to match original errant script a2eb45d verified marksverdhei commited on Dec 4, 2025