Pre-v5 update for the tokeniser (training date pushed to the 25th) 794cf97 crossroderick commited on Apr 24, 2025
Including changes for the upcoming inclusion of validation metrics 18cf0a2 crossroderick commited on Apr 24, 2025
Removed NFD and StripAccents from the tokeniser training process f93a822 crossroderick commited on Apr 23, 2025
Updated the readme and get_data.sh, and added a requirements file 6cbc4c0 crossroderick commited on Apr 19, 2025
Delete src/data/kkwiki-latest-pages-articles.xml.bz2 bc6b470 verified crossroderick commited on Apr 18, 2025