Pre-v5 update for the tokeniser (training date pushed to the 25th) 794cf97 crossroderick commited on Apr 24, 2025
Including changes for the upcoming inclusion of validation metrics 18cf0a2 crossroderick commited on Apr 24, 2025
Removed NFD and StripAccents from the tokeniser training process f93a822 crossroderick commited on Apr 23, 2025
Added more info to the "fine-tuning instructions" section d727d23 crossroderick commited on Apr 23, 2025
Updated the readme and get_data.sh, and added a requirements file 6cbc4c0 crossroderick commited on Apr 19, 2025