dalat5 / src

Commit History

Fix with the correct model files
96f0b49

crossroderick commited on

Major (v5) training update
bdd4daa

crossroderick commited on

Pre-v5 update for the tokeniser (training date pushed to the 25th)
794cf97

crossroderick commited on

Including changes for the upcoming inclusion of validation metrics
18cf0a2

crossroderick commited on

Readme and tokeniser update
8a2143a

crossroderick commited on

Removed unnecessary imports
8dc2b55

crossroderick commited on

Removed NFD and StripAccents from the tokeniser training process
f93a822

crossroderick commited on

Addition of a new tokeniser (pre-v5)
178501c

crossroderick commited on

Fourth iteration with 1.9 million training records
9fea118

crossroderick commited on

Pre-v4 readme and support files update
252a85f

crossroderick commited on

Major update with 1.6 million training records
a48965a

crossroderick commited on

Updated the readme and get_data.sh, and added a requirements file
6cbc4c0

crossroderick commited on

Delete src/data/clean_corpus.jsonl
c37d421
verified

crossroderick commited on

Delete src/data/kkwiki-latest-pages-articles.xml.bz2
bc6b470
verified

crossroderick commited on

Delete src/data/kazakh_latin_corpus.jsonl
d145d71
verified

crossroderick commited on

Minor update to the "get_data.sh" file
e1a03df

crossroderick commited on

Minor update to the get_data.sh file
42fea0f

crossroderick commited on

Training update with more data and 2 epochs
03c9e83

crossroderick commited on

Fixed character mapping, training with 8 epochs
508f442

crossroderick commited on

Model training update with 13 epochs
70fdfe0

crossroderick commited on

Model training update with 10 epochs
d17d151

crossroderick commited on

Model training update with 5 epochs
5c07823

crossroderick commited on

Upload folder using huggingface_hub
cb301d1
verified

crossroderick commited on