754 MB

Ctrl+K

8 contributors

History: 38 commits

versae

Changed and added vocab and tokenizer

29e26bb almost 5 years ago

configs
Changed and added vocab and tokenizer almost 5 years ago
mc4
Fixes to mc4 fork almost 5 years ago
.gitattributes

736 Bytes
Update .gitattributes almost 5 years ago
.gitignore

1.84 kB
Initial test with BETO's corpus almost 5 years ago
README.md

1.84 kB
Adding checkpointing, wandb, and new mlm script almost 5 years ago
config.json

619 Bytes
Changed and added vocab and tokenizer almost 5 years ago
config.py

247 Bytes
Scripts for perplexity sampling and fixes almost 5 years ago
convert.py

469 Bytes
Adding missing import almost 5 years ago
flax_model.msgpack

250 MB
xet

New Flax model almost 5 years ago
get_embeddings_and_perplexity.py

1.53 kB
Add script to generate dataset of embeddings and perplexities. Add script to generate t-SNE plot for embedding and perplexity visualization. almost 5 years ago
merges.txt

505 kB
Changed and added vocab and tokenizer almost 5 years ago
perplexity.py

751 Bytes
Adding checkpointing, wandb, and new mlm script almost 5 years ago
pytorch_model.bin

499 MB
xet

Base model at 105k steps almost 5 years ago
run.sh

883 Bytes
Adding base config and organizing configs almost 5 years ago
run_mlm_flax.py

30 kB
Adding sampling to mc4 almost 5 years ago
run_mlm_flax_stream.py

28.4 kB
Fixes to mc4 fork almost 5 years ago
run_stream.sh

924 Bytes
Adding base config and organizing configs almost 5 years ago
special_tokens_map.json

239 Bytes
Changed and added vocab and tokenizer almost 5 years ago
tokenizer.json

1.45 MB
Changed and added vocab and tokenizer almost 5 years ago
tokenizer_config.json

292 Bytes
Changed and added vocab and tokenizer almost 5 years ago
tokens.py

649 Bytes
Scripts for perplexity sampling and fixes almost 5 years ago
tokens.py.orig

899 Bytes
Adjust batch size for extrating tokens almost 5 years ago
tsne_plot.py

3.02 kB
Remove unused imports almost 5 years ago
vocab.json

846 kB
Changed and added vocab and tokenizer almost 5 years ago