nanoT5-base-65kBPE-v2
This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
- SiLU/gated-SiLU activation
- 25% mask rate during pretrain
- 65k vocab size, adapted claude3 tokenizer
training code: https://github.com/pszemraj/nanoT5/tree/any-tokenizer
plots
more details are under checkpoints/
loss
gradients
weights
- Downloads last month
- 7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support


