File size: 400 Bytes
ca4fc4d
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
# Increasing Speed 

* Integrate Flash Attention 2.0 cuda, significant speed up

* Utilize 8BIT Optimizer from BNB, big speed up weakness => bnb isn't compatible with all gpus

* Use a better tokenizer TokenMonster?

* Parallelize the transformer blocks similar to that of [PALMS](https://github.com/conceptofmind/PaLM)

* Look into MPTS config for LION for pretraining, did they use high batch size?