Improvment (I hope) is on its way!

#1
by coolpoodle - opened

Retraining with 1024 context and unfreezing layer norms (in hope that features are more stable).

Side note, training script is on its way after this run finishes. <3

Side note, training script is on its way after this run finishes. <3

Training script posted!.

The second training run didn't go as well as I hoped but I know I could tune learning parameters to be better, I wanted to get a quick idea of how much context / layer norms contributed to learning.

More or less the same improvements from the initial .pt upload but 10x the gate projection size...

Went from 243Kb to 343MB...

There is gotta be something deeper there.

Anyways, im exhausted and I need to go to bed.

There is so much to do with this and im super excited I just hope maybe I can find someone who wants to experiment with me, so if you are out there leave a comment!

Sign up or log in to comment