Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Sunny111ย 
posted an update 12 days ago
Post
1601
Are you familiar with reverse residual connections or looping in language models?

Excited to share my Looped-GPT blog post and codebase ๐Ÿš€
https://github.com/sanyalsunny111/Looped-GPT

TL;DR: looping during pre-training improves generalization.

Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens

P.S. This is my first post here โ€” I have ~4 followers and zero expectations for reach ๐Ÿ˜„

What am I looking at here?

Hi, nice work and interesting result :).
Did you compare these with a training on x2 and x4 epoch on a baseline model to benchmark the deviation of a "standard" method?

ยท

not yet but I will.