Awesome

#1
by Datdanboi25 - opened

Cool implementation, could prob get away with a higher learning rate and more depth over width (would also help with tokenizer size).
Whats ur activation?

SupraLabs org

Hey thanks!
I use standard HF Transformers with Llama architecture.
See the full code in this repos' files list as train.py πŸ˜ƒ

Ahh sweet, would probably be worth dropping ur intermediate dim multipler to 2.67-3x and investing further into depth, swiglu at 4x is pretty overweighted

Also big fan of the little competition going on, cool to see some action in the small model community, almost tempted to train a competitor haha. What parameter limit are you guys using?

SupraLabs org

Thank! Good tips! :-)
Of course you can train a competitor. I think ~3M should be the top limit. But 0.5M - 2M is more likely :-)
Have fun :D

SupraLabs org

or join us, lol...if you want. wanna?

AxionLab-official changed discussion status to closed

I appreciate the invite but currently got my hands full with axiomic labs and getting the next gen GPT-X model out.

Keep up the great work tho!

Competitor in the works!

lol this is amazing

Oh Harley you joined, Sweet!

SupraLabs org

Do you also want to join? come on, PLEASE! ❀️

LH-Tech-AI changed discussion status to open

yo he already said no.

Harley-ml changed discussion status to closed
Harley-ml changed discussion status to open

yo they already said no. they are also a new competitor, if im not mistaken

what?! nah hf broken or some shi

LH-Tech-AI changed discussion status to closed

Haha nah Ill think abt it, would like to get the next few versions of gpt-x out so cant exactly promise ill contribute very much in the short term

ok!

Oh also you guys got discord or smth? I feel like there should be a better way to dm then a hf discussion haha lol

Yes. My discord is "harleyml."

SupraLabs org

yeah, my discord name is "lh_tech_ai".

LH-Tech-AI changed discussion status to open
AxionLab-official changed discussion status to closed

Sign up or log in to comment