Scale up?
#1
by Datdanboi25 - opened
Very cool Idea, would love to see what a 100m or so parameter version could come up with.
I think it would memorize full entire words instead of making ones up.
The only reason I haven't scaled up further is because I don't have enough individual, unique English and Spanish words to train a model of that size without overfitting.
Also, if I'm not mistaken, you're the one who created GPT-X and GPT-X2, right? Well.. they're really cool! Keep up the good work!
Haha true, could mess around with adding some auxiliary loss for if it ends up predicting a real word.
Also appreciate it, same to you, looking forward to whats next!