lets scale this further

#3
by youngbrett48 - opened

I want to scale this concept to larger context with more tasks. I have started with a basic training run at 160 latent tokens on 5k research papers for the task of summarization. I believe it needs for data and more compute (or possibly just bigger models and more latent tokens), as it hallucinates a lot. This takes around 4 hours to train on a single h100 for 1 epoch. Here's the code and the model:

https://github.com/bdytx5/context_cascade
https://huggingface.co/youngbrett48/C3-Context-Cascade-Summarization

Let me know if you would like to collaborate further to push this concept to its potential.

Brett Young

Sign up or log in to comment