lets scale this further
#3
by
youngbrett48
- opened
I want to scale this concept to larger context with more tasks. I have started with a basic training run at 160 latent tokens on 5k research papers for the task of summarization. I believe it needs for data and more compute (or possibly just bigger models and more latent tokens), as it hallucinates a lot. This takes around 4 hours to train on a single h100 for 1 epoch. Here's the code and the model:
https://github.com/bdytx5/context_cascade
https://huggingface.co/youngbrett48/C3-Context-Cascade-Summarization
Let me know if you would like to collaborate further to push this concept to its potential.
Brett Young