This isn’t actually doing self-supervised curriculum learning.
What the model is doing is estimating how difficult a sequence is using its own perplexity, and then using that signal to decide how many recursion steps to run. Which isn't self-supervised curriculum learning.
So it’s basically adjusting the amount of compute based on difficulty. I’d call that adaptive compute, not self-supervised curriculum learning. In a true self-supervised curriculum, the training progression itself changes. For example the model gradually moves from easier samples to harder ones over time. That isn’t happening here. 😉
you're completely right, my bad on the terminology.
it's adaptive compute — using the model's own perplexity to allocate recursion depth per input, not curriculum learning in the training progression sense.Thanks for catching that 🙏