Curriculum learning by length
#38
by
jbakerx - opened
Start training with shorter sequences (512/1024) then move to 2048.
This often stabilizes training and improves coherence, especially CPU-only
jbakerx changed discussion title from
Better segmentation: train on scenes, not entire books
to Curriculum learning by length
We will consider this enhancement for inclusion in version 2.0.0.