it might be better to try knowledge distillation
#1
by
Alignment-Lab-AI - opened
from the mtp model into pythia over a corpus of very long text, then initializing the weights from pythia into the storywriter architecture.
Alignment-Lab-AI changed discussion status to
closed