9B CPT with Llama-Factory - Poor Results.
#3
by
AwkwardUnicorn
- opened
Hey team,
I've experimented with this model and found it to be quite flexible for performing SFT.
What I'd like to do is perform CPT using some of my own data and then set up SFT instruction training afterwards.
However, I'm currently facing an issue where any form of CPT on this model results in the model becoming significantly worse than the base version. This doesn't happen with the 2B model.
I've used Llama-Factory for CPT and have a dataset of around 4.5 million examples following the format shown in the GitHub repository, using only one language pair. https://github.com/xiaomi-research/gemmax/blob/main/examples/cpt.json
Are there any known issues or quirks with CPT on this model that I might be missing?