9B CPT with Llama-Factory - Poor Results.

by AwkwardUnicorn - opened Sep 22, 2025

Sep 22, 2025

Hey team,

I've experimented with this model and found it to be quite flexible for performing SFT.

What I'd like to do is perform CPT using some of my own data and then set up SFT instruction training afterwards.

However, I'm currently facing an issue where any form of CPT on this model results in the model becoming significantly worse than the base version. This doesn't happen with the 2B model.

I've used Llama-Factory for CPT and have a dataset of around 4.5 million examples following the format shown in the GitHub repository, using only one language pair. https://github.com/xiaomi-research/gemmax/blob/main/examples/cpt.json

Are there any known issues or quirks with CPT on this model that I might be missing?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment