Instructions to use ModelSpace/GemmaX2-28-9B-Pretrain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModelSpace/GemmaX2-28-9B-Pretrain with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="ModelSpace/GemmaX2-28-9B-Pretrain")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ModelSpace/GemmaX2-28-9B-Pretrain") model = AutoModelForCausalLM.from_pretrained("ModelSpace/GemmaX2-28-9B-Pretrain") - Notebooks
- Google Colab
- Kaggle
9B CPT with Llama-Factory - Poor Results.
Hey team,
I've experimented with this model and found it to be quite flexible for performing SFT.
What I'd like to do is perform CPT using some of my own data and then set up SFT instruction training afterwards.
However, I'm currently facing an issue where any form of CPT on this model results in the model becoming significantly worse than the base version. This doesn't happen with the 2B model.
I've used Llama-Factory for CPT and have a dataset of around 4.5 million examples following the format shown in the GitHub repository, using only one language pair. https://github.com/xiaomi-research/gemmax/blob/main/examples/cpt.json
Are there any known issues or quirks with CPT on this model that I might be missing?