Mistral 12B — CPT (Continual Pretraining with LoRA)

Model type: Causal Language Model
Base model: mistralai/Mistral-Nemo-Instruct-2407
License: Apache 2.0
Framework: Axolotl


Overview

mistral-12b-cpt is a continual-pretrained version of the Mistral-12B Nemo Instruct model.
This CPT phase extends the model’s factual and energy domain understanding using scientific, governmental, news, and encyclopedic text.

Training was executed on the Leonardo EuroHPC system using Axolotl with DeepSpeed ZeRO-1 for efficient large-scale distributed fine-tuning.


Training Setup

Objective: Unsupervised continual pretraining (language modeling)
Adapter type: LoRA
Precision: bfloat16
Hardware: 8 nodes × 2 × NVIDIA A100 64 GB GPUs
Framework: Axolotl + DeepSpeed + PyTorch 2.5.1 + CUDA 12.1
Runtime: 24 h
Checkpoints: 5 per epoch


Dataset

Dataset Description
arxiv.jsonl Scientific and technical papers
gov.jsonl Government and policy documents
news.jsonl News articles
wiki.jsonl Wikipedia text

Hyperparameters

Parameter Value
Sequence length 2048
Micro batch size 2
Gradient accumulation 2
Epochs 10
Max steps 10000
Learning rate 0.0002
LR scheduler cosine
Optimizer AdamW (8-bit)
Warmup steps 10
Weight decay 0.0
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
LoRA targets q_proj, k_proj, v_proj, o_proj
Gradient checkpointing ✅
Flash attention ✅
Loss watchdog (threshold/patience) 5.0 / 3

Tokenizer

Tokenizer type: AutoTokenizer
Pad token: <|end_of_text|>

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ubitech-edg/mistral-12b-cpt

Adapter
(57)
this model
Adapters
1 model

Dataset used to train ubitech-edg/mistral-12b-cpt