See axolotl config

axolotl version: 0.11.0.dev0

# === Model Configuration ===
base_model: arcee-ai/GLM-4-32B-Base-32K
load_in_8bit: false
load_in_4bit: true

# === HF Configuration === 
hub_model_id: ToastyPigeon/glm-books-qlora-2-2ep
hub_strategy: "checkpoint"

# === Training Setup ===
num_epochs: 2
micro_batch_size: 1
gradient_accumulation_steps: 8
sequence_len: 8192
#sequence_parallel_degree: 2
#heads_k_stride: 1
sample_packing: true
pad_to_sequence_len: true
#max_steps: 10
# === Evaluation ===
val_set_size: 0.01
evals_per_epoch: 10
#eval_steps: 20
#max_steps: 60
#eval_table_size:
eval_max_new_tokens: 128
eval_sample_packing: false
#eval_strategy: "no"

# === LoRA Configuration ===
adapter: qlora
lora_model_dir:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
peft_use_rslora: false
lora_modules_to_save:
#  - embed_tokens
#  - lm_head
#fix_untrained_tokens: true
#lora_mlp_kernel: true
#lora_qkv_kernel: true
#lora_o_kernel: true

# === Hyperparameter Configuration ===
#optimizer: apollo_adamw_layerwise
warmup_steps: 0
optimizer: adamw_torch_fused
#optimizer: paged_adamw_8bit
#optim_args:
#  enable_stochastic_rounding: true
#  enable_cautious: true
#  enable_8bit: true
# Apollo-mini configuration:
#optim_args: "proj=random,rank=128,scale=128.0,scale_type=tensor,update_proj_gap=100"
# Regular Apollo configuration:
# optim_args: 
#optim_target_modules: all_linear
learning_rate: 5e-5
lr_scheduler: rex
#cosine_min_lr_ratio: 0.2
#lr_scheduler: cosine_with_min_lr
#lr_scheduler_kwargs:
#  cosine_min_lr: 1e-6
weight_decay: 0.0001
max_grad_norm: 2.0
#warmup_steps: 0
#warmup_ratio: 0.025


# === Data Configuration ===
#chat_template: jinja
#chat_template_jinja: "{%- set default_system_message = \"You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You obediently fulfill the user's requests.\" %}\n\n{{- bos_token }}\n\n{%- if messages[0]['role'] == 'system' %}\n    {%- if messages[0]['content'] is string %}\n        {%- set system_message = messages[0]['content'] %}\n    {%- else %}\n        {%- set system_message = messages[0]['content'][0]['text'] %}\n    {%- endif %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = default_system_message %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}\n\n{%- for message in loop_messages %}\n    {%- if message['role'] == 'user' %}\n        {%- if message['content'] is string %}\n            {{- '[INST]' + message['content'] + '[/INST]' }}\n        {%- else %}\n            {{- '[INST]' }}\n            {%- for bl (line truncated to 1000 characters)
#chat_template: chatml
#special_tokens:
#  pad_token: "<pad>"
#  eos_token: "<|im_end|>"
#tokenizer_use_mistral_common: true
shuffle_merged_datasets: true
datasets:
#  - path: ToastyPigeon/cowriter-instruct
#    type: chat_template
#    chat_template: chatml
#    data_files: cowriter-4k.json
  - path: ToastyPigeon/steve-and-marvin
    type: completion
    data_files: marvin.json
#  - path: allura-org/EU01-S2
#    type: chat_template
#    chat_template: chatml
#    field_messages: conversations
#    message_property_mappings:
#      role: from
#      content: value
#  - path: ToastyPigeon/gutenberg-sft
#    type: chat_template
#    chat_template: chatml
#    field_messages: conversations
#    message_property_mappings:
#      role: from
#      content: value

dataset_prepared_path: last_run_prepared


# === Plugins ===
plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

# === Hardware Optimization ===
#gradient_checkpointing: offload
#gradient_checkpointing_kwargs:
#  use_reentrant: false
liger_rope: false
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
#liger_fused_linear_cross_entropy: true
cut_cross_entropy: true

#deepspeed: /workspace/axolotl/deepspeed_configs/zero3.json

# === FSDP Config === 
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: true
  fsdp_activation_checkpointing: true
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: Glm4DecoderLayer
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sharding_strategy: FULL_SHARD
# === Wandb Tracking ===
wandb_project: GLM
# wandb_entity: [WANDB_ENTITY]
# wandb_name: [WANDB_RUN_NAME]

# === Checkpointing ===
saves_per_epoch: 10
save_total_limit: 1

# === Advanced Settings ===
output_dir: /workspace/aibox-standalone-pool/axolotl/glm-tulu-ckpts
bf16: auto
flash_attention: true
train_on_inputs: false
group_by_length: false
save_safetensors: true
logging_steps: 1
gc_steps: 10
seed: 69

glm-books-qlora-2-2ep

This model is a fine-tuned version of arcee-ai/GLM-4-32B-Base-32K on the ToastyPigeon/steve-and-marvin dataset. It achieves the following results on the evaluation set:

Loss: 2.5593

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 69
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 16
total_eval_batch_size: 2
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 362

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	2.6252
2.7208	0.1045	19	2.6070
2.6175	0.2089	38	2.6019
2.6084	0.3134	57	2.5969
2.4412	0.4179	76	2.5929
2.6179	0.5223	95	2.5894
2.5022	0.6268	114	2.5850
2.4762	0.7313	133	2.5823
2.5463	0.8357	152	2.5785
2.607	0.9402	171	2.5762
2.5752	1.0440	190	2.5740
2.6927	1.1485	209	2.5722
2.4706	1.2529	228	2.5697
2.4016	1.3574	247	2.5685
2.5024	1.4619	266	2.5671
2.5391	1.5663	285	2.5655
2.5241	1.6708	304	2.5637
2.5926	1.7753	323	2.5629
2.5049	1.8797	342	2.5605
2.5426	1.9842	361	2.5593

Framework versions

PEFT 0.15.2
Transformers 4.52.4
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 59

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ToastyPigeon/glm-books-qlora-2-2ep

Base model

zai-org/GLM-4-32B-Base-0414

Finetuned

arcee-ai/GLM-4-32B-Base-32K

Adapter

(1)

this model