Gaslit-LoRA / README.md
MAWNIPULATOR's picture
Upload folder using huggingface_hub
6a6940d verified
metadata
library_name: peft
license: mit
base_model: ArliAI/GLM-4.5-Air-Derestricted
tags:
  - axolotl
  - base_model:adapter:ArliAI/GLM-4.5-Air-Derestricted
  - lora
  - transformers
datasets:
  - data/data.jsonl
pipeline_tag: text-generation
model-index:
  - name: output
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

# Weights and Biases logging config
wandb_project: Test Run _Q3
wandb_name: "0.1"

# Model architecture config
base_model: ArliAI/GLM-4.5-Air-Derestricted
model_type: AutoModelForCausalLM


# Model checkpointing config
output_dir: ./output
saves_per_epoch: 1
save_safetensors: true
save_total_limit: 1

# Mixed precision training config
bf16: true
fp16: false
tf32: false

# Model loading config
load_in_8bit: true
load_in_4bit: false
strict: false

# Sequence config
sequence_len: 8192
s2_attention: false
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
train_on_inputs: false
group_by_length: false

# QLoRA adapter config
adapter: lora
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
peft_use_dora: false
lora_target_modules:
    - gate_proj
    - down_proj
    - up_proj
    - q_proj
    - v_proj
    - k_proj
    - o_proj

# Dataset config
datasets:
  - path: data/data.jsonl
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content

# Training hyperparameters
num_epochs: 2
gradient_accumulation_steps: 2
micro_batch_size: 2
eval_batch_size: 1
warmup_steps: 500
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1e-5
loraplus_lr_ratio: 8
cosine_min_lr_ratio: 0.1
weight_decay: 0.1
max_grad_norm: 1
logging_steps: 10

# === Plugins ===
plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
cut_cross_entropy: true

# Model optimization
gradient_checkpointing: offload
xformers_attention: false
flash_attention: true
sdp_attention: false

# Loss monitoring config
early_stopping_patience: false
loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3

# Debug config
debug: false
seed: 42

deepspeed: deepspeed_configs/zero2.json

output

This model is a fine-tuned version of ArliAI/GLM-4.5-Air-Derestricted on the data/data.jsonl dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • training_steps: 48

Training results

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1