Uploaded model

  • Developed by: qingy2024
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-14b-bnb-4bit

Trained on qingy2024/metamathqa-30k for 500 steps.

Parameter Value Description
per_device_train_batch_size 4 Number of samples per batch on each device during training.
gradient_accumulation_steps 3 Number of steps to accumulate gradients before updating.
warmup_steps 5 Number of steps for learning rate warmup.
max_steps 500 Total number of training steps.
learning_rate 2e-4 Initial learning rate for training.
logging_steps 1 Frequency of logging updates (in steps).
optim adamw_8bit Optimizer used for training.
weight_decay 0.01 Weight decay regularization coefficient.
lr_scheduler_type cosine Type of learning rate scheduler.
seed 3407 Random seed for reproducibility.
packing True Enables sequence packing for faster training.
max_seq_length 2048 Maximum token length per sequence.
dataset_num_proc` 2 Number of processes for dataset preparation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support