training_arguments = TrainingArguments(
    output_dir= "./results",
    num_train_epochs= 6,
    per_device_train_batch_size= 64,
    gradient_accumulation_steps= 16,
    optim = "paged_adamw_8bit",
    save_steps= 3,
    logging_steps= 1,
    learning_rate= 4e-4,
    weight_decay= 0.001,
    fp16= False,
    bf16= False,
    max_grad_norm= 0.3,
    max_steps= -1,
    warmup_ratio= 0.3,
    group_by_length= True,
    lr_scheduler_type= "linear",
    report_to="wandb",
)
Run history:

train/epoch	β–β–β–β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–…β–…β–…β–…β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆ
train/global_step	β–β–β–β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–…β–…β–…β–…β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆ
train/learning_rate	β–‚β–ƒβ–ƒβ–„β–…β–†β–†β–‡β–ˆβ–ˆβ–‡β–‡β–‡β–†β–†β–†β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–β–
train/loss	β–ˆβ–ˆβ–‡β–‡β–†β–…β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–
train/total_flos	▁
train/train_loss	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁

Run summary:

train/epoch	5.16
train/global_step	30
train/learning_rate	0.0
train/loss	1.1752
train/total_flos	7297433452388352.0
train/train_loss	1.49426
train/train_runtime	2373.0384
train/train_samples_per_second	14.971
train/train_steps_per_second	0.013
Downloads last month
3
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support