| | --- |
| | library_name: peft |
| | language: |
| | - pt |
| | --- |
| | ## Training procedure |
| |
|
| |
|
| | The following `bitsandbytes` quantization config was used during training: |
| | - quant_method: bitsandbytes |
| | - load_in_8bit: False |
| | - load_in_4bit: True |
| | - llm_int8_threshold: 6.0 |
| | - llm_int8_skip_modules: None |
| | - llm_int8_enable_fp32_cpu_offload: False |
| | - llm_int8_has_fp16_weight: False |
| | - bnb_4bit_quant_type: nf4 |
| | - bnb_4bit_use_double_quant: True |
| | - bnb_4bit_compute_dtype: bfloat16 |
| | ### Framework versions |
| | |
| | |
| | - PEFT 0.4.0 |
| | |
| | This model was trained with this parameters: |
| | ``` |
| | max_seq_length = 2048 |
| | |
| | training_arguments_mistral = { |
| | 'num_train_epochs':10, |
| | 'per_device_train_batch_size':2, |
| | 'gradient_accumulation_steps':2, |
| | 'gradient_checkpointing':True, |
| | 'optim':'adamw_torch', |
| | 'lr_scheduler_type':'constant_with_warmup', |
| | 'logging_steps':10, |
| | 'evaluation_strategy':'epoch', |
| | 'save_strategy':"epoch", |
| | 'load_best_model_at_end':True, |
| | 'learning_rate':4e-4, |
| | 'save_total_limit':3, |
| | 'fp16':True, |
| | 'tf32': True, |
| | 'max_steps':8000, |
| | 'max_grad_norm':0.3, |
| | 'warmup_ratio':0.03, |
| | 'disable_tqdm':False, |
| | 'weight_decay':0.001, |
| | 'hub_model_id':'Weni/WeniGPT-Mistral-7B-instructBase-4bit', |
| | 'push_to_hub':True, |
| | 'hub_strategy':'every_save', |
| | 'hub_token':token, |
| | 'hub_private_repo':True, |
| | ``` |