--- library_name: peft license: gemma base_model: google/gemma-2-9b-it tags: - generated_from_trainer datasets: - frjonah/training_data5 model-index: - name: outputs/test3 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.10.0.dev0` ```yaml # axolotl preprocess config.yaml adapter: lora base_model: google/gemma-2-9b-it bf16: auto dataset_processes: 32 datasets: - path: frjonah/training_data5 type: system_prompt: "" field_system: system field_instruction: prompt field_output: completion format: "[INST] {instruction} [/INST]" no_input_format: "[INST] {instruction} [/INST]" resize_token_embeddings_to_32x: false add_special_tokens: false special_tokens: pad_token: null eos_token: null bos_token: null unk_token: null gradient_accumulation_steps: 1 gradient_checkpointing: true learning_rate: 0.00002 lisa_layers_attribute: model.layers load_best_model_at_end: false load_in_4bit: false load_in_8bit: true lora_alpha: 256 lora_dropout: 0.1 lora_r: 128 lora_target_modules: - q_proj - v_proj - k_proj - o_proj - gate_proj - down_proj - up_proj loraplus_lr_embedding: 1.0e-06 lr_scheduler: cosine max_prompt_len: 512 mean_resizing_embeddings: false micro_batch_size: 16 num_epochs: 30.0 optimizer: adamw_bnb_8bit output_dir: ./outputs/test3 pretrain_multipack_attn: true pretrain_multipack_buffer_size: 10000 qlora_sharded_model_loading: false ray_num_workers: 1 resources_per_worker: GPU: 1 sample_packing_bin_size: 200 sample_packing_group_size: 100000 save_only_model: false save_safetensors: true sequence_len: 2048 shuffle_merged_datasets: true skip_prepare_dataset: false strict: false train_on_inputs: false trl: log_completions: false ref_model_mixup_alpha: 0.9 ref_model_sync_steps: 64 sync_ref_model: false use_vllm: false vllm_device: auto vllm_dtype: auto vllm_gpu_memory_utilization: 0.9 use_ray: false val_set_size: 0.0 weight_decay: 0.01 ```

# outputs/test3 This model is a fine-tuned version of [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on the frjonah/training_data5 dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 12 - training_steps: 402 ### Training results ### Framework versions - PEFT 0.15.2 - Transformers 4.52.3 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1