| | --- |
| | tags: |
| | - generated_from_trainer |
| | - google/gemma |
| | - PyTorch |
| | - transformers |
| | - trl |
| | - peft |
| | - tensorboard |
| | model-index: |
| | - name: pygemma |
| | results: [] |
| | datasets: |
| | - iamtarun/python_code_instructions_18k_alpaca |
| | license_name: gemma-terms-of-use |
| | license_link: https://ai.google.dev/gemma/terms |
| | language: |
| | - en |
| | base_model: google/gemma-2b |
| | widget: |
| | - example_title: Compute Sum |
| | messages: |
| | - role: system |
| | content: >- |
| | Welcome to PyGemma, your AI-powered Python assistant. I'm here to help you |
| | answer common questions about the Python programming language. Let's dive |
| | into Python! |
| | - role: user |
| | content: Create a function to calculate the sum of a sequence of integers. |
| | pipeline_tag: text-generation |
| | license: other |
| | --- |
| | |
| | # Model Card for pygemma: |
| |
|
| | **pygemma** is a language model that is trained to act as Python assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset |
| | [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca). |
| |
|
| |
|
| | ## Training hyperparameters |
| |
|
| | The following hyperparameters were used during the training: |
| |
|
| |
|
| | - output_dir: peft-lora-pygemma |
| | |
| | - overwrite_output_dir: True |
| | |
| | - do_train: False |
| | |
| | - do_eval: False |
| | |
| | - do_predict: False |
| | |
| | - evaluation_strategy: no |
| | |
| | - prediction_loss_only: False |
| | |
| | - per_device_train_batch_size: 2 |
| | |
| | - per_device_eval_batch_size: None |
| | |
| | - per_gpu_train_batch_size: None |
| | |
| | - per_gpu_eval_batch_size: None |
| | |
| | - gradient_accumulation_steps: 4 |
| | |
| | - eval_accumulation_steps: None |
| | |
| | - eval_delay: 0 |
| | |
| | - learning_rate: 2e-05 |
| | |
| | - weight_decay: 0.0 |
| | |
| | - adam_beta1: 0.9 |
| | |
| | - adam_beta2: 0.999 |
| | |
| | - adam_epsilon: 1e-08 |
| | |
| | - max_grad_norm: 0.3 |
| | |
| | - num_train_epochs: 3 |
| | |
| | - max_steps: -1 |
| | |
| | - lr_scheduler_type: cosine |
| | |
| | - lr_scheduler_kwargs: {} |
| | |
| | - warmup_ratio: 0.1 |
| | |
| | - warmup_steps: 0 |
| | |
| | - log_level: passive |
| | |
| | - log_level_replica: warning |
| | |
| | - log_on_each_node: True |
| | |
| | - logging_dir: peft-lora-pygemma/runs/Mar13_16-30-02_e65672b6422a |
| | |
| | - logging_strategy: steps |
| | |
| | - logging_first_step: False |
| | |
| | - logging_steps: 10 |
| | |
| | - logging_nan_inf_filter: True |
| | |
| | - save_strategy: epoch |
| | |
| | - save_steps: 500 |
| | |
| | - save_total_limit: None |
| | |
| | - save_safetensors: True |
| | |
| | - save_on_each_node: False |
| | |
| | - save_only_model: False |
| | |
| | - no_cuda: False |
| | |
| | - use_cpu: False |
| | |
| | - use_mps_device: False |
| | |
| | - seed: 42 |
| | |
| | - data_seed: None |
| | |
| | - jit_mode_eval: False |
| | |
| | - use_ipex: False |
| | |
| | - bf16: True |
| | |
| | - fp16: False |
| | |
| | - fp16_opt_level: O1 |
| | |
| | - half_precision_backend: auto |
| | |
| | - bf16_full_eval: False |
| | |
| | - fp16_full_eval: False |
| | |
| | - tf32: None |
| | |
| | - local_rank: 0 |
| | |
| | - ddp_backend: None |
| | |
| | - tpu_num_cores: None |
| | |
| | - tpu_metrics_debug: False |
| | |
| | - debug: [] |
| | |
| | - dataloader_drop_last: False |
| | |
| | - eval_steps: None |
| | |
| | - dataloader_num_workers: 0 |
| | |
| | - dataloader_prefetch_factor: None |
| | |
| | - past_index: -1 |
| | |
| | - run_name: peft-lora-pygemma |
| | |
| | - disable_tqdm: False |
| | |
| | - remove_unused_columns: True |
| | |
| | - label_names: None |
| | |
| | - load_best_model_at_end: False |
| | |
| | - metric_for_best_model: None |
| | |
| | - greater_is_better: None |
| | |
| | - ignore_data_skip: False |
| | |
| | - fsdp: [] |
| | |
| | - fsdp_min_num_params: 0 |
| | |
| | - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
| | |
| | - fsdp_transformer_layer_cls_to_wrap: None |
| | |
| | - accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True) |
| | |
| | - deepspeed: None |
| | |
| | - label_smoothing_factor: 0.0 |
| | |
| | - optim: adamw_torch_fused |
| | |
| | - optim_args: None |
| | |
| | - adafactor: False |
| | |
| | - group_by_length: False |
| | |
| | - length_column_name: length |
| | |
| | - report_to: ['tensorboard'] |
| | |
| | - ddp_find_unused_parameters: None |
| | |
| | - ddp_bucket_cap_mb: None |
| | |
| | - ddp_broadcast_buffers: None |
| | |
| | - dataloader_pin_memory: True |
| | |
| | - dataloader_persistent_workers: False |
| | |
| | - skip_memory_metrics: True |
| | |
| | - use_legacy_prediction_loop: False |
| | |
| | - push_to_hub: False |
| | |
| | - resume_from_checkpoint: None |
| | |
| | - hub_model_id: None |
| | |
| | - hub_strategy: every_save |
| | |
| | - hub_token: None |
| | |
| | - hub_private_repo: False |
| | |
| | - hub_always_push: False |
| | |
| | - gradient_checkpointing: True |
| | |
| | - gradient_checkpointing_kwargs: {'use_reentrant': False} |
| | |
| | - include_inputs_for_metrics: False |
| | |
| | - fp16_backend: auto |
| | |
| | - push_to_hub_model_id: None |
| | |
| | - push_to_hub_organization: None |
| | |
| | - push_to_hub_token: None |
| | |
| | - mp_parameters: |
| | |
| | - auto_find_batch_size: False |
| | |
| | - full_determinism: False |
| | |
| | - torchdynamo: None |
| | |
| | - ray_scope: last |
| | |
| | - ddp_timeout: 1800 |
| | |
| | - torch_compile: False |
| | |
| | - torch_compile_backend: None |
| | |
| | - torch_compile_mode: None |
| | |
| | - dispatch_batches: None |
| | |
| | - split_batches: None |
| | |
| | - include_tokens_per_second: False |
| | |
| | - include_num_input_tokens_seen: False |
| | |
| | - neftune_noise_alpha: None |
| | |
| | - distributed_state: Distributed environment: NO |
| | Num processes: 1 |
| | Process index: 0 |
| | Local process index: 0 |
| | Device: cuda |
| |
|
| | |
| | - _n_gpu: 1 |
| | |
| | - __cached__setup_devices: cuda:0 |
| | |
| | - deepspeed_plugin: None |