README.md · Menouar/pygemma at main

pygemma / README.md

Menouar

Update README.md

afd5a03 verified almost 2 years ago

preview code

raw

history blame contribute delete

4.88 kB

	---
	tags:
	- generated_from_trainer
	- google/gemma
	- PyTorch
	- transformers
	- trl
	- peft
	- tensorboard
	model-index:
	- name: pygemma
	results: []
	datasets:
	- iamtarun/python_code_instructions_18k_alpaca
	license_name: gemma-terms-of-use
	license_link: https://ai.google.dev/gemma/terms
	language:
	- en
	base_model: google/gemma-2b
	widget:
	- example_title: Compute Sum
	messages:
	- role: system
	content: >-
	Welcome to PyGemma, your AI-powered Python assistant. I'm here to help you
	answer common questions about the Python programming language. Let's dive
	into Python!
	- role: user
	content: Create a function to calculate the sum of a sequence of integers.
	pipeline_tag: text-generation
	license: other
	---

	# Model Card for pygemma:

	pygemma is a language model that is trained to act as Python assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset
	[iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca).


	## Training hyperparameters

	The following hyperparameters were used during the training:


	- output_dir: peft-lora-pygemma

	- overwrite_output_dir: True

	- do_train: False

	- do_eval: False

	- do_predict: False

	- evaluation_strategy: no

	- prediction_loss_only: False

	- per_device_train_batch_size: 2

	- per_device_eval_batch_size: None

	- per_gpu_train_batch_size: None

	- per_gpu_eval_batch_size: None

	- gradient_accumulation_steps: 4

	- eval_accumulation_steps: None

	- eval_delay: 0

	- learning_rate: 2e-05

	- weight_decay: 0.0

	- adam_beta1: 0.9

	- adam_beta2: 0.999

	- adam_epsilon: 1e-08

	- max_grad_norm: 0.3

	- num_train_epochs: 3

	- max_steps: -1

	- lr_scheduler_type: cosine

	- lr_scheduler_kwargs: {}

	- warmup_ratio: 0.1

	- warmup_steps: 0

	- log_level: passive

	- log_level_replica: warning

	- log_on_each_node: True

	- logging_dir: peft-lora-pygemma/runs/Mar13_16-30-02_e65672b6422a

	- logging_strategy: steps

	- logging_first_step: False

	- logging_steps: 10

	- logging_nan_inf_filter: True

	- save_strategy: epoch

	- save_steps: 500

	- save_total_limit: None

	- save_safetensors: True

	- save_on_each_node: False

	- save_only_model: False

	- no_cuda: False

	- use_cpu: False

	- use_mps_device: False

	- seed: 42

	- data_seed: None

	- jit_mode_eval: False

	- use_ipex: False

	- bf16: True

	- fp16: False

	- fp16_opt_level: O1

	- half_precision_backend: auto

	- bf16_full_eval: False

	- fp16_full_eval: False

	- tf32: None

	- local_rank: 0

	- ddp_backend: None

	- tpu_num_cores: None

	- tpu_metrics_debug: False

	- debug: []

	- dataloader_drop_last: False

	- eval_steps: None

	- dataloader_num_workers: 0

	- dataloader_prefetch_factor: None

	- past_index: -1

	- run_name: peft-lora-pygemma

	- disable_tqdm: False

	- remove_unused_columns: True

	- label_names: None

	- load_best_model_at_end: False

	- metric_for_best_model: None

	- greater_is_better: None

	- ignore_data_skip: False

	- fsdp: []

	- fsdp_min_num_params: 0

	- fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}

	- fsdp_transformer_layer_cls_to_wrap: None

	- accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True)

	- deepspeed: None

	- label_smoothing_factor: 0.0

	- optim: adamw_torch_fused

	- optim_args: None

	- adafactor: False

	- group_by_length: False

	- length_column_name: length

	- report_to: ['tensorboard']

	- ddp_find_unused_parameters: None

	- ddp_bucket_cap_mb: None

	- ddp_broadcast_buffers: None

	- dataloader_pin_memory: True

	- dataloader_persistent_workers: False

	- skip_memory_metrics: True

	- use_legacy_prediction_loop: False

	- push_to_hub: False

	- resume_from_checkpoint: None

	- hub_model_id: None

	- hub_strategy: every_save

	- hub_token: None

	- hub_private_repo: False

	- hub_always_push: False

	- gradient_checkpointing: True

	- gradient_checkpointing_kwargs: {'use_reentrant': False}

	- include_inputs_for_metrics: False

	- fp16_backend: auto

	- push_to_hub_model_id: None

	- push_to_hub_organization: None

	- push_to_hub_token: None

	- mp_parameters:

	- auto_find_batch_size: False

	- full_determinism: False

	- torchdynamo: None

	- ray_scope: last

	- ddp_timeout: 1800

	- torch_compile: False

	- torch_compile_backend: None

	- torch_compile_mode: None

	- dispatch_batches: None

	- split_batches: None

	- include_tokens_per_second: False

	- include_num_input_tokens_seen: False

	- neftune_noise_alpha: None

	- distributed_state: Distributed environment: NO
	Num processes: 1
	Process index: 0
	Local process index: 0
	Device: cuda


	- _n_gpu: 1

	- __cached__setup_devices: cuda:0

	- deepspeed_plugin: None