--- library_name: peft license: apache-2.0 base_model: google/gemma-4-31B-it tags: - axolotl - base_model:adapter:google/gemma-4-31B-it - lora - transformers datasets: - ConicCat/Mura_Books pipeline_tag: text-generation model-index: - name: Writer-Stage-2 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.16.0.dev0` ```yaml base_model: google/gemma-4-31B-it load_in_8bit: false load_in_4bit: false plugins: - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin - axolotl.integrations.liger.LigerPlugin torch_compile: false liger_layer_norm: true liger_rope: true liger_rms_norm: true liger_glu_activation: true liger_rms_norm_gated: true strict: false sequence_len: 2048 max_sample_length: 2048 flash_attention: false sdp_attention: true sample_packing: true gradient_checkpointing: true activation_offloading: true bf16: true tf32: true lora_mlp_kernel: false lora_qkv_kernel: false lora_o_kernel: false datasets: - path: ConicCat/Mura_Books type: chat_template chat_template_jinja: > {%- macro strip_thinking(text) -%} {%- set ns = namespace(result='') -%} {%- for part in text.split('') -%} {%- if '<|channel>' in part -%} {%- set ns.result = ns.result + part.split('<|channel>')[0] -%} {%- else -%} {%- set ns.result = ns.result + part -%} {%- endif -%} {%- endfor -%} {{- ns.result | trim -}} {%- endmacro -%} {%- set loop_messages = messages -%} {{ bos_token }} {#- Handle System Definitions Block -#} {%- if (enable_thinking is defined and enable_thinking) or messages[0]['role'] in ['system', 'developer'] -%} {{- '<|turn>system\n' -}} {#- Inject Thinking token at the very top of the FIRST system turn -#} {%- if enable_thinking is defined and enable_thinking -%} {{- '<|think|>' -}} {%- endif -%} {%- if messages[0]['role'] in ['system', 'developer'] -%} {{- messages[0]['content'] | trim -}} {%- set loop_messages = messages[1:] -%} {%- endif -%} {{- '\n' -}} {%- endif %} {#- Loop through messages -#} {%- for message in loop_messages -%} {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%} {{- '<|turn>' + role + '\n' -}} {#- Flag to identify the final SFT turn -#} {%- set is_final_sft_turn = loop.last and not add_generation_prompt -%} {%- if message['content'] is string -%} {%- if role == 'model' -%} {%- if is_final_sft_turn and '<|channel>thought' not in message['content'] -%} {{- '<|channel>thought\n' -}} {%- endif -%} {{- strip_thinking(message['content']) -}} {%- else -%} {{- message['content'] | trim -}} {%- endif -%} {%- elif message['content'] is sequence -%} {%- set ns = namespace(has_thinking=false) -%} {%- for item in message['content'] -%} {%- if item['type'] == 'text' and '<|channel>thought' in item['text'] -%} {%- set ns.has_thinking = true -%} {%- endif -%} {%- endfor -%} {%- if role == 'model' and is_final_sft_turn and not ns.has_thinking -%} {{- '<|channel>thought\n' -}} {%- endif -%} {%- for item in message['content'] -%} {%- if item['type'] == 'text' -%} {%- if role == 'model' -%} {{- strip_thinking(item['text']) -}} {%- else -%} {{- item['text'] | trim -}} {%- endif -%} {%- endif -%} {%- endfor -%} {%- endif -%} {{- '\n' -}} {%- endfor -%} {#- Generation Prompt handled as normal (serves as the final turn when true) -#} {%- if add_generation_prompt -%} {{- '<|turn>model\n' -}} {%- if not enable_thinking | default(false) -%} {{- '<|channel>thought\n' -}} {%- endif -%} {%- endif -%} adapter: lora lora_r: 32 lora_alpha: 64 lora_dropout: 0.0 lora_bias: None lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj' use_tensorboard: true optimizer: paged_adamw_8bit learning_rate: 2.5e-5 # 1e-4 / 4 loraplus_lr_ratio: 16 # Training arguments output_dir: ./Writer-Stage-2 num_epochs: 11 micro_batch_size: 2 gradient_accumulation_steps: 4 save_strategy: 'no' warmup_ratio: 0.05 lr_scheduler: 'cosine' max_grad_norm: 1 logging_steps: 1 seed: 42 eot_tokens: - "" push_dataset_to_hub: ConicCat/Gemma4-Mura hf_use_auth_token: true ```

# Writer-Stage-2 This model is a fine-tuned version of [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) on the ConicCat/Mura_Books dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2.5e-05 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 12 - training_steps: 242 ### Training results ### Framework versions - PEFT 0.19.1 - Transformers 5.5.0 - Pytorch 2.8.0+cu128 - Datasets 4.5.0 - Tokenizers 0.22.2