| | --- |
| | base_model: zai-org/GLM-4.7-Flash |
| | library_name: peft |
| | model_name: output-2 |
| | tags: |
| | - base_model:adapter:zai-org/GLM-4.7-Flash |
| | - lora |
| | - sft |
| | - transformers |
| | - trl |
| | licence: license |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # output-2 |
| |
|
| | This model is a fine-tuned version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash). |
| |
|
| | **W&B run:** [https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v](https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v) |
| | ## Training procedure |
| |
|
| | ### Hyperparameters |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Learning rate | `2e-05` | |
| | | LR scheduler | SchedulerType.COSINE | |
| | | Per-device batch size | 1 | |
| | | Gradient accumulation | 4 | |
| | | Effective batch size | 4 | |
| | | Epochs | 2 | |
| | | Max sequence length | 8192 | |
| | | Optimizer | OptimizerNames.ADAMW_TORCH_FUSED | |
| | | Weight decay | 0.01 | |
| | | Warmup ratio | 0.03 | |
| | | Max gradient norm | 1.0 | |
| | | Precision | bf16 | |
| | | Gradient checkpointing | yes | |
| | | Loss type | nll | |
| | | Chunked cross-entropy | yes | |
| |
|
| |
|
| | ### LoRA configuration |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Rank (r) | 32 | |
| | | Alpha | 16 | |
| | | Target modules | kv_a_proj_with_mqa, kv_b_proj, mlp.down_proj, mlp.gate_proj, mlp.up_proj, o_proj, q_a_proj, q_b_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert.up_proj | |
| | | rsLoRA | yes | |
| |
|
| |
|
| | ### Dataset statistics |
| |
|
| | | Dataset | Samples | Total tokens | Trainable tokens | |
| | |---------|--------:|-------------:|-----------------:| |
| | | rpDungeon/some-revised-datasets/springdragon_processed.jsonl | 2,473 | 5,421,492 | 5,421,492 | |
| | |
| | |
| | <details> |
| | <summary>Training config</summary> |
| | |
| | ```yaml |
| | model_name_or_path: zai-org/GLM-4.7-Flash |
| | data_config: data.yaml |
| | prepared_dataset: prepared |
| | output_dir: output-2 |
| | attn_implementation: flash_attention_2 |
| | bf16: true |
| | gradient_checkpointing: true |
| | gradient_checkpointing_kwargs: |
| | use_reentrant: false |
| | use_cce: true |
| | padding_free: false |
| | dataloader_num_workers: 4 |
| | dataloader_pin_memory: true |
| | aux_loss_top_prob_weight: 0.05 |
| | neftune_noise_alpha: 5 |
| | max_length: 8192 |
| | per_device_train_batch_size: 1 |
| | gradient_accumulation_steps: 4 |
| | truncation_strategy: split |
| | use_peft: true |
| | lora_r: 32 |
| | lora_alpha: 16 |
| | lora_dropout: 0.0 |
| | use_rslora: true |
| | lora_target_modules: |
| | - q_a_proj |
| | - q_b_proj |
| | - kv_a_proj_with_mqa |
| | - kv_b_proj |
| | - o_proj |
| | - shared_expert.gate_proj |
| | - shared_expert.up_proj |
| | - shared_expert.down_proj |
| | - mlp.gate_proj |
| | - mlp.up_proj |
| | - mlp.down_proj |
| | model_init_kwargs: |
| | trust_remote_code: true |
| | torch_dtype: bfloat16 |
| | trust_remote_code: true |
| | optim: adamw_torch_fused |
| | learning_rate: 2.0e-05 |
| | lr_scheduler_type: cosine |
| | warmup_ratio: 0.03 |
| | weight_decay: 0.01 |
| | max_grad_norm: 1.0 |
| | num_train_epochs: 2 |
| | logging_steps: 1 |
| | disable_tqdm: false |
| | saves_per_epoch: 4 |
| | eval_strategy: 'no' |
| | save_total_limit: 3 |
| | report_to: wandb |
| | run_name: glm47-sonic-springdragon |
| | ``` |
| | |
| | </details> |
| | |
| | <details> |
| | <summary>Data config</summary> |
| | |
| | ```yaml |
| | datasets: |
| | - path: rpDungeon/some-revised-datasets |
| | data_files: springdragon_processed.jsonl |
| | type: text |
| | columns: |
| | - text |
| | truncation_strategy: split |
| | shuffle_datasets: true |
| | shuffle_combined: true |
| | shuffle_seed: 42 |
| | eval_split: 0 |
| | split_seed: 42 |
| | assistant_only_loss: false |
| | ``` |
| | |
| | </details> |
| | |
| | ### Framework versions |
| | |
| | - PEFT 0.18.1 |
| | - Loft: 0.1.0 |
| | - Transformers: 5.3.0 |
| | - Pytorch: 2.9.1 |
| | - Datasets: 4.6.1 |
| | - Tokenizers: 0.22.2 |