--- base_model: zai-org/GLM-4.7-Flash library_name: peft model_name: output-2 tags: - base_model:adapter:zai-org/GLM-4.7-Flash - lora - sft - transformers - trl licence: license pipeline_tag: text-generation --- # output-2 This model is a fine-tuned version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash). **W&B run:** [https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v](https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v) ## Training procedure ### Hyperparameters | Parameter | Value | |-----------|-------| | Learning rate | `2e-05` | | LR scheduler | SchedulerType.COSINE | | Per-device batch size | 1 | | Gradient accumulation | 4 | | Effective batch size | 4 | | Epochs | 2 | | Max sequence length | 8192 | | Optimizer | OptimizerNames.ADAMW_TORCH_FUSED | | Weight decay | 0.01 | | Warmup ratio | 0.03 | | Max gradient norm | 1.0 | | Precision | bf16 | | Gradient checkpointing | yes | | Loss type | nll | | Chunked cross-entropy | yes | ### LoRA configuration | Parameter | Value | |-----------|-------| | Rank (r) | 32 | | Alpha | 16 | | Target modules | kv_a_proj_with_mqa, kv_b_proj, mlp.down_proj, mlp.gate_proj, mlp.up_proj, o_proj, q_a_proj, q_b_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert.up_proj | | rsLoRA | yes | ### Dataset statistics | Dataset | Samples | Total tokens | Trainable tokens | |---------|--------:|-------------:|-----------------:| | rpDungeon/some-revised-datasets/springdragon_processed.jsonl | 2,473 | 5,421,492 | 5,421,492 |
Training config ```yaml model_name_or_path: zai-org/GLM-4.7-Flash data_config: data.yaml prepared_dataset: prepared output_dir: output-2 attn_implementation: flash_attention_2 bf16: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false use_cce: true padding_free: false dataloader_num_workers: 4 dataloader_pin_memory: true aux_loss_top_prob_weight: 0.05 neftune_noise_alpha: 5 max_length: 8192 per_device_train_batch_size: 1 gradient_accumulation_steps: 4 truncation_strategy: split use_peft: true lora_r: 32 lora_alpha: 16 lora_dropout: 0.0 use_rslora: true lora_target_modules: - q_a_proj - q_b_proj - kv_a_proj_with_mqa - kv_b_proj - o_proj - shared_expert.gate_proj - shared_expert.up_proj - shared_expert.down_proj - mlp.gate_proj - mlp.up_proj - mlp.down_proj model_init_kwargs: trust_remote_code: true torch_dtype: bfloat16 trust_remote_code: true optim: adamw_torch_fused learning_rate: 2.0e-05 lr_scheduler_type: cosine warmup_ratio: 0.03 weight_decay: 0.01 max_grad_norm: 1.0 num_train_epochs: 2 logging_steps: 1 disable_tqdm: false saves_per_epoch: 4 eval_strategy: 'no' save_total_limit: 3 report_to: wandb run_name: glm47-sonic-springdragon ```
Data config ```yaml datasets: - path: rpDungeon/some-revised-datasets data_files: springdragon_processed.jsonl type: text columns: - text truncation_strategy: split shuffle_datasets: true shuffle_combined: true shuffle_seed: 42 eval_split: 0 split_seed: 42 assistant_only_loss: false ```
### Framework versions - PEFT 0.18.1 - Loft: 0.1.0 - Transformers: 5.3.0 - Pytorch: 2.9.1 - Datasets: 4.6.1 - Tokenizers: 0.22.2