File size: 3,255 Bytes
f19c302 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | ---
base_model: zai-org/GLM-4.7-Flash
library_name: peft
model_name: output-2
tags:
- base_model:adapter:zai-org/GLM-4.7-Flash
- lora
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
---
# output-2
This model is a fine-tuned version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash).
**W&B run:** [https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v](https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v)
## Training procedure
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| Learning rate | `2e-05` |
| LR scheduler | SchedulerType.COSINE |
| Per-device batch size | 1 |
| Gradient accumulation | 4 |
| Effective batch size | 4 |
| Epochs | 2 |
| Max sequence length | 8192 |
| Optimizer | OptimizerNames.ADAMW_TORCH_FUSED |
| Weight decay | 0.01 |
| Warmup ratio | 0.03 |
| Max gradient norm | 1.0 |
| Precision | bf16 |
| Gradient checkpointing | yes |
| Loss type | nll |
| Chunked cross-entropy | yes |
### LoRA configuration
| Parameter | Value |
|-----------|-------|
| Rank (r) | 32 |
| Alpha | 16 |
| Target modules | kv_a_proj_with_mqa, kv_b_proj, mlp.down_proj, mlp.gate_proj, mlp.up_proj, o_proj, q_a_proj, q_b_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert.up_proj |
| rsLoRA | yes |
### Dataset statistics
| Dataset | Samples | Total tokens | Trainable tokens |
|---------|--------:|-------------:|-----------------:|
| rpDungeon/some-revised-datasets/springdragon_processed.jsonl | 2,473 | 5,421,492 | 5,421,492 |
<details>
<summary>Training config</summary>
```yaml
model_name_or_path: zai-org/GLM-4.7-Flash
data_config: data.yaml
prepared_dataset: prepared
output_dir: output-2
attn_implementation: flash_attention_2
bf16: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
use_cce: true
padding_free: false
dataloader_num_workers: 4
dataloader_pin_memory: true
aux_loss_top_prob_weight: 0.05
neftune_noise_alpha: 5
max_length: 8192
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
truncation_strategy: split
use_peft: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.0
use_rslora: true
lora_target_modules:
- q_a_proj
- q_b_proj
- kv_a_proj_with_mqa
- kv_b_proj
- o_proj
- shared_expert.gate_proj
- shared_expert.up_proj
- shared_expert.down_proj
- mlp.gate_proj
- mlp.up_proj
- mlp.down_proj
model_init_kwargs:
trust_remote_code: true
torch_dtype: bfloat16
trust_remote_code: true
optim: adamw_torch_fused
learning_rate: 2.0e-05
lr_scheduler_type: cosine
warmup_ratio: 0.03
weight_decay: 0.01
max_grad_norm: 1.0
num_train_epochs: 2
logging_steps: 1
disable_tqdm: false
saves_per_epoch: 4
eval_strategy: 'no'
save_total_limit: 3
report_to: wandb
run_name: glm47-sonic-springdragon
```
</details>
<details>
<summary>Data config</summary>
```yaml
datasets:
- path: rpDungeon/some-revised-datasets
data_files: springdragon_processed.jsonl
type: text
columns:
- text
truncation_strategy: split
shuffle_datasets: true
shuffle_combined: true
shuffle_seed: 42
eval_split: 0
split_seed: 42
assistant_only_loss: false
```
</details>
### Framework versions
- PEFT 0.18.1
- Loft: 0.1.0
- Transformers: 5.3.0
- Pytorch: 2.9.1
- Datasets: 4.6.1
- Tokenizers: 0.22.2 |