rpDungeon
/

GLM-4.7-Flash-Springdragon

Text Generation

Model card Files Files and versions

GLM-4.7-Flash-Springdragon / README.md

ToastyPigeon's picture

Upload folder using huggingface_hub

f19c302 verified 8 days ago

|

history blame contribute delete

3.26 kB

	---
	base_model: zai-org/GLM-4.7-Flash
	library_name: peft
	model_name: output-2
	tags:
	- base_model:adapter:zai-org/GLM-4.7-Flash
	- lora
	- sft
	- transformers
	- trl
	licence: license
	pipeline_tag: text-generation
	---

	# output-2

	This model is a fine-tuned version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash).

	W&B run: [https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v](https://wandb.ai/cooawoo-personal/huggingface/runs/ay5ml51v)
	## Training procedure

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Learning rate \| `2e-05` \|
	\| LR scheduler \| SchedulerType.COSINE \|
	\| Per-device batch size \| 1 \|
	\| Gradient accumulation \| 4 \|
	\| Effective batch size \| 4 \|
	\| Epochs \| 2 \|
	\| Max sequence length \| 8192 \|
	\| Optimizer \| OptimizerNames.ADAMW_TORCH_FUSED \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.03 \|
	\| Max gradient norm \| 1.0 \|
	\| Precision \| bf16 \|
	\| Gradient checkpointing \| yes \|
	\| Loss type \| nll \|
	\| Chunked cross-entropy \| yes \|


	### LoRA configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Rank (r) \| 32 \|
	\| Alpha \| 16 \|
	\| Target modules \| kv_a_proj_with_mqa, kv_b_proj, mlp.down_proj, mlp.gate_proj, mlp.up_proj, o_proj, q_a_proj, q_b_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert.up_proj \|
	\| rsLoRA \| yes \|


	### Dataset statistics

	\| Dataset \| Samples \| Total tokens \| Trainable tokens \|
	\|---------\|--------:\|-------------:\|-----------------:\|
	\| rpDungeon/some-revised-datasets/springdragon_processed.jsonl \| 2,473 \| 5,421,492 \| 5,421,492 \|


	<details>
	<summary>Training config</summary>

	```yaml
	model_name_or_path: zai-org/GLM-4.7-Flash
	data_config: data.yaml
	prepared_dataset: prepared
	output_dir: output-2
	attn_implementation: flash_attention_2
	bf16: true
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	use_cce: true
	padding_free: false
	dataloader_num_workers: 4
	dataloader_pin_memory: true
	aux_loss_top_prob_weight: 0.05
	neftune_noise_alpha: 5
	max_length: 8192
	per_device_train_batch_size: 1
	gradient_accumulation_steps: 4
	truncation_strategy: split
	use_peft: true
	lora_r: 32
	lora_alpha: 16
	lora_dropout: 0.0
	use_rslora: true
	lora_target_modules:
	- q_a_proj
	- q_b_proj
	- kv_a_proj_with_mqa
	- kv_b_proj
	- o_proj
	- shared_expert.gate_proj
	- shared_expert.up_proj
	- shared_expert.down_proj
	- mlp.gate_proj
	- mlp.up_proj
	- mlp.down_proj
	model_init_kwargs:
	trust_remote_code: true
	torch_dtype: bfloat16
	trust_remote_code: true
	optim: adamw_torch_fused
	learning_rate: 2.0e-05
	lr_scheduler_type: cosine
	warmup_ratio: 0.03
	weight_decay: 0.01
	max_grad_norm: 1.0
	num_train_epochs: 2
	logging_steps: 1
	disable_tqdm: false
	saves_per_epoch: 4
	eval_strategy: 'no'
	save_total_limit: 3
	report_to: wandb
	run_name: glm47-sonic-springdragon
	```

	</details>

	<details>
	<summary>Data config</summary>

	```yaml
	datasets:
	- path: rpDungeon/some-revised-datasets
	data_files: springdragon_processed.jsonl
	type: text
	columns:
	- text
	truncation_strategy: split
	shuffle_datasets: true
	shuffle_combined: true
	shuffle_seed: 42
	eval_split: 0
	split_seed: 42
	assistant_only_loss: false
	```

	</details>

	### Framework versions

	- PEFT 0.18.1
	- Loft: 0.1.0
	- Transformers: 5.3.0
	- Pytorch: 2.9.1
	- Datasets: 4.6.1
	- Tokenizers: 0.22.2