Upload folder using huggingface_hub

59fdd89 verified 10 months ago

7.07 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
	tags:
	- generated_from_trainer
	datasets:
	- Personamaxx-VN.json
	- NewEden/LIMARP-Complexity
	- NewEden/PIPPA-Mega-Filtered
	- NewEden/OpenCAI-ShareGPT
	- NewEden/Creative_Writing-Complexity
	- NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed
	- prosemaxx-adventure-failuremaxx.json
	- NewEden/Books-V2-ShareGPT
	- NewEden/Deepseek-V3-RP-Filtered
	- NewEden/BlueSky-10K-Complexity
	- NewEden/Final-Alpindale-LNs-ShareGPT
	- NewEden/DeepseekRP-Filtered
	- NewEden/RP-logs-V2-Experimental
	- anthracite-org/kalo_opus_misc_240827
	- anthracite-org/kalo_misc_part2
	- NewEden/vanilla-backrooms-claude-sharegpt
	- NewEden/Storium-Prefixed-Clean
	model-index:
	- name: output/Francois-V2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.8.0`
	```yaml
	base_model: PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
	## Liger+CCE
	plugins:
	- axolotl.integrations.liger.LigerPlugin
	# - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_layer_norm: true
	liger_glu_activation: true
	liger_fused_linear_cross_entropy: false
	#cut_cross_entropy: false

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: Personamaxx-VN.json
	type: dan-chat-advanced
	- path: NewEden/LIMARP-Complexity
	type: dan-chat-advanced
	- path: NewEden/PIPPA-Mega-Filtered
	type: dan-chat-advanced
	- path: NewEden/OpenCAI-ShareGPT
	type: dan-chat-advanced
	- path: NewEden/Creative_Writing-Complexity
	type: dan-chat-advanced
	- path: NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed
	type: dan-chat-advanced
	- path: prosemaxx-adventure-failuremaxx.json
	type: dan-chat-advanced
	- path: NewEden/Books-V2-ShareGPT
	type: dan-chat-advanced
	- path: NewEden/Deepseek-V3-RP-Filtered
	type: dan-chat-advanced
	- path: NewEden/BlueSky-10K-Complexity
	type: dan-chat-advanced
	- path: NewEden/Final-Alpindale-LNs-ShareGPT
	type: dan-chat-advanced
	- path: NewEden/DeepseekRP-Filtered
	type: dan-chat-advanced
	- path: NewEden/RP-logs-V2-Experimental
	type: dan-chat-advanced
	- path: anthracite-org/kalo_opus_misc_240827
	type: dan-chat-advanced
	- path: anthracite-org/kalo_misc_part2
	type: dan-chat-advanced
	- path: NewEden/vanilla-backrooms-claude-sharegpt
	type: dan-chat-advanced
	- path: NewEden/Storium-Prefixed-Clean
	type: dan-chat-advanced


	## LOra so we dont fuck brains
	adapter: lora
	lora_model_dir:
	lora_r: 128
	lora_alpha: 16
	lora_dropout: 0.05
	peft_use_rslora: true
	lora_target_modules:
	- gate_proj
	- down_proj
	- up_proj
	- q_proj
	- v_proj
	- k_proj
	- o_proj

	#lora_modules_to_save:
	# - embed_tokens
	# - lm_head
	shuffle_merged_datasets: true
	dataset_prepared_path: prepared_data
	output_dir: ./output/Francois-V2


	## Ctx Length
	sequence_len: 16384
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len: false
	#batch_flattening: true

	#torch_compile: auto # Optional[Union[Literal["auto"], bool]]
	#torch_compile_backend: # Optional[str]
	## Wandb
	wandb_project: Francois
	wandb_entity:
	wandb_watch:
	wandb_name: v3
	wandb_log_model:

	## Hparams
	gradient_accumulation_steps: 2
	micro_batch_size: 2
	num_epochs: 4
	optimizer: paged_ademamix_8bit
	lr_scheduler: cosine
	learning_rate: 3e-5
	max_grad_norm: 0.0001
	weight_decay: 0.02
	warmup_steps: 40

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	## Unsloth is broken, Use grad-ckpting.
	gradient_checkpointing: true
	early_stopping_patience:
	#resume_from_checkpoint: /home/ubuntu/Mango/axolotl/outputs/checkpoint-1088
	local_rank:
	logging_steps: 1
	xformers_attention: False
	flash_attention: True
	s2_attention:


	## Evals
	val_set_size: 0.0025
	evals_per_epoch: 4
	eval_table_size:
	eval_max_new_tokens: 128
	saves_per_epoch: 2
	debug:
	# Multi-GPU
	deepspeed: ./deepspeed_configs/zero2.json
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <pad>

	```

	</details><br>

	# output/Francois-V2

	This model is a fine-tuned version of [PocketDoc/Dans-PersonalityEngine-V1.1.0-12b](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.1.0-12b) on the Personamaxx-VN.json, the NewEden/LIMARP-Complexity, the NewEden/PIPPA-Mega-Filtered, the NewEden/OpenCAI-ShareGPT, the NewEden/Creative_Writing-Complexity, the NewEden/Light-Novels-Roleplay-Logs-Books-Oh-My-duplicate-turns-removed, the prosemaxx-adventure-failuremaxx.json, the NewEden/Books-V2-ShareGPT, the NewEden/Deepseek-V3-RP-Filtered, the NewEden/BlueSky-10K-Complexity, the NewEden/Final-Alpindale-LNs-ShareGPT, the NewEden/DeepseekRP-Filtered, the NewEden/RP-logs-V2-Experimental, the anthracite-org/kalo_opus_misc_240827, the anthracite-org/kalo_misc_part2, the NewEden/vanilla-backrooms-claude-sharegpt and the NewEden/Storium-Prefixed-Clean datasets.
	It achieves the following results on the evaluation set:
	- Loss: 2.1779

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Use paged_ademamix_8bit and the args are:
	No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 40
	- num_epochs: 4.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.731 \| 0.0023 \| 1 \| 2.4143 \|
	\| 1.451 \| 0.2506 \| 109 \| 2.3014 \|
	\| 1.4026 \| 0.5011 \| 218 \| 2.2824 \|
	\| 1.6573 \| 0.7517 \| 327 \| 2.2581 \|
	\| 1.587 \| 1.0023 \| 436 \| 2.2424 \|
	\| 1.2928 \| 1.2529 \| 545 \| 2.2229 \|
	\| 1.4023 \| 1.5034 \| 654 \| 2.2034 \|
	\| 1.6312 \| 1.7540 \| 763 \| 2.1959 \|
	\| 1.3044 \| 2.0046 \| 872 \| 2.1909 \|
	\| 1.4984 \| 2.2552 \| 981 \| 2.1876 \|
	\| 1.3767 \| 2.5057 \| 1090 \| 2.1840 \|
	\| 1.3972 \| 2.7563 \| 1199 \| 2.1812 \|
	\| 1.3663 \| 3.0069 \| 1308 \| 2.1792 \|
	\| 1.4958 \| 3.2575 \| 1417 \| 2.1785 \|
	\| 1.4214 \| 3.5080 \| 1526 \| 2.1784 \|
	\| 1.4001 \| 3.7586 \| 1635 \| 2.1779 \|


	### Framework versions

	- PEFT 0.15.1
	- Transformers 4.51.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.5.0
	- Tokenizers 0.21.1