Dans-DiscountModels
/

7b-m-dans-personalityengine-v1.3.0L-TestArticle-1

+---
+library_name: transformers
+base_model: Dans-DiscountModels/mistral-7b-v0.3-DanChat
+tags:
+- axolotl
+- generated_from_trainer
+datasets:
+- Dans-DiscountModels/dpe-130l-m-7b-32k
+model-index:
+- name: 7b-m-dans-personalityengine-v1.3.0L-TestArticle-1
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.10.0.dev0`
+```yaml
+base_model: Dans-DiscountModels/mistral-7b-v0.3-DanChat
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+trust_remote_code:
+# wandb configuration
+wandb_project: 7b-m-dans-personalityengine
+wandb_watch:
+wandb_run_id: V1.3.0L-1-8 # V{Version}-{Run Number}-{Attempt Number}
+wandb_log_model:
+# push checkpoints to hub
+hub_model_id: Dans-DiscountModels/7b-m-dans-personalityengine-v1.3.0L-TestArticle-1
+# how to push checkpoints to hub
+# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
+hub_strategy: "every_save"
+# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
+# Required to be true when used in combination with `push_dataset_to_hub`
+hf_use_auth_token: true
+# where to save the finished model to
+output_dir: ./7b-m-dans-personalityengine
+# where to save the dataset to
+dataset_prepared_path: ./7b-m-dans-personalityengine-data
+save_safetensors: true
+# dataset settings (local or huggingface repo)
+datasets:
+  - path: Dans-DiscountModels/dpe-130l-m-7b-32k
+    split: train
+    ds_type: parquet
+    type:
+test_datasets:
+  - path: Dans-DiscountModels/dpe-130l-m-7b-32k
+    split: validation
+    ds_type: parquet
+    type:
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_layer_norm: true
+liger_glu_activation: true
+liger_fused_linear_cross_entropy: false
+cut_cross_entropy: true
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+sequence_len: 32768
+sample_packing: true
+eval_sample_packing: true
+pad_to_sequence_len: true
+gradient_checkpointing: true
+# gradient_checkpointing_kwargs:
+# use_reentrant: false
+gradient_accumulation_steps: 1
+micro_batch_size: 4
+num_epochs: 2
+optimizer: ademamix_8bit
+optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"
+lr_scheduler: rex
+learning_rate: 0.000000012
+cosine_min_lr_ratio: 0.1
+# weight_decay: 0.03
+max_grad_norm: 0.001
+train_on_inputs: false
+group_by_length: false
+bf16: true
+fp16: false
+tf32: false
+early_stopping_patience:
+resume_from_checkpoint:
+auto_resume_from_checkpoints: false
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.05
+evals_per_epoch: 10
+eval_table_size:
+eval_max_new_tokens:
+saves_per_epoch: 2
+save_total_limit: 1
+debug: false
+deepspeed: deepspeed_configs/zero3_bf16.json
+fsdp:
+fsdp_config:
+special_tokens:
+```
+</details><br>
+# 7b-m-dans-personalityengine-v1.3.0L-TestArticle-1
+This model is a fine-tuned version of [Dans-DiscountModels/mistral-7b-v0.3-DanChat](https://huggingface.co/Dans-DiscountModels/mistral-7b-v0.3-DanChat) on the Dans-DiscountModels/dpe-130l-m-7b-32k dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.5911
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1.2e-08
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- total_train_batch_size: 32
+- total_eval_batch_size: 32
+- optimizer: Use ademamix_8bit and the args are:
+beta1=0.9,beta2=0.999,beta3=0.999,alpha=5
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 47
+- num_epochs: 2.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.4427        | 0.0021 | 1    | 1.5639          |
+| 1.5781        | 0.1015 | 48   | 1.5631          |
+| 1.462         | 0.2030 | 96   | 1.5590          |
+| 1.6565        | 0.3044 | 144  | 1.5540          |
+| 1.454         | 0.4059 | 192  | 1.5498          |
+| 1.5414        | 0.5074 | 240  | 1.5471          |
+| 1.6084        | 0.6089 | 288  | 1.5459          |
+| 1.5315        | 0.7104 | 336  | 1.5457          |
+| 1.4646        | 0.8118 | 384  | 1.5465          |
+| 1.5506        | 0.9133 | 432  | 1.5482          |
+| 1.5083        | 1.0148 | 480  | 1.5506          |
+| 1.4986        | 1.1163 | 528  | 1.5538          |
+| 1.4976        | 1.2178 | 576  | 1.5576          |
+| 1.6139        | 1.3192 | 624  | 1.5618          |
+| 1.6305        | 1.4207 | 672  | 1.5666          |
+| 1.5522        | 1.5222 | 720  | 1.5717          |
+| 1.5846        | 1.6237 | 768  | 1.5771          |
+| 1.6093        | 1.7252 | 816  | 1.5824          |
+| 1.6282        | 1.8266 | 864  | 1.5873          |
+| 1.5984        | 1.9281 | 912  | 1.5911          |
+### Framework versions
+- Transformers 4.51.3
+- Pytorch 2.7.0+cu126
+- Datasets 3.5.1
+- Tokenizers 0.21.1