Model save

Browse files

Files changed (4) hide show

README.md +169 -0
generation_config.json +11 -0
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1

README.md ADDED Viewed

	@@ -0,0 +1,169 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: AiForgeMaster/Qwen3-4B-P3-TC-RSSFT-1
+tags:
+- axolotl
+- generated_from_trainer
+model-index:
+- name: Qwen3-4B-P3-RSSFT-KE-1
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.13.0.dev0`
+```yaml
+# axolotl train config.yaml
+# Prevent NCCL timeout
+ddp_timeout: 7200  # 2 hours timeout instead of 10 minutes
+# Load model from local models directory first, fallback to HuggingFace if not found
+base_model: AiForgeMaster/Qwen3-4B-P3-TC-RSSFT-1  # Local path - will fallback to Qwen/Qwen3-4B if not found locally
+# Automatically upload checkpoint and final model to HF
+hub_model_id: AiForgeMaster/Qwen3-4B-P3-RSSFT-KE-1
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+# SFT dataset configuration - using HuggingFace datasets
+datasets:
+  - path: AiForgeMaster/KE-2017-2025  # Private HF dataset - requires API key
+    type: chat_template
+    split: train
+    field_messages: messages
+    trust_remote_code: false
+    # skip: 0 # number of rows of data to skip over from the beginning
+# Local paths relative to working directory
+dataset_prepared_path: ./data/prepared
+val_set_size: 0.0  # Set to 0 for SFT (no validation split)
+output_dir: ./outputs
+# Cache directories for HuggingFace downloads (relative to working dir)
+# This ensures models and datasets are downloaded to local directories
+hf_use_auth_token: true  # Use HF token for private repos if needed
+sequence_len: 8192
+sample_packing: false  # Standard for SFT
+eval_sample_packing: false  # Disable for SFT
+# WandB configuration - fill in your details
+wandb_project: ngpt-cpt
+wandb_entity: null
+wandb_watch: gradients
+wandb_name: qwen3_4b_p3_rssft_ke_1
+wandb_log_model: end
+# Batch size configuration (total effective batch size = micro_batch_size * gradient_accumulation_steps * num_gpus)
+# For batch size 8-16: micro_batch_size=2, gradient_accumulation_steps=4 gives effective batch size of 8 per GPU
+gradient_accumulation_steps: 2
+micro_batch_size: 2  # Adjust based on your GPU memory
+optimizer: adamw_torch_fused
+lr_scheduler: cosine
+learning_rate: 2e-5  # Good learning rate for SFT
+bf16: auto
+tf32: true
+max_grad_norm: 1.0
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+logging_steps: 10  # Log every 10 steps
+flash_attention: true
+warmup_steps: 50  # Good warmup for SFT
+# Checkpoint saving configuration - save every 50 steps
+save_steps: 50
+save_strategy: steps
+save_total_limit: 5  # Keep only 5 most recent checkpoints
+save_only_model: false  # Save full checkpoint including optimizer state
+# Evaluation configuration removed for pure SFT (val_set_size: 0.0)
+# eval_steps: 2000  # Not supported when val_set_size == 0
+# eval_strategy: steps  # Not supported when val_set_size == 0
+weight_decay: 0.01  # Good weight decay for SFT
+# Liger optimizations for memory efficiency and speed
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_layer_norm: true
+liger_fused_linear_cross_entropy: true
+# Additional SFT optimizations
+# Enable for first run to validate checkpoint saving works
+save_first_step: true
+# Memory optimizations
+dataloader_pin_memory: true
+dataloader_num_workers: 4
+remove_unused_columns: true
+# Advanced training settings for SFT
+# Calculate max_steps for full epoch: dataset_size / (micro_batch_size * gradient_accumulation_steps * num_gpus)
+# max_steps: 175  # Set for one full epoch with your dataset size
+num_epochs: 1
+group_by_length: true  # Good for SFT efficiency
+train_on_inputs: true  # train on user inputs in SFT
+# Loss monitoring
+loss_watchdog_threshold: 10.0  # Stop if loss exceeds this value
+loss_watchdog_patience: 3
+# Garbage collection to manage memory
+gc_steps: 100  # Run garbage collection every 100 steps
+```
+</details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uskfoundation/ngpt-cpt/runs/oy7n1t61)
+# Qwen3-4B-P3-RSSFT-KE-1
+This model is a fine-tuned version of [AiForgeMaster/Qwen3-4B-P3-TC-RSSFT-1](https://huggingface.co/AiForgeMaster/Qwen3-4B-P3-TC-RSSFT-1) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 50
+- training_steps: 416
+### Framework versions
+- Transformers 4.56.1
+- Pytorch 2.7.1+cu126
+- Datasets 4.0.0
+- Tokenizers 0.22.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "do_sample": true,
+  "eos_token_id": [
+    151645
+  ],
+  "max_length": 40960,
+  "pad_token_id": 151643,
+  "transformers_version": "4.56.1",
+  "use_cache": false
+}

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e557a73d5abf8c91c0c7b7de3f4fec9cae62d69c6b8af012193cf2fa40345cca
 size 4967215360

 version https://git-lfs.github.com/spec/v1
+oid sha256:32bc206666f45d987d4ab262fffec62797b79db877078b09eb4cc40e408210a5
 size 4967215360

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2451a2580c19125d8fb1a5b809d097f621143f40e12d080d741cd93b25162241
 size 3077766632

 version https://git-lfs.github.com/spec/v1
+oid sha256:3b81685543c894f0548252f24f171b519d5785ac7d453269aeb0ea08128d4d81
 size 3077766632