--- library_name: peft base_model: hardlyworking/Broth-12B tags: - axolotl - generated_from_trainer datasets: - PocketDoc/Dans-MemoryCore-CoreCurriculum-Small - NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned - Nitral-AI/Reasoning-1shot_ShareGPT - Nitral-AI/GU_Instruct-ShareGPT - Nitral-AI/Medical_Instruct-ShareGPT - AquaV/Resistance-Sharegpt - AquaV/US-Army-Survival-Sharegpt - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned model-index: - name: Noodles-12B results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.8.0` ```yaml ## model base_model: hardlyworking/Broth-12B model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer ## upload hub_model_id: hardlyworking/Noodles-12B hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true ## qlora COPE load_in_8bit: false load_in_4bit: false strict: false ## data datasets: - path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small type: dan-chat-advanced - path: NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered type: dan-chat-advanced - path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned type: dan-chat-advanced - path: Nitral-AI/Reasoning-1shot_ShareGPT type: dan-chat-advanced - path: Nitral-AI/GU_Instruct-ShareGPT type: dan-chat-advanced - path: Nitral-AI/Medical_Instruct-ShareGPT type: dan-chat-advanced - path: AquaV/Resistance-Sharegpt type: dan-chat-advanced - path: AquaV/US-Army-Survival-Sharegpt type: dan-chat-advanced - path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned type: dan-chat-advanced shuffle_merged_datasets: true dataset_prepared_path: dataset_prepared val_set_size: 0.001 output_dir: outputs/out ## LIGER & CCE plugins: - axolotl.integrations.liger.LigerPlugin - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin liger_rope: true liger_rms_norm: true liger_layer_norm: true liger_glu_activation: true liger_fused_linear_cross_entropy: false cut_cross_entropy: false ## CTX settings sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true ## Lora adapter: lora lora_model_dir: lora_r: 128 lora_alpha: 16 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: peft_use_rslora: true lora_modules_to_save: - embed_tokens - lm_head ## WandB wandb_project: JoeyBoy wandb_entity: wandb_watch: wandb_name: wandb_log_model: ## evals evals_per_epoch: 8 eval_table_size: eval_max_new_tokens: 128 ## hoe params gradient_accumulation_steps: 2 micro_batch_size: 4 num_epochs: 2 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true s2_attention: warmup_steps: 40 saves_per_epoch: 2 debug: ## for ademiamix deepspeed: ./deepspeed_configs/zero3_bf16.json ## for adamw #deepspeed: ./deepspeed_configs/zero3_bf16.json weight_decay: 0.01 fsdp: fsdp_config: special_tokens: pad_token: ```

# Noodles-12B This model is a fine-tuned version of [hardlyworking/Broth-12B](https://huggingface.co/hardlyworking/Broth-12B) on the PocketDoc/Dans-MemoryCore-CoreCurriculum-Small, the NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered, the Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned, the Nitral-AI/Reasoning-1shot_ShareGPT, the Nitral-AI/GU_Instruct-ShareGPT, the Nitral-AI/Medical_Instruct-ShareGPT, the AquaV/Resistance-Sharegpt, the AquaV/US-Army-Survival-Sharegpt and the Gryphe/Sonnet3.5-SlimOrcaDedupCleaned datasets. It achieves the following results on the evaluation set: - Loss: 0.7255 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 32 - total_eval_batch_size: 16 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 40 - num_epochs: 2.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.0128 | 0.0013 | 1 | 1.2742 | | 0.8925 | 0.1256 | 94 | 0.8576 | | 0.8065 | 0.2512 | 188 | 0.8102 | | 0.782 | 0.3768 | 282 | 0.7881 | | 0.8135 | 0.5023 | 376 | 0.7755 | | 0.7877 | 0.6279 | 470 | 0.7601 | | 0.7955 | 0.7535 | 564 | 0.7516 | | 0.758 | 0.8791 | 658 | 0.7444 | | 0.7362 | 1.0040 | 752 | 0.7402 | | 0.7053 | 1.1296 | 846 | 0.7354 | | 0.6439 | 1.2552 | 940 | 0.7326 | | 0.7445 | 1.3808 | 1034 | 0.7298 | | 0.5843 | 1.5063 | 1128 | 0.7288 | | 0.6571 | 1.6319 | 1222 | 0.7268 | | 0.652 | 1.7575 | 1316 | 0.7257 | | 0.6872 | 1.8831 | 1410 | 0.7255 | ### Framework versions - PEFT 0.15.1 - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.21.1