Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: Maykeye/TinyLLama-v0
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 40f84f4610a855b2_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/40f84f4610a855b2_train_data.json
  type:
    field_instruction: question
    field_output: answer
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 400
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/bfd2d379-a1a8-4f73-a0b4-63117bb0c96f
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 121364
micro_batch_size: 2
mlflow_experiment_name: /tmp/40f84f4610a855b2_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 400
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.026089497411921857
wandb_entity: null
wandb_mode: online
wandb_name: 56a29ccc-0b06-416d-a18e-0e2c0cd86f0e
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 56a29ccc-0b06-416d-a18e-0e2c0cd86f0e
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

bfd2d379-a1a8-4f73-a0b4-63117bb0c96f

This model is a fine-tuned version of Maykeye/TinyLLama-v0 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 6.9194

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 121364

Training results

Training Loss Epoch Step Validation Loss
12.0619 0.0000 1 11.9615
7.9221 0.0171 400 7.8516
7.5425 0.0343 800 7.6125
7.5271 0.0514 1200 7.5215
7.6511 0.0686 1600 7.4503
7.4575 0.0857 2000 7.4023
7.4126 0.1029 2400 7.3567
7.2177 0.1200 2800 7.3185
7.3211 0.1372 3200 7.2852
7.2918 0.1543 3600 7.2609
7.2682 0.1714 4000 7.2424
6.9925 0.1886 4400 7.2183
7.1942 0.2057 4800 7.2031
7.1606 0.2229 5200 7.1858
7.3641 0.2400 5600 7.1721
7.4306 0.2572 6000 7.1570
7.1406 0.2743 6400 7.1427
7.3856 0.2915 6800 7.1338
7.3319 0.3086 7200 7.1230
7.2052 0.3257 7600 7.1124
7.1784 0.3429 8000 7.1049
6.9061 0.3600 8400 7.0983
6.8446 0.3772 8800 7.0885
7.2836 0.3943 9200 7.0825
7.2025 0.4115 9600 7.0736
6.9665 0.4286 10000 7.0693
7.0319 0.4458 10400 7.0638
7.1117 0.4629 10800 7.0562
7.1637 0.4800 11200 7.0510
7.0831 0.4972 11600 7.0450
7.1105 0.5143 12000 7.0402
7.0615 0.5315 12400 7.0354
6.9541 0.5486 12800 7.0310
6.9338 0.5658 13200 7.0293
6.8745 0.5829 13600 7.0230
6.9395 0.6001 14000 7.0177
6.991 0.6172 14400 7.0163
6.0832 0.6343 14800 7.0112
7.0355 0.6515 15200 7.0085
7.0765 0.6686 15600 7.0036
7.0429 0.6858 16000 7.0015
7.0843 0.7029 16400 6.9986
7.0766 0.7201 16800 6.9955
7.1227 0.7372 17200 6.9926
6.8547 0.7544 17600 6.9899
6.7269 0.7715 18000 6.9884
7.0857 0.7887 18400 6.9865
6.9734 0.8058 18800 6.9847
7.0499 0.8229 19200 6.9813
7.1258 0.8401 19600 6.9774
7.1636 0.8572 20000 6.9753
7.2572 0.8744 20400 6.9742
6.9789 0.8915 20800 6.9716
7.2132 0.9087 21200 6.9704
7.1362 0.9258 21600 6.9687
6.9767 0.9430 22000 6.9657
6.9853 0.9601 22400 6.9644
7.0186 0.9772 22800 6.9627
7.0654 0.9944 23200 6.9604
7.1397 1.0115 23600 6.9602
6.995 1.0287 24000 6.9587
7.2728 1.0458 24400 6.9546
6.915 1.0630 24800 6.9572
6.9481 1.0801 25200 6.9535
6.9489 1.0973 25600 6.9499
7.0888 1.1144 26000 6.9492
7.1006 1.1315 26400 6.9482
7.0525 1.1487 26800 6.9465
7.0576 1.1658 27200 6.9432
6.9836 1.1830 27600 6.9440
6.9761 1.2001 28000 6.9411
6.8321 1.2173 28400 6.9403
6.9887 1.2344 28800 6.9388
6.9359 1.2516 29200 6.9389
7.0867 1.2687 29600 6.9355
7.0808 1.2858 30000 6.9345
7.1399 1.3030 30400 6.9346
6.9547 1.3201 30800 6.9321
6.911 1.3373 31200 6.9298
7.1562 1.3544 31600 6.9300
7.0566 1.3716 32000 6.9283
7.1362 1.3887 32400 6.9291
7.026 1.4059 32800 6.9247
7.0724 1.4230 33200 6.9254
6.5286 1.4401 33600 6.9241
6.7828 1.4573 34000 6.9224
6.9472 1.4744 34400 6.9230
5.4581 1.4916 34800 6.9196
7.0774 1.5087 35200 6.9188
6.8963 1.5259 35600 6.9183
6.9506 1.5430 36000 6.9198
6.8666 1.5602 36400 6.9194

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/bfd2d379-a1a8-4f73-a0b4-63117bb0c96f

Adapter
(369)
this model