Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: true
base_model: fxmarty/really-tiny-falcon-testing
bf16: auto
chat_template: llama3
dataset_prepared_path: null
dataset_processes: 12
datasets:
- data_files:
  - 415dedee96180e11_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/415dedee96180e11_train_data.json
  type:
    field_instruction: instruction
    field_output: response
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
eval_max_new_tokens: 128
eval_steps: 2000
eval_table_size: null
evals_per_epoch: null
flash_attention: false
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 1
gradient_checkpointing: false
group_by_length: false
hub_model_id: error577/4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.0
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 8
mlflow_experiment_name: /tmp/415dedee96180e11_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 2000
sequence_len: 512
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.01
wandb_entity: null
wandb_mode: online
wandb_name: 5941f67d-1b56-4ae0-b76d-52a8681c66f9
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 5941f67d-1b56-4ae0-b76d-52a8681c66f9
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f

This model is a fine-tuned version of fxmarty/really-tiny-falcon-testing on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 10.9373

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 30
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss
11.0892 0.0001 1 11.0858
10.9584 0.1083 2000 10.9866
10.9595 0.2166 4000 10.9773
10.9648 0.3250 6000 10.9714
10.9636 0.4333 8000 10.9668
10.9655 0.5416 10000 10.9627
10.9675 0.6499 12000 10.9596
10.9637 0.7582 14000 10.9575
10.954 0.8666 16000 10.9556
10.9561 0.9749 18000 10.9539
10.9531 1.0832 20000 10.9530
10.949 1.1915 22000 10.9520
10.9415 1.2998 24000 10.9507
10.9581 1.4081 26000 10.9497
10.9587 1.5165 28000 10.9491
10.9509 1.6248 30000 10.9481
10.9518 1.7331 32000 10.9474
10.9661 1.8414 34000 10.9475
10.935 1.9497 36000 10.9469
10.9494 2.0581 38000 10.9462
10.9367 2.1664 40000 10.9458
10.9474 2.2747 42000 10.9457
10.9661 2.3830 44000 10.9456
10.9579 2.4913 46000 10.9449
10.9417 2.5997 48000 10.9443
10.9537 2.7080 50000 10.9440
10.9467 2.8163 52000 10.9440
10.9549 2.9246 54000 10.9432
10.953 3.0329 56000 10.9430
10.9254 3.1412 58000 10.9429
10.9436 3.2496 60000 10.9424
10.9463 3.3579 62000 10.9423
10.9296 3.4662 64000 10.9420
10.9296 3.5745 66000 10.9416
10.934 3.6828 68000 10.9417
10.9502 3.7912 70000 10.9417
10.9546 3.8995 72000 10.9412
10.924 4.0078 74000 10.9407
10.9337 4.1161 76000 10.9408
10.9296 4.2244 78000 10.9407
10.9361 4.3328 80000 10.9404
10.9444 4.4411 82000 10.9404
10.9434 4.5494 84000 10.9399
10.9261 4.6577 86000 10.9398
10.9328 4.7660 88000 10.9399
10.9527 4.8744 90000 10.9397
10.931 4.9827 92000 10.9394
10.9588 5.0910 94000 10.9393
10.9565 5.1993 96000 10.9393
10.9629 5.3076 98000 10.9391
10.9279 5.4159 100000 10.9391
10.9422 5.5243 102000 10.9391
10.9475 5.6326 104000 10.9387
10.9235 5.7409 106000 10.9387
10.946 5.8492 108000 10.9385
10.9304 5.9575 110000 10.9385
10.9217 6.0659 112000 10.9384
10.9307 6.1742 114000 10.9383
10.95 6.2825 116000 10.9384
10.9492 6.3908 118000 10.9381
10.9282 6.4991 120000 10.9381
10.946 6.6075 122000 10.9381
10.9419 6.7158 124000 10.9379
10.9331 6.8241 126000 10.9379
10.9557 6.9324 128000 10.9379
10.9447 7.0407 130000 10.9379
10.9327 7.1490 132000 10.9377
10.9411 7.2574 134000 10.9377
10.9434 7.3657 136000 10.9377
10.9354 7.4740 138000 10.9375
10.9305 7.5823 140000 10.9376
10.9364 7.6906 142000 10.9374
10.9203 7.7990 144000 10.9375
10.9314 7.9073 146000 10.9375
10.9374 8.0156 148000 10.9373
10.9251 8.1239 150000 10.9374
10.933 8.2322 152000 10.9374
10.935 8.3406 154000 10.9373

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for error577/4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f

Adapter
(169)
this model