Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: unsloth/Qwen2-0.5B
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 40527198fd7180da_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/40527198fd7180da_train_data.json
  type:
    field_input: bodies
    field_instruction: decl
    field_output: desc
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 2
early_stopping_threshold: 0.0001
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: romainnn/5245836f-40d3-4e05-a231-8462c8c2baff
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 9983
micro_batch_size: 4
mlflow_experiment_name: /tmp/40527198fd7180da_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.00917724190842581
wandb_entity: null
wandb_mode: online
wandb_name: a88f056a-2840-48ca-ada7-c0aa6accc543
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: a88f056a-2840-48ca-ada7-c0aa6accc543
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

5245836f-40d3-4e05-a231-8462c8c2baff

This model is a fine-tuned version of unsloth/Qwen2-0.5B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8156

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 9983

Training results

Training Loss Epoch Step Validation Loss
2.5279 0.0001 1 3.4427
2.0594 0.0059 100 2.2386
2.2409 0.0119 200 2.2071
1.9584 0.0178 300 2.1853
1.9313 0.0237 400 2.1659
1.8711 0.0296 500 2.1537
1.8007 0.0356 600 2.1435
2.1542 0.0415 700 2.1296
2.0665 0.0474 800 2.1189
2.3934 0.0534 900 2.1166
2.3511 0.0593 1000 2.1009
2.0545 0.0652 1100 2.0989
2.2765 0.0711 1200 2.0876
1.9124 0.0771 1300 2.0830
1.9408 0.0830 1400 2.0767
2.1107 0.0889 1500 2.0676
2.055 0.0948 1600 2.0619
1.7226 0.1008 1700 2.0545
1.8976 0.1067 1800 2.0473
1.9798 0.1126 1900 2.0380
1.9494 0.1186 2000 2.0348
2.0733 0.1245 2100 2.0308
1.9957 0.1304 2200 2.0251
2.0361 0.1363 2300 2.0206
2.2313 0.1423 2400 2.0176
1.8886 0.1482 2500 2.0082
2.1336 0.1541 2600 2.0078
1.8516 0.1601 2700 2.0024
1.9572 0.1660 2800 1.9952
1.6813 0.1719 2900 1.9922
1.7759 0.1778 3000 1.9856
1.8491 0.1838 3100 1.9809
2.1544 0.1897 3200 1.9768
2.0521 0.1956 3300 1.9671
1.6342 0.2015 3400 1.9630
2.0625 0.2075 3500 1.9614
1.9992 0.2134 3600 1.9548
1.7514 0.2193 3700 1.9516
1.8403 0.2253 3800 1.9478
1.9456 0.2312 3900 1.9440
1.4284 0.2371 4000 1.9384
2.1788 0.2430 4100 1.9353
1.7432 0.2490 4200 1.9302
1.7911 0.2549 4300 1.9272
1.94 0.2608 4400 1.9212
1.9083 0.2668 4500 1.9176
1.5717 0.2727 4600 1.9148
1.9185 0.2786 4700 1.9123
1.8033 0.2845 4800 1.9077
1.9307 0.2905 4900 1.9032
2.3501 0.2964 5000 1.8999
1.9069 0.3023 5100 1.8955
2.0346 0.3082 5200 1.8940
1.7066 0.3142 5300 1.8910
1.888 0.3201 5400 1.8858
1.9475 0.3260 5500 1.8833
1.6037 0.3320 5600 1.8814
1.7273 0.3379 5700 1.8777
1.8923 0.3438 5800 1.8740
2.3106 0.3497 5900 1.8707
1.3758 0.3557 6000 1.8684
1.9701 0.3616 6100 1.8651
1.854 0.3675 6200 1.8617
1.5884 0.3735 6300 1.8588
1.7065 0.3794 6400 1.8564
1.5681 0.3853 6500 1.8529
1.9251 0.3912 6600 1.8504
1.6854 0.3972 6700 1.8487
1.9832 0.4031 6800 1.8456
1.2331 0.4090 6900 1.8452
1.8641 0.4149 7000 1.8421
1.8737 0.4209 7100 1.8402
1.6779 0.4268 7200 1.8377
1.3901 0.4327 7300 1.8359
1.4336 0.4387 7400 1.8336
1.7731 0.4446 7500 1.8318
1.7203 0.4505 7600 1.8304
1.8094 0.4564 7700 1.8292
1.8295 0.4624 7800 1.8277
1.7489 0.4683 7900 1.8271
1.4063 0.4742 8000 1.8257
1.6528 0.4802 8100 1.8239
1.9509 0.4861 8200 1.8234
1.7355 0.4920 8300 1.8218
1.6242 0.4979 8400 1.8211
1.5363 0.5039 8500 1.8196
1.7172 0.5098 8600 1.8189
1.5967 0.5157 8700 1.8182
1.8118 0.5216 8800 1.8180
1.8206 0.5276 8900 1.8171
1.7382 0.5335 9000 1.8167
1.5683 0.5394 9100 1.8165
1.8624 0.5454 9200 1.8163
1.9627 0.5513 9300 1.8160
1.6127 0.5572 9400 1.8158
1.9056 0.5631 9500 1.8156
2.1929 0.5691 9600 1.8156
1.6813 0.5750 9700 1.8157
1.9613 0.5809 9800 1.8156
1.5377 0.5869 9900 1.8156

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for romainnn/5245836f-40d3-4e05-a231-8462c8c2baff

Adapter
(279)
this model