See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: unsloth/Qwen2-0.5B
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 40527198fd7180da_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/40527198fd7180da_train_data.json
  type:
    field_input: bodies
    field_instruction: decl
    field_output: desc
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 2
early_stopping_threshold: 0.0001
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: romainnn/5245836f-40d3-4e05-a231-8462c8c2baff
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 9983
micro_batch_size: 4
mlflow_experiment_name: /tmp/40527198fd7180da_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.00917724190842581
wandb_entity: null
wandb_mode: online
wandb_name: a88f056a-2840-48ca-ada7-c0aa6accc543
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: a88f056a-2840-48ca-ada7-c0aa6accc543
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

5245836f-40d3-4e05-a231-8462c8c2baff

This model is a fine-tuned version of unsloth/Qwen2-0.5B on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.8156

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 9983

Training results

Training Loss	Epoch	Step	Validation Loss
2.5279	0.0001	1	3.4427
2.0594	0.0059	100	2.2386
2.2409	0.0119	200	2.2071
1.9584	0.0178	300	2.1853
1.9313	0.0237	400	2.1659
1.8711	0.0296	500	2.1537
1.8007	0.0356	600	2.1435
2.1542	0.0415	700	2.1296
2.0665	0.0474	800	2.1189
2.3934	0.0534	900	2.1166
2.3511	0.0593	1000	2.1009
2.0545	0.0652	1100	2.0989
2.2765	0.0711	1200	2.0876
1.9124	0.0771	1300	2.0830
1.9408	0.0830	1400	2.0767
2.1107	0.0889	1500	2.0676
2.055	0.0948	1600	2.0619
1.7226	0.1008	1700	2.0545
1.8976	0.1067	1800	2.0473
1.9798	0.1126	1900	2.0380
1.9494	0.1186	2000	2.0348
2.0733	0.1245	2100	2.0308
1.9957	0.1304	2200	2.0251
2.0361	0.1363	2300	2.0206
2.2313	0.1423	2400	2.0176
1.8886	0.1482	2500	2.0082
2.1336	0.1541	2600	2.0078
1.8516	0.1601	2700	2.0024
1.9572	0.1660	2800	1.9952
1.6813	0.1719	2900	1.9922
1.7759	0.1778	3000	1.9856
1.8491	0.1838	3100	1.9809
2.1544	0.1897	3200	1.9768
2.0521	0.1956	3300	1.9671
1.6342	0.2015	3400	1.9630
2.0625	0.2075	3500	1.9614
1.9992	0.2134	3600	1.9548
1.7514	0.2193	3700	1.9516
1.8403	0.2253	3800	1.9478
1.9456	0.2312	3900	1.9440
1.4284	0.2371	4000	1.9384
2.1788	0.2430	4100	1.9353
1.7432	0.2490	4200	1.9302
1.7911	0.2549	4300	1.9272
1.94	0.2608	4400	1.9212
1.9083	0.2668	4500	1.9176
1.5717	0.2727	4600	1.9148
1.9185	0.2786	4700	1.9123
1.8033	0.2845	4800	1.9077
1.9307	0.2905	4900	1.9032
2.3501	0.2964	5000	1.8999
1.9069	0.3023	5100	1.8955
2.0346	0.3082	5200	1.8940
1.7066	0.3142	5300	1.8910
1.888	0.3201	5400	1.8858
1.9475	0.3260	5500	1.8833
1.6037	0.3320	5600	1.8814
1.7273	0.3379	5700	1.8777
1.8923	0.3438	5800	1.8740
2.3106	0.3497	5900	1.8707
1.3758	0.3557	6000	1.8684
1.9701	0.3616	6100	1.8651
1.854	0.3675	6200	1.8617
1.5884	0.3735	6300	1.8588
1.7065	0.3794	6400	1.8564
1.5681	0.3853	6500	1.8529
1.9251	0.3912	6600	1.8504
1.6854	0.3972	6700	1.8487
1.9832	0.4031	6800	1.8456
1.2331	0.4090	6900	1.8452
1.8641	0.4149	7000	1.8421
1.8737	0.4209	7100	1.8402
1.6779	0.4268	7200	1.8377
1.3901	0.4327	7300	1.8359
1.4336	0.4387	7400	1.8336
1.7731	0.4446	7500	1.8318
1.7203	0.4505	7600	1.8304
1.8094	0.4564	7700	1.8292
1.8295	0.4624	7800	1.8277
1.7489	0.4683	7900	1.8271
1.4063	0.4742	8000	1.8257
1.6528	0.4802	8100	1.8239
1.9509	0.4861	8200	1.8234
1.7355	0.4920	8300	1.8218
1.6242	0.4979	8400	1.8211
1.5363	0.5039	8500	1.8196
1.7172	0.5098	8600	1.8189
1.5967	0.5157	8700	1.8182
1.8118	0.5216	8800	1.8180
1.8206	0.5276	8900	1.8171
1.7382	0.5335	9000	1.8167
1.5683	0.5394	9100	1.8165
1.8624	0.5454	9200	1.8163
1.9627	0.5513	9300	1.8160
1.6127	0.5572	9400	1.8158
1.9056	0.5631	9500	1.8156
2.1929	0.5691	9600	1.8156
1.6813	0.5750	9700	1.8157
1.9613	0.5809	9800	1.8156
1.5377	0.5869	9900	1.8156

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for romainnn/5245836f-40d3-4e05-a231-8462c8c2baff

Base model

unsloth/Qwen2-0.5B

Adapter

(279)

this model