See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: true
base_model: fxmarty/really-tiny-falcon-testing
bf16: auto
chat_template: llama3
dataset_prepared_path: null
dataset_processes: 12
datasets:
- data_files:
  - 415dedee96180e11_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/415dedee96180e11_train_data.json
  type:
    field_instruction: instruction
    field_output: response
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
eval_max_new_tokens: 128
eval_steps: 2000
eval_table_size: null
evals_per_epoch: null
flash_attention: false
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 1
gradient_checkpointing: false
group_by_length: false
hub_model_id: error577/4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.0
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 8
mlflow_experiment_name: /tmp/415dedee96180e11_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 2000
sequence_len: 512
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.01
wandb_entity: null
wandb_mode: online
wandb_name: 5941f67d-1b56-4ae0-b76d-52a8681c66f9
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 5941f67d-1b56-4ae0-b76d-52a8681c66f9
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f

This model is a fine-tuned version of fxmarty/really-tiny-falcon-testing on the None dataset. It achieves the following results on the evaluation set:

Loss: 10.9373

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 30
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
11.0892	0.0001	1	11.0858
10.9584	0.1083	2000	10.9866
10.9595	0.2166	4000	10.9773
10.9648	0.3250	6000	10.9714
10.9636	0.4333	8000	10.9668
10.9655	0.5416	10000	10.9627
10.9675	0.6499	12000	10.9596
10.9637	0.7582	14000	10.9575
10.954	0.8666	16000	10.9556
10.9561	0.9749	18000	10.9539
10.9531	1.0832	20000	10.9530
10.949	1.1915	22000	10.9520
10.9415	1.2998	24000	10.9507
10.9581	1.4081	26000	10.9497
10.9587	1.5165	28000	10.9491
10.9509	1.6248	30000	10.9481
10.9518	1.7331	32000	10.9474
10.9661	1.8414	34000	10.9475
10.935	1.9497	36000	10.9469
10.9494	2.0581	38000	10.9462
10.9367	2.1664	40000	10.9458
10.9474	2.2747	42000	10.9457
10.9661	2.3830	44000	10.9456
10.9579	2.4913	46000	10.9449
10.9417	2.5997	48000	10.9443
10.9537	2.7080	50000	10.9440
10.9467	2.8163	52000	10.9440
10.9549	2.9246	54000	10.9432
10.953	3.0329	56000	10.9430
10.9254	3.1412	58000	10.9429
10.9436	3.2496	60000	10.9424
10.9463	3.3579	62000	10.9423
10.9296	3.4662	64000	10.9420
10.9296	3.5745	66000	10.9416
10.934	3.6828	68000	10.9417
10.9502	3.7912	70000	10.9417
10.9546	3.8995	72000	10.9412
10.924	4.0078	74000	10.9407
10.9337	4.1161	76000	10.9408
10.9296	4.2244	78000	10.9407
10.9361	4.3328	80000	10.9404
10.9444	4.4411	82000	10.9404
10.9434	4.5494	84000	10.9399
10.9261	4.6577	86000	10.9398
10.9328	4.7660	88000	10.9399
10.9527	4.8744	90000	10.9397
10.931	4.9827	92000	10.9394
10.9588	5.0910	94000	10.9393
10.9565	5.1993	96000	10.9393
10.9629	5.3076	98000	10.9391
10.9279	5.4159	100000	10.9391
10.9422	5.5243	102000	10.9391
10.9475	5.6326	104000	10.9387
10.9235	5.7409	106000	10.9387
10.946	5.8492	108000	10.9385
10.9304	5.9575	110000	10.9385
10.9217	6.0659	112000	10.9384
10.9307	6.1742	114000	10.9383
10.95	6.2825	116000	10.9384
10.9492	6.3908	118000	10.9381
10.9282	6.4991	120000	10.9381
10.946	6.6075	122000	10.9381
10.9419	6.7158	124000	10.9379
10.9331	6.8241	126000	10.9379
10.9557	6.9324	128000	10.9379
10.9447	7.0407	130000	10.9379
10.9327	7.1490	132000	10.9377
10.9411	7.2574	134000	10.9377
10.9434	7.3657	136000	10.9377
10.9354	7.4740	138000	10.9375
10.9305	7.5823	140000	10.9376
10.9364	7.6906	142000	10.9374
10.9203	7.7990	144000	10.9375
10.9314	7.9073	146000	10.9375
10.9374	8.0156	148000	10.9373
10.9251	8.1239	150000	10.9374
10.933	8.2322	152000	10.9374
10.935	8.3406	154000	10.9373

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for error577/4c67a4f2-ac21-4b53-8a01-59d3c0c31f4f

Base model

fxmarty/really-tiny-falcon-testing

Adapter

(169)

this model