See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: Maykeye/TinyLLama-v0
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 40f84f4610a855b2_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/40f84f4610a855b2_train_data.json
  type:
    field_instruction: question
    field_output: answer
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 400
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/bfd2d379-a1a8-4f73-a0b4-63117bb0c96f
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 121364
micro_batch_size: 2
mlflow_experiment_name: /tmp/40f84f4610a855b2_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 400
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.026089497411921857
wandb_entity: null
wandb_mode: online
wandb_name: 56a29ccc-0b06-416d-a18e-0e2c0cd86f0e
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 56a29ccc-0b06-416d-a18e-0e2c0cd86f0e
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

bfd2d379-a1a8-4f73-a0b4-63117bb0c96f

This model is a fine-tuned version of Maykeye/TinyLLama-v0 on the None dataset. It achieves the following results on the evaluation set:

Loss: 6.9194

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 121364

Training results

Training Loss	Epoch	Step	Validation Loss
12.0619	0.0000	1	11.9615
7.9221	0.0171	400	7.8516
7.5425	0.0343	800	7.6125
7.5271	0.0514	1200	7.5215
7.6511	0.0686	1600	7.4503
7.4575	0.0857	2000	7.4023
7.4126	0.1029	2400	7.3567
7.2177	0.1200	2800	7.3185
7.3211	0.1372	3200	7.2852
7.2918	0.1543	3600	7.2609
7.2682	0.1714	4000	7.2424
6.9925	0.1886	4400	7.2183
7.1942	0.2057	4800	7.2031
7.1606	0.2229	5200	7.1858
7.3641	0.2400	5600	7.1721
7.4306	0.2572	6000	7.1570
7.1406	0.2743	6400	7.1427
7.3856	0.2915	6800	7.1338
7.3319	0.3086	7200	7.1230
7.2052	0.3257	7600	7.1124
7.1784	0.3429	8000	7.1049
6.9061	0.3600	8400	7.0983
6.8446	0.3772	8800	7.0885
7.2836	0.3943	9200	7.0825
7.2025	0.4115	9600	7.0736
6.9665	0.4286	10000	7.0693
7.0319	0.4458	10400	7.0638
7.1117	0.4629	10800	7.0562
7.1637	0.4800	11200	7.0510
7.0831	0.4972	11600	7.0450
7.1105	0.5143	12000	7.0402
7.0615	0.5315	12400	7.0354
6.9541	0.5486	12800	7.0310
6.9338	0.5658	13200	7.0293
6.8745	0.5829	13600	7.0230
6.9395	0.6001	14000	7.0177
6.991	0.6172	14400	7.0163
6.0832	0.6343	14800	7.0112
7.0355	0.6515	15200	7.0085
7.0765	0.6686	15600	7.0036
7.0429	0.6858	16000	7.0015
7.0843	0.7029	16400	6.9986
7.0766	0.7201	16800	6.9955
7.1227	0.7372	17200	6.9926
6.8547	0.7544	17600	6.9899
6.7269	0.7715	18000	6.9884
7.0857	0.7887	18400	6.9865
6.9734	0.8058	18800	6.9847
7.0499	0.8229	19200	6.9813
7.1258	0.8401	19600	6.9774
7.1636	0.8572	20000	6.9753
7.2572	0.8744	20400	6.9742
6.9789	0.8915	20800	6.9716
7.2132	0.9087	21200	6.9704
7.1362	0.9258	21600	6.9687
6.9767	0.9430	22000	6.9657
6.9853	0.9601	22400	6.9644
7.0186	0.9772	22800	6.9627
7.0654	0.9944	23200	6.9604
7.1397	1.0115	23600	6.9602
6.995	1.0287	24000	6.9587
7.2728	1.0458	24400	6.9546
6.915	1.0630	24800	6.9572
6.9481	1.0801	25200	6.9535
6.9489	1.0973	25600	6.9499
7.0888	1.1144	26000	6.9492
7.1006	1.1315	26400	6.9482
7.0525	1.1487	26800	6.9465
7.0576	1.1658	27200	6.9432
6.9836	1.1830	27600	6.9440
6.9761	1.2001	28000	6.9411
6.8321	1.2173	28400	6.9403
6.9887	1.2344	28800	6.9388
6.9359	1.2516	29200	6.9389
7.0867	1.2687	29600	6.9355
7.0808	1.2858	30000	6.9345
7.1399	1.3030	30400	6.9346
6.9547	1.3201	30800	6.9321
6.911	1.3373	31200	6.9298
7.1562	1.3544	31600	6.9300
7.0566	1.3716	32000	6.9283
7.1362	1.3887	32400	6.9291
7.026	1.4059	32800	6.9247
7.0724	1.4230	33200	6.9254
6.5286	1.4401	33600	6.9241
6.7828	1.4573	34000	6.9224
6.9472	1.4744	34400	6.9230
5.4581	1.4916	34800	6.9196
7.0774	1.5087	35200	6.9188
6.8963	1.5259	35600	6.9183
6.9506	1.5430	36000	6.9198
6.8666	1.5602	36400	6.9194

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/bfd2d379-a1a8-4f73-a0b4-63117bb0c96f

Base model

Maykeye/TinyLLama-v0

Adapter

(369)

this model