See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: echarlaix/tiny-random-mistral
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 4203a208385b2e15_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/4203a208385b2e15_train_data.json
  type:
    field_input: phonemes
    field_instruction: text
    field_output: text_description
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/ac150255-24dd-4163-bb80-563586a4ded3
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 5220
micro_batch_size: 4
mlflow_experiment_name: /tmp/4203a208385b2e15_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.03246352722716028
wandb_entity: null
wandb_mode: online
wandb_name: 08f50b0e-6976-4b33-92a6-13a122249e33
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 08f50b0e-6976-4b33-92a6-13a122249e33
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

ac150255-24dd-4163-bb80-563586a4ded3

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

Loss: 9.9806

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 5220

Training results

Training Loss	Epoch	Step	Validation Loss
82.9965	0.0002	1	10.3729
82.0649	0.0215	100	10.2506
80.9734	0.0429	200	10.1243
80.7036	0.0644	300	10.0859
80.5457	0.0859	400	10.0630
80.5119	0.1074	500	10.0450
80.3969	0.1288	600	10.0329
80.4267	0.1503	700	10.0241
80.3522	0.1718	800	10.0173
80.2041	0.1933	900	10.0127
80.1714	0.2147	1000	10.0090
79.9849	0.2362	1100	10.0059
80.0626	0.2577	1200	10.0033
79.9732	0.2792	1300	10.0013
79.9765	0.3006	1400	9.9994
80.2953	0.3221	1500	9.9978
80.1077	0.3436	1600	9.9964
80.1056	0.3651	1700	9.9952
80.1374	0.3865	1800	9.9940
80.1592	0.4080	1900	9.9929
79.9731	0.4295	2000	9.9919
80.0841	0.4509	2100	9.9907
80.0825	0.4724	2200	9.9892
80.0441	0.4939	2300	9.9880
80.0279	0.5154	2400	9.9872
79.8874	0.5368	2500	9.9864
79.8938	0.5583	2600	9.9858
80.2128	0.5798	2700	9.9853
79.9174	0.6013	2800	9.9847
80.1205	0.6227	2900	9.9842
79.8894	0.6442	3000	9.9838
79.9382	0.6657	3100	9.9833
79.9075	0.6872	3200	9.9829
79.9308	0.7086	3300	9.9826
80.0173	0.7301	3400	9.9823
79.9442	0.7516	3500	9.9821
80.2148	0.7731	3600	9.9818
79.9001	0.7945	3700	9.9817
79.9304	0.8160	3800	9.9814
80.1208	0.8375	3900	9.9813
79.9569	0.8589	4000	9.9811
80.0727	0.8804	4100	9.9810
79.9112	0.9019	4200	9.9809
79.8832	0.9234	4300	9.9808
79.9899	0.9448	4400	9.9808
79.8904	0.9663	4500	9.9807
79.9213	0.9878	4600	9.9807
79.7445	1.0094	4700	9.9807
80.103	1.0308	4800	9.9806
79.9172	1.0523	4900	9.9806
79.9391	1.0738	5000	9.9806
79.8985	1.0953	5100	9.9806
79.9923	1.1167	5200	9.9806

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/ac150255-24dd-4163-bb80-563586a4ded3

Base model

echarlaix/tiny-random-mistral

Adapter

(158)

this model