Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: echarlaix/tiny-random-mistral
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 4203a208385b2e15_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/4203a208385b2e15_train_data.json
  type:
    field_input: phonemes
    field_instruction: text
    field_output: text_description
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/ac150255-24dd-4163-bb80-563586a4ded3
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 5220
micro_batch_size: 4
mlflow_experiment_name: /tmp/4203a208385b2e15_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.03246352722716028
wandb_entity: null
wandb_mode: online
wandb_name: 08f50b0e-6976-4b33-92a6-13a122249e33
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 08f50b0e-6976-4b33-92a6-13a122249e33
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

ac150255-24dd-4163-bb80-563586a4ded3

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 9.9806

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 5220

Training results

Training Loss Epoch Step Validation Loss
82.9965 0.0002 1 10.3729
82.0649 0.0215 100 10.2506
80.9734 0.0429 200 10.1243
80.7036 0.0644 300 10.0859
80.5457 0.0859 400 10.0630
80.5119 0.1074 500 10.0450
80.3969 0.1288 600 10.0329
80.4267 0.1503 700 10.0241
80.3522 0.1718 800 10.0173
80.2041 0.1933 900 10.0127
80.1714 0.2147 1000 10.0090
79.9849 0.2362 1100 10.0059
80.0626 0.2577 1200 10.0033
79.9732 0.2792 1300 10.0013
79.9765 0.3006 1400 9.9994
80.2953 0.3221 1500 9.9978
80.1077 0.3436 1600 9.9964
80.1056 0.3651 1700 9.9952
80.1374 0.3865 1800 9.9940
80.1592 0.4080 1900 9.9929
79.9731 0.4295 2000 9.9919
80.0841 0.4509 2100 9.9907
80.0825 0.4724 2200 9.9892
80.0441 0.4939 2300 9.9880
80.0279 0.5154 2400 9.9872
79.8874 0.5368 2500 9.9864
79.8938 0.5583 2600 9.9858
80.2128 0.5798 2700 9.9853
79.9174 0.6013 2800 9.9847
80.1205 0.6227 2900 9.9842
79.8894 0.6442 3000 9.9838
79.9382 0.6657 3100 9.9833
79.9075 0.6872 3200 9.9829
79.9308 0.7086 3300 9.9826
80.0173 0.7301 3400 9.9823
79.9442 0.7516 3500 9.9821
80.2148 0.7731 3600 9.9818
79.9001 0.7945 3700 9.9817
79.9304 0.8160 3800 9.9814
80.1208 0.8375 3900 9.9813
79.9569 0.8589 4000 9.9811
80.0727 0.8804 4100 9.9810
79.9112 0.9019 4200 9.9809
79.8832 0.9234 4300 9.9808
79.9899 0.9448 4400 9.9808
79.8904 0.9663 4500 9.9807
79.9213 0.9878 4600 9.9807
79.7445 1.0094 4700 9.9807
80.103 1.0308 4800 9.9806
79.9172 1.0523 4900 9.9806
79.9391 1.0738 5000 9.9806
79.8985 1.0953 5100 9.9806
79.9923 1.1167 5200 9.9806

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/ac150255-24dd-4163-bb80-563586a4ded3

Adapter
(158)
this model