Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: echarlaix/tiny-random-mistral
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - f3d0d4415de730db_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/f3d0d4415de730db_train_data.json
  type:
    field_input: Moreinfo
    field_instruction: Position
    field_output: CV
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/942c4135-e225-4072-a929-7998548563ef
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 4679
micro_batch_size: 4
mlflow_experiment_name: /tmp/f3d0d4415de730db_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.02390628735357399
wandb_entity: null
wandb_mode: online
wandb_name: 213b2f40-7d78-40ca-8cc0-85380681cac5
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 213b2f40-7d78-40ca-8cc0-85380681cac5
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

942c4135-e225-4072-a929-7998548563ef

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 10.2514

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 4679

Training results

Training Loss Epoch Step Validation Loss
83.0184 0.0002 1 10.3771
82.4725 0.0157 100 10.3081
82.3512 0.0313 200 10.2932
82.3996 0.0470 300 10.2884
82.2037 0.0627 400 10.2842
82.2038 0.0784 500 10.2788
82.1844 0.0940 600 10.2746
82.1807 0.1097 700 10.2714
82.2035 0.1254 800 10.2683
82.1132 0.1411 900 10.2660
82.1501 0.1567 1000 10.2642
82.142 0.1724 1100 10.2630
82.1508 0.1881 1200 10.2616
82.1372 0.2038 1300 10.2606
82.1296 0.2194 1400 10.2596
82.1202 0.2351 1500 10.2590
82.0928 0.2508 1600 10.2583
82.0691 0.2665 1700 10.2577
82.1119 0.2821 1800 10.2569
82.0947 0.2978 1900 10.2563
82.1192 0.3135 2000 10.2560
82.0347 0.3292 2100 10.2555
82.0395 0.3448 2200 10.2551
82.0739 0.3605 2300 10.2547
82.0571 0.3762 2400 10.2543
82.021 0.3919 2500 10.2540
82.0816 0.4075 2600 10.2537
82.0561 0.4232 2700 10.2535
82.049 0.4389 2800 10.2531
82.0867 0.4546 2900 10.2529
82.0198 0.4702 3000 10.2526
82.1186 0.4859 3100 10.2525
82.0431 0.5016 3200 10.2523
82.0169 0.5173 3300 10.2522
82.0835 0.5329 3400 10.2520
82.0196 0.5486 3500 10.2519
82.1073 0.5643 3600 10.2518
82.0386 0.5800 3700 10.2517
82.0942 0.5956 3800 10.2516
82.025 0.6113 3900 10.2516
82.0014 0.6270 4000 10.2515
82.0336 0.6427 4100 10.2515
81.994 0.6583 4200 10.2515
82.1177 0.6740 4300 10.2515
82.0593 0.6897 4400 10.2514
82.0582 0.7054 4500 10.2514
82.0283 0.7210 4600 10.2514

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/942c4135-e225-4072-a929-7998548563ef

Adapter
(158)
this model