See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: echarlaix/tiny-random-mistral
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - f3d0d4415de730db_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/f3d0d4415de730db_train_data.json
  type:
    field_input: Moreinfo
    field_instruction: Position
    field_output: CV
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/942c4135-e225-4072-a929-7998548563ef
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- down_proj
- up_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 4679
micro_batch_size: 4
mlflow_experiment_name: /tmp/f3d0d4415de730db_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 2048
special_tokens:
  pad_token: </s>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.02390628735357399
wandb_entity: null
wandb_mode: online
wandb_name: 213b2f40-7d78-40ca-8cc0-85380681cac5
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 213b2f40-7d78-40ca-8cc0-85380681cac5
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

942c4135-e225-4072-a929-7998548563ef

This model is a fine-tuned version of echarlaix/tiny-random-mistral on the None dataset. It achieves the following results on the evaluation set:

Loss: 10.2514

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 4679

Training results

Training Loss	Epoch	Step	Validation Loss
83.0184	0.0002	1	10.3771
82.4725	0.0157	100	10.3081
82.3512	0.0313	200	10.2932
82.3996	0.0470	300	10.2884
82.2037	0.0627	400	10.2842
82.2038	0.0784	500	10.2788
82.1844	0.0940	600	10.2746
82.1807	0.1097	700	10.2714
82.2035	0.1254	800	10.2683
82.1132	0.1411	900	10.2660
82.1501	0.1567	1000	10.2642
82.142	0.1724	1100	10.2630
82.1508	0.1881	1200	10.2616
82.1372	0.2038	1300	10.2606
82.1296	0.2194	1400	10.2596
82.1202	0.2351	1500	10.2590
82.0928	0.2508	1600	10.2583
82.0691	0.2665	1700	10.2577
82.1119	0.2821	1800	10.2569
82.0947	0.2978	1900	10.2563
82.1192	0.3135	2000	10.2560
82.0347	0.3292	2100	10.2555
82.0395	0.3448	2200	10.2551
82.0739	0.3605	2300	10.2547
82.0571	0.3762	2400	10.2543
82.021	0.3919	2500	10.2540
82.0816	0.4075	2600	10.2537
82.0561	0.4232	2700	10.2535
82.049	0.4389	2800	10.2531
82.0867	0.4546	2900	10.2529
82.0198	0.4702	3000	10.2526
82.1186	0.4859	3100	10.2525
82.0431	0.5016	3200	10.2523
82.0169	0.5173	3300	10.2522
82.0835	0.5329	3400	10.2520
82.0196	0.5486	3500	10.2519
82.1073	0.5643	3600	10.2518
82.0386	0.5800	3700	10.2517
82.0942	0.5956	3800	10.2516
82.025	0.6113	3900	10.2516
82.0014	0.6270	4000	10.2515
82.0336	0.6427	4100	10.2515
81.994	0.6583	4200	10.2515
82.1177	0.6740	4300	10.2515
82.0593	0.6897	4400	10.2514
82.0582	0.7054	4500	10.2514
82.0283	0.7210	4600	10.2514

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/942c4135-e225-4072-a929-7998548563ef

Base model

echarlaix/tiny-random-mistral

Adapter

(158)

this model