See axolotl config
axolotl version: 0.6.0
base_model: meta-llama/Meta-Llama-3.1-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: tatsu-lab/alpaca
type: alpaca
format: csv
prompt_template: '### Instruction: {instruction}
### Input: {input}
### Response: {output}'
dataset_prepared_path: null
val_set_size: 0.1
output_dir: /root/outputs/fine_tuned_model
adapter: qlora
lora_model_dir: null
sequence_len: 2048
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 8
lora_dropout: 0.05
lora_target_modules: null
lora_target_linear: true
lora_fan_in_fan_out: null
wandb_project: null
wandb_entity: null
wandb_watch: null
wandb_name: null
wandb_log_model: null
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 10
max_steps: 10000000
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16: null
tf32: false
gradient_checkpointing: true
early_stopping_patience: 3
save_strategy: steps
save_steps: 20
evaluation_strategy: steps
eval_steps: 20
load_best_model_at_end: true
save_total_limit: 3
metric_for_best_model: loss
greater_is_better: false
resume_from_checkpoint: null
local_rank: null
logging_steps: 1
xformers_attention: null
flash_attention: true
warmup_steps: 10
debug: null
deepspeed: null
weight_decay: 0.0
fsdp: null
fsdp_config: null
special_tokens:
pad_token: <|end_of_text|>
mlflow_tracking_uri: https://mlflow-dev.qpiai-pro.tech
mlflow_experiment_name: llama-8B-chemistry
hf_mlflow_log_artifacts: 'true'
local_files_only: true
root/outputs/fine_tuned_model
This model was trained from scratch on the tatsu-lab/alpaca dataset. It achieves the following results on the evaluation set:
- Loss: 1.9859
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- training_steps: 9890
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.2045 | 0.0010 | 1 | 2.3167 |
| 2.1303 | 0.0202 | 20 | 2.0805 |
| 1.9063 | 0.0404 | 40 | 2.0458 |
| 2.0275 | 0.0606 | 60 | 2.0337 |
| 2.1621 | 0.0807 | 80 | 2.0254 |
| 1.8073 | 0.1009 | 100 | 2.0203 |
| 2.1245 | 0.1211 | 120 | 2.0177 |
| 1.9644 | 0.1413 | 140 | 2.0137 |
| 1.9735 | 0.1615 | 160 | 2.0123 |
| 2.2691 | 0.1817 | 180 | 2.0095 |
| 1.9491 | 0.2019 | 200 | 2.0075 |
| 2.0258 | 0.2221 | 220 | 2.0057 |
| 1.7861 | 0.2422 | 240 | 2.0050 |
| 1.9007 | 0.2624 | 260 | 2.0006 |
| 1.9219 | 0.2826 | 280 | 2.0009 |
| 2.0698 | 0.3028 | 300 | 1.9978 |
| 1.6277 | 0.3230 | 320 | 1.9976 |
| 1.7718 | 0.3432 | 340 | 1.9964 |
| 1.8223 | 0.3634 | 360 | 1.9958 |
| 2.1197 | 0.3835 | 380 | 1.9953 |
| 2.1519 | 0.4037 | 400 | 1.9969 |
| 2.0659 | 0.4239 | 420 | 1.9952 |
| 1.7126 | 0.4441 | 440 | 1.9947 |
| 2.1095 | 0.4643 | 460 | 1.9924 |
| 1.6791 | 0.4845 | 480 | 1.9918 |
| 1.9868 | 0.5047 | 500 | 1.9908 |
| 1.9909 | 0.5249 | 520 | 1.9899 |
| 2.2069 | 0.5450 | 540 | 1.9917 |
| 2.0763 | 0.5652 | 560 | 1.9895 |
| 1.9251 | 0.5854 | 580 | 1.9891 |
| 1.982 | 0.6056 | 600 | 1.9879 |
| 2.054 | 0.6258 | 620 | 1.9875 |
| 1.7292 | 0.6460 | 640 | 1.9875 |
| 1.7901 | 0.6662 | 660 | 1.9891 |
| 1.9179 | 0.6863 | 680 | 1.9868 |
| 1.6178 | 0.7065 | 700 | 1.9874 |
| 1.7637 | 0.7267 | 720 | 1.9859 |
| 1.6946 | 0.7469 | 740 | 1.9868 |
| 1.8821 | 0.7671 | 760 | 1.9862 |
| 2.1346 | 0.7873 | 780 | 1.9859 |
Framework versions
- PEFT 0.14.0
- Transformers 4.47.0
- Pytorch 2.3.1+cu121
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for pavan01729/llama-8B-chemistry
Base model
meta-llama/Llama-3.1-8B