ADPrLlama / running_log.txt

Initial commit of the LoRA/adapter model

7ba0b05 about 1 year ago

38.3 kB

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/tokenizer.model

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file tokenizer.json from cache at None

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file added_tokens.json from cache at None

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/special_tokens_map.json

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/tokenizer_config.json

	[INFO\|2025-04-06 00:01:34] tokenization_utils_base.py:2060 >> loading file chat_template.jinja from cache at None

	[INFO\|2025-04-06 00:01:36] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:01:36] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/tokenizer.model

	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file tokenizer.json from cache at None

	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file added_tokens.json from cache at None

	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/special_tokens_map.json

	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/tokenizer_config.json

	[INFO\|2025-04-06 00:01:36] tokenization_utils_base.py:2060 >> loading file chat_template.jinja from cache at None

	[INFO\|2025-04-06 00:01:37] logging.py:143 >> Loading dataset ADPr/train.json...

	[INFO\|2025-04-06 00:01:41] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:01:41] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:01:41] logging.py:143 >> KV cache is disabled during training.

	[INFO\|2025-04-06 00:01:43] modeling_utils.py:1154 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/model.safetensors.index.json

	[INFO\|2025-04-06 00:09:05] modeling_utils.py:2170 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.

	[INFO\|2025-04-06 00:09:05] configuration_utils.py:1139 >> Generate config GenerationConfig {
	"bos_token_id": 1,
	"eos_token_id": 2,
	"pad_token_id": 0,
	"use_cache": false
	}


	[INFO\|2025-04-06 00:09:09] modeling_utils.py:4987 >> All model checkpoint weights were used when initializing LlamaForCausalLM.


	[INFO\|2025-04-06 00:09:09] modeling_utils.py:4995 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at GreatCaptainNemo/ProLLaMA.
	If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.

	[INFO\|2025-04-06 00:09:10] configuration_utils.py:1094 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/generation_config.json

	[INFO\|2025-04-06 00:09:10] configuration_utils.py:1139 >> Generate config GenerationConfig {
	"bos_token_id": 1,
	"eos_token_id": 2,
	"pad_token_id": 0
	}


	[INFO\|2025-04-06 00:09:10] logging.py:143 >> Gradient checkpointing enabled.

	[INFO\|2025-04-06 00:09:10] logging.py:143 >> Using torch SDPA for faster training and inference.

	[INFO\|2025-04-06 00:09:10] logging.py:143 >> Upcasting trainable params to float32.

	[INFO\|2025-04-06 00:09:10] logging.py:143 >> Fine-tuning method: LoRA

	[WARNING\|2025-04-06 00:09:10] logging.py:148 >> Vocab has been resized, add lm_head,embed_tokens to trainable params.

	[INFO\|2025-04-06 00:09:12] logging.py:143 >> trainable params: 422,051,840 \|\| all params: 7,160,467,456 \|\| trainable%: 5.8942

	[INFO\|2025-04-06 00:09:12] trainer.py:748 >> Using auto half precision backend

	[WARNING\|2025-04-06 00:09:12] trainer.py:783 >> No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.

	[INFO\|2025-04-06 00:09:12] trainer.py:2409 >> *** Running training ***

	[INFO\|2025-04-06 00:09:12] trainer.py:2410 >> Num examples = 28,058

	[INFO\|2025-04-06 00:09:12] trainer.py:2411 >> Num Epochs = 3

	[INFO\|2025-04-06 00:09:12] trainer.py:2412 >> Instantaneous batch size per device = 16

	[INFO\|2025-04-06 00:09:12] trainer.py:2415 >> Total train batch size (w. parallel, distributed & accumulation) = 128

	[INFO\|2025-04-06 00:09:12] trainer.py:2416 >> Gradient Accumulation steps = 8

	[INFO\|2025-04-06 00:09:12] trainer.py:2417 >> Total optimization steps = 657

	[INFO\|2025-04-06 00:09:12] trainer.py:2418 >> Number of trainable parameters = 422,051,840

	[INFO\|2025-04-06 00:09:38] logging.py:143 >> {'loss': 8.7284, 'learning_rate': 1.2500e-05, 'epoch': 0.02, 'throughput': 2445.23}

	[INFO\|2025-04-06 00:10:03] logging.py:143 >> {'loss': 4.9749, 'learning_rate': 2.5000e-05, 'epoch': 0.05, 'throughput': 2468.54}

	[INFO\|2025-04-06 00:10:27] logging.py:143 >> {'loss': 1.5517, 'learning_rate': 3.7500e-05, 'epoch': 0.07, 'throughput': 2469.07}

	[INFO\|2025-04-06 00:10:52] logging.py:143 >> {'loss': 0.6744, 'learning_rate': 5.0000e-05, 'epoch': 0.09, 'throughput': 2472.96}

	[INFO\|2025-04-06 00:11:17] logging.py:143 >> {'loss': 0.5648, 'learning_rate': 4.9992e-05, 'epoch': 0.11, 'throughput': 2474.77}

	[INFO\|2025-04-06 00:11:42] logging.py:143 >> {'loss': 0.5605, 'learning_rate': 4.9970e-05, 'epoch': 0.14, 'throughput': 2475.65}

	[INFO\|2025-04-06 00:12:06] logging.py:143 >> {'loss': 0.5297, 'learning_rate': 4.9932e-05, 'epoch': 0.16, 'throughput': 2476.78}

	[INFO\|2025-04-06 00:12:31] logging.py:143 >> {'loss': 0.5314, 'learning_rate': 4.9878e-05, 'epoch': 0.18, 'throughput': 2478.52}

	[INFO\|2025-04-06 00:12:56] logging.py:143 >> {'loss': 0.5013, 'learning_rate': 4.9810e-05, 'epoch': 0.21, 'throughput': 2480.10}

	[INFO\|2025-04-06 00:13:21] logging.py:143 >> {'loss': 0.4944, 'learning_rate': 4.9727e-05, 'epoch': 0.23, 'throughput': 2479.51}

	[INFO\|2025-04-06 00:13:46] logging.py:143 >> {'loss': 0.5071, 'learning_rate': 4.9628e-05, 'epoch': 0.25, 'throughput': 2479.71}

	[INFO\|2025-04-06 00:14:11] logging.py:143 >> {'loss': 0.5025, 'learning_rate': 4.9515e-05, 'epoch': 0.27, 'throughput': 2479.25}

	[INFO\|2025-04-06 00:14:36] logging.py:143 >> {'loss': 0.5038, 'learning_rate': 4.9387e-05, 'epoch': 0.30, 'throughput': 2479.76}

	[INFO\|2025-04-06 00:15:01] logging.py:143 >> {'loss': 0.4907, 'learning_rate': 4.9244e-05, 'epoch': 0.32, 'throughput': 2479.97}

	[INFO\|2025-04-06 00:15:25] logging.py:143 >> {'loss': 0.4774, 'learning_rate': 4.9086e-05, 'epoch': 0.34, 'throughput': 2479.35}

	[INFO\|2025-04-06 00:15:50] logging.py:143 >> {'loss': 0.4709, 'learning_rate': 4.8913e-05, 'epoch': 0.36, 'throughput': 2479.85}

	[INFO\|2025-04-06 00:16:15] logging.py:143 >> {'loss': 0.4793, 'learning_rate': 4.8726e-05, 'epoch': 0.39, 'throughput': 2480.04}

	[INFO\|2025-04-06 00:16:39] logging.py:143 >> {'loss': 0.4835, 'learning_rate': 4.8525e-05, 'epoch': 0.41, 'throughput': 2479.96}

	[INFO\|2025-04-06 00:17:04] logging.py:143 >> {'loss': 0.4668, 'learning_rate': 4.8309e-05, 'epoch': 0.43, 'throughput': 2480.30}

	[INFO\|2025-04-06 00:17:28] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 4.8079e-05, 'epoch': 0.46, 'throughput': 2479.57}

	[INFO\|2025-04-06 00:17:28] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 00:17:28] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 00:17:28] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 00:18:03] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-100

	[INFO\|2025-04-06 00:18:03] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:18:03] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:18:07] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-100/tokenizer_config.json

	[INFO\|2025-04-06 00:18:07] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-100/special_tokens_map.json

	[INFO\|2025-04-06 00:18:42] logging.py:143 >> {'loss': 0.4823, 'learning_rate': 4.7835e-05, 'epoch': 0.48, 'throughput': 2266.26}

	[INFO\|2025-04-06 00:19:07] logging.py:143 >> {'loss': 0.4905, 'learning_rate': 4.7577e-05, 'epoch': 0.50, 'throughput': 2275.37}

	[INFO\|2025-04-06 00:19:32] logging.py:143 >> {'loss': 0.4759, 'learning_rate': 4.7306e-05, 'epoch': 0.52, 'throughput': 2283.37}

	[INFO\|2025-04-06 00:19:57] logging.py:143 >> {'loss': 0.4735, 'learning_rate': 4.7021e-05, 'epoch': 0.55, 'throughput': 2290.88}

	[INFO\|2025-04-06 00:20:21] logging.py:143 >> {'loss': 0.4662, 'learning_rate': 4.6722e-05, 'epoch': 0.57, 'throughput': 2297.32}

	[INFO\|2025-04-06 00:20:46] logging.py:143 >> {'loss': 0.4757, 'learning_rate': 4.6410e-05, 'epoch': 0.59, 'throughput': 2303.86}

	[INFO\|2025-04-06 00:21:10] logging.py:143 >> {'loss': 0.4614, 'learning_rate': 4.6086e-05, 'epoch': 0.62, 'throughput': 2310.19}

	[INFO\|2025-04-06 00:21:35] logging.py:143 >> {'loss': 0.4553, 'learning_rate': 4.5748e-05, 'epoch': 0.64, 'throughput': 2315.47}

	[INFO\|2025-04-06 00:22:00] logging.py:143 >> {'loss': 0.4540, 'learning_rate': 4.5398e-05, 'epoch': 0.66, 'throughput': 2321.33}

	[INFO\|2025-04-06 00:22:25] logging.py:143 >> {'loss': 0.4740, 'learning_rate': 4.5035e-05, 'epoch': 0.68, 'throughput': 2326.35}

	[INFO\|2025-04-06 00:22:49] logging.py:143 >> {'loss': 0.4635, 'learning_rate': 4.4661e-05, 'epoch': 0.71, 'throughput': 2331.01}

	[INFO\|2025-04-06 00:23:14] logging.py:143 >> {'loss': 0.4529, 'learning_rate': 4.4274e-05, 'epoch': 0.73, 'throughput': 2335.07}

	[INFO\|2025-04-06 00:23:39] logging.py:143 >> {'loss': 0.4581, 'learning_rate': 4.3875e-05, 'epoch': 0.75, 'throughput': 2339.33}

	[INFO\|2025-04-06 00:24:03] logging.py:143 >> {'loss': 0.4422, 'learning_rate': 4.3465e-05, 'epoch': 0.78, 'throughput': 2343.15}

	[INFO\|2025-04-06 00:24:28] logging.py:143 >> {'loss': 0.4522, 'learning_rate': 4.3044e-05, 'epoch': 0.80, 'throughput': 2346.80}

	[INFO\|2025-04-06 00:24:53] logging.py:143 >> {'loss': 0.4457, 'learning_rate': 4.2612e-05, 'epoch': 0.82, 'throughput': 2350.45}

	[INFO\|2025-04-06 00:25:18] logging.py:143 >> {'loss': 0.4448, 'learning_rate': 4.2169e-05, 'epoch': 0.84, 'throughput': 2353.93}

	[INFO\|2025-04-06 00:25:42] logging.py:143 >> {'loss': 0.4442, 'learning_rate': 4.1716e-05, 'epoch': 0.87, 'throughput': 2357.13}

	[INFO\|2025-04-06 00:26:07] logging.py:143 >> {'loss': 0.4327, 'learning_rate': 4.1253e-05, 'epoch': 0.89, 'throughput': 2360.39}

	[INFO\|2025-04-06 00:26:32] logging.py:143 >> {'loss': 0.4222, 'learning_rate': 4.0779e-05, 'epoch': 0.91, 'throughput': 2363.27}

	[INFO\|2025-04-06 00:26:32] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 00:26:32] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 00:26:32] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 00:27:06] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-200

	[INFO\|2025-04-06 00:27:07] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:27:07] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:27:12] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-200/tokenizer_config.json

	[INFO\|2025-04-06 00:27:12] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-200/special_tokens_map.json

	[INFO\|2025-04-06 00:27:45] logging.py:143 >> {'loss': 0.4190, 'learning_rate': 4.0296e-05, 'epoch': 0.94, 'throughput': 2263.02}

	[INFO\|2025-04-06 00:28:10] logging.py:143 >> {'loss': 0.4266, 'learning_rate': 3.9804e-05, 'epoch': 0.96, 'throughput': 2267.79}

	[INFO\|2025-04-06 00:28:35] logging.py:143 >> {'loss': 0.4234, 'learning_rate': 3.9303e-05, 'epoch': 0.98, 'throughput': 2272.62}

	[INFO\|2025-04-06 00:28:56] logging.py:143 >> {'loss': 0.4299, 'learning_rate': 3.8793e-05, 'epoch': 1.00, 'throughput': 2276.33}

	[INFO\|2025-04-06 00:29:21] logging.py:143 >> {'loss': 0.4057, 'learning_rate': 3.8275e-05, 'epoch': 1.02, 'throughput': 2280.53}

	[INFO\|2025-04-06 00:29:46] logging.py:143 >> {'loss': 0.4060, 'learning_rate': 3.7748e-05, 'epoch': 1.05, 'throughput': 2284.56}

	[INFO\|2025-04-06 00:30:11] logging.py:143 >> {'loss': 0.4095, 'learning_rate': 3.7214e-05, 'epoch': 1.07, 'throughput': 2288.40}

	[INFO\|2025-04-06 00:30:35] logging.py:143 >> {'loss': 0.3905, 'learning_rate': 3.6673e-05, 'epoch': 1.09, 'throughput': 2292.14}

	[INFO\|2025-04-06 00:31:00] logging.py:143 >> {'loss': 0.3970, 'learning_rate': 3.6124e-05, 'epoch': 1.11, 'throughput': 2295.76}

	[INFO\|2025-04-06 00:31:25] logging.py:143 >> {'loss': 0.3880, 'learning_rate': 3.5569e-05, 'epoch': 1.14, 'throughput': 2299.20}

	[INFO\|2025-04-06 00:31:49] logging.py:143 >> {'loss': 0.3891, 'learning_rate': 3.5007e-05, 'epoch': 1.16, 'throughput': 2302.55}

	[INFO\|2025-04-06 00:32:14] logging.py:143 >> {'loss': 0.3850, 'learning_rate': 3.4439e-05, 'epoch': 1.18, 'throughput': 2305.91}

	[INFO\|2025-04-06 00:32:39] logging.py:143 >> {'loss': 0.3857, 'learning_rate': 3.3865e-05, 'epoch': 1.21, 'throughput': 2309.02}

	[INFO\|2025-04-06 00:33:04] logging.py:143 >> {'loss': 0.3923, 'learning_rate': 3.3286e-05, 'epoch': 1.23, 'throughput': 2312.21}

	[INFO\|2025-04-06 00:33:29] logging.py:143 >> {'loss': 0.3751, 'learning_rate': 3.2702e-05, 'epoch': 1.25, 'throughput': 2314.98}

	[INFO\|2025-04-06 00:33:54] logging.py:143 >> {'loss': 0.3860, 'learning_rate': 3.2113e-05, 'epoch': 1.27, 'throughput': 2318.02}

	[INFO\|2025-04-06 00:34:19] logging.py:143 >> {'loss': 0.3757, 'learning_rate': 3.1520e-05, 'epoch': 1.30, 'throughput': 2320.61}

	[INFO\|2025-04-06 00:34:44] logging.py:143 >> {'loss': 0.3831, 'learning_rate': 3.0923e-05, 'epoch': 1.32, 'throughput': 2323.45}

	[INFO\|2025-04-06 00:35:08] logging.py:143 >> {'loss': 0.3702, 'learning_rate': 3.0322e-05, 'epoch': 1.34, 'throughput': 2325.63}

	[INFO\|2025-04-06 00:35:33] logging.py:143 >> {'loss': 0.3820, 'learning_rate': 2.9718e-05, 'epoch': 1.36, 'throughput': 2328.17}

	[INFO\|2025-04-06 00:35:33] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 00:35:33] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 00:35:33] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 00:36:07] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-300

	[INFO\|2025-04-06 00:36:08] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:36:08] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:36:11] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-300/tokenizer_config.json

	[INFO\|2025-04-06 00:36:11] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-300/special_tokens_map.json

	[INFO\|2025-04-06 00:36:45] logging.py:143 >> {'loss': 0.3920, 'learning_rate': 2.9112e-05, 'epoch': 1.39, 'throughput': 2263.76}

	[INFO\|2025-04-06 00:37:10] logging.py:143 >> {'loss': 0.3707, 'learning_rate': 2.8502e-05, 'epoch': 1.41, 'throughput': 2266.96}

	[INFO\|2025-04-06 00:37:34] logging.py:143 >> {'loss': 0.3645, 'learning_rate': 2.7891e-05, 'epoch': 1.43, 'throughput': 2270.01}

	[INFO\|2025-04-06 00:38:00] logging.py:143 >> {'loss': 0.3599, 'learning_rate': 2.7278e-05, 'epoch': 1.46, 'throughput': 2272.87}

	[INFO\|2025-04-06 00:38:24] logging.py:143 >> {'loss': 0.3630, 'learning_rate': 2.6663e-05, 'epoch': 1.48, 'throughput': 2275.73}

	[INFO\|2025-04-06 00:38:49] logging.py:143 >> {'loss': 0.3542, 'learning_rate': 2.6048e-05, 'epoch': 1.50, 'throughput': 2278.35}

	[INFO\|2025-04-06 00:39:13] logging.py:143 >> {'loss': 0.3479, 'learning_rate': 2.5432e-05, 'epoch': 1.52, 'throughput': 2281.03}

	[INFO\|2025-04-06 00:39:38] logging.py:143 >> {'loss': 0.3532, 'learning_rate': 2.4815e-05, 'epoch': 1.55, 'throughput': 2283.71}

	[INFO\|2025-04-06 00:40:03] logging.py:143 >> {'loss': 0.3564, 'learning_rate': 2.4199e-05, 'epoch': 1.57, 'throughput': 2286.49}

	[INFO\|2025-04-06 00:40:27] logging.py:143 >> {'loss': 0.3424, 'learning_rate': 2.3583e-05, 'epoch': 1.59, 'throughput': 2289.17}

	[INFO\|2025-04-06 00:40:52] logging.py:143 >> {'loss': 0.3559, 'learning_rate': 2.2968e-05, 'epoch': 1.62, 'throughput': 2291.71}

	[INFO\|2025-04-06 00:41:17] logging.py:143 >> {'loss': 0.3548, 'learning_rate': 2.2354e-05, 'epoch': 1.64, 'throughput': 2294.11}

	[INFO\|2025-04-06 00:41:42] logging.py:143 >> {'loss': 0.3467, 'learning_rate': 2.1742e-05, 'epoch': 1.66, 'throughput': 2296.45}

	[INFO\|2025-04-06 00:42:07] logging.py:143 >> {'loss': 0.3515, 'learning_rate': 2.1132e-05, 'epoch': 1.68, 'throughput': 2298.95}

	[INFO\|2025-04-06 00:42:31] logging.py:143 >> {'loss': 0.3308, 'learning_rate': 2.0524e-05, 'epoch': 1.71, 'throughput': 2301.22}

	[INFO\|2025-04-06 00:42:56] logging.py:143 >> {'loss': 0.3354, 'learning_rate': 1.9919e-05, 'epoch': 1.73, 'throughput': 2303.46}

	[INFO\|2025-04-06 00:43:21] logging.py:143 >> {'loss': 0.3430, 'learning_rate': 1.9317e-05, 'epoch': 1.75, 'throughput': 2305.69}

	[INFO\|2025-04-06 00:43:46] logging.py:143 >> {'loss': 0.3456, 'learning_rate': 1.8718e-05, 'epoch': 1.78, 'throughput': 2307.85}

	[INFO\|2025-04-06 00:44:10] logging.py:143 >> {'loss': 0.3286, 'learning_rate': 1.8124e-05, 'epoch': 1.80, 'throughput': 2309.86}

	[INFO\|2025-04-06 00:44:35] logging.py:143 >> {'loss': 0.3574, 'learning_rate': 1.7533e-05, 'epoch': 1.82, 'throughput': 2311.96}

	[INFO\|2025-04-06 00:44:35] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 00:44:35] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 00:44:35] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 00:45:10] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-400

	[INFO\|2025-04-06 00:45:10] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:45:10] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:45:15] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-400/tokenizer_config.json

	[INFO\|2025-04-06 00:45:15] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-400/special_tokens_map.json

	[INFO\|2025-04-06 00:45:47] logging.py:143 >> {'loss': 0.3280, 'learning_rate': 1.6947e-05, 'epoch': 1.84, 'throughput': 2263.76}

	[INFO\|2025-04-06 00:46:12] logging.py:143 >> {'loss': 0.3358, 'learning_rate': 1.6366e-05, 'epoch': 1.87, 'throughput': 2266.24}

	[INFO\|2025-04-06 00:46:37] logging.py:143 >> {'loss': 0.3365, 'learning_rate': 1.5790e-05, 'epoch': 1.89, 'throughput': 2268.54}

	[INFO\|2025-04-06 00:47:02] logging.py:143 >> {'loss': 0.3257, 'learning_rate': 1.5220e-05, 'epoch': 1.91, 'throughput': 2270.84}

	[INFO\|2025-04-06 00:47:27] logging.py:143 >> {'loss': 0.3369, 'learning_rate': 1.4655e-05, 'epoch': 1.94, 'throughput': 2273.16}

	[INFO\|2025-04-06 00:47:52] logging.py:143 >> {'loss': 0.3410, 'learning_rate': 1.4097e-05, 'epoch': 1.96, 'throughput': 2275.49}

	[INFO\|2025-04-06 00:48:17] logging.py:143 >> {'loss': 0.3240, 'learning_rate': 1.3546e-05, 'epoch': 1.98, 'throughput': 2277.63}

	[INFO\|2025-04-06 00:48:38] logging.py:143 >> {'loss': 0.3200, 'learning_rate': 1.3002e-05, 'epoch': 2.00, 'throughput': 2279.38}

	[INFO\|2025-04-06 00:49:03] logging.py:143 >> {'loss': 0.3055, 'learning_rate': 1.2464e-05, 'epoch': 2.02, 'throughput': 2281.53}

	[INFO\|2025-04-06 00:49:27] logging.py:143 >> {'loss': 0.3040, 'learning_rate': 1.1935e-05, 'epoch': 2.05, 'throughput': 2283.42}

	[INFO\|2025-04-06 00:49:52] logging.py:143 >> {'loss': 0.3105, 'learning_rate': 1.1413e-05, 'epoch': 2.07, 'throughput': 2285.51}

	[INFO\|2025-04-06 00:50:17] logging.py:143 >> {'loss': 0.2971, 'learning_rate': 1.0900e-05, 'epoch': 2.09, 'throughput': 2287.38}

	[INFO\|2025-04-06 00:50:42] logging.py:143 >> {'loss': 0.2986, 'learning_rate': 1.0395e-05, 'epoch': 2.11, 'throughput': 2289.16}

	[INFO\|2025-04-06 00:51:07] logging.py:143 >> {'loss': 0.3109, 'learning_rate': 9.8994e-06, 'epoch': 2.14, 'throughput': 2291.02}

	[INFO\|2025-04-06 00:51:32] logging.py:143 >> {'loss': 0.3053, 'learning_rate': 9.4128e-06, 'epoch': 2.16, 'throughput': 2292.87}

	[INFO\|2025-04-06 00:51:57] logging.py:143 >> {'loss': 0.3116, 'learning_rate': 8.9356e-06, 'epoch': 2.18, 'throughput': 2294.73}

	[INFO\|2025-04-06 00:52:22] logging.py:143 >> {'loss': 0.2883, 'learning_rate': 8.4681e-06, 'epoch': 2.21, 'throughput': 2296.38}

	[INFO\|2025-04-06 00:52:47] logging.py:143 >> {'loss': 0.2859, 'learning_rate': 8.0108e-06, 'epoch': 2.23, 'throughput': 2298.08}

	[INFO\|2025-04-06 00:53:11] logging.py:143 >> {'loss': 0.3055, 'learning_rate': 7.5637e-06, 'epoch': 2.25, 'throughput': 2299.77}

	[INFO\|2025-04-06 00:53:36] logging.py:143 >> {'loss': 0.3110, 'learning_rate': 7.1273e-06, 'epoch': 2.27, 'throughput': 2301.47}

	[INFO\|2025-04-06 00:53:36] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 00:53:36] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 00:53:36] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 00:54:11] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-500

	[INFO\|2025-04-06 00:54:11] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 00:54:11] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 00:54:16] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-500/tokenizer_config.json

	[INFO\|2025-04-06 00:54:16] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-500/special_tokens_map.json

	[INFO\|2025-04-06 00:54:50] logging.py:143 >> {'loss': 0.2869, 'learning_rate': 6.7017e-06, 'epoch': 2.30, 'throughput': 2262.25}

	[INFO\|2025-04-06 00:55:15] logging.py:143 >> {'loss': 0.2944, 'learning_rate': 6.2872e-06, 'epoch': 2.32, 'throughput': 2264.23}

	[INFO\|2025-04-06 00:55:40] logging.py:143 >> {'loss': 0.2967, 'learning_rate': 5.8842e-06, 'epoch': 2.34, 'throughput': 2266.11}

	[INFO\|2025-04-06 00:56:04] logging.py:143 >> {'loss': 0.2747, 'learning_rate': 5.4927e-06, 'epoch': 2.36, 'throughput': 2267.95}

	[INFO\|2025-04-06 00:56:29] logging.py:143 >> {'loss': 0.2771, 'learning_rate': 5.1131e-06, 'epoch': 2.39, 'throughput': 2269.76}

	[INFO\|2025-04-06 00:56:54] logging.py:143 >> {'loss': 0.2999, 'learning_rate': 4.7456e-06, 'epoch': 2.41, 'throughput': 2271.55}

	[INFO\|2025-04-06 00:57:20] logging.py:143 >> {'loss': 0.2982, 'learning_rate': 4.3904e-06, 'epoch': 2.43, 'throughput': 2273.34}

	[INFO\|2025-04-06 00:57:44] logging.py:143 >> {'loss': 0.2822, 'learning_rate': 4.0478e-06, 'epoch': 2.46, 'throughput': 2275.04}

	[INFO\|2025-04-06 00:58:09] logging.py:143 >> {'loss': 0.2867, 'learning_rate': 3.7179e-06, 'epoch': 2.48, 'throughput': 2276.60}

	[INFO\|2025-04-06 00:58:33] logging.py:143 >> {'loss': 0.2763, 'learning_rate': 3.4009e-06, 'epoch': 2.50, 'throughput': 2278.12}

	[INFO\|2025-04-06 00:58:58] logging.py:143 >> {'loss': 0.2975, 'learning_rate': 3.0971e-06, 'epoch': 2.52, 'throughput': 2279.74}

	[INFO\|2025-04-06 00:59:22] logging.py:143 >> {'loss': 0.2826, 'learning_rate': 2.8066e-06, 'epoch': 2.55, 'throughput': 2281.23}

	[INFO\|2025-04-06 00:59:47] logging.py:143 >> {'loss': 0.2807, 'learning_rate': 2.5295e-06, 'epoch': 2.57, 'throughput': 2282.81}

	[INFO\|2025-04-06 01:00:12] logging.py:143 >> {'loss': 0.2882, 'learning_rate': 2.2662e-06, 'epoch': 2.59, 'throughput': 2284.37}

	[INFO\|2025-04-06 01:00:37] logging.py:143 >> {'loss': 0.2878, 'learning_rate': 2.0167e-06, 'epoch': 2.62, 'throughput': 2285.93}

	[INFO\|2025-04-06 01:01:01] logging.py:143 >> {'loss': 0.2913, 'learning_rate': 1.7811e-06, 'epoch': 2.64, 'throughput': 2287.44}

	[INFO\|2025-04-06 01:01:26] logging.py:143 >> {'loss': 0.2893, 'learning_rate': 1.5597e-06, 'epoch': 2.66, 'throughput': 2289.00}

	[INFO\|2025-04-06 01:01:51] logging.py:143 >> {'loss': 0.2847, 'learning_rate': 1.3525e-06, 'epoch': 2.68, 'throughput': 2290.49}

	[INFO\|2025-04-06 01:02:16] logging.py:143 >> {'loss': 0.2902, 'learning_rate': 1.1597e-06, 'epoch': 2.71, 'throughput': 2292.06}

	[INFO\|2025-04-06 01:02:41] logging.py:143 >> {'loss': 0.2808, 'learning_rate': 9.8134e-07, 'epoch': 2.73, 'throughput': 2293.36}

	[INFO\|2025-04-06 01:02:41] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 01:02:41] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 01:02:41] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 01:03:15] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-600

	[INFO\|2025-04-06 01:03:16] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 01:03:16] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 01:03:22] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-600/tokenizer_config.json

	[INFO\|2025-04-06 01:03:22] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-600/special_tokens_map.json

	[INFO\|2025-04-06 01:03:55] logging.py:143 >> {'loss': 0.2775, 'learning_rate': 8.1763e-07, 'epoch': 2.75, 'throughput': 2259.90}

	[INFO\|2025-04-06 01:04:20] logging.py:143 >> {'loss': 0.2874, 'learning_rate': 6.6862e-07, 'epoch': 2.78, 'throughput': 2261.55}

	[INFO\|2025-04-06 01:04:45] logging.py:143 >> {'loss': 0.2933, 'learning_rate': 5.3441e-07, 'epoch': 2.80, 'throughput': 2263.21}

	[INFO\|2025-04-06 01:05:10] logging.py:143 >> {'loss': 0.2807, 'learning_rate': 4.1508e-07, 'epoch': 2.82, 'throughput': 2264.75}

	[INFO\|2025-04-06 01:05:34] logging.py:143 >> {'loss': 0.2843, 'learning_rate': 3.1069e-07, 'epoch': 2.84, 'throughput': 2266.34}

	[INFO\|2025-04-06 01:05:59] logging.py:143 >> {'loss': 0.2717, 'learning_rate': 2.2132e-07, 'epoch': 2.87, 'throughput': 2267.87}

	[INFO\|2025-04-06 01:06:24] logging.py:143 >> {'loss': 0.2789, 'learning_rate': 1.4701e-07, 'epoch': 2.89, 'throughput': 2269.33}

	[INFO\|2025-04-06 01:06:48] logging.py:143 >> {'loss': 0.2806, 'learning_rate': 8.7816e-08, 'epoch': 2.91, 'throughput': 2270.81}

	[INFO\|2025-04-06 01:07:13] logging.py:143 >> {'loss': 0.2729, 'learning_rate': 4.3769e-08, 'epoch': 2.94, 'throughput': 2272.29}

	[INFO\|2025-04-06 01:07:38] logging.py:143 >> {'loss': 0.2868, 'learning_rate': 1.4896e-08, 'epoch': 2.96, 'throughput': 2273.81}

	[INFO\|2025-04-06 01:08:03] logging.py:143 >> {'loss': 0.2849, 'learning_rate': 1.2162e-09, 'epoch': 2.98, 'throughput': 2275.21}

	[INFO\|2025-04-06 01:08:13] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-657

	[INFO\|2025-04-06 01:08:13] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 01:08:13] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 01:08:18] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-657/tokenizer_config.json

	[INFO\|2025-04-06 01:08:18] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/checkpoint-657/special_tokens_map.json

	[INFO\|2025-04-06 01:08:26] trainer.py:2665 >>

	Training completed. Do not forget to share your model on huggingface.co/models =)



	[INFO\|2025-04-06 01:08:26] trainer.py:3966 >> Saving model checkpoint to saves/Custom/lora/train_2025-04-05-23-57-03

	[INFO\|2025-04-06 01:08:26] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--GreatCaptainNemo--ProLLaMA/snapshots/7c873bf1e1c53e5b9cbdf58e6b3420a6821569a7/config.json

	[INFO\|2025-04-06 01:08:26] configuration_utils.py:771 >> Model config LlamaConfig {
	"architectures": [
	"LlamaForCausalLM"
	],
	"attention_bias": false,
	"attention_dropout": 0.0,
	"bos_token_id": 1,
	"eos_token_id": 2,
	"head_dim": 128,
	"hidden_act": "silu",
	"hidden_size": 4096,
	"initializer_range": 0.02,
	"intermediate_size": 11008,
	"max_position_embeddings": 4096,
	"mlp_bias": false,
	"model_type": "llama",
	"num_attention_heads": 32,
	"num_hidden_layers": 32,
	"num_key_value_heads": 32,
	"pad_token_id": 0,
	"pretraining_tp": 1,
	"rms_norm_eps": 1e-05,
	"rope_scaling": null,
	"rope_theta": 10000.0,
	"tie_word_embeddings": false,
	"torch_dtype": "float16",
	"transformers_version": "4.50.0",
	"use_cache": true,
	"vocab_size": 32000
	}


	[INFO\|2025-04-06 01:08:36] tokenization_utils_base.py:2510 >> tokenizer config file saved in saves/Custom/lora/train_2025-04-05-23-57-03/tokenizer_config.json

	[INFO\|2025-04-06 01:08:36] tokenization_utils_base.py:2519 >> Special tokens file saved in saves/Custom/lora/train_2025-04-05-23-57-03/special_tokens_map.json

	[WARNING\|2025-04-06 01:08:36] logging.py:148 >> No metric eval_accuracy to plot.

	[INFO\|2025-04-06 01:08:36] trainer.py:4289 >>
	*** Running Evaluation ***

	[INFO\|2025-04-06 01:08:36] trainer.py:4291 >> Num examples = 3118

	[INFO\|2025-04-06 01:08:36] trainer.py:4294 >> Batch size = 16

	[INFO\|2025-04-06 01:09:10] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
	{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}