Mistral-RDE-Finetuned / running_log.txt
SamChen888's picture
Upload folder using huggingface_hub
1f125e9 verified
[WARNING|2024-11-25 18:35:33] logging.py:162 >> We recommend enable `upcast_layernorm` in quantized training.
[INFO|2024-11-25 18:35:33] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|2024-11-25 18:35:33] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
[INFO|2024-11-25 18:35:33] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:35:35] tokenization_utils_base.py:2211 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.model
[INFO|2024-11-25 18:35:35] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.json
[INFO|2024-11-25 18:35:35] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None
[INFO|2024-11-25 18:35:35] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/special_tokens_map.json
[INFO|2024-11-25 18:35:35] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer_config.json
[INFO|2024-11-25 18:35:36] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
[INFO|2024-11-25 18:35:36] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:35:36] tokenization_utils_base.py:2211 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.model
[INFO|2024-11-25 18:35:36] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.json
[INFO|2024-11-25 18:35:36] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None
[INFO|2024-11-25 18:35:36] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/special_tokens_map.json
[INFO|2024-11-25 18:35:36] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer_config.json
[INFO|2024-11-25 18:35:36] logging.py:157 >> Add pad token: </s>
[INFO|2024-11-25 18:35:36] logging.py:157 >> Loading dataset treino_pt_rde.json...
[INFO|2024-11-25 18:35:50] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
[INFO|2024-11-25 18:35:50] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:35:50] logging.py:157 >> Quantizing model to 4 bit with bitsandbytes.
[INFO|2024-11-25 18:35:55] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/config.json
[INFO|2024-11-25 18:35:55] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 770,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"unsloth_version": "2024.9",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:35:56] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unslothai--colab/snapshots/20f9daee9da18936efa03ad4e1361884c60cca0c/config.json
[INFO|2024-11-25 18:35:57] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unslothai--repeat/snapshots/7c48478c02f84ed89f149b0815cc0216ee831fb0/config.json
[INFO|2024-11-25 18:35:57] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unslothai--vram-16/snapshots/9703344699da71a2bb9f17e575eb918c8f6cb349/config.json
[INFO|2024-11-25 18:35:58] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unslothai--1/snapshots/7ec782b7604cd9ea0781c23a4270f031650f5617/config.json
[INFO|2024-11-25 18:35:58] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/config.json
[INFO|2024-11-25 18:35:58] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 770,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"unsloth_version": "2024.9",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:35:58] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/config.json
[INFO|2024-11-25 18:35:58] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 770,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.46.1",
"unsloth_version": "2024.9",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 18:37:37] modeling_utils.py:3937 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/model.safetensors
[INFO|2024-11-25 18:37:37] modeling_utils.py:1670 >> Instantiating MistralForCausalLM model under default dtype torch.float16.
[INFO|2024-11-25 18:37:37] configuration_utils.py:1096 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 770
}
[INFO|2024-11-25 18:38:02] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing MistralForCausalLM.
[INFO|2024-11-25 18:38:02] modeling_utils.py:4808 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at unsloth/mistral-7b-instruct-v0.3-bnb-4bit.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|2024-11-25 18:38:02] configuration_utils.py:1051 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/generation_config.json
[INFO|2024-11-25 18:38:02] configuration_utils.py:1096 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 32768,
"pad_token_id": 770
}
[INFO|2024-11-25 18:38:06] logging.py:157 >> Gradient checkpointing enabled.
[INFO|2024-11-25 18:38:06] logging.py:157 >> Upcasting trainable params to float32.
[INFO|2024-11-25 18:38:06] logging.py:157 >> Fine-tuning method: LoRA
[INFO|2024-11-25 18:38:06] logging.py:157 >> Found linear modules: gate_proj,up_proj,v_proj,down_proj,o_proj,q_proj,k_proj
[WARNING|2024-11-25 18:38:10] logging.py:168 >> Unsloth 2024.11.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
[INFO|2024-11-25 18:38:12] logging.py:157 >> trainable params: 20,971,520 || all params: 7,268,995,072 || trainable%: 0.2885
[INFO|2024-11-25 18:38:12] trainer.py:698 >> Using auto half precision backend
[WARNING|2024-11-25 18:38:13] <string>:208 >> ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 3,716 | Num Epochs = 3
O^O/ \_/ \ Batch size per device = 8 | Gradient Accumulation steps = 4
\ / Total batch size = 32 | Total steps = 348
"-____-" Number of trainable parameters = 20,971,520
[INFO|2024-11-25 18:44:44] logging.py:157 >> {'loss': 0.1805, 'learning_rate': 2.9939e-05, 'epoch': 0.09}
[INFO|2024-11-25 18:51:07] logging.py:157 >> {'loss': 0.0827, 'learning_rate': 2.9756e-05, 'epoch': 0.17}
[INFO|2024-11-25 18:57:27] logging.py:157 >> {'loss': 0.0722, 'learning_rate': 2.9453e-05, 'epoch': 0.26}
[INFO|2024-11-25 19:03:51] logging.py:157 >> {'loss': 0.0637, 'learning_rate': 2.9033e-05, 'epoch': 0.34}
[INFO|2024-11-25 19:10:09] logging.py:157 >> {'loss': 0.0646, 'learning_rate': 2.8498e-05, 'epoch': 0.43}
[INFO|2024-11-25 19:16:32] logging.py:157 >> {'loss': 0.0619, 'learning_rate': 2.7853e-05, 'epoch': 0.52}
[INFO|2024-11-25 19:22:55] logging.py:157 >> {'loss': 0.0588, 'learning_rate': 2.7103e-05, 'epoch': 0.60}
[INFO|2024-11-25 19:29:16] logging.py:157 >> {'loss': 0.0594, 'learning_rate': 2.6255e-05, 'epoch': 0.69}
[INFO|2024-11-25 19:35:47] logging.py:157 >> {'loss': 0.0590, 'learning_rate': 2.5315e-05, 'epoch': 0.77}
[INFO|2024-11-25 19:42:12] logging.py:157 >> {'loss': 0.0557, 'learning_rate': 2.4292e-05, 'epoch': 0.86}
[INFO|2024-11-25 19:48:30] logging.py:157 >> {'loss': 0.0493, 'learning_rate': 2.3192e-05, 'epoch': 0.95}
[INFO|2024-11-25 19:54:47] logging.py:157 >> {'loss': 0.0509, 'learning_rate': 2.2026e-05, 'epoch': 1.03}
[INFO|2024-11-25 20:01:14] logging.py:157 >> {'loss': 0.0419, 'learning_rate': 2.0803e-05, 'epoch': 1.12}
[INFO|2024-11-25 20:07:39] logging.py:157 >> {'loss': 0.0420, 'learning_rate': 1.9532e-05, 'epoch': 1.20}
[INFO|2024-11-25 20:13:57] logging.py:157 >> {'loss': 0.0432, 'learning_rate': 1.8225e-05, 'epoch': 1.29}
[INFO|2024-11-25 20:20:19] logging.py:157 >> {'loss': 0.0475, 'learning_rate': 1.6891e-05, 'epoch': 1.38}
[INFO|2024-11-25 20:26:41] logging.py:157 >> {'loss': 0.0448, 'learning_rate': 1.5542e-05, 'epoch': 1.46}
[INFO|2024-11-25 20:33:05] logging.py:157 >> {'loss': 0.0407, 'learning_rate': 1.4188e-05, 'epoch': 1.55}
[INFO|2024-11-25 20:39:22] logging.py:157 >> {'loss': 0.0421, 'learning_rate': 1.2841e-05, 'epoch': 1.63}
[INFO|2024-11-25 20:45:43] logging.py:157 >> {'loss': 0.0401, 'learning_rate': 1.1511e-05, 'epoch': 1.72}
[INFO|2024-11-25 20:52:07] logging.py:157 >> {'loss': 0.0438, 'learning_rate': 1.0210e-05, 'epoch': 1.81}
[INFO|2024-11-25 20:58:30] logging.py:157 >> {'loss': 0.0444, 'learning_rate': 8.9485e-06, 'epoch': 1.89}
[INFO|2024-11-25 21:04:51] logging.py:157 >> {'loss': 0.0410, 'learning_rate': 7.7358e-06, 'epoch': 1.98}
[INFO|2024-11-25 21:11:09] logging.py:157 >> {'loss': 0.0317, 'learning_rate': 6.5822e-06, 'epoch': 2.06}
[INFO|2024-11-25 21:17:32] logging.py:157 >> {'loss': 0.0357, 'learning_rate': 5.4972e-06, 'epoch': 2.15}
[INFO|2024-11-25 21:23:59] logging.py:157 >> {'loss': 0.0386, 'learning_rate': 4.4896e-06, 'epoch': 2.24}
[INFO|2024-11-25 21:30:17] logging.py:157 >> {'loss': 0.0289, 'learning_rate': 3.5676e-06, 'epoch': 2.32}
[INFO|2024-11-25 21:36:49] logging.py:157 >> {'loss': 0.0307, 'learning_rate': 2.7387e-06, 'epoch': 2.41}
[INFO|2024-11-25 21:43:08] logging.py:157 >> {'loss': 0.0338, 'learning_rate': 2.0096e-06, 'epoch': 2.49}
[INFO|2024-11-25 21:49:30] logging.py:157 >> {'loss': 0.0279, 'learning_rate': 1.3864e-06, 'epoch': 2.58}
[INFO|2024-11-25 21:55:50] logging.py:157 >> {'loss': 0.0321, 'learning_rate': 8.7399e-07, 'epoch': 2.67}
[INFO|2024-11-25 22:02:10] logging.py:157 >> {'loss': 0.0352, 'learning_rate': 4.7666e-07, 'epoch': 2.75}
[INFO|2024-11-25 22:08:34] logging.py:157 >> {'loss': 0.0341, 'learning_rate': 1.9760e-07, 'epoch': 2.84}
[INFO|2024-11-25 22:14:53] logging.py:157 >> {'loss': 0.0315, 'learning_rate': 3.9102e-08, 'epoch': 2.92}
[INFO|2024-11-25 22:19:59] trainer.py:3801 >> Saving model checkpoint to saves/Mistral-7B-Instruct-v0.3/lora/mistral-finetuned/checkpoint-348
[INFO|2024-11-25 22:19:59] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/config.json
[INFO|2024-11-25 22:19:59] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "unsloth/Mistral-7B-Instruct-v0.3",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 770,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"unsloth_version": "2024.9",
"use_cache": true,
"vocab_size": 32768
}
[INFO|2024-11-25 22:20:00] <string>:484 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
[INFO|2024-11-25 22:20:00] trainer.py:3801 >> Saving model checkpoint to saves/Mistral-7B-Instruct-v0.3/lora/mistral-finetuned
[INFO|2024-11-25 22:20:00] configuration_utils.py:679 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--mistral-7b-instruct-v0.3-bnb-4bit/snapshots/d5f623888f1415cf89b5c208d09cb620694618ee/config.json
[INFO|2024-11-25 22:20:00] configuration_utils.py:746 >> Model config MistralConfig {
"_name_or_path": "unsloth/Mistral-7B-Instruct-v0.3",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": 770,
"quantization_config": {
"_load_in_4bit": true,
"_load_in_8bit": false,
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_storage": "uint8",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"unsloth_version": "2024.9",
"use_cache": true,
"vocab_size": 32768
}
[WARNING|2024-11-25 22:20:05] logging.py:162 >> No metric eval_loss to plot.
[WARNING|2024-11-25 22:20:05] logging.py:162 >> No metric eval_accuracy to plot.
[INFO|2024-11-25 22:20:05] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}