| /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. | |
| warnings.warn( | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file vocab.json | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file merges.txt | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file added_tokens.json | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file special_tokens_map.json | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2024] 2023-12-22 17:17:06,207 >> loading file tokenizer.json | |
| [WARNING|logging.py:314] 2023-12-22 17:17:06,301 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| [INFO|configuration_utils.py:737] 2023-12-22 17:17:06,302 >> loading configuration file ./models/phi-2-sft-alpaca_gpt4_en-ep1/config.json | |
| [INFO|configuration_utils.py:737] 2023-12-22 17:17:06,314 >> loading configuration file ./models/phi-2-sft-alpaca_gpt4_en-ep1/config.json | |
| [INFO|configuration_utils.py:802] 2023-12-22 17:17:06,315 >> Model config PhiConfig { | |
| "_name_or_path": "./models/phi-2-sft-alpaca_gpt4_en-ep1", | |
| "activation_function": "gelu_new", | |
| "architectures": [ | |
| "PhiForCausalLM" | |
| ], | |
| "attn_pdrop": 0.0, | |
| "auto_map": { | |
| "AutoConfig": "configuration_phi.PhiConfig", | |
| "AutoModel": "modeling_phi.PhiForCausalLM", | |
| "AutoModelForCausalLM": "modeling_phi.PhiForCausalLM" | |
| }, | |
| "embd_pdrop": 0.0, | |
| "flash_attn": false, | |
| "flash_rotary": false, | |
| "fused_dense": false, | |
| "img_processor": null, | |
| "initializer_range": 0.02, | |
| "layer_norm_epsilon": 1e-05, | |
| "model_type": "phi-msft", | |
| "n_embd": 2560, | |
| "n_head": 32, | |
| "n_head_kv": null, | |
| "n_inner": null, | |
| "n_layer": 32, | |
| "n_positions": 2048, | |
| "resid_pdrop": 0.1, | |
| "rotary_dim": 32, | |
| "tie_word_embeddings": false, | |
| "torch_dtype": "float16", | |
| "transformers_version": "4.36.2", | |
| "use_cache": true, | |
| "vocab_size": 51200 | |
| } | |
| [INFO|modeling_utils.py:3341] 2023-12-22 17:17:06,553 >> loading weights file ./models/phi-2-sft-alpaca_gpt4_en-ep1/model.safetensors.index.json | |
| [INFO|modeling_utils.py:1341] 2023-12-22 17:17:06,560 >> Instantiating PhiForCausalLM model under default dtype torch.float16. | |
| [INFO|configuration_utils.py:826] 2023-12-22 17:17:06,561 >> Generate config GenerationConfig {} | |
| [INFO|configuration_utils.py:826] 2023-12-22 17:17:06,562 >> Generate config GenerationConfig {} | |
| Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|βββββ | 1/2 [00:00<00:00, 5.06it/s] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 5.58it/s] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 5.49it/s] | |
| [INFO|modeling_utils.py:4185] 2023-12-22 17:17:07,056 >> All model checkpoint weights were used when initializing PhiForCausalLM. | |
| [INFO|modeling_utils.py:4193] 2023-12-22 17:17:07,056 >> All the weights of PhiForCausalLM were initialized from the model checkpoint at ./models/phi-2-sft-alpaca_gpt4_en-ep1. | |
| If your task is similar to the task the model of the checkpoint was trained on, you can already use PhiForCausalLM for predictions without further training. | |
| [INFO|configuration_utils.py:779] 2023-12-22 17:17:07,059 >> loading configuration file ./models/phi-2-sft-alpaca_gpt4_en-ep1/generation_config.json | |
| [INFO|configuration_utils.py:826] 2023-12-22 17:17:07,059 >> Generate config GenerationConfig {} | |
| 12/22/2023 17:17:07 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA | |
| 12/22/2023 17:17:08 - INFO - llmtuner.model.adapter - Merged 1 adapter(s). | |
| 12/22/2023 17:17:08 - INFO - llmtuner.model.adapter - Loaded adapter(s): ./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora | |
| 12/22/2023 17:17:08 - INFO - llmtuner.model.loader - trainable params: 0 || all params: 2779683840 || trainable%: 0.0000 | |
| 12/22/2023 17:17:08 - INFO - llmtuner.model.loader - This IS expected that the trainable params is 0 if you are using model for inference only. | |
| [INFO|configuration_utils.py:483] 2023-12-22 17:17:08,317 >> Configuration saved in ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/config.json | |
| [INFO|configuration_utils.py:594] 2023-12-22 17:17:08,317 >> Configuration saved in ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/generation_config.json | |
| [INFO|modeling_utils.py:2390] 2023-12-22 17:17:15,004 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/model.safetensors.index.json. | |
| [INFO|tokenization_utils_base.py:2432] 2023-12-22 17:17:15,005 >> tokenizer config file saved in ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/tokenizer_config.json | |
| [INFO|tokenization_utils_base.py:2441] 2023-12-22 17:17:15,006 >> Special tokens file saved in ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/special_tokens_map.json | |
| [INFO|tokenization_utils_base.py:2492] 2023-12-22 17:17:15,006 >> added tokens file saved in ./models/export/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1/added_tokens.json | |