Upload run.log with huggingface_hub

076b7d1 verified about 1 month ago

13.8 kB

	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	Repo card metadata block was not found. Setting CardData to empty.
	WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	Repo card metadata block was not found. Setting CardData to empty.
	WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
	WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

	=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=A -> /app/out/X/X_A
	{'loss': 4.3202, 'grad_norm': 12.326040267944336, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.1414, 'grad_norm': 5.1463093757629395, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.0993, 'grad_norm': 1.9524067640304565, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 0.7319, 'grad_norm': 1.0913549661636353, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 0.6953, 'grad_norm': 0.8020966053009033, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 0.7629, 'grad_norm': 0.8037011027336121, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 0.6505, 'grad_norm': 0.7035626173019409, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 0.7464, 'grad_norm': 1.008595585823059, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 18.6371, 'train_samples_per_second': 80.485, 'train_steps_per_second': 10.087, 'train_loss': 1.0916077913121973, 'epoch': 1.0}

	=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=B -> /app/out/X/X_B
	{'loss': 4.1486, 'grad_norm': 7.985095024108887, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.2678, 'grad_norm': 3.0237557888031006, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.843, 'grad_norm': 1.2927166223526, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 1.4819, 'grad_norm': 1.0656538009643555, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 1.4765, 'grad_norm': 1.1160749197006226, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 1.4363, 'grad_norm': 0.8927919268608093, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 1.4441, 'grad_norm': 1.162216305732727, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 1.4125, 'grad_norm': 0.9546242356300354, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 24.5221, 'train_samples_per_second': 61.169, 'train_steps_per_second': 7.667, 'train_loss': 1.7482892832857497, 'epoch': 1.0}

	=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=C -> /app/out/X/X_C
	{'loss': 4.6726, 'grad_norm': 9.03194522857666, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.5296, 'grad_norm': 4.634464263916016, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.672, 'grad_norm': 1.788339376449585, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 1.3986, 'grad_norm': 1.5529241561889648, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 1.3295, 'grad_norm': 1.0464192628860474, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 1.2901, 'grad_norm': 0.9648394584655762, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 1.3069, 'grad_norm': 1.234605073928833, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 1.3079, 'grad_norm': 0.9420123100280762, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 21.1671, 'train_samples_per_second': 70.865, 'train_steps_per_second': 8.882, 'train_loss': 1.667192169960509, 'epoch': 1.0}

	=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=D -> /app/out/X/X_D
	{'loss': 3.7401, 'grad_norm': 7.278934478759766, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 2.8468, 'grad_norm': 3.4680252075195312, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.1733, 'grad_norm': 1.6588348150253296, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 0.865, 'grad_norm': 0.7071417570114136, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 0.8709, 'grad_norm': 0.5555262565612793, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 0.8206, 'grad_norm': 0.6325730681419373, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 0.8126, 'grad_norm': 0.7635664343833923, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 0.8417, 'grad_norm': 0.5870062708854675, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 21.0825, 'train_samples_per_second': 71.149, 'train_steps_per_second': 8.917, 'train_loss': 1.1592141364483124, 'epoch': 1.0}

	=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=A -> /app/out/Y/Y_A
	{'loss': 4.7555, 'grad_norm': 5.822748184204102, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.2489, 'grad_norm': 3.5014379024505615, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.0954, 'grad_norm': 2.0311391353607178, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 0.8942, 'grad_norm': 0.8921640515327454, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 0.8581, 'grad_norm': 0.8614839911460876, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 0.9191, 'grad_norm': 0.7092980742454529, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 0.818, 'grad_norm': 0.8512246608734131, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 0.9047, 'grad_norm': 0.9553236961364746, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 22.3723, 'train_samples_per_second': 67.047, 'train_steps_per_second': 8.403, 'train_loss': 1.2261660606303113, 'epoch': 1.0}

	=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=B -> /app/out/Y/Y_B
	{'loss': 4.425, 'grad_norm': 3.522216320037842, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.3123, 'grad_norm': 3.414928436279297, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.6821, 'grad_norm': 1.4625403881072998, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 1.5242, 'grad_norm': 1.119425892829895, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 1.5277, 'grad_norm': 1.1711347103118896, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 1.4649, 'grad_norm': 1.1075676679611206, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 1.4895, 'grad_norm': 1.4818918704986572, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 1.4579, 'grad_norm': 1.0621212720870972, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 34.5565, 'train_samples_per_second': 43.407, 'train_steps_per_second': 5.44, 'train_loss': 1.7648479532688222, 'epoch': 1.0}

	=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=C -> /app/out/Y/Y_C
	{'loss': 4.9627, 'grad_norm': 4.488640308380127, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 3.5387, 'grad_norm': 4.163662433624268, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.5488, 'grad_norm': 1.7466603517532349, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 1.4444, 'grad_norm': 1.3302953243255615, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 1.3853, 'grad_norm': 1.098808765411377, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 1.3533, 'grad_norm': 1.0179518461227417, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 1.3764, 'grad_norm': 1.439223051071167, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 1.3666, 'grad_norm': 1.1373263597488403, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 27.2509, 'train_samples_per_second': 55.044, 'train_steps_per_second': 6.899, 'train_loss': 1.6965796490933032, 'epoch': 1.0}

	=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=D -> /app/out/Y/Y_DWARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
	/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
	warnings.warn(
	/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
	warnings.warn(
	/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
	warnings.warn(
	/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
	warnings.warn(
	/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:612: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `20` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
	warnings.warn(

	{'loss': 4.1206, 'grad_norm': 4.062298774719238, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
	{'loss': 2.9718, 'grad_norm': 2.6493866443634033, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
	{'loss': 1.0817, 'grad_norm': 0.8573890328407288, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
	{'loss': 0.9537, 'grad_norm': 0.6526174545288086, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
	{'loss': 0.9645, 'grad_norm': 0.6483463048934937, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
	{'loss': 0.9246, 'grad_norm': 0.6785702109336853, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
	{'loss': 0.916, 'grad_norm': 0.6483059525489807, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
	{'loss': 0.9405, 'grad_norm': 0.6321303844451904, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
	{'train_runtime': 28.8229, 'train_samples_per_second': 52.042, 'train_steps_per_second': 6.523, 'train_loss': 1.236437439918518, 'epoch': 1.0}

	=== Building cross-model mapping ===
	X adapter dim: 540672 Y adapter dim: 851968
	Anchor weights alpha (A,B,C): [-0.4286724925041199, -0.009492842480540276, 0.11948390305042267]
	cos(Y_hat_D, Y_D) = 0.9513461589813232
	cos(Y_mean_ABC, Y_D) = 0.9567482471466064
	cos(Y_A, Y_D) = 0.9471098780632019
	cos(Y_B, Y_D) = 0.9269149303436279
	cos(Y_C, Y_D) = 0.941544771194458
	Saved /app/out/Y/Y_pred_D
	Saved /app/out/Y/Y_mean_ABC

	=== Evaluating on task D (Emotion) ===
	base_Y 0.3075
	Y_A_on_D 0.51
	Y_B_on_D 0.5375
	Y_C_on_D 0.47
	Y_mean_ABC_on_D 0.505
	Y_pred_D 0.52
	Y_oracle_D 0.665

	=== Results ===
	base_Y 0.3075
	Y_A_on_D 0.5100
	Y_B_on_D 0.5375
	Y_C_on_D 0.4700
	Y_mean_ABC_on_D 0.5050
	Y_pred_D 0.5200
	Y_oracle_D 0.6650
	base_X 0.2850
	X_oracle_D 0.6075