WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=A -> /app/out/X/X_A
{'loss': 4.3202, 'grad_norm': 12.326040267944336, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.1414, 'grad_norm': 5.1463093757629395, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.0993, 'grad_norm': 1.9524067640304565, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 0.7319, 'grad_norm': 1.0913549661636353, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 0.6953, 'grad_norm': 0.8020966053009033, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 0.7629, 'grad_norm': 0.8037011027336121, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 0.6505, 'grad_norm': 0.7035626173019409, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 0.7464, 'grad_norm': 1.008595585823059, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 18.6371, 'train_samples_per_second': 80.485, 'train_steps_per_second': 10.087, 'train_loss': 1.0916077913121973, 'epoch': 1.0}

=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=B -> /app/out/X/X_B
{'loss': 4.1486, 'grad_norm': 7.985095024108887, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.2678, 'grad_norm': 3.0237557888031006, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.843, 'grad_norm': 1.2927166223526, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 1.4819, 'grad_norm': 1.0656538009643555, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 1.4765, 'grad_norm': 1.1160749197006226, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 1.4363, 'grad_norm': 0.8927919268608093, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 1.4441, 'grad_norm': 1.162216305732727, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 1.4125, 'grad_norm': 0.9546242356300354, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 24.5221, 'train_samples_per_second': 61.169, 'train_steps_per_second': 7.667, 'train_loss': 1.7482892832857497, 'epoch': 1.0}

=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=C -> /app/out/X/X_C
{'loss': 4.6726, 'grad_norm': 9.03194522857666, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.5296, 'grad_norm': 4.634464263916016, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.672, 'grad_norm': 1.788339376449585, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 1.3986, 'grad_norm': 1.5529241561889648, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 1.3295, 'grad_norm': 1.0464192628860474, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 1.2901, 'grad_norm': 0.9648394584655762, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 1.3069, 'grad_norm': 1.234605073928833, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 1.3079, 'grad_norm': 0.9420123100280762, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 21.1671, 'train_samples_per_second': 70.865, 'train_steps_per_second': 8.882, 'train_loss': 1.667192169960509, 'epoch': 1.0}

=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=D -> /app/out/X/X_D
{'loss': 3.7401, 'grad_norm': 7.278934478759766, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 2.8468, 'grad_norm': 3.4680252075195312, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.1733, 'grad_norm': 1.6588348150253296, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 0.865, 'grad_norm': 0.7071417570114136, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 0.8709, 'grad_norm': 0.5555262565612793, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 0.8206, 'grad_norm': 0.6325730681419373, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 0.8126, 'grad_norm': 0.7635664343833923, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 0.8417, 'grad_norm': 0.5870062708854675, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 21.0825, 'train_samples_per_second': 71.149, 'train_steps_per_second': 8.917, 'train_loss': 1.1592141364483124, 'epoch': 1.0}

=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=A -> /app/out/Y/Y_A
{'loss': 4.7555, 'grad_norm': 5.822748184204102, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.2489, 'grad_norm': 3.5014379024505615, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.0954, 'grad_norm': 2.0311391353607178, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 0.8942, 'grad_norm': 0.8921640515327454, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 0.8581, 'grad_norm': 0.8614839911460876, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 0.9191, 'grad_norm': 0.7092980742454529, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 0.818, 'grad_norm': 0.8512246608734131, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 0.9047, 'grad_norm': 0.9553236961364746, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 22.3723, 'train_samples_per_second': 67.047, 'train_steps_per_second': 8.403, 'train_loss': 1.2261660606303113, 'epoch': 1.0}

=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=B -> /app/out/Y/Y_B
{'loss': 4.425, 'grad_norm': 3.522216320037842, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.3123, 'grad_norm': 3.414928436279297, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.6821, 'grad_norm': 1.4625403881072998, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 1.5242, 'grad_norm': 1.119425892829895, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 1.5277, 'grad_norm': 1.1711347103118896, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 1.4649, 'grad_norm': 1.1075676679611206, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 1.4895, 'grad_norm': 1.4818918704986572, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 1.4579, 'grad_norm': 1.0621212720870972, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 34.5565, 'train_samples_per_second': 43.407, 'train_steps_per_second': 5.44, 'train_loss': 1.7648479532688222, 'epoch': 1.0}

=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=C -> /app/out/Y/Y_C
{'loss': 4.9627, 'grad_norm': 4.488640308380127, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 3.5387, 'grad_norm': 4.163662433624268, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.5488, 'grad_norm': 1.7466603517532349, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 1.4444, 'grad_norm': 1.3302953243255615, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 1.3853, 'grad_norm': 1.098808765411377, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 1.3533, 'grad_norm': 1.0179518461227417, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 1.3764, 'grad_norm': 1.439223051071167, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 1.3666, 'grad_norm': 1.1373263597488403, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 27.2509, 'train_samples_per_second': 55.044, 'train_steps_per_second': 6.899, 'train_loss': 1.6965796490933032, 'epoch': 1.0}

=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=D -> /app/out/Y/Y_DWARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:612: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `20` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(

{'loss': 4.1206, 'grad_norm': 4.062298774719238, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
{'loss': 2.9718, 'grad_norm': 2.6493866443634033, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
{'loss': 1.0817, 'grad_norm': 0.8573890328407288, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
{'loss': 0.9537, 'grad_norm': 0.6526174545288086, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
{'loss': 0.9645, 'grad_norm': 0.6483463048934937, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
{'loss': 0.9246, 'grad_norm': 0.6785702109336853, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
{'loss': 0.916, 'grad_norm': 0.6483059525489807, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
{'loss': 0.9405, 'grad_norm': 0.6321303844451904, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
{'train_runtime': 28.8229, 'train_samples_per_second': 52.042, 'train_steps_per_second': 6.523, 'train_loss': 1.236437439918518, 'epoch': 1.0}

=== Building cross-model mapping ===
X adapter dim: 540672 Y adapter dim: 851968
Anchor weights alpha (A,B,C): [-0.4286724925041199, -0.009492842480540276, 0.11948390305042267]
cos(Y_hat_D, Y_D) = 0.9513461589813232
cos(Y_mean_ABC, Y_D) = 0.9567482471466064
cos(Y_A, Y_D) = 0.9471098780632019
cos(Y_B, Y_D) = 0.9269149303436279
cos(Y_C, Y_D) = 0.941544771194458
Saved /app/out/Y/Y_pred_D
Saved /app/out/Y/Y_mean_ABC

=== Evaluating on task D (Emotion) ===
base_Y 0.3075
Y_A_on_D 0.51
Y_B_on_D 0.5375
Y_C_on_D 0.47
Y_mean_ABC_on_D 0.505
Y_pred_D 0.52
Y_oracle_D 0.665

=== Results ===
  base_Y                   0.3075
  Y_A_on_D                 0.5100
  Y_B_on_D                 0.5375
  Y_C_on_D                 0.4700
  Y_mean_ABC_on_D          0.5050
  Y_pred_D                 0.5200
  Y_oracle_D               0.6650
  base_X                   0.2850
  X_oracle_D               0.6075