| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| Repo card metadata block was not found. Setting CardData to empty. |
| WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty. |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| Repo card metadata block was not found. Setting CardData to empty. |
| WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty. |
| WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
|
|
| === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=A -> /app/out/X/X_A |
| {'loss': 4.3202, 'grad_norm': 12.326040267944336, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.1414, 'grad_norm': 5.1463093757629395, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.0993, 'grad_norm': 1.9524067640304565, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 0.7319, 'grad_norm': 1.0913549661636353, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 0.6953, 'grad_norm': 0.8020966053009033, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 0.7629, 'grad_norm': 0.8037011027336121, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 0.6505, 'grad_norm': 0.7035626173019409, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 0.7464, 'grad_norm': 1.008595585823059, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 18.6371, 'train_samples_per_second': 80.485, 'train_steps_per_second': 10.087, 'train_loss': 1.0916077913121973, 'epoch': 1.0} |
|
|
| === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=B -> /app/out/X/X_B |
| {'loss': 4.1486, 'grad_norm': 7.985095024108887, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.2678, 'grad_norm': 3.0237557888031006, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.843, 'grad_norm': 1.2927166223526, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 1.4819, 'grad_norm': 1.0656538009643555, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 1.4765, 'grad_norm': 1.1160749197006226, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 1.4363, 'grad_norm': 0.8927919268608093, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 1.4441, 'grad_norm': 1.162216305732727, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 1.4125, 'grad_norm': 0.9546242356300354, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 24.5221, 'train_samples_per_second': 61.169, 'train_steps_per_second': 7.667, 'train_loss': 1.7482892832857497, 'epoch': 1.0} |
|
|
| === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=C -> /app/out/X/X_C |
| {'loss': 4.6726, 'grad_norm': 9.03194522857666, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.5296, 'grad_norm': 4.634464263916016, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.672, 'grad_norm': 1.788339376449585, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 1.3986, 'grad_norm': 1.5529241561889648, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 1.3295, 'grad_norm': 1.0464192628860474, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 1.2901, 'grad_norm': 0.9648394584655762, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 1.3069, 'grad_norm': 1.234605073928833, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 1.3079, 'grad_norm': 0.9420123100280762, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 21.1671, 'train_samples_per_second': 70.865, 'train_steps_per_second': 8.882, 'train_loss': 1.667192169960509, 'epoch': 1.0} |
|
|
| === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=D -> /app/out/X/X_D |
| {'loss': 3.7401, 'grad_norm': 7.278934478759766, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 2.8468, 'grad_norm': 3.4680252075195312, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.1733, 'grad_norm': 1.6588348150253296, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 0.865, 'grad_norm': 0.7071417570114136, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 0.8709, 'grad_norm': 0.5555262565612793, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 0.8206, 'grad_norm': 0.6325730681419373, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 0.8126, 'grad_norm': 0.7635664343833923, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 0.8417, 'grad_norm': 0.5870062708854675, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 21.0825, 'train_samples_per_second': 71.149, 'train_steps_per_second': 8.917, 'train_loss': 1.1592141364483124, 'epoch': 1.0} |
|
|
| === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=A -> /app/out/Y/Y_A |
| {'loss': 4.7555, 'grad_norm': 5.822748184204102, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.2489, 'grad_norm': 3.5014379024505615, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.0954, 'grad_norm': 2.0311391353607178, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 0.8942, 'grad_norm': 0.8921640515327454, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 0.8581, 'grad_norm': 0.8614839911460876, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 0.9191, 'grad_norm': 0.7092980742454529, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 0.818, 'grad_norm': 0.8512246608734131, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 0.9047, 'grad_norm': 0.9553236961364746, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 22.3723, 'train_samples_per_second': 67.047, 'train_steps_per_second': 8.403, 'train_loss': 1.2261660606303113, 'epoch': 1.0} |
|
|
| === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=B -> /app/out/Y/Y_B |
| {'loss': 4.425, 'grad_norm': 3.522216320037842, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.3123, 'grad_norm': 3.414928436279297, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.6821, 'grad_norm': 1.4625403881072998, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 1.5242, 'grad_norm': 1.119425892829895, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 1.5277, 'grad_norm': 1.1711347103118896, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 1.4649, 'grad_norm': 1.1075676679611206, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 1.4895, 'grad_norm': 1.4818918704986572, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 1.4579, 'grad_norm': 1.0621212720870972, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 34.5565, 'train_samples_per_second': 43.407, 'train_steps_per_second': 5.44, 'train_loss': 1.7648479532688222, 'epoch': 1.0} |
|
|
| === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=C -> /app/out/Y/Y_C |
| {'loss': 4.9627, 'grad_norm': 4.488640308380127, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 3.5387, 'grad_norm': 4.163662433624268, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.5488, 'grad_norm': 1.7466603517532349, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 1.4444, 'grad_norm': 1.3302953243255615, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 1.3853, 'grad_norm': 1.098808765411377, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 1.3533, 'grad_norm': 1.0179518461227417, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 1.3764, 'grad_norm': 1.439223051071167, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 1.3666, 'grad_norm': 1.1373263597488403, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 27.2509, 'train_samples_per_second': 55.044, 'train_steps_per_second': 6.899, 'train_loss': 1.6965796490933032, 'epoch': 1.0} |
|
|
| === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=D -> /app/out/Y/Y_DWARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0 |
| /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. |
| warnings.warn( |
| /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. |
| warnings.warn( |
| /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. |
| warnings.warn( |
| /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. |
| warnings.warn( |
| /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:612: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `20` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`. |
| warnings.warn( |
|
|
| {'loss': 4.1206, 'grad_norm': 4.062298774719238, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213} |
| {'loss': 2.9718, 'grad_norm': 2.6493866443634033, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532} |
| {'loss': 1.0817, 'grad_norm': 0.8573890328407288, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063} |
| {'loss': 0.9537, 'grad_norm': 0.6526174545288086, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595} |
| {'loss': 0.9645, 'grad_norm': 0.6483463048934937, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213} |
| {'loss': 0.9246, 'grad_norm': 0.6785702109336853, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766} |
| {'loss': 0.916, 'grad_norm': 0.6483059525489807, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319} |
| {'loss': 0.9405, 'grad_norm': 0.6321303844451904, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872} |
| {'train_runtime': 28.8229, 'train_samples_per_second': 52.042, 'train_steps_per_second': 6.523, 'train_loss': 1.236437439918518, 'epoch': 1.0} |
|
|
| === Building cross-model mapping === |
| X adapter dim: 540672 Y adapter dim: 851968 |
| Anchor weights alpha (A,B,C): [-0.4286724925041199, -0.009492842480540276, 0.11948390305042267] |
| cos(Y_hat_D, Y_D) = 0.9513461589813232 |
| cos(Y_mean_ABC, Y_D) = 0.9567482471466064 |
| cos(Y_A, Y_D) = 0.9471098780632019 |
| cos(Y_B, Y_D) = 0.9269149303436279 |
| cos(Y_C, Y_D) = 0.941544771194458 |
| Saved /app/out/Y/Y_pred_D |
| Saved /app/out/Y/Y_mean_ABC |
|
|
| === Evaluating on task D (Emotion) === |
| base_Y 0.3075 |
| Y_A_on_D 0.51 |
| Y_B_on_D 0.5375 |
| Y_C_on_D 0.47 |
| Y_mean_ABC_on_D 0.505 |
| Y_pred_D 0.52 |
| Y_oracle_D 0.665 |
|
|
| === Results === |
| base_Y 0.3075 |
| Y_A_on_D 0.5100 |
| Y_B_on_D 0.5375 |
| Y_C_on_D 0.4700 |
| Y_mean_ABC_on_D 0.5050 |
| Y_pred_D 0.5200 |
| Y_oracle_D 0.6650 |
| base_X 0.2850 |
| X_oracle_D 0.6075 |
|
|