Upload run.log with huggingface_hub
Browse files
run.log
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 2 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 3 |
+
Repo card metadata block was not found. Setting CardData to empty.
|
| 4 |
+
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
|
| 5 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 6 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 7 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 8 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 9 |
+
Repo card metadata block was not found. Setting CardData to empty.
|
| 10 |
+
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
|
| 11 |
+
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 12 |
+
|
| 13 |
+
=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=A -> /app/out/X/X_A
|
| 14 |
+
{'loss': 4.3202, 'grad_norm': 12.326040267944336, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 15 |
+
{'loss': 3.1414, 'grad_norm': 5.1463093757629395, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 16 |
+
{'loss': 1.0993, 'grad_norm': 1.9524067640304565, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 17 |
+
{'loss': 0.7319, 'grad_norm': 1.0913549661636353, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 18 |
+
{'loss': 0.6953, 'grad_norm': 0.8020966053009033, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 19 |
+
{'loss': 0.7629, 'grad_norm': 0.8037011027336121, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 20 |
+
{'loss': 0.6505, 'grad_norm': 0.7035626173019409, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 21 |
+
{'loss': 0.7464, 'grad_norm': 1.008595585823059, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 22 |
+
{'train_runtime': 18.6371, 'train_samples_per_second': 80.485, 'train_steps_per_second': 10.087, 'train_loss': 1.0916077913121973, 'epoch': 1.0}
|
| 23 |
+
|
| 24 |
+
=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=B -> /app/out/X/X_B
|
| 25 |
+
{'loss': 4.1486, 'grad_norm': 7.985095024108887, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 26 |
+
{'loss': 3.2678, 'grad_norm': 3.0237557888031006, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 27 |
+
{'loss': 1.843, 'grad_norm': 1.2927166223526, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 28 |
+
{'loss': 1.4819, 'grad_norm': 1.0656538009643555, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 29 |
+
{'loss': 1.4765, 'grad_norm': 1.1160749197006226, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 30 |
+
{'loss': 1.4363, 'grad_norm': 0.8927919268608093, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 31 |
+
{'loss': 1.4441, 'grad_norm': 1.162216305732727, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 32 |
+
{'loss': 1.4125, 'grad_norm': 0.9546242356300354, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 33 |
+
{'train_runtime': 24.5221, 'train_samples_per_second': 61.169, 'train_steps_per_second': 7.667, 'train_loss': 1.7482892832857497, 'epoch': 1.0}
|
| 34 |
+
|
| 35 |
+
=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=C -> /app/out/X/X_C
|
| 36 |
+
{'loss': 4.6726, 'grad_norm': 9.03194522857666, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 37 |
+
{'loss': 3.5296, 'grad_norm': 4.634464263916016, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 38 |
+
{'loss': 1.672, 'grad_norm': 1.788339376449585, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 39 |
+
{'loss': 1.3986, 'grad_norm': 1.5529241561889648, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 40 |
+
{'loss': 1.3295, 'grad_norm': 1.0464192628860474, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 41 |
+
{'loss': 1.2901, 'grad_norm': 0.9648394584655762, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 42 |
+
{'loss': 1.3069, 'grad_norm': 1.234605073928833, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 43 |
+
{'loss': 1.3079, 'grad_norm': 0.9420123100280762, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 44 |
+
{'train_runtime': 21.1671, 'train_samples_per_second': 70.865, 'train_steps_per_second': 8.882, 'train_loss': 1.667192169960509, 'epoch': 1.0}
|
| 45 |
+
|
| 46 |
+
=== Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=D -> /app/out/X/X_D
|
| 47 |
+
{'loss': 3.7401, 'grad_norm': 7.278934478759766, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 48 |
+
{'loss': 2.8468, 'grad_norm': 3.4680252075195312, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 49 |
+
{'loss': 1.1733, 'grad_norm': 1.6588348150253296, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 50 |
+
{'loss': 0.865, 'grad_norm': 0.7071417570114136, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 51 |
+
{'loss': 0.8709, 'grad_norm': 0.5555262565612793, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 52 |
+
{'loss': 0.8206, 'grad_norm': 0.6325730681419373, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 53 |
+
{'loss': 0.8126, 'grad_norm': 0.7635664343833923, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 54 |
+
{'loss': 0.8417, 'grad_norm': 0.5870062708854675, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 55 |
+
{'train_runtime': 21.0825, 'train_samples_per_second': 71.149, 'train_steps_per_second': 8.917, 'train_loss': 1.1592141364483124, 'epoch': 1.0}
|
| 56 |
+
|
| 57 |
+
=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=A -> /app/out/Y/Y_A
|
| 58 |
+
{'loss': 4.7555, 'grad_norm': 5.822748184204102, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 59 |
+
{'loss': 3.2489, 'grad_norm': 3.5014379024505615, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 60 |
+
{'loss': 1.0954, 'grad_norm': 2.0311391353607178, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 61 |
+
{'loss': 0.8942, 'grad_norm': 0.8921640515327454, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 62 |
+
{'loss': 0.8581, 'grad_norm': 0.8614839911460876, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 63 |
+
{'loss': 0.9191, 'grad_norm': 0.7092980742454529, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 64 |
+
{'loss': 0.818, 'grad_norm': 0.8512246608734131, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 65 |
+
{'loss': 0.9047, 'grad_norm': 0.9553236961364746, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 66 |
+
{'train_runtime': 22.3723, 'train_samples_per_second': 67.047, 'train_steps_per_second': 8.403, 'train_loss': 1.2261660606303113, 'epoch': 1.0}
|
| 67 |
+
|
| 68 |
+
=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=B -> /app/out/Y/Y_B
|
| 69 |
+
{'loss': 4.425, 'grad_norm': 3.522216320037842, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 70 |
+
{'loss': 3.3123, 'grad_norm': 3.414928436279297, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 71 |
+
{'loss': 1.6821, 'grad_norm': 1.4625403881072998, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 72 |
+
{'loss': 1.5242, 'grad_norm': 1.119425892829895, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 73 |
+
{'loss': 1.5277, 'grad_norm': 1.1711347103118896, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 74 |
+
{'loss': 1.4649, 'grad_norm': 1.1075676679611206, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 75 |
+
{'loss': 1.4895, 'grad_norm': 1.4818918704986572, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 76 |
+
{'loss': 1.4579, 'grad_norm': 1.0621212720870972, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 77 |
+
{'train_runtime': 34.5565, 'train_samples_per_second': 43.407, 'train_steps_per_second': 5.44, 'train_loss': 1.7648479532688222, 'epoch': 1.0}
|
| 78 |
+
|
| 79 |
+
=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=C -> /app/out/Y/Y_C
|
| 80 |
+
{'loss': 4.9627, 'grad_norm': 4.488640308380127, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 81 |
+
{'loss': 3.5387, 'grad_norm': 4.163662433624268, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 82 |
+
{'loss': 1.5488, 'grad_norm': 1.7466603517532349, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 83 |
+
{'loss': 1.4444, 'grad_norm': 1.3302953243255615, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 84 |
+
{'loss': 1.3853, 'grad_norm': 1.098808765411377, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 85 |
+
{'loss': 1.3533, 'grad_norm': 1.0179518461227417, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 86 |
+
{'loss': 1.3764, 'grad_norm': 1.439223051071167, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 87 |
+
{'loss': 1.3666, 'grad_norm': 1.1373263597488403, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 88 |
+
{'train_runtime': 27.2509, 'train_samples_per_second': 55.044, 'train_steps_per_second': 6.899, 'train_loss': 1.6965796490933032, 'epoch': 1.0}
|
| 89 |
+
|
| 90 |
+
=== Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=D -> /app/out/Y/Y_DWARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
|
| 91 |
+
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
|
| 92 |
+
warnings.warn(
|
| 93 |
+
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
|
| 94 |
+
warnings.warn(
|
| 95 |
+
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
|
| 96 |
+
warnings.warn(
|
| 97 |
+
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
|
| 98 |
+
warnings.warn(
|
| 99 |
+
/usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:612: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `20` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
|
| 100 |
+
warnings.warn(
|
| 101 |
+
|
| 102 |
+
{'loss': 4.1206, 'grad_norm': 4.062298774719238, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
|
| 103 |
+
{'loss': 2.9718, 'grad_norm': 2.6493866443634033, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
|
| 104 |
+
{'loss': 1.0817, 'grad_norm': 0.8573890328407288, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
|
| 105 |
+
{'loss': 0.9537, 'grad_norm': 0.6526174545288086, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
|
| 106 |
+
{'loss': 0.9645, 'grad_norm': 0.6483463048934937, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
|
| 107 |
+
{'loss': 0.9246, 'grad_norm': 0.6785702109336853, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
|
| 108 |
+
{'loss': 0.916, 'grad_norm': 0.6483059525489807, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
|
| 109 |
+
{'loss': 0.9405, 'grad_norm': 0.6321303844451904, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
|
| 110 |
+
{'train_runtime': 28.8229, 'train_samples_per_second': 52.042, 'train_steps_per_second': 6.523, 'train_loss': 1.236437439918518, 'epoch': 1.0}
|
| 111 |
+
|
| 112 |
+
=== Building cross-model mapping ===
|
| 113 |
+
X adapter dim: 540672 Y adapter dim: 851968
|
| 114 |
+
Anchor weights alpha (A,B,C): [-0.4286724925041199, -0.009492842480540276, 0.11948390305042267]
|
| 115 |
+
cos(Y_hat_D, Y_D) = 0.9513461589813232
|
| 116 |
+
cos(Y_mean_ABC, Y_D) = 0.9567482471466064
|
| 117 |
+
cos(Y_A, Y_D) = 0.9471098780632019
|
| 118 |
+
cos(Y_B, Y_D) = 0.9269149303436279
|
| 119 |
+
cos(Y_C, Y_D) = 0.941544771194458
|
| 120 |
+
Saved /app/out/Y/Y_pred_D
|
| 121 |
+
Saved /app/out/Y/Y_mean_ABC
|
| 122 |
+
|
| 123 |
+
=== Evaluating on task D (Emotion) ===
|
| 124 |
+
base_Y 0.3075
|
| 125 |
+
Y_A_on_D 0.51
|
| 126 |
+
Y_B_on_D 0.5375
|
| 127 |
+
Y_C_on_D 0.47
|
| 128 |
+
Y_mean_ABC_on_D 0.505
|
| 129 |
+
Y_pred_D 0.52
|
| 130 |
+
Y_oracle_D 0.665
|
| 131 |
+
|
| 132 |
+
=== Results ===
|
| 133 |
+
base_Y 0.3075
|
| 134 |
+
Y_A_on_D 0.5100
|
| 135 |
+
Y_B_on_D 0.5375
|
| 136 |
+
Y_C_on_D 0.4700
|
| 137 |
+
Y_mean_ABC_on_D 0.5050
|
| 138 |
+
Y_pred_D 0.5200
|
| 139 |
+
Y_oracle_D 0.6650
|
| 140 |
+
base_X 0.2850
|
| 141 |
+
X_oracle_D 0.6075
|