Samarth0710 commited on
Commit
076b7d1
·
verified ·
1 Parent(s): 741671a

Upload run.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. run.log +141 -0
run.log ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
3
+ Repo card metadata block was not found. Setting CardData to empty.
4
+ WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
5
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
6
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
7
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
8
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
9
+ Repo card metadata block was not found. Setting CardData to empty.
10
+ WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
11
+ WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
12
+
13
+ === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=A -> /app/out/X/X_A
14
+ {'loss': 4.3202, 'grad_norm': 12.326040267944336, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
15
+ {'loss': 3.1414, 'grad_norm': 5.1463093757629395, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
16
+ {'loss': 1.0993, 'grad_norm': 1.9524067640304565, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
17
+ {'loss': 0.7319, 'grad_norm': 1.0913549661636353, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
18
+ {'loss': 0.6953, 'grad_norm': 0.8020966053009033, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
19
+ {'loss': 0.7629, 'grad_norm': 0.8037011027336121, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
20
+ {'loss': 0.6505, 'grad_norm': 0.7035626173019409, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
21
+ {'loss': 0.7464, 'grad_norm': 1.008595585823059, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
22
+ {'train_runtime': 18.6371, 'train_samples_per_second': 80.485, 'train_steps_per_second': 10.087, 'train_loss': 1.0916077913121973, 'epoch': 1.0}
23
+
24
+ === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=B -> /app/out/X/X_B
25
+ {'loss': 4.1486, 'grad_norm': 7.985095024108887, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
26
+ {'loss': 3.2678, 'grad_norm': 3.0237557888031006, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
27
+ {'loss': 1.843, 'grad_norm': 1.2927166223526, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
28
+ {'loss': 1.4819, 'grad_norm': 1.0656538009643555, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
29
+ {'loss': 1.4765, 'grad_norm': 1.1160749197006226, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
30
+ {'loss': 1.4363, 'grad_norm': 0.8927919268608093, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
31
+ {'loss': 1.4441, 'grad_norm': 1.162216305732727, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
32
+ {'loss': 1.4125, 'grad_norm': 0.9546242356300354, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
33
+ {'train_runtime': 24.5221, 'train_samples_per_second': 61.169, 'train_steps_per_second': 7.667, 'train_loss': 1.7482892832857497, 'epoch': 1.0}
34
+
35
+ === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=C -> /app/out/X/X_C
36
+ {'loss': 4.6726, 'grad_norm': 9.03194522857666, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
37
+ {'loss': 3.5296, 'grad_norm': 4.634464263916016, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
38
+ {'loss': 1.672, 'grad_norm': 1.788339376449585, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
39
+ {'loss': 1.3986, 'grad_norm': 1.5529241561889648, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
40
+ {'loss': 1.3295, 'grad_norm': 1.0464192628860474, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
41
+ {'loss': 1.2901, 'grad_norm': 0.9648394584655762, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
42
+ {'loss': 1.3069, 'grad_norm': 1.234605073928833, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
43
+ {'loss': 1.3079, 'grad_norm': 0.9420123100280762, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
44
+ {'train_runtime': 21.1671, 'train_samples_per_second': 70.865, 'train_steps_per_second': 8.882, 'train_loss': 1.667192169960509, 'epoch': 1.0}
45
+
46
+ === Training LoRA: model=Qwen/Qwen2.5-0.5B-Instruct task=D -> /app/out/X/X_D
47
+ {'loss': 3.7401, 'grad_norm': 7.278934478759766, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
48
+ {'loss': 2.8468, 'grad_norm': 3.4680252075195312, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
49
+ {'loss': 1.1733, 'grad_norm': 1.6588348150253296, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
50
+ {'loss': 0.865, 'grad_norm': 0.7071417570114136, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
51
+ {'loss': 0.8709, 'grad_norm': 0.5555262565612793, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
52
+ {'loss': 0.8206, 'grad_norm': 0.6325730681419373, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
53
+ {'loss': 0.8126, 'grad_norm': 0.7635664343833923, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
54
+ {'loss': 0.8417, 'grad_norm': 0.5870062708854675, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
55
+ {'train_runtime': 21.0825, 'train_samples_per_second': 71.149, 'train_steps_per_second': 8.917, 'train_loss': 1.1592141364483124, 'epoch': 1.0}
56
+
57
+ === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=A -> /app/out/Y/Y_A
58
+ {'loss': 4.7555, 'grad_norm': 5.822748184204102, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
59
+ {'loss': 3.2489, 'grad_norm': 3.5014379024505615, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
60
+ {'loss': 1.0954, 'grad_norm': 2.0311391353607178, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
61
+ {'loss': 0.8942, 'grad_norm': 0.8921640515327454, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
62
+ {'loss': 0.8581, 'grad_norm': 0.8614839911460876, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
63
+ {'loss': 0.9191, 'grad_norm': 0.7092980742454529, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
64
+ {'loss': 0.818, 'grad_norm': 0.8512246608734131, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
65
+ {'loss': 0.9047, 'grad_norm': 0.9553236961364746, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
66
+ {'train_runtime': 22.3723, 'train_samples_per_second': 67.047, 'train_steps_per_second': 8.403, 'train_loss': 1.2261660606303113, 'epoch': 1.0}
67
+
68
+ === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=B -> /app/out/Y/Y_B
69
+ {'loss': 4.425, 'grad_norm': 3.522216320037842, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
70
+ {'loss': 3.3123, 'grad_norm': 3.414928436279297, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
71
+ {'loss': 1.6821, 'grad_norm': 1.4625403881072998, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
72
+ {'loss': 1.5242, 'grad_norm': 1.119425892829895, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
73
+ {'loss': 1.5277, 'grad_norm': 1.1711347103118896, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
74
+ {'loss': 1.4649, 'grad_norm': 1.1075676679611206, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
75
+ {'loss': 1.4895, 'grad_norm': 1.4818918704986572, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
76
+ {'loss': 1.4579, 'grad_norm': 1.0621212720870972, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
77
+ {'train_runtime': 34.5565, 'train_samples_per_second': 43.407, 'train_steps_per_second': 5.44, 'train_loss': 1.7648479532688222, 'epoch': 1.0}
78
+
79
+ === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=C -> /app/out/Y/Y_C
80
+ {'loss': 4.9627, 'grad_norm': 4.488640308380127, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
81
+ {'loss': 3.5387, 'grad_norm': 4.163662433624268, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
82
+ {'loss': 1.5488, 'grad_norm': 1.7466603517532349, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
83
+ {'loss': 1.4444, 'grad_norm': 1.3302953243255615, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
84
+ {'loss': 1.3853, 'grad_norm': 1.098808765411377, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
85
+ {'loss': 1.3533, 'grad_norm': 1.0179518461227417, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
86
+ {'loss': 1.3764, 'grad_norm': 1.439223051071167, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
87
+ {'loss': 1.3666, 'grad_norm': 1.1373263597488403, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
88
+ {'train_runtime': 27.2509, 'train_samples_per_second': 55.044, 'train_steps_per_second': 6.899, 'train_loss': 1.6965796490933032, 'epoch': 1.0}
89
+
90
+ === Training LoRA: model=meta-llama/Llama-3.2-1B-Instruct task=D -> /app/out/Y/Y_DWARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
91
+ /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
92
+ warnings.warn(
93
+ /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
94
+ warnings.warn(
95
+ /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
96
+ warnings.warn(
97
+ /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
98
+ warnings.warn(
99
+ /usr/local/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:612: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `20` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
100
+ warnings.warn(
101
+
102
+ {'loss': 4.1206, 'grad_norm': 4.062298774719238, 'learning_rate': 2e-05, 'epoch': 0.005319148936170213}
103
+ {'loss': 2.9718, 'grad_norm': 2.6493866443634033, 'learning_rate': 0.00019651603150596495, 'epoch': 0.13297872340425532}
104
+ {'loss': 1.0817, 'grad_norm': 0.8573890328407288, 'learning_rate': 0.0001760978787760968, 'epoch': 0.26595744680851063}
105
+ {'loss': 0.9537, 'grad_norm': 0.6526174545288086, 'learning_rate': 0.00014110317293976218, 'epoch': 0.39893617021276595}
106
+ {'loss': 0.9645, 'grad_norm': 0.6483463048934937, 'learning_rate': 9.823515193568715e-05, 'epoch': 0.5319148936170213}
107
+ {'loss': 0.9246, 'grad_norm': 0.6785702109336853, 'learning_rate': 5.570518768099918e-05, 'epoch': 0.6648936170212766}
108
+ {'loss': 0.916, 'grad_norm': 0.6483059525489807, 'learning_rate': 2.1659897302814747e-05, 'epoch': 0.7978723404255319}
109
+ {'loss': 0.9405, 'grad_norm': 0.6321303844451904, 'learning_rate': 2.6206581536199594e-06, 'epoch': 0.9308510638297872}
110
+ {'train_runtime': 28.8229, 'train_samples_per_second': 52.042, 'train_steps_per_second': 6.523, 'train_loss': 1.236437439918518, 'epoch': 1.0}
111
+
112
+ === Building cross-model mapping ===
113
+ X adapter dim: 540672 Y adapter dim: 851968
114
+ Anchor weights alpha (A,B,C): [-0.4286724925041199, -0.009492842480540276, 0.11948390305042267]
115
+ cos(Y_hat_D, Y_D) = 0.9513461589813232
116
+ cos(Y_mean_ABC, Y_D) = 0.9567482471466064
117
+ cos(Y_A, Y_D) = 0.9471098780632019
118
+ cos(Y_B, Y_D) = 0.9269149303436279
119
+ cos(Y_C, Y_D) = 0.941544771194458
120
+ Saved /app/out/Y/Y_pred_D
121
+ Saved /app/out/Y/Y_mean_ABC
122
+
123
+ === Evaluating on task D (Emotion) ===
124
+ base_Y 0.3075
125
+ Y_A_on_D 0.51
126
+ Y_B_on_D 0.5375
127
+ Y_C_on_D 0.47
128
+ Y_mean_ABC_on_D 0.505
129
+ Y_pred_D 0.52
130
+ Y_oracle_D 0.665
131
+
132
+ === Results ===
133
+ base_Y 0.3075
134
+ Y_A_on_D 0.5100
135
+ Y_B_on_D 0.5375
136
+ Y_C_on_D 0.4700
137
+ Y_mean_ABC_on_D 0.5050
138
+ Y_pred_D 0.5200
139
+ Y_oracle_D 0.6650
140
+ base_X 0.2850
141
+ X_oracle_D 0.6075