AbstractPhil commited on
Commit
1f38969
·
verified ·
1 Parent(s): 20d0565

Upload Math_Collective_V2.ipynb

Browse files
Files changed (1) hide show
  1. Math_Collective_V2.ipynb +2422 -0
Math_Collective_V2.ipynb ADDED
@@ -0,0 +1,2422 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "machine_shape": "hm",
8
+ "gpuType": "L4"
9
+ },
10
+ "kernelspec": {
11
+ "name": "python3",
12
+ "display_name": "Python 3"
13
+ },
14
+ "language_info": {
15
+ "name": "python"
16
+ },
17
+ "accelerator": "GPU",
18
+ "widgets": {
19
+ "application/vnd.jupyter.widget-state+json": {
20
+ "17f8d0b0ef4347fd81984c46fbb9e684": {
21
+ "model_module": "@jupyter-widgets/controls",
22
+ "model_name": "HBoxModel",
23
+ "model_module_version": "1.5.0",
24
+ "state": {
25
+ "_dom_classes": [],
26
+ "_model_module": "@jupyter-widgets/controls",
27
+ "_model_module_version": "1.5.0",
28
+ "_model_name": "HBoxModel",
29
+ "_view_count": null,
30
+ "_view_module": "@jupyter-widgets/controls",
31
+ "_view_module_version": "1.5.0",
32
+ "_view_name": "HBoxView",
33
+ "box_style": "",
34
+ "children": [
35
+ "IPY_MODEL_7d78e730875146efa1dd35b906344678",
36
+ "IPY_MODEL_dc1906aa05194549a4d6dcc85d92dc09",
37
+ "IPY_MODEL_1f6c2b5fc10a4a98a5380a13fa70bc75"
38
+ ],
39
+ "layout": "IPY_MODEL_dba61edae6884c92bdf9a3d76dec59c9"
40
+ }
41
+ },
42
+ "7d78e730875146efa1dd35b906344678": {
43
+ "model_module": "@jupyter-widgets/controls",
44
+ "model_name": "HTMLModel",
45
+ "model_module_version": "1.5.0",
46
+ "state": {
47
+ "_dom_classes": [],
48
+ "_model_module": "@jupyter-widgets/controls",
49
+ "_model_module_version": "1.5.0",
50
+ "_model_name": "HTMLModel",
51
+ "_view_count": null,
52
+ "_view_module": "@jupyter-widgets/controls",
53
+ "_view_module_version": "1.5.0",
54
+ "_view_name": "HTMLView",
55
+ "description": "",
56
+ "description_tooltip": null,
57
+ "layout": "IPY_MODEL_99a31b328dbf401c9db2b340cb2105d5",
58
+ "placeholder": "​",
59
+ "style": "IPY_MODEL_76efebd872c044b9a7f8c38d57eac49f",
60
+ "value": "Epoch 1: 100%"
61
+ }
62
+ },
63
+ "dc1906aa05194549a4d6dcc85d92dc09": {
64
+ "model_module": "@jupyter-widgets/controls",
65
+ "model_name": "FloatProgressModel",
66
+ "model_module_version": "1.5.0",
67
+ "state": {
68
+ "_dom_classes": [],
69
+ "_model_module": "@jupyter-widgets/controls",
70
+ "_model_module_version": "1.5.0",
71
+ "_model_name": "FloatProgressModel",
72
+ "_view_count": null,
73
+ "_view_module": "@jupyter-widgets/controls",
74
+ "_view_module_version": "1.5.0",
75
+ "_view_name": "ProgressView",
76
+ "bar_style": "success",
77
+ "description": "",
78
+ "description_tooltip": null,
79
+ "layout": "IPY_MODEL_5af975c1370f43d59b97ed336c866e0c",
80
+ "max": 935,
81
+ "min": 0,
82
+ "orientation": "horizontal",
83
+ "style": "IPY_MODEL_4a035406ac194a67afca19b313e05b9a",
84
+ "value": 935
85
+ }
86
+ },
87
+ "1f6c2b5fc10a4a98a5380a13fa70bc75": {
88
+ "model_module": "@jupyter-widgets/controls",
89
+ "model_name": "HTMLModel",
90
+ "model_module_version": "1.5.0",
91
+ "state": {
92
+ "_dom_classes": [],
93
+ "_model_module": "@jupyter-widgets/controls",
94
+ "_model_module_version": "1.5.0",
95
+ "_model_name": "HTMLModel",
96
+ "_view_count": null,
97
+ "_view_module": "@jupyter-widgets/controls",
98
+ "_view_module_version": "1.5.0",
99
+ "_view_name": "HTMLView",
100
+ "description": "",
101
+ "description_tooltip": null,
102
+ "layout": "IPY_MODEL_c429a347002a43fe8a0a57f5797767d7",
103
+ "placeholder": "​",
104
+ "style": "IPY_MODEL_cce3e0efd2bb4b978c6fa150b6cac602",
105
+ "value": " 935/935 [05:06<00:00,  3.30it/s, acc=21.8%]"
106
+ }
107
+ },
108
+ "dba61edae6884c92bdf9a3d76dec59c9": {
109
+ "model_module": "@jupyter-widgets/base",
110
+ "model_name": "LayoutModel",
111
+ "model_module_version": "1.2.0",
112
+ "state": {
113
+ "_model_module": "@jupyter-widgets/base",
114
+ "_model_module_version": "1.2.0",
115
+ "_model_name": "LayoutModel",
116
+ "_view_count": null,
117
+ "_view_module": "@jupyter-widgets/base",
118
+ "_view_module_version": "1.2.0",
119
+ "_view_name": "LayoutView",
120
+ "align_content": null,
121
+ "align_items": null,
122
+ "align_self": null,
123
+ "border": null,
124
+ "bottom": null,
125
+ "display": null,
126
+ "flex": null,
127
+ "flex_flow": null,
128
+ "grid_area": null,
129
+ "grid_auto_columns": null,
130
+ "grid_auto_flow": null,
131
+ "grid_auto_rows": null,
132
+ "grid_column": null,
133
+ "grid_gap": null,
134
+ "grid_row": null,
135
+ "grid_template_areas": null,
136
+ "grid_template_columns": null,
137
+ "grid_template_rows": null,
138
+ "height": null,
139
+ "justify_content": null,
140
+ "justify_items": null,
141
+ "left": null,
142
+ "margin": null,
143
+ "max_height": null,
144
+ "max_width": null,
145
+ "min_height": null,
146
+ "min_width": null,
147
+ "object_fit": null,
148
+ "object_position": null,
149
+ "order": null,
150
+ "overflow": null,
151
+ "overflow_x": null,
152
+ "overflow_y": null,
153
+ "padding": null,
154
+ "right": null,
155
+ "top": null,
156
+ "visibility": null,
157
+ "width": null
158
+ }
159
+ },
160
+ "99a31b328dbf401c9db2b340cb2105d5": {
161
+ "model_module": "@jupyter-widgets/base",
162
+ "model_name": "LayoutModel",
163
+ "model_module_version": "1.2.0",
164
+ "state": {
165
+ "_model_module": "@jupyter-widgets/base",
166
+ "_model_module_version": "1.2.0",
167
+ "_model_name": "LayoutModel",
168
+ "_view_count": null,
169
+ "_view_module": "@jupyter-widgets/base",
170
+ "_view_module_version": "1.2.0",
171
+ "_view_name": "LayoutView",
172
+ "align_content": null,
173
+ "align_items": null,
174
+ "align_self": null,
175
+ "border": null,
176
+ "bottom": null,
177
+ "display": null,
178
+ "flex": null,
179
+ "flex_flow": null,
180
+ "grid_area": null,
181
+ "grid_auto_columns": null,
182
+ "grid_auto_flow": null,
183
+ "grid_auto_rows": null,
184
+ "grid_column": null,
185
+ "grid_gap": null,
186
+ "grid_row": null,
187
+ "grid_template_areas": null,
188
+ "grid_template_columns": null,
189
+ "grid_template_rows": null,
190
+ "height": null,
191
+ "justify_content": null,
192
+ "justify_items": null,
193
+ "left": null,
194
+ "margin": null,
195
+ "max_height": null,
196
+ "max_width": null,
197
+ "min_height": null,
198
+ "min_width": null,
199
+ "object_fit": null,
200
+ "object_position": null,
201
+ "order": null,
202
+ "overflow": null,
203
+ "overflow_x": null,
204
+ "overflow_y": null,
205
+ "padding": null,
206
+ "right": null,
207
+ "top": null,
208
+ "visibility": null,
209
+ "width": null
210
+ }
211
+ },
212
+ "76efebd872c044b9a7f8c38d57eac49f": {
213
+ "model_module": "@jupyter-widgets/controls",
214
+ "model_name": "DescriptionStyleModel",
215
+ "model_module_version": "1.5.0",
216
+ "state": {
217
+ "_model_module": "@jupyter-widgets/controls",
218
+ "_model_module_version": "1.5.0",
219
+ "_model_name": "DescriptionStyleModel",
220
+ "_view_count": null,
221
+ "_view_module": "@jupyter-widgets/base",
222
+ "_view_module_version": "1.2.0",
223
+ "_view_name": "StyleView",
224
+ "description_width": ""
225
+ }
226
+ },
227
+ "5af975c1370f43d59b97ed336c866e0c": {
228
+ "model_module": "@jupyter-widgets/base",
229
+ "model_name": "LayoutModel",
230
+ "model_module_version": "1.2.0",
231
+ "state": {
232
+ "_model_module": "@jupyter-widgets/base",
233
+ "_model_module_version": "1.2.0",
234
+ "_model_name": "LayoutModel",
235
+ "_view_count": null,
236
+ "_view_module": "@jupyter-widgets/base",
237
+ "_view_module_version": "1.2.0",
238
+ "_view_name": "LayoutView",
239
+ "align_content": null,
240
+ "align_items": null,
241
+ "align_self": null,
242
+ "border": null,
243
+ "bottom": null,
244
+ "display": null,
245
+ "flex": null,
246
+ "flex_flow": null,
247
+ "grid_area": null,
248
+ "grid_auto_columns": null,
249
+ "grid_auto_flow": null,
250
+ "grid_auto_rows": null,
251
+ "grid_column": null,
252
+ "grid_gap": null,
253
+ "grid_row": null,
254
+ "grid_template_areas": null,
255
+ "grid_template_columns": null,
256
+ "grid_template_rows": null,
257
+ "height": null,
258
+ "justify_content": null,
259
+ "justify_items": null,
260
+ "left": null,
261
+ "margin": null,
262
+ "max_height": null,
263
+ "max_width": null,
264
+ "min_height": null,
265
+ "min_width": null,
266
+ "object_fit": null,
267
+ "object_position": null,
268
+ "order": null,
269
+ "overflow": null,
270
+ "overflow_x": null,
271
+ "overflow_y": null,
272
+ "padding": null,
273
+ "right": null,
274
+ "top": null,
275
+ "visibility": null,
276
+ "width": null
277
+ }
278
+ },
279
+ "4a035406ac194a67afca19b313e05b9a": {
280
+ "model_module": "@jupyter-widgets/controls",
281
+ "model_name": "ProgressStyleModel",
282
+ "model_module_version": "1.5.0",
283
+ "state": {
284
+ "_model_module": "@jupyter-widgets/controls",
285
+ "_model_module_version": "1.5.0",
286
+ "_model_name": "ProgressStyleModel",
287
+ "_view_count": null,
288
+ "_view_module": "@jupyter-widgets/base",
289
+ "_view_module_version": "1.2.0",
290
+ "_view_name": "StyleView",
291
+ "bar_color": null,
292
+ "description_width": ""
293
+ }
294
+ },
295
+ "c429a347002a43fe8a0a57f5797767d7": {
296
+ "model_module": "@jupyter-widgets/base",
297
+ "model_name": "LayoutModel",
298
+ "model_module_version": "1.2.0",
299
+ "state": {
300
+ "_model_module": "@jupyter-widgets/base",
301
+ "_model_module_version": "1.2.0",
302
+ "_model_name": "LayoutModel",
303
+ "_view_count": null,
304
+ "_view_module": "@jupyter-widgets/base",
305
+ "_view_module_version": "1.2.0",
306
+ "_view_name": "LayoutView",
307
+ "align_content": null,
308
+ "align_items": null,
309
+ "align_self": null,
310
+ "border": null,
311
+ "bottom": null,
312
+ "display": null,
313
+ "flex": null,
314
+ "flex_flow": null,
315
+ "grid_area": null,
316
+ "grid_auto_columns": null,
317
+ "grid_auto_flow": null,
318
+ "grid_auto_rows": null,
319
+ "grid_column": null,
320
+ "grid_gap": null,
321
+ "grid_row": null,
322
+ "grid_template_areas": null,
323
+ "grid_template_columns": null,
324
+ "grid_template_rows": null,
325
+ "height": null,
326
+ "justify_content": null,
327
+ "justify_items": null,
328
+ "left": null,
329
+ "margin": null,
330
+ "max_height": null,
331
+ "max_width": null,
332
+ "min_height": null,
333
+ "min_width": null,
334
+ "object_fit": null,
335
+ "object_position": null,
336
+ "order": null,
337
+ "overflow": null,
338
+ "overflow_x": null,
339
+ "overflow_y": null,
340
+ "padding": null,
341
+ "right": null,
342
+ "top": null,
343
+ "visibility": null,
344
+ "width": null
345
+ }
346
+ },
347
+ "cce3e0efd2bb4b978c6fa150b6cac602": {
348
+ "model_module": "@jupyter-widgets/controls",
349
+ "model_name": "DescriptionStyleModel",
350
+ "model_module_version": "1.5.0",
351
+ "state": {
352
+ "_model_module": "@jupyter-widgets/controls",
353
+ "_model_module_version": "1.5.0",
354
+ "_model_name": "DescriptionStyleModel",
355
+ "_view_count": null,
356
+ "_view_module": "@jupyter-widgets/base",
357
+ "_view_module_version": "1.2.0",
358
+ "_view_name": "StyleView",
359
+ "description_width": ""
360
+ }
361
+ },
362
+ "da18291cbf374325ac676de844a353e7": {
363
+ "model_module": "@jupyter-widgets/controls",
364
+ "model_name": "HBoxModel",
365
+ "model_module_version": "1.5.0",
366
+ "state": {
367
+ "_dom_classes": [],
368
+ "_model_module": "@jupyter-widgets/controls",
369
+ "_model_module_version": "1.5.0",
370
+ "_model_name": "HBoxModel",
371
+ "_view_count": null,
372
+ "_view_module": "@jupyter-widgets/controls",
373
+ "_view_module_version": "1.5.0",
374
+ "_view_name": "HBoxView",
375
+ "box_style": "",
376
+ "children": [
377
+ "IPY_MODEL_4e910a07266a40caabcc33eeb941bdc0",
378
+ "IPY_MODEL_9c602bb5011d48c8b04404f6b3a37fab",
379
+ "IPY_MODEL_a87e1520f5254be18114a5f782e4e234"
380
+ ],
381
+ "layout": "IPY_MODEL_fd027301b7a6496eb8ae039c25bc4d4b"
382
+ }
383
+ },
384
+ "4e910a07266a40caabcc33eeb941bdc0": {
385
+ "model_module": "@jupyter-widgets/controls",
386
+ "model_name": "HTMLModel",
387
+ "model_module_version": "1.5.0",
388
+ "state": {
389
+ "_dom_classes": [],
390
+ "_model_module": "@jupyter-widgets/controls",
391
+ "_model_module_version": "1.5.0",
392
+ "_model_name": "HTMLModel",
393
+ "_view_count": null,
394
+ "_view_module": "@jupyter-widgets/controls",
395
+ "_view_module_version": "1.5.0",
396
+ "_view_name": "HTMLView",
397
+ "description": "",
398
+ "description_tooltip": null,
399
+ "layout": "IPY_MODEL_9b3468290ae84339ac7c8ce2a9976741",
400
+ "placeholder": "​",
401
+ "style": "IPY_MODEL_8beb040b66a1413abd534d5d455d71e2",
402
+ "value": "Validating: 100%"
403
+ }
404
+ },
405
+ "9c602bb5011d48c8b04404f6b3a37fab": {
406
+ "model_module": "@jupyter-widgets/controls",
407
+ "model_name": "FloatProgressModel",
408
+ "model_module_version": "1.5.0",
409
+ "state": {
410
+ "_dom_classes": [],
411
+ "_model_module": "@jupyter-widgets/controls",
412
+ "_model_module_version": "1.5.0",
413
+ "_model_name": "FloatProgressModel",
414
+ "_view_count": null,
415
+ "_view_module": "@jupyter-widgets/controls",
416
+ "_view_module_version": "1.5.0",
417
+ "_view_name": "ProgressView",
418
+ "bar_style": "success",
419
+ "description": "",
420
+ "description_tooltip": null,
421
+ "layout": "IPY_MODEL_b0fc7aa19e194e91a3138867b2377db9",
422
+ "max": 165,
423
+ "min": 0,
424
+ "orientation": "horizontal",
425
+ "style": "IPY_MODEL_e9a139d721a147d58952f4a1efc0401b",
426
+ "value": 165
427
+ }
428
+ },
429
+ "a87e1520f5254be18114a5f782e4e234": {
430
+ "model_module": "@jupyter-widgets/controls",
431
+ "model_name": "HTMLModel",
432
+ "model_module_version": "1.5.0",
433
+ "state": {
434
+ "_dom_classes": [],
435
+ "_model_module": "@jupyter-widgets/controls",
436
+ "_model_module_version": "1.5.0",
437
+ "_model_name": "HTMLModel",
438
+ "_view_count": null,
439
+ "_view_module": "@jupyter-widgets/controls",
440
+ "_view_module_version": "1.5.0",
441
+ "_view_name": "HTMLView",
442
+ "description": "",
443
+ "description_tooltip": null,
444
+ "layout": "IPY_MODEL_b2c27f1f8a5f437e8f359f1e0ea05fbd",
445
+ "placeholder": "​",
446
+ "style": "IPY_MODEL_1f697938d7e845bb9adcbb3015606f70",
447
+ "value": " 165/165 [00:52<00:00,  3.17it/s]"
448
+ }
449
+ },
450
+ "fd027301b7a6496eb8ae039c25bc4d4b": {
451
+ "model_module": "@jupyter-widgets/base",
452
+ "model_name": "LayoutModel",
453
+ "model_module_version": "1.2.0",
454
+ "state": {
455
+ "_model_module": "@jupyter-widgets/base",
456
+ "_model_module_version": "1.2.0",
457
+ "_model_name": "LayoutModel",
458
+ "_view_count": null,
459
+ "_view_module": "@jupyter-widgets/base",
460
+ "_view_module_version": "1.2.0",
461
+ "_view_name": "LayoutView",
462
+ "align_content": null,
463
+ "align_items": null,
464
+ "align_self": null,
465
+ "border": null,
466
+ "bottom": null,
467
+ "display": null,
468
+ "flex": null,
469
+ "flex_flow": null,
470
+ "grid_area": null,
471
+ "grid_auto_columns": null,
472
+ "grid_auto_flow": null,
473
+ "grid_auto_rows": null,
474
+ "grid_column": null,
475
+ "grid_gap": null,
476
+ "grid_row": null,
477
+ "grid_template_areas": null,
478
+ "grid_template_columns": null,
479
+ "grid_template_rows": null,
480
+ "height": null,
481
+ "justify_content": null,
482
+ "justify_items": null,
483
+ "left": null,
484
+ "margin": null,
485
+ "max_height": null,
486
+ "max_width": null,
487
+ "min_height": null,
488
+ "min_width": null,
489
+ "object_fit": null,
490
+ "object_position": null,
491
+ "order": null,
492
+ "overflow": null,
493
+ "overflow_x": null,
494
+ "overflow_y": null,
495
+ "padding": null,
496
+ "right": null,
497
+ "top": null,
498
+ "visibility": null,
499
+ "width": null
500
+ }
501
+ },
502
+ "9b3468290ae84339ac7c8ce2a9976741": {
503
+ "model_module": "@jupyter-widgets/base",
504
+ "model_name": "LayoutModel",
505
+ "model_module_version": "1.2.0",
506
+ "state": {
507
+ "_model_module": "@jupyter-widgets/base",
508
+ "_model_module_version": "1.2.0",
509
+ "_model_name": "LayoutModel",
510
+ "_view_count": null,
511
+ "_view_module": "@jupyter-widgets/base",
512
+ "_view_module_version": "1.2.0",
513
+ "_view_name": "LayoutView",
514
+ "align_content": null,
515
+ "align_items": null,
516
+ "align_self": null,
517
+ "border": null,
518
+ "bottom": null,
519
+ "display": null,
520
+ "flex": null,
521
+ "flex_flow": null,
522
+ "grid_area": null,
523
+ "grid_auto_columns": null,
524
+ "grid_auto_flow": null,
525
+ "grid_auto_rows": null,
526
+ "grid_column": null,
527
+ "grid_gap": null,
528
+ "grid_row": null,
529
+ "grid_template_areas": null,
530
+ "grid_template_columns": null,
531
+ "grid_template_rows": null,
532
+ "height": null,
533
+ "justify_content": null,
534
+ "justify_items": null,
535
+ "left": null,
536
+ "margin": null,
537
+ "max_height": null,
538
+ "max_width": null,
539
+ "min_height": null,
540
+ "min_width": null,
541
+ "object_fit": null,
542
+ "object_position": null,
543
+ "order": null,
544
+ "overflow": null,
545
+ "overflow_x": null,
546
+ "overflow_y": null,
547
+ "padding": null,
548
+ "right": null,
549
+ "top": null,
550
+ "visibility": null,
551
+ "width": null
552
+ }
553
+ },
554
+ "8beb040b66a1413abd534d5d455d71e2": {
555
+ "model_module": "@jupyter-widgets/controls",
556
+ "model_name": "DescriptionStyleModel",
557
+ "model_module_version": "1.5.0",
558
+ "state": {
559
+ "_model_module": "@jupyter-widgets/controls",
560
+ "_model_module_version": "1.5.0",
561
+ "_model_name": "DescriptionStyleModel",
562
+ "_view_count": null,
563
+ "_view_module": "@jupyter-widgets/base",
564
+ "_view_module_version": "1.2.0",
565
+ "_view_name": "StyleView",
566
+ "description_width": ""
567
+ }
568
+ },
569
+ "b0fc7aa19e194e91a3138867b2377db9": {
570
+ "model_module": "@jupyter-widgets/base",
571
+ "model_name": "LayoutModel",
572
+ "model_module_version": "1.2.0",
573
+ "state": {
574
+ "_model_module": "@jupyter-widgets/base",
575
+ "_model_module_version": "1.2.0",
576
+ "_model_name": "LayoutModel",
577
+ "_view_count": null,
578
+ "_view_module": "@jupyter-widgets/base",
579
+ "_view_module_version": "1.2.0",
580
+ "_view_name": "LayoutView",
581
+ "align_content": null,
582
+ "align_items": null,
583
+ "align_self": null,
584
+ "border": null,
585
+ "bottom": null,
586
+ "display": null,
587
+ "flex": null,
588
+ "flex_flow": null,
589
+ "grid_area": null,
590
+ "grid_auto_columns": null,
591
+ "grid_auto_flow": null,
592
+ "grid_auto_rows": null,
593
+ "grid_column": null,
594
+ "grid_gap": null,
595
+ "grid_row": null,
596
+ "grid_template_areas": null,
597
+ "grid_template_columns": null,
598
+ "grid_template_rows": null,
599
+ "height": null,
600
+ "justify_content": null,
601
+ "justify_items": null,
602
+ "left": null,
603
+ "margin": null,
604
+ "max_height": null,
605
+ "max_width": null,
606
+ "min_height": null,
607
+ "min_width": null,
608
+ "object_fit": null,
609
+ "object_position": null,
610
+ "order": null,
611
+ "overflow": null,
612
+ "overflow_x": null,
613
+ "overflow_y": null,
614
+ "padding": null,
615
+ "right": null,
616
+ "top": null,
617
+ "visibility": null,
618
+ "width": null
619
+ }
620
+ },
621
+ "e9a139d721a147d58952f4a1efc0401b": {
622
+ "model_module": "@jupyter-widgets/controls",
623
+ "model_name": "ProgressStyleModel",
624
+ "model_module_version": "1.5.0",
625
+ "state": {
626
+ "_model_module": "@jupyter-widgets/controls",
627
+ "_model_module_version": "1.5.0",
628
+ "_model_name": "ProgressStyleModel",
629
+ "_view_count": null,
630
+ "_view_module": "@jupyter-widgets/base",
631
+ "_view_module_version": "1.2.0",
632
+ "_view_name": "StyleView",
633
+ "bar_color": null,
634
+ "description_width": ""
635
+ }
636
+ },
637
+ "b2c27f1f8a5f437e8f359f1e0ea05fbd": {
638
+ "model_module": "@jupyter-widgets/base",
639
+ "model_name": "LayoutModel",
640
+ "model_module_version": "1.2.0",
641
+ "state": {
642
+ "_model_module": "@jupyter-widgets/base",
643
+ "_model_module_version": "1.2.0",
644
+ "_model_name": "LayoutModel",
645
+ "_view_count": null,
646
+ "_view_module": "@jupyter-widgets/base",
647
+ "_view_module_version": "1.2.0",
648
+ "_view_name": "LayoutView",
649
+ "align_content": null,
650
+ "align_items": null,
651
+ "align_self": null,
652
+ "border": null,
653
+ "bottom": null,
654
+ "display": null,
655
+ "flex": null,
656
+ "flex_flow": null,
657
+ "grid_area": null,
658
+ "grid_auto_columns": null,
659
+ "grid_auto_flow": null,
660
+ "grid_auto_rows": null,
661
+ "grid_column": null,
662
+ "grid_gap": null,
663
+ "grid_row": null,
664
+ "grid_template_areas": null,
665
+ "grid_template_columns": null,
666
+ "grid_template_rows": null,
667
+ "height": null,
668
+ "justify_content": null,
669
+ "justify_items": null,
670
+ "left": null,
671
+ "margin": null,
672
+ "max_height": null,
673
+ "max_width": null,
674
+ "min_height": null,
675
+ "min_width": null,
676
+ "object_fit": null,
677
+ "object_position": null,
678
+ "order": null,
679
+ "overflow": null,
680
+ "overflow_x": null,
681
+ "overflow_y": null,
682
+ "padding": null,
683
+ "right": null,
684
+ "top": null,
685
+ "visibility": null,
686
+ "width": null
687
+ }
688
+ },
689
+ "1f697938d7e845bb9adcbb3015606f70": {
690
+ "model_module": "@jupyter-widgets/controls",
691
+ "model_name": "DescriptionStyleModel",
692
+ "model_module_version": "1.5.0",
693
+ "state": {
694
+ "_model_module": "@jupyter-widgets/controls",
695
+ "_model_module_version": "1.5.0",
696
+ "_model_name": "DescriptionStyleModel",
697
+ "_view_count": null,
698
+ "_view_module": "@jupyter-widgets/base",
699
+ "_view_module_version": "1.2.0",
700
+ "_view_name": "StyleView",
701
+ "description_width": ""
702
+ }
703
+ },
704
+ "7909e231d23b447c9a0975ed72e00f60": {
705
+ "model_module": "@jupyter-widgets/controls",
706
+ "model_name": "HBoxModel",
707
+ "model_module_version": "1.5.0",
708
+ "state": {
709
+ "_dom_classes": [],
710
+ "_model_module": "@jupyter-widgets/controls",
711
+ "_model_module_version": "1.5.0",
712
+ "_model_name": "HBoxModel",
713
+ "_view_count": null,
714
+ "_view_module": "@jupyter-widgets/controls",
715
+ "_view_module_version": "1.5.0",
716
+ "_view_name": "HBoxView",
717
+ "box_style": "",
718
+ "children": [
719
+ "IPY_MODEL_654bf91d7d694141aa9d3382e72439cb",
720
+ "IPY_MODEL_ddbba23be5cd42d8a12669ae9ae7f0fc",
721
+ "IPY_MODEL_94201956b5774585b99637bf2157f1b0"
722
+ ],
723
+ "layout": "IPY_MODEL_0d3995008ff149828c81d2b91f18be0b"
724
+ }
725
+ },
726
+ "654bf91d7d694141aa9d3382e72439cb": {
727
+ "model_module": "@jupyter-widgets/controls",
728
+ "model_name": "HTMLModel",
729
+ "model_module_version": "1.5.0",
730
+ "state": {
731
+ "_dom_classes": [],
732
+ "_model_module": "@jupyter-widgets/controls",
733
+ "_model_module_version": "1.5.0",
734
+ "_model_name": "HTMLModel",
735
+ "_view_count": null,
736
+ "_view_module": "@jupyter-widgets/controls",
737
+ "_view_module_version": "1.5.0",
738
+ "_view_name": "HTMLView",
739
+ "description": "",
740
+ "description_tooltip": null,
741
+ "layout": "IPY_MODEL_a3f9669d01d241e99b7a7bdd98792353",
742
+ "placeholder": "​",
743
+ "style": "IPY_MODEL_e361562b232c49e6b4771a10106d0d4d",
744
+ "value": "Epoch 2: 100%"
745
+ }
746
+ },
747
+ "ddbba23be5cd42d8a12669ae9ae7f0fc": {
748
+ "model_module": "@jupyter-widgets/controls",
749
+ "model_name": "FloatProgressModel",
750
+ "model_module_version": "1.5.0",
751
+ "state": {
752
+ "_dom_classes": [],
753
+ "_model_module": "@jupyter-widgets/controls",
754
+ "_model_module_version": "1.5.0",
755
+ "_model_name": "FloatProgressModel",
756
+ "_view_count": null,
757
+ "_view_module": "@jupyter-widgets/controls",
758
+ "_view_module_version": "1.5.0",
759
+ "_view_name": "ProgressView",
760
+ "bar_style": "success",
761
+ "description": "",
762
+ "description_tooltip": null,
763
+ "layout": "IPY_MODEL_b0b5f7a2436d4d51b0760b1162f219db",
764
+ "max": 935,
765
+ "min": 0,
766
+ "orientation": "horizontal",
767
+ "style": "IPY_MODEL_4ceb35fd88a242218cb4992f631fe68c",
768
+ "value": 935
769
+ }
770
+ },
771
+ "94201956b5774585b99637bf2157f1b0": {
772
+ "model_module": "@jupyter-widgets/controls",
773
+ "model_name": "HTMLModel",
774
+ "model_module_version": "1.5.0",
775
+ "state": {
776
+ "_dom_classes": [],
777
+ "_model_module": "@jupyter-widgets/controls",
778
+ "_model_module_version": "1.5.0",
779
+ "_model_name": "HTMLModel",
780
+ "_view_count": null,
781
+ "_view_module": "@jupyter-widgets/controls",
782
+ "_view_module_version": "1.5.0",
783
+ "_view_name": "HTMLView",
784
+ "description": "",
785
+ "description_tooltip": null,
786
+ "layout": "IPY_MODEL_faca4620999e4480823607b9a673e570",
787
+ "placeholder": "​",
788
+ "style": "IPY_MODEL_b2548dde84694f589b108a28ca34c4f2",
789
+ "value": " 935/935 [05:08<00:00,  2.73it/s, acc=28.8%]"
790
+ }
791
+ },
792
+ "0d3995008ff149828c81d2b91f18be0b": {
793
+ "model_module": "@jupyter-widgets/base",
794
+ "model_name": "LayoutModel",
795
+ "model_module_version": "1.2.0",
796
+ "state": {
797
+ "_model_module": "@jupyter-widgets/base",
798
+ "_model_module_version": "1.2.0",
799
+ "_model_name": "LayoutModel",
800
+ "_view_count": null,
801
+ "_view_module": "@jupyter-widgets/base",
802
+ "_view_module_version": "1.2.0",
803
+ "_view_name": "LayoutView",
804
+ "align_content": null,
805
+ "align_items": null,
806
+ "align_self": null,
807
+ "border": null,
808
+ "bottom": null,
809
+ "display": null,
810
+ "flex": null,
811
+ "flex_flow": null,
812
+ "grid_area": null,
813
+ "grid_auto_columns": null,
814
+ "grid_auto_flow": null,
815
+ "grid_auto_rows": null,
816
+ "grid_column": null,
817
+ "grid_gap": null,
818
+ "grid_row": null,
819
+ "grid_template_areas": null,
820
+ "grid_template_columns": null,
821
+ "grid_template_rows": null,
822
+ "height": null,
823
+ "justify_content": null,
824
+ "justify_items": null,
825
+ "left": null,
826
+ "margin": null,
827
+ "max_height": null,
828
+ "max_width": null,
829
+ "min_height": null,
830
+ "min_width": null,
831
+ "object_fit": null,
832
+ "object_position": null,
833
+ "order": null,
834
+ "overflow": null,
835
+ "overflow_x": null,
836
+ "overflow_y": null,
837
+ "padding": null,
838
+ "right": null,
839
+ "top": null,
840
+ "visibility": null,
841
+ "width": null
842
+ }
843
+ },
844
+ "a3f9669d01d241e99b7a7bdd98792353": {
845
+ "model_module": "@jupyter-widgets/base",
846
+ "model_name": "LayoutModel",
847
+ "model_module_version": "1.2.0",
848
+ "state": {
849
+ "_model_module": "@jupyter-widgets/base",
850
+ "_model_module_version": "1.2.0",
851
+ "_model_name": "LayoutModel",
852
+ "_view_count": null,
853
+ "_view_module": "@jupyter-widgets/base",
854
+ "_view_module_version": "1.2.0",
855
+ "_view_name": "LayoutView",
856
+ "align_content": null,
857
+ "align_items": null,
858
+ "align_self": null,
859
+ "border": null,
860
+ "bottom": null,
861
+ "display": null,
862
+ "flex": null,
863
+ "flex_flow": null,
864
+ "grid_area": null,
865
+ "grid_auto_columns": null,
866
+ "grid_auto_flow": null,
867
+ "grid_auto_rows": null,
868
+ "grid_column": null,
869
+ "grid_gap": null,
870
+ "grid_row": null,
871
+ "grid_template_areas": null,
872
+ "grid_template_columns": null,
873
+ "grid_template_rows": null,
874
+ "height": null,
875
+ "justify_content": null,
876
+ "justify_items": null,
877
+ "left": null,
878
+ "margin": null,
879
+ "max_height": null,
880
+ "max_width": null,
881
+ "min_height": null,
882
+ "min_width": null,
883
+ "object_fit": null,
884
+ "object_position": null,
885
+ "order": null,
886
+ "overflow": null,
887
+ "overflow_x": null,
888
+ "overflow_y": null,
889
+ "padding": null,
890
+ "right": null,
891
+ "top": null,
892
+ "visibility": null,
893
+ "width": null
894
+ }
895
+ },
896
+ "e361562b232c49e6b4771a10106d0d4d": {
897
+ "model_module": "@jupyter-widgets/controls",
898
+ "model_name": "DescriptionStyleModel",
899
+ "model_module_version": "1.5.0",
900
+ "state": {
901
+ "_model_module": "@jupyter-widgets/controls",
902
+ "_model_module_version": "1.5.0",
903
+ "_model_name": "DescriptionStyleModel",
904
+ "_view_count": null,
905
+ "_view_module": "@jupyter-widgets/base",
906
+ "_view_module_version": "1.2.0",
907
+ "_view_name": "StyleView",
908
+ "description_width": ""
909
+ }
910
+ },
911
+ "b0b5f7a2436d4d51b0760b1162f219db": {
912
+ "model_module": "@jupyter-widgets/base",
913
+ "model_name": "LayoutModel",
914
+ "model_module_version": "1.2.0",
915
+ "state": {
916
+ "_model_module": "@jupyter-widgets/base",
917
+ "_model_module_version": "1.2.0",
918
+ "_model_name": "LayoutModel",
919
+ "_view_count": null,
920
+ "_view_module": "@jupyter-widgets/base",
921
+ "_view_module_version": "1.2.0",
922
+ "_view_name": "LayoutView",
923
+ "align_content": null,
924
+ "align_items": null,
925
+ "align_self": null,
926
+ "border": null,
927
+ "bottom": null,
928
+ "display": null,
929
+ "flex": null,
930
+ "flex_flow": null,
931
+ "grid_area": null,
932
+ "grid_auto_columns": null,
933
+ "grid_auto_flow": null,
934
+ "grid_auto_rows": null,
935
+ "grid_column": null,
936
+ "grid_gap": null,
937
+ "grid_row": null,
938
+ "grid_template_areas": null,
939
+ "grid_template_columns": null,
940
+ "grid_template_rows": null,
941
+ "height": null,
942
+ "justify_content": null,
943
+ "justify_items": null,
944
+ "left": null,
945
+ "margin": null,
946
+ "max_height": null,
947
+ "max_width": null,
948
+ "min_height": null,
949
+ "min_width": null,
950
+ "object_fit": null,
951
+ "object_position": null,
952
+ "order": null,
953
+ "overflow": null,
954
+ "overflow_x": null,
955
+ "overflow_y": null,
956
+ "padding": null,
957
+ "right": null,
958
+ "top": null,
959
+ "visibility": null,
960
+ "width": null
961
+ }
962
+ },
963
+ "4ceb35fd88a242218cb4992f631fe68c": {
964
+ "model_module": "@jupyter-widgets/controls",
965
+ "model_name": "ProgressStyleModel",
966
+ "model_module_version": "1.5.0",
967
+ "state": {
968
+ "_model_module": "@jupyter-widgets/controls",
969
+ "_model_module_version": "1.5.0",
970
+ "_model_name": "ProgressStyleModel",
971
+ "_view_count": null,
972
+ "_view_module": "@jupyter-widgets/base",
973
+ "_view_module_version": "1.2.0",
974
+ "_view_name": "StyleView",
975
+ "bar_color": null,
976
+ "description_width": ""
977
+ }
978
+ },
979
+ "faca4620999e4480823607b9a673e570": {
980
+ "model_module": "@jupyter-widgets/base",
981
+ "model_name": "LayoutModel",
982
+ "model_module_version": "1.2.0",
983
+ "state": {
984
+ "_model_module": "@jupyter-widgets/base",
985
+ "_model_module_version": "1.2.0",
986
+ "_model_name": "LayoutModel",
987
+ "_view_count": null,
988
+ "_view_module": "@jupyter-widgets/base",
989
+ "_view_module_version": "1.2.0",
990
+ "_view_name": "LayoutView",
991
+ "align_content": null,
992
+ "align_items": null,
993
+ "align_self": null,
994
+ "border": null,
995
+ "bottom": null,
996
+ "display": null,
997
+ "flex": null,
998
+ "flex_flow": null,
999
+ "grid_area": null,
1000
+ "grid_auto_columns": null,
1001
+ "grid_auto_flow": null,
1002
+ "grid_auto_rows": null,
1003
+ "grid_column": null,
1004
+ "grid_gap": null,
1005
+ "grid_row": null,
1006
+ "grid_template_areas": null,
1007
+ "grid_template_columns": null,
1008
+ "grid_template_rows": null,
1009
+ "height": null,
1010
+ "justify_content": null,
1011
+ "justify_items": null,
1012
+ "left": null,
1013
+ "margin": null,
1014
+ "max_height": null,
1015
+ "max_width": null,
1016
+ "min_height": null,
1017
+ "min_width": null,
1018
+ "object_fit": null,
1019
+ "object_position": null,
1020
+ "order": null,
1021
+ "overflow": null,
1022
+ "overflow_x": null,
1023
+ "overflow_y": null,
1024
+ "padding": null,
1025
+ "right": null,
1026
+ "top": null,
1027
+ "visibility": null,
1028
+ "width": null
1029
+ }
1030
+ },
1031
+ "b2548dde84694f589b108a28ca34c4f2": {
1032
+ "model_module": "@jupyter-widgets/controls",
1033
+ "model_name": "DescriptionStyleModel",
1034
+ "model_module_version": "1.5.0",
1035
+ "state": {
1036
+ "_model_module": "@jupyter-widgets/controls",
1037
+ "_model_module_version": "1.5.0",
1038
+ "_model_name": "DescriptionStyleModel",
1039
+ "_view_count": null,
1040
+ "_view_module": "@jupyter-widgets/base",
1041
+ "_view_module_version": "1.2.0",
1042
+ "_view_name": "StyleView",
1043
+ "description_width": ""
1044
+ }
1045
+ },
1046
+ "990d32fd8ad943898db8f524175171ba": {
1047
+ "model_module": "@jupyter-widgets/controls",
1048
+ "model_name": "HBoxModel",
1049
+ "model_module_version": "1.5.0",
1050
+ "state": {
1051
+ "_dom_classes": [],
1052
+ "_model_module": "@jupyter-widgets/controls",
1053
+ "_model_module_version": "1.5.0",
1054
+ "_model_name": "HBoxModel",
1055
+ "_view_count": null,
1056
+ "_view_module": "@jupyter-widgets/controls",
1057
+ "_view_module_version": "1.5.0",
1058
+ "_view_name": "HBoxView",
1059
+ "box_style": "",
1060
+ "children": [
1061
+ "IPY_MODEL_56543f9f03674385a106f46b3736a270",
1062
+ "IPY_MODEL_0856e831cbb7419fbbe66e7e7fb99906",
1063
+ "IPY_MODEL_f4f091fd560242a28cf09b6c33d7c9a3"
1064
+ ],
1065
+ "layout": "IPY_MODEL_7421731362ea4c25abceefb9cf2c5d1b"
1066
+ }
1067
+ },
1068
+ "56543f9f03674385a106f46b3736a270": {
1069
+ "model_module": "@jupyter-widgets/controls",
1070
+ "model_name": "HTMLModel",
1071
+ "model_module_version": "1.5.0",
1072
+ "state": {
1073
+ "_dom_classes": [],
1074
+ "_model_module": "@jupyter-widgets/controls",
1075
+ "_model_module_version": "1.5.0",
1076
+ "_model_name": "HTMLModel",
1077
+ "_view_count": null,
1078
+ "_view_module": "@jupyter-widgets/controls",
1079
+ "_view_module_version": "1.5.0",
1080
+ "_view_name": "HTMLView",
1081
+ "description": "",
1082
+ "description_tooltip": null,
1083
+ "layout": "IPY_MODEL_26e92b3e139a464a848b68a55a8095a8",
1084
+ "placeholder": "​",
1085
+ "style": "IPY_MODEL_75cf349e98bd4a6bb45a66a112acfa30",
1086
+ "value": "Validating: 100%"
1087
+ }
1088
+ },
1089
+ "0856e831cbb7419fbbe66e7e7fb99906": {
1090
+ "model_module": "@jupyter-widgets/controls",
1091
+ "model_name": "FloatProgressModel",
1092
+ "model_module_version": "1.5.0",
1093
+ "state": {
1094
+ "_dom_classes": [],
1095
+ "_model_module": "@jupyter-widgets/controls",
1096
+ "_model_module_version": "1.5.0",
1097
+ "_model_name": "FloatProgressModel",
1098
+ "_view_count": null,
1099
+ "_view_module": "@jupyter-widgets/controls",
1100
+ "_view_module_version": "1.5.0",
1101
+ "_view_name": "ProgressView",
1102
+ "bar_style": "success",
1103
+ "description": "",
1104
+ "description_tooltip": null,
1105
+ "layout": "IPY_MODEL_abbb218edf3a4f54bcd72acecfdd17b3",
1106
+ "max": 165,
1107
+ "min": 0,
1108
+ "orientation": "horizontal",
1109
+ "style": "IPY_MODEL_72bc8f82e228452fb5faa36ff84d03a4",
1110
+ "value": 165
1111
+ }
1112
+ },
1113
+ "f4f091fd560242a28cf09b6c33d7c9a3": {
1114
+ "model_module": "@jupyter-widgets/controls",
1115
+ "model_name": "HTMLModel",
1116
+ "model_module_version": "1.5.0",
1117
+ "state": {
1118
+ "_dom_classes": [],
1119
+ "_model_module": "@jupyter-widgets/controls",
1120
+ "_model_module_version": "1.5.0",
1121
+ "_model_name": "HTMLModel",
1122
+ "_view_count": null,
1123
+ "_view_module": "@jupyter-widgets/controls",
1124
+ "_view_module_version": "1.5.0",
1125
+ "_view_name": "HTMLView",
1126
+ "description": "",
1127
+ "description_tooltip": null,
1128
+ "layout": "IPY_MODEL_bbee56a1661c4347bc538ad204e4efe2",
1129
+ "placeholder": "​",
1130
+ "style": "IPY_MODEL_05669dee14ff4356a8185285ee9c2951",
1131
+ "value": " 165/165 [00:52<00:00,  3.18it/s]"
1132
+ }
1133
+ },
1134
+ "7421731362ea4c25abceefb9cf2c5d1b": {
1135
+ "model_module": "@jupyter-widgets/base",
1136
+ "model_name": "LayoutModel",
1137
+ "model_module_version": "1.2.0",
1138
+ "state": {
1139
+ "_model_module": "@jupyter-widgets/base",
1140
+ "_model_module_version": "1.2.0",
1141
+ "_model_name": "LayoutModel",
1142
+ "_view_count": null,
1143
+ "_view_module": "@jupyter-widgets/base",
1144
+ "_view_module_version": "1.2.0",
1145
+ "_view_name": "LayoutView",
1146
+ "align_content": null,
1147
+ "align_items": null,
1148
+ "align_self": null,
1149
+ "border": null,
1150
+ "bottom": null,
1151
+ "display": null,
1152
+ "flex": null,
1153
+ "flex_flow": null,
1154
+ "grid_area": null,
1155
+ "grid_auto_columns": null,
1156
+ "grid_auto_flow": null,
1157
+ "grid_auto_rows": null,
1158
+ "grid_column": null,
1159
+ "grid_gap": null,
1160
+ "grid_row": null,
1161
+ "grid_template_areas": null,
1162
+ "grid_template_columns": null,
1163
+ "grid_template_rows": null,
1164
+ "height": null,
1165
+ "justify_content": null,
1166
+ "justify_items": null,
1167
+ "left": null,
1168
+ "margin": null,
1169
+ "max_height": null,
1170
+ "max_width": null,
1171
+ "min_height": null,
1172
+ "min_width": null,
1173
+ "object_fit": null,
1174
+ "object_position": null,
1175
+ "order": null,
1176
+ "overflow": null,
1177
+ "overflow_x": null,
1178
+ "overflow_y": null,
1179
+ "padding": null,
1180
+ "right": null,
1181
+ "top": null,
1182
+ "visibility": null,
1183
+ "width": null
1184
+ }
1185
+ },
1186
+ "26e92b3e139a464a848b68a55a8095a8": {
1187
+ "model_module": "@jupyter-widgets/base",
1188
+ "model_name": "LayoutModel",
1189
+ "model_module_version": "1.2.0",
1190
+ "state": {
1191
+ "_model_module": "@jupyter-widgets/base",
1192
+ "_model_module_version": "1.2.0",
1193
+ "_model_name": "LayoutModel",
1194
+ "_view_count": null,
1195
+ "_view_module": "@jupyter-widgets/base",
1196
+ "_view_module_version": "1.2.0",
1197
+ "_view_name": "LayoutView",
1198
+ "align_content": null,
1199
+ "align_items": null,
1200
+ "align_self": null,
1201
+ "border": null,
1202
+ "bottom": null,
1203
+ "display": null,
1204
+ "flex": null,
1205
+ "flex_flow": null,
1206
+ "grid_area": null,
1207
+ "grid_auto_columns": null,
1208
+ "grid_auto_flow": null,
1209
+ "grid_auto_rows": null,
1210
+ "grid_column": null,
1211
+ "grid_gap": null,
1212
+ "grid_row": null,
1213
+ "grid_template_areas": null,
1214
+ "grid_template_columns": null,
1215
+ "grid_template_rows": null,
1216
+ "height": null,
1217
+ "justify_content": null,
1218
+ "justify_items": null,
1219
+ "left": null,
1220
+ "margin": null,
1221
+ "max_height": null,
1222
+ "max_width": null,
1223
+ "min_height": null,
1224
+ "min_width": null,
1225
+ "object_fit": null,
1226
+ "object_position": null,
1227
+ "order": null,
1228
+ "overflow": null,
1229
+ "overflow_x": null,
1230
+ "overflow_y": null,
1231
+ "padding": null,
1232
+ "right": null,
1233
+ "top": null,
1234
+ "visibility": null,
1235
+ "width": null
1236
+ }
1237
+ },
1238
+ "75cf349e98bd4a6bb45a66a112acfa30": {
1239
+ "model_module": "@jupyter-widgets/controls",
1240
+ "model_name": "DescriptionStyleModel",
1241
+ "model_module_version": "1.5.0",
1242
+ "state": {
1243
+ "_model_module": "@jupyter-widgets/controls",
1244
+ "_model_module_version": "1.5.0",
1245
+ "_model_name": "DescriptionStyleModel",
1246
+ "_view_count": null,
1247
+ "_view_module": "@jupyter-widgets/base",
1248
+ "_view_module_version": "1.2.0",
1249
+ "_view_name": "StyleView",
1250
+ "description_width": ""
1251
+ }
1252
+ },
1253
+ "abbb218edf3a4f54bcd72acecfdd17b3": {
1254
+ "model_module": "@jupyter-widgets/base",
1255
+ "model_name": "LayoutModel",
1256
+ "model_module_version": "1.2.0",
1257
+ "state": {
1258
+ "_model_module": "@jupyter-widgets/base",
1259
+ "_model_module_version": "1.2.0",
1260
+ "_model_name": "LayoutModel",
1261
+ "_view_count": null,
1262
+ "_view_module": "@jupyter-widgets/base",
1263
+ "_view_module_version": "1.2.0",
1264
+ "_view_name": "LayoutView",
1265
+ "align_content": null,
1266
+ "align_items": null,
1267
+ "align_self": null,
1268
+ "border": null,
1269
+ "bottom": null,
1270
+ "display": null,
1271
+ "flex": null,
1272
+ "flex_flow": null,
1273
+ "grid_area": null,
1274
+ "grid_auto_columns": null,
1275
+ "grid_auto_flow": null,
1276
+ "grid_auto_rows": null,
1277
+ "grid_column": null,
1278
+ "grid_gap": null,
1279
+ "grid_row": null,
1280
+ "grid_template_areas": null,
1281
+ "grid_template_columns": null,
1282
+ "grid_template_rows": null,
1283
+ "height": null,
1284
+ "justify_content": null,
1285
+ "justify_items": null,
1286
+ "left": null,
1287
+ "margin": null,
1288
+ "max_height": null,
1289
+ "max_width": null,
1290
+ "min_height": null,
1291
+ "min_width": null,
1292
+ "object_fit": null,
1293
+ "object_position": null,
1294
+ "order": null,
1295
+ "overflow": null,
1296
+ "overflow_x": null,
1297
+ "overflow_y": null,
1298
+ "padding": null,
1299
+ "right": null,
1300
+ "top": null,
1301
+ "visibility": null,
1302
+ "width": null
1303
+ }
1304
+ },
1305
+ "72bc8f82e228452fb5faa36ff84d03a4": {
1306
+ "model_module": "@jupyter-widgets/controls",
1307
+ "model_name": "ProgressStyleModel",
1308
+ "model_module_version": "1.5.0",
1309
+ "state": {
1310
+ "_model_module": "@jupyter-widgets/controls",
1311
+ "_model_module_version": "1.5.0",
1312
+ "_model_name": "ProgressStyleModel",
1313
+ "_view_count": null,
1314
+ "_view_module": "@jupyter-widgets/base",
1315
+ "_view_module_version": "1.2.0",
1316
+ "_view_name": "StyleView",
1317
+ "bar_color": null,
1318
+ "description_width": ""
1319
+ }
1320
+ },
1321
+ "bbee56a1661c4347bc538ad204e4efe2": {
1322
+ "model_module": "@jupyter-widgets/base",
1323
+ "model_name": "LayoutModel",
1324
+ "model_module_version": "1.2.0",
1325
+ "state": {
1326
+ "_model_module": "@jupyter-widgets/base",
1327
+ "_model_module_version": "1.2.0",
1328
+ "_model_name": "LayoutModel",
1329
+ "_view_count": null,
1330
+ "_view_module": "@jupyter-widgets/base",
1331
+ "_view_module_version": "1.2.0",
1332
+ "_view_name": "LayoutView",
1333
+ "align_content": null,
1334
+ "align_items": null,
1335
+ "align_self": null,
1336
+ "border": null,
1337
+ "bottom": null,
1338
+ "display": null,
1339
+ "flex": null,
1340
+ "flex_flow": null,
1341
+ "grid_area": null,
1342
+ "grid_auto_columns": null,
1343
+ "grid_auto_flow": null,
1344
+ "grid_auto_rows": null,
1345
+ "grid_column": null,
1346
+ "grid_gap": null,
1347
+ "grid_row": null,
1348
+ "grid_template_areas": null,
1349
+ "grid_template_columns": null,
1350
+ "grid_template_rows": null,
1351
+ "height": null,
1352
+ "justify_content": null,
1353
+ "justify_items": null,
1354
+ "left": null,
1355
+ "margin": null,
1356
+ "max_height": null,
1357
+ "max_width": null,
1358
+ "min_height": null,
1359
+ "min_width": null,
1360
+ "object_fit": null,
1361
+ "object_position": null,
1362
+ "order": null,
1363
+ "overflow": null,
1364
+ "overflow_x": null,
1365
+ "overflow_y": null,
1366
+ "padding": null,
1367
+ "right": null,
1368
+ "top": null,
1369
+ "visibility": null,
1370
+ "width": null
1371
+ }
1372
+ },
1373
+ "05669dee14ff4356a8185285ee9c2951": {
1374
+ "model_module": "@jupyter-widgets/controls",
1375
+ "model_name": "DescriptionStyleModel",
1376
+ "model_module_version": "1.5.0",
1377
+ "state": {
1378
+ "_model_module": "@jupyter-widgets/controls",
1379
+ "_model_module_version": "1.5.0",
1380
+ "_model_name": "DescriptionStyleModel",
1381
+ "_view_count": null,
1382
+ "_view_module": "@jupyter-widgets/base",
1383
+ "_view_module_version": "1.2.0",
1384
+ "_view_name": "StyleView",
1385
+ "description_width": ""
1386
+ }
1387
+ },
1388
+ "a1dbd51ede52442e8b6b2cef26e8bbd9": {
1389
+ "model_module": "@jupyter-widgets/controls",
1390
+ "model_name": "HBoxModel",
1391
+ "model_module_version": "1.5.0",
1392
+ "state": {
1393
+ "_dom_classes": [],
1394
+ "_model_module": "@jupyter-widgets/controls",
1395
+ "_model_module_version": "1.5.0",
1396
+ "_model_name": "HBoxModel",
1397
+ "_view_count": null,
1398
+ "_view_module": "@jupyter-widgets/controls",
1399
+ "_view_module_version": "1.5.0",
1400
+ "_view_name": "HBoxView",
1401
+ "box_style": "",
1402
+ "children": [
1403
+ "IPY_MODEL_b62503ae035d447ab609dff304523785",
1404
+ "IPY_MODEL_d98feea8dca847a1a087c19e0ffb40db",
1405
+ "IPY_MODEL_903bdef3de794bd0a0978a9fa847dcc8"
1406
+ ],
1407
+ "layout": "IPY_MODEL_81acd630739a4a99a33bc5c060b56c0b"
1408
+ }
1409
+ },
1410
+ "b62503ae035d447ab609dff304523785": {
1411
+ "model_module": "@jupyter-widgets/controls",
1412
+ "model_name": "HTMLModel",
1413
+ "model_module_version": "1.5.0",
1414
+ "state": {
1415
+ "_dom_classes": [],
1416
+ "_model_module": "@jupyter-widgets/controls",
1417
+ "_model_module_version": "1.5.0",
1418
+ "_model_name": "HTMLModel",
1419
+ "_view_count": null,
1420
+ "_view_module": "@jupyter-widgets/controls",
1421
+ "_view_module_version": "1.5.0",
1422
+ "_view_name": "HTMLView",
1423
+ "description": "",
1424
+ "description_tooltip": null,
1425
+ "layout": "IPY_MODEL_3f1848c901e14b6d86f9e10a52ca607e",
1426
+ "placeholder": "​",
1427
+ "style": "IPY_MODEL_b47f59b606b94a68a3361a185e0040aa",
1428
+ "value": "Epoch 3:  93%"
1429
+ }
1430
+ },
1431
+ "d98feea8dca847a1a087c19e0ffb40db": {
1432
+ "model_module": "@jupyter-widgets/controls",
1433
+ "model_name": "FloatProgressModel",
1434
+ "model_module_version": "1.5.0",
1435
+ "state": {
1436
+ "_dom_classes": [],
1437
+ "_model_module": "@jupyter-widgets/controls",
1438
+ "_model_module_version": "1.5.0",
1439
+ "_model_name": "FloatProgressModel",
1440
+ "_view_count": null,
1441
+ "_view_module": "@jupyter-widgets/controls",
1442
+ "_view_module_version": "1.5.0",
1443
+ "_view_name": "ProgressView",
1444
+ "bar_style": "",
1445
+ "description": "",
1446
+ "description_tooltip": null,
1447
+ "layout": "IPY_MODEL_5f47532448504e4c8398688d85787201",
1448
+ "max": 935,
1449
+ "min": 0,
1450
+ "orientation": "horizontal",
1451
+ "style": "IPY_MODEL_0e8901c365954716932d4a09bb5e615f",
1452
+ "value": 867
1453
+ }
1454
+ },
1455
+ "903bdef3de794bd0a0978a9fa847dcc8": {
1456
+ "model_module": "@jupyter-widgets/controls",
1457
+ "model_name": "HTMLModel",
1458
+ "model_module_version": "1.5.0",
1459
+ "state": {
1460
+ "_dom_classes": [],
1461
+ "_model_module": "@jupyter-widgets/controls",
1462
+ "_model_module_version": "1.5.0",
1463
+ "_model_name": "HTMLModel",
1464
+ "_view_count": null,
1465
+ "_view_module": "@jupyter-widgets/controls",
1466
+ "_view_module_version": "1.5.0",
1467
+ "_view_name": "HTMLView",
1468
+ "description": "",
1469
+ "description_tooltip": null,
1470
+ "layout": "IPY_MODEL_7527f720a46a4ed384f09866b6550e6e",
1471
+ "placeholder": "​",
1472
+ "style": "IPY_MODEL_d5df03c93765465fb8bc49dc19c88786",
1473
+ "value": " 867/935 [04:44<00:23,  2.91it/s, acc=33.5%]"
1474
+ }
1475
+ },
1476
+ "81acd630739a4a99a33bc5c060b56c0b": {
1477
+ "model_module": "@jupyter-widgets/base",
1478
+ "model_name": "LayoutModel",
1479
+ "model_module_version": "1.2.0",
1480
+ "state": {
1481
+ "_model_module": "@jupyter-widgets/base",
1482
+ "_model_module_version": "1.2.0",
1483
+ "_model_name": "LayoutModel",
1484
+ "_view_count": null,
1485
+ "_view_module": "@jupyter-widgets/base",
1486
+ "_view_module_version": "1.2.0",
1487
+ "_view_name": "LayoutView",
1488
+ "align_content": null,
1489
+ "align_items": null,
1490
+ "align_self": null,
1491
+ "border": null,
1492
+ "bottom": null,
1493
+ "display": null,
1494
+ "flex": null,
1495
+ "flex_flow": null,
1496
+ "grid_area": null,
1497
+ "grid_auto_columns": null,
1498
+ "grid_auto_flow": null,
1499
+ "grid_auto_rows": null,
1500
+ "grid_column": null,
1501
+ "grid_gap": null,
1502
+ "grid_row": null,
1503
+ "grid_template_areas": null,
1504
+ "grid_template_columns": null,
1505
+ "grid_template_rows": null,
1506
+ "height": null,
1507
+ "justify_content": null,
1508
+ "justify_items": null,
1509
+ "left": null,
1510
+ "margin": null,
1511
+ "max_height": null,
1512
+ "max_width": null,
1513
+ "min_height": null,
1514
+ "min_width": null,
1515
+ "object_fit": null,
1516
+ "object_position": null,
1517
+ "order": null,
1518
+ "overflow": null,
1519
+ "overflow_x": null,
1520
+ "overflow_y": null,
1521
+ "padding": null,
1522
+ "right": null,
1523
+ "top": null,
1524
+ "visibility": null,
1525
+ "width": null
1526
+ }
1527
+ },
1528
+ "3f1848c901e14b6d86f9e10a52ca607e": {
1529
+ "model_module": "@jupyter-widgets/base",
1530
+ "model_name": "LayoutModel",
1531
+ "model_module_version": "1.2.0",
1532
+ "state": {
1533
+ "_model_module": "@jupyter-widgets/base",
1534
+ "_model_module_version": "1.2.0",
1535
+ "_model_name": "LayoutModel",
1536
+ "_view_count": null,
1537
+ "_view_module": "@jupyter-widgets/base",
1538
+ "_view_module_version": "1.2.0",
1539
+ "_view_name": "LayoutView",
1540
+ "align_content": null,
1541
+ "align_items": null,
1542
+ "align_self": null,
1543
+ "border": null,
1544
+ "bottom": null,
1545
+ "display": null,
1546
+ "flex": null,
1547
+ "flex_flow": null,
1548
+ "grid_area": null,
1549
+ "grid_auto_columns": null,
1550
+ "grid_auto_flow": null,
1551
+ "grid_auto_rows": null,
1552
+ "grid_column": null,
1553
+ "grid_gap": null,
1554
+ "grid_row": null,
1555
+ "grid_template_areas": null,
1556
+ "grid_template_columns": null,
1557
+ "grid_template_rows": null,
1558
+ "height": null,
1559
+ "justify_content": null,
1560
+ "justify_items": null,
1561
+ "left": null,
1562
+ "margin": null,
1563
+ "max_height": null,
1564
+ "max_width": null,
1565
+ "min_height": null,
1566
+ "min_width": null,
1567
+ "object_fit": null,
1568
+ "object_position": null,
1569
+ "order": null,
1570
+ "overflow": null,
1571
+ "overflow_x": null,
1572
+ "overflow_y": null,
1573
+ "padding": null,
1574
+ "right": null,
1575
+ "top": null,
1576
+ "visibility": null,
1577
+ "width": null
1578
+ }
1579
+ },
1580
+ "b47f59b606b94a68a3361a185e0040aa": {
1581
+ "model_module": "@jupyter-widgets/controls",
1582
+ "model_name": "DescriptionStyleModel",
1583
+ "model_module_version": "1.5.0",
1584
+ "state": {
1585
+ "_model_module": "@jupyter-widgets/controls",
1586
+ "_model_module_version": "1.5.0",
1587
+ "_model_name": "DescriptionStyleModel",
1588
+ "_view_count": null,
1589
+ "_view_module": "@jupyter-widgets/base",
1590
+ "_view_module_version": "1.2.0",
1591
+ "_view_name": "StyleView",
1592
+ "description_width": ""
1593
+ }
1594
+ },
1595
+ "5f47532448504e4c8398688d85787201": {
1596
+ "model_module": "@jupyter-widgets/base",
1597
+ "model_name": "LayoutModel",
1598
+ "model_module_version": "1.2.0",
1599
+ "state": {
1600
+ "_model_module": "@jupyter-widgets/base",
1601
+ "_model_module_version": "1.2.0",
1602
+ "_model_name": "LayoutModel",
1603
+ "_view_count": null,
1604
+ "_view_module": "@jupyter-widgets/base",
1605
+ "_view_module_version": "1.2.0",
1606
+ "_view_name": "LayoutView",
1607
+ "align_content": null,
1608
+ "align_items": null,
1609
+ "align_self": null,
1610
+ "border": null,
1611
+ "bottom": null,
1612
+ "display": null,
1613
+ "flex": null,
1614
+ "flex_flow": null,
1615
+ "grid_area": null,
1616
+ "grid_auto_columns": null,
1617
+ "grid_auto_flow": null,
1618
+ "grid_auto_rows": null,
1619
+ "grid_column": null,
1620
+ "grid_gap": null,
1621
+ "grid_row": null,
1622
+ "grid_template_areas": null,
1623
+ "grid_template_columns": null,
1624
+ "grid_template_rows": null,
1625
+ "height": null,
1626
+ "justify_content": null,
1627
+ "justify_items": null,
1628
+ "left": null,
1629
+ "margin": null,
1630
+ "max_height": null,
1631
+ "max_width": null,
1632
+ "min_height": null,
1633
+ "min_width": null,
1634
+ "object_fit": null,
1635
+ "object_position": null,
1636
+ "order": null,
1637
+ "overflow": null,
1638
+ "overflow_x": null,
1639
+ "overflow_y": null,
1640
+ "padding": null,
1641
+ "right": null,
1642
+ "top": null,
1643
+ "visibility": null,
1644
+ "width": null
1645
+ }
1646
+ },
1647
+ "0e8901c365954716932d4a09bb5e615f": {
1648
+ "model_module": "@jupyter-widgets/controls",
1649
+ "model_name": "ProgressStyleModel",
1650
+ "model_module_version": "1.5.0",
1651
+ "state": {
1652
+ "_model_module": "@jupyter-widgets/controls",
1653
+ "_model_module_version": "1.5.0",
1654
+ "_model_name": "ProgressStyleModel",
1655
+ "_view_count": null,
1656
+ "_view_module": "@jupyter-widgets/base",
1657
+ "_view_module_version": "1.2.0",
1658
+ "_view_name": "StyleView",
1659
+ "bar_color": null,
1660
+ "description_width": ""
1661
+ }
1662
+ },
1663
+ "7527f720a46a4ed384f09866b6550e6e": {
1664
+ "model_module": "@jupyter-widgets/base",
1665
+ "model_name": "LayoutModel",
1666
+ "model_module_version": "1.2.0",
1667
+ "state": {
1668
+ "_model_module": "@jupyter-widgets/base",
1669
+ "_model_module_version": "1.2.0",
1670
+ "_model_name": "LayoutModel",
1671
+ "_view_count": null,
1672
+ "_view_module": "@jupyter-widgets/base",
1673
+ "_view_module_version": "1.2.0",
1674
+ "_view_name": "LayoutView",
1675
+ "align_content": null,
1676
+ "align_items": null,
1677
+ "align_self": null,
1678
+ "border": null,
1679
+ "bottom": null,
1680
+ "display": null,
1681
+ "flex": null,
1682
+ "flex_flow": null,
1683
+ "grid_area": null,
1684
+ "grid_auto_columns": null,
1685
+ "grid_auto_flow": null,
1686
+ "grid_auto_rows": null,
1687
+ "grid_column": null,
1688
+ "grid_gap": null,
1689
+ "grid_row": null,
1690
+ "grid_template_areas": null,
1691
+ "grid_template_columns": null,
1692
+ "grid_template_rows": null,
1693
+ "height": null,
1694
+ "justify_content": null,
1695
+ "justify_items": null,
1696
+ "left": null,
1697
+ "margin": null,
1698
+ "max_height": null,
1699
+ "max_width": null,
1700
+ "min_height": null,
1701
+ "min_width": null,
1702
+ "object_fit": null,
1703
+ "object_position": null,
1704
+ "order": null,
1705
+ "overflow": null,
1706
+ "overflow_x": null,
1707
+ "overflow_y": null,
1708
+ "padding": null,
1709
+ "right": null,
1710
+ "top": null,
1711
+ "visibility": null,
1712
+ "width": null
1713
+ }
1714
+ },
1715
+ "d5df03c93765465fb8bc49dc19c88786": {
1716
+ "model_module": "@jupyter-widgets/controls",
1717
+ "model_name": "DescriptionStyleModel",
1718
+ "model_module_version": "1.5.0",
1719
+ "state": {
1720
+ "_model_module": "@jupyter-widgets/controls",
1721
+ "_model_module_version": "1.5.0",
1722
+ "_model_name": "DescriptionStyleModel",
1723
+ "_view_count": null,
1724
+ "_view_module": "@jupyter-widgets/base",
1725
+ "_view_module_version": "1.2.0",
1726
+ "_view_name": "StyleView",
1727
+ "description_width": ""
1728
+ }
1729
+ }
1730
+ }
1731
+ }
1732
+ },
1733
+ "cells": [
1734
+ {
1735
+ "cell_type": "code",
1736
+ "source": [
1737
+ "try:\n",
1738
+ " !pip uninstall -qy geometricvocab geofractal\n",
1739
+ "except:\n",
1740
+ " pass\n",
1741
+ "\n",
1742
+ "!pip install -q git+https://github.com/AbstractEyes/geofractal.git"
1743
+ ],
1744
+ "metadata": {
1745
+ "colab": {
1746
+ "base_uri": "https://localhost:8080/"
1747
+ },
1748
+ "id": "LhqMG1Ayd6W6",
1749
+ "outputId": "26f76fc9-8243-4029-db43-aba0a125e027"
1750
+ },
1751
+ "execution_count": 21,
1752
+ "outputs": [
1753
+ {
1754
+ "output_type": "stream",
1755
+ "name": "stdout",
1756
+ "text": [
1757
+ " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
1758
+ " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
1759
+ " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
1760
+ " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
1761
+ " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
1762
+ " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
1763
+ " Building wheel for geofractal (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
1764
+ " Building wheel for geometricvocab (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"
1765
+ ]
1766
+ }
1767
+ ]
1768
+ },
1769
+ {
1770
+ "cell_type": "code",
1771
+ "source": [
1772
+ "# Cell: Setup Qwen2.5-Math-1.5B + T5 Hierarchical Collective\n",
1773
+ "import torch\n",
1774
+ "import torch.nn as nn\n",
1775
+ "from transformers import AutoTokenizer, AutoModel, T5Tokenizer, T5EncoderModel\n",
1776
+ "from datasets import load_dataset\n",
1777
+ "from torch.utils.data import DataLoader\n",
1778
+ "from geofractal.router.head import build_standard_head, HeadConfig\n",
1779
+ "from tqdm.auto import tqdm\n",
1780
+ "import re\n",
1781
+ "\n",
1782
+ "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
1783
+ "print(f\"Device: {device}\")\n",
1784
+ "\n",
1785
+ "# ============================================================================\n",
1786
+ "# FROZEN BACKBONES\n",
1787
+ "# ============================================================================\n",
1788
+ "\n",
1789
+ "# Qwen2.5-Math-1.5B - actual math reasoning model\n",
1790
+ "qwen_tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-Math-1.5B\", trust_remote_code=True)\n",
1791
+ "qwen = AutoModel.from_pretrained(\"Qwen/Qwen2.5-Math-1.5B\", trust_remote_code=True).to(device)\n",
1792
+ "qwen.eval()\n",
1793
+ "for p in qwen.parameters():\n",
1794
+ " p.requires_grad = False\n",
1795
+ "\n",
1796
+ "# T5-base - general language\n",
1797
+ "t5_tokenizer = T5Tokenizer.from_pretrained(\"t5-base\")\n",
1798
+ "t5 = T5EncoderModel.from_pretrained(\"t5-base\").to(device)\n",
1799
+ "t5.eval()\n",
1800
+ "for p in t5.parameters():\n",
1801
+ " p.requires_grad = False\n",
1802
+ "\n",
1803
+ "print(f\"Qwen2.5-Math hidden: {qwen.config.hidden_size}\")\n",
1804
+ "print(f\"T5 hidden: {t5.config.d_model}\")\n",
1805
+ "print(f\"Qwen params: {sum(p.numel() for p in qwen.parameters()):,} (frozen)\")\n",
1806
+ "print(f\"T5 params: {sum(p.numel() for p in t5.parameters()):,} (frozen)\")"
1807
+ ],
1808
+ "metadata": {
1809
+ "colab": {
1810
+ "base_uri": "https://localhost:8080/"
1811
+ },
1812
+ "id": "_zXY4Iktog6n",
1813
+ "outputId": "8d2e1538-f154-4699-851e-f2c7ced877f1"
1814
+ },
1815
+ "execution_count": 1,
1816
+ "outputs": [
1817
+ {
1818
+ "output_type": "stream",
1819
+ "name": "stdout",
1820
+ "text": [
1821
+ "Device: cuda\n"
1822
+ ]
1823
+ },
1824
+ {
1825
+ "output_type": "stream",
1826
+ "name": "stderr",
1827
+ "text": [
1828
+ "You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565\n"
1829
+ ]
1830
+ },
1831
+ {
1832
+ "output_type": "stream",
1833
+ "name": "stdout",
1834
+ "text": [
1835
+ "Qwen2.5-Math hidden: 1536\n",
1836
+ "T5 hidden: 768\n",
1837
+ "Qwen params: 1,543,714,304 (frozen)\n",
1838
+ "T5 params: 109,628,544 (frozen)\n"
1839
+ ]
1840
+ }
1841
+ ]
1842
+ },
1843
+ {
1844
+ "cell_type": "code",
1845
+ "source": [
1846
+ "# Cell: Hierarchical routing structure\n",
1847
+ "#\n",
1848
+ "# Level 1: Base streams (T5 linguistic + Deterministic)\n",
1849
+ "# Level 2: Qwen-Math as supervisor/verifier\n",
1850
+ "#\n",
1851
+ "# The hierarchy: Qwen sees the fused lower-level representation\n",
1852
+ "# and can override/refine it\n",
1853
+ "\n",
1854
+ "ROUTE_DIM = 512\n",
1855
+ "QWEN_DIM = qwen.config.hidden_size # 1536 for 1.5B\n",
1856
+ "\n",
1857
+ "# === LEVEL 1: Base Streams ===\n",
1858
+ "\n",
1859
+ "# Projections\n",
1860
+ "proj_t5 = nn.Linear(768, ROUTE_DIM).to(device)\n",
1861
+ "\n",
1862
+ "# Deterministic streams (simplified - keep best performers)\n",
1863
+ "class SymbolicCalculatorStream(nn.Module):\n",
1864
+ " def __init__(self, output_dim):\n",
1865
+ " super().__init__()\n",
1866
+ " self.num_features = 12\n",
1867
+ " self.projection = nn.Linear(self.num_features, output_dim)\n",
1868
+ "\n",
1869
+ " def extract_numbers(self, text):\n",
1870
+ " pattern = r'-?\\d+\\.?\\d*'\n",
1871
+ " matches = re.findall(pattern, text)\n",
1872
+ " return [float(m) for m in matches if m not in ['-', '.']][:10]\n",
1873
+ "\n",
1874
+ " def compute_features(self, numbers):\n",
1875
+ " if len(numbers) == 0:\n",
1876
+ " return torch.zeros(self.num_features)\n",
1877
+ " t = torch.tensor(numbers, dtype=torch.float32)\n",
1878
+ " return torch.tensor([\n",
1879
+ " len(numbers), t.sum().item(),\n",
1880
+ " t.prod().item() if len(t) < 10 else 0,\n",
1881
+ " t.mean().item(), t.std().item() if len(t) > 1 else 0,\n",
1882
+ " t.min().item(), t.max().item(), (t.max() - t.min()).item(),\n",
1883
+ " (t > 0).sum().item(), (t < 0).sum().item(),\n",
1884
+ " (t == 0).sum().item(), t.abs().mean().item(),\n",
1885
+ " ], dtype=torch.float32)\n",
1886
+ "\n",
1887
+ " def forward(self, texts, seq_len):\n",
1888
+ " features = torch.stack([self.compute_features(self.extract_numbers(t)) for t in texts])\n",
1889
+ " features = features.to(next(self.parameters()).device)\n",
1890
+ " return self.projection(features.unsqueeze(1).expand(-1, seq_len, -1))\n",
1891
+ "\n",
1892
+ "symbolic_stream = SymbolicCalculatorStream(output_dim=ROUTE_DIM).to(device)\n",
1893
+ "\n",
1894
+ "# Level 1 heads\n",
1895
+ "head_config = HeadConfig(feature_dim=ROUTE_DIM, fingerprint_dim=64, num_anchors=16, num_routes=4)\n",
1896
+ "head_t5 = build_standard_head(head_config).to(device)\n",
1897
+ "head_symbolic = build_standard_head(head_config).to(device)\n",
1898
+ "\n",
1899
+ "# Level 1 fusion\n",
1900
+ "fusion_l1 = nn.Sequential(\n",
1901
+ " nn.Linear(ROUTE_DIM * 2, ROUTE_DIM),\n",
1902
+ " nn.LayerNorm(ROUTE_DIM),\n",
1903
+ " nn.GELU(),\n",
1904
+ ").to(device)\n",
1905
+ "\n",
1906
+ "# === LEVEL 2: Qwen-Math Supervisor ===\n",
1907
+ "\n",
1908
+ "proj_qwen = nn.Linear(QWEN_DIM, ROUTE_DIM).to(device)\n",
1909
+ "head_qwen = build_standard_head(head_config).to(device)\n",
1910
+ "\n",
1911
+ "# Level 2 receives: Qwen features + Level 1 fused features\n",
1912
+ "fusion_l2 = nn.Sequential(\n",
1913
+ " nn.Linear(ROUTE_DIM * 2, ROUTE_DIM),\n",
1914
+ " nn.LayerNorm(ROUTE_DIM),\n",
1915
+ " nn.GELU(),\n",
1916
+ " nn.Dropout(0.1),\n",
1917
+ ").to(device)\n",
1918
+ "\n",
1919
+ "# === OUTPUT ===\n",
1920
+ "NUM_BUCKETS = 10 # Reduced from 20 for easier learning\n",
1921
+ "\n",
1922
+ "classifier = nn.Sequential(\n",
1923
+ " nn.Linear(ROUTE_DIM, 256),\n",
1924
+ " nn.GELU(),\n",
1925
+ " nn.Dropout(0.1),\n",
1926
+ " nn.Linear(256, NUM_BUCKETS),\n",
1927
+ ").to(device)\n",
1928
+ "\n",
1929
+ "# Individual classifiers for emergence tracking\n",
1930
+ "classifier_qwen = nn.Linear(ROUTE_DIM, NUM_BUCKETS).to(device)\n",
1931
+ "classifier_t5 = nn.Linear(ROUTE_DIM, NUM_BUCKETS).to(device)\n",
1932
+ "classifier_symbolic = nn.Linear(ROUTE_DIM, NUM_BUCKETS).to(device)\n",
1933
+ "classifier_l1 = nn.Linear(ROUTE_DIM, NUM_BUCKETS).to(device) # Level 1 combined\n",
1934
+ "\n",
1935
+ "print(\"✓ Hierarchical architecture defined\")\n",
1936
+ "print(f\" Level 1: T5 + Symbolic → fusion_l1\")\n",
1937
+ "print(f\" Level 2: Qwen-Math + L1 → fusion_l2 → classifier\")"
1938
+ ],
1939
+ "metadata": {
1940
+ "colab": {
1941
+ "base_uri": "https://localhost:8080/"
1942
+ },
1943
+ "id": "evj4Dmk8onVR",
1944
+ "outputId": "ca07d4fc-5493-440a-b455-19cfecfffe81"
1945
+ },
1946
+ "execution_count": 2,
1947
+ "outputs": [
1948
+ {
1949
+ "output_type": "stream",
1950
+ "name": "stdout",
1951
+ "text": [
1952
+ "✓ Hierarchical architecture defined\n",
1953
+ " Level 1: T5 + Symbolic → fusion_l1\n",
1954
+ " Level 2: Qwen-Math + L1 → fusion_l2 → classifier\n"
1955
+ ]
1956
+ }
1957
+ ]
1958
+ },
1959
+ {
1960
+ "cell_type": "code",
1961
+ "source": [
1962
+ "# Cell: Collect trainable params\n",
1963
+ "trainable = (\n",
1964
+ " list(proj_t5.parameters()) + list(proj_qwen.parameters()) +\n",
1965
+ " list(symbolic_stream.parameters()) +\n",
1966
+ " list(head_t5.parameters()) + list(head_symbolic.parameters()) + list(head_qwen.parameters()) +\n",
1967
+ " list(fusion_l1.parameters()) + list(fusion_l2.parameters()) +\n",
1968
+ " list(classifier.parameters()) +\n",
1969
+ " list(classifier_qwen.parameters()) + list(classifier_t5.parameters()) +\n",
1970
+ " list(classifier_symbolic.parameters()) + list(classifier_l1.parameters())\n",
1971
+ ")\n",
1972
+ "\n",
1973
+ "optimizer = torch.optim.AdamW(trainable, lr=2e-4, weight_decay=0.01)\n",
1974
+ "criterion = nn.CrossEntropyLoss()\n",
1975
+ "\n",
1976
+ "print(f\"\\n=== HIERARCHICAL MATH COLLECTIVE ===\")\n",
1977
+ "print(f\"Frozen: Qwen2.5-Math-1.5B ({sum(p.numel() for p in qwen.parameters()):,}) + T5 ({sum(p.numel() for p in t5.parameters()):,})\")\n",
1978
+ "print(f\"Trainable: {sum(p.numel() for p in trainable):,}\")\n",
1979
+ "print(f\"Output: {NUM_BUCKETS} answer buckets\")"
1980
+ ],
1981
+ "metadata": {
1982
+ "colab": {
1983
+ "base_uri": "https://localhost:8080/"
1984
+ },
1985
+ "id": "k9ET-pYfokWU",
1986
+ "outputId": "fe494679-c66e-45ad-a2f6-f639ab3cb9e9"
1987
+ },
1988
+ "execution_count": 3,
1989
+ "outputs": [
1990
+ {
1991
+ "output_type": "stream",
1992
+ "name": "stdout",
1993
+ "text": [
1994
+ "\n",
1995
+ "=== HIERARCHICAL MATH COLLECTIVE ===\n",
1996
+ "Frozen: Qwen2.5-Math-1.5B (1,543,714,304) + T5 (109,628,544)\n",
1997
+ "Trainable: 13,686,822\n",
1998
+ "Output: 10 answer buckets\n"
1999
+ ]
2000
+ }
2001
+ ]
2002
+ },
2003
+ {
2004
+ "cell_type": "code",
2005
+ "source": [
2006
+ "# Cell: Forward pass\n",
2007
+ "# Cell: Fixed forward with safe last-token pooling\n",
2008
+ "def forward_hierarchical(questions):\n",
2009
+ " B = len(questions)\n",
2010
+ "\n",
2011
+ " # === ENCODE ===\n",
2012
+ " enc_qwen = qwen_tokenizer(questions, return_tensors=\"pt\",\n",
2013
+ " padding=True, truncation=True, max_length=256)\n",
2014
+ " with torch.no_grad():\n",
2015
+ " hidden_qwen = qwen(\n",
2016
+ " enc_qwen.input_ids.to(device),\n",
2017
+ " attention_mask=enc_qwen.attention_mask.to(device)\n",
2018
+ " ).last_hidden_state\n",
2019
+ "\n",
2020
+ " enc_t5 = t5_tokenizer(questions, return_tensors=\"pt\",\n",
2021
+ " padding=True, truncation=True, max_length=256)\n",
2022
+ " with torch.no_grad():\n",
2023
+ " hidden_t5 = t5(\n",
2024
+ " enc_t5.input_ids.to(device),\n",
2025
+ " attention_mask=enc_t5.attention_mask.to(device)\n",
2026
+ " ).last_hidden_state\n",
2027
+ "\n",
2028
+ " S = min(hidden_qwen.shape[1], hidden_t5.shape[1])\n",
2029
+ " hidden_qwen = hidden_qwen[:, :S, :]\n",
2030
+ " hidden_t5 = hidden_t5[:, :S, :]\n",
2031
+ "\n",
2032
+ " # === LEVEL 1 ===\n",
2033
+ " proj_t = proj_t5(hidden_t5)\n",
2034
+ " symbolic_feat = symbolic_stream(questions, S)\n",
2035
+ "\n",
2036
+ " routed_t5 = head_t5(proj_t)\n",
2037
+ " routed_symbolic = head_symbolic(symbolic_feat)\n",
2038
+ "\n",
2039
+ " pooled_t5 = routed_t5[:, 0]\n",
2040
+ " pooled_symbolic = routed_symbolic[:, 0]\n",
2041
+ "\n",
2042
+ " fused_l1 = fusion_l1(torch.cat([pooled_t5, pooled_symbolic], dim=-1))\n",
2043
+ "\n",
2044
+ " # === LEVEL 2 ===\n",
2045
+ " proj_q = proj_qwen(hidden_qwen)\n",
2046
+ " routed_qwen = head_qwen(proj_q)\n",
2047
+ "\n",
2048
+ " # FIX: Safe last-token pooling - clamp to actual sequence length after truncation\n",
2049
+ " seq_lens = enc_qwen.attention_mask[:, :S].sum(dim=1) - 1 # Truncated mask\n",
2050
+ " seq_lens = seq_lens.clamp(min=0, max=S-1).long() # Safety clamp\n",
2051
+ " pooled_qwen = routed_qwen[torch.arange(B, device=device), seq_lens]\n",
2052
+ "\n",
2053
+ " fused_l2 = fusion_l2(torch.cat([pooled_qwen, fused_l1], dim=-1))\n",
2054
+ "\n",
2055
+ " # === CLASSIFY ===\n",
2056
+ " logits = classifier(fused_l2)\n",
2057
+ "\n",
2058
+ " ind_logits = {\n",
2059
+ " 'qwen': classifier_qwen(pooled_qwen),\n",
2060
+ " 't5': classifier_t5(pooled_t5),\n",
2061
+ " 'symbolic': classifier_symbolic(pooled_symbolic),\n",
2062
+ " 'level1': classifier_l1(fused_l1),\n",
2063
+ " }\n",
2064
+ "\n",
2065
+ " return logits, ind_logits\n",
2066
+ "\n",
2067
+ "print(\"✓ Fixed with safe index clamping\")\n",
2068
+ "\n",
2069
+ "print(\"✓ Fixed Qwen pooling to use last token\")\n",
2070
+ "\n",
2071
+ "# Test\n",
2072
+ "test_q = [\"John has 5 apples and buys 3 more. How many apples does John have?\"]\n",
2073
+ "logits, ind = forward_hierarchical(test_q)\n",
2074
+ "print(f\"✓ Forward pass works: {logits.shape}\")"
2075
+ ],
2076
+ "metadata": {
2077
+ "colab": {
2078
+ "base_uri": "https://localhost:8080/"
2079
+ },
2080
+ "id": "w9Shin3FosSP",
2081
+ "outputId": "138c8176-c81a-4f07-bdd2-ee6c9e295ea4"
2082
+ },
2083
+ "execution_count": 4,
2084
+ "outputs": [
2085
+ {
2086
+ "output_type": "stream",
2087
+ "name": "stdout",
2088
+ "text": [
2089
+ "✓ Fixed with safe index clamping\n",
2090
+ "✓ Fixed Qwen pooling to use last token\n",
2091
+ "✓ Forward pass works: torch.Size([1, 10])\n"
2092
+ ]
2093
+ }
2094
+ ]
2095
+ },
2096
+ {
2097
+ "cell_type": "code",
2098
+ "source": [
2099
+ "# Cell: Load data and setup buckets\n",
2100
+ "dataset = load_dataset(\"openai/gsm8k\", \"main\")\n",
2101
+ "\n",
2102
+ "def extract_final_answer(answer_text):\n",
2103
+ " match = re.search(r'####\\s*(-?\\d+\\.?\\d*)', answer_text)\n",
2104
+ " return float(match.group(1)) if match else None\n",
2105
+ "\n",
2106
+ "answers = [extract_final_answer(ex['answer']) for ex in dataset['train']]\n",
2107
+ "answers = [a for a in answers if a is not None]\n",
2108
+ "\n",
2109
+ "import numpy as np\n",
2110
+ "percentiles = np.percentile(answers, np.linspace(0, 100, NUM_BUCKETS + 1))\n",
2111
+ "print(f\"Answer range: {min(answers)} to {max(answers)}\")\n",
2112
+ "print(f\"{NUM_BUCKETS} buckets\")\n",
2113
+ "\n",
2114
+ "def answer_to_bucket(answer):\n",
2115
+ " for i, (low, high) in enumerate(zip(percentiles[:-1], percentiles[1:])):\n",
2116
+ " if answer <= high:\n",
2117
+ " return i\n",
2118
+ " return NUM_BUCKETS - 1\n",
2119
+ "\n",
2120
+ "def collate_fn(examples):\n",
2121
+ " return {\n",
2122
+ " 'question': [ex['question'] for ex in examples],\n",
2123
+ " 'answer': [ex['answer'] for ex in examples],\n",
2124
+ " }\n",
2125
+ "\n",
2126
+ "train_loader = DataLoader(dataset['train'], batch_size=8, shuffle=True, collate_fn=collate_fn) # Smaller batch for 1.5B model\n",
2127
+ "test_loader = DataLoader(dataset['test'], batch_size=8, shuffle=False, collate_fn=collate_fn)\n",
2128
+ "\n",
2129
+ "print(f\"Train batches: {len(train_loader)}, Test batches: {len(test_loader)}\")"
2130
+ ],
2131
+ "metadata": {
2132
+ "colab": {
2133
+ "base_uri": "https://localhost:8080/"
2134
+ },
2135
+ "id": "JCH22yCxovZS",
2136
+ "outputId": "4e19eea8-a54d-4f02-ed0d-24155cb47e5f"
2137
+ },
2138
+ "execution_count": 6,
2139
+ "outputs": [
2140
+ {
2141
+ "output_type": "stream",
2142
+ "name": "stdout",
2143
+ "text": [
2144
+ "Answer range: -47.0 to 192000000.0\n",
2145
+ "10 buckets\n",
2146
+ "Train batches: 935, Test batches: 165\n"
2147
+ ]
2148
+ }
2149
+ ]
2150
+ },
2151
+ {
2152
+ "cell_type": "code",
2153
+ "source": [
2154
+ "# Cell: Training loop\n",
2155
+ "EPOCHS = 5\n",
2156
+ "history = []\n",
2157
+ "\n",
2158
+ "for epoch in range(EPOCHS):\n",
2159
+ " head_t5.train(); head_symbolic.train(); head_qwen.train()\n",
2160
+ " fusion_l1.train(); fusion_l2.train(); classifier.train()\n",
2161
+ "\n",
2162
+ " correct, total = 0, 0\n",
2163
+ " pbar = tqdm(train_loader, desc=f\"Epoch {epoch+1}\")\n",
2164
+ "\n",
2165
+ " for batch in pbar:\n",
2166
+ " questions = batch['question']\n",
2167
+ " answers_text = batch['answer']\n",
2168
+ "\n",
2169
+ " labels = []\n",
2170
+ " for ans in answers_text:\n",
2171
+ " num = extract_final_answer(ans)\n",
2172
+ " labels.append(answer_to_bucket(num) if num else 0)\n",
2173
+ " labels = torch.tensor(labels).to(device)\n",
2174
+ "\n",
2175
+ " optimizer.zero_grad()\n",
2176
+ " logits, ind_logits = forward_hierarchical(questions)\n",
2177
+ "\n",
2178
+ " # Hierarchical loss\n",
2179
+ " loss = criterion(logits, labels) # Final output\n",
2180
+ " loss += 0.1 * criterion(ind_logits['qwen'], labels)\n",
2181
+ " loss += 0.1 * criterion(ind_logits['t5'], labels)\n",
2182
+ " loss += 0.05 * criterion(ind_logits['symbolic'], labels)\n",
2183
+ " loss += 0.1 * criterion(ind_logits['level1'], labels) # Level 1 combined\n",
2184
+ "\n",
2185
+ " loss.backward()\n",
2186
+ " optimizer.step()\n",
2187
+ "\n",
2188
+ " correct += (logits.argmax(-1) == labels).sum().item()\n",
2189
+ " total += labels.size(0)\n",
2190
+ " pbar.set_postfix({'acc': f'{correct/total:.1%}'})\n",
2191
+ "\n",
2192
+ " # Eval\n",
2193
+ " head_t5.eval(); head_symbolic.eval(); head_qwen.eval()\n",
2194
+ " fusion_l1.eval(); fusion_l2.eval(); classifier.eval()\n",
2195
+ "\n",
2196
+ " metrics = {k: 0 for k in ['collective', 'qwen', 't5', 'symbolic', 'level1']}\n",
2197
+ " val_total = 0\n",
2198
+ "\n",
2199
+ " with torch.no_grad():\n",
2200
+ " for batch in tqdm(test_loader, desc=\"Validating\"):\n",
2201
+ " questions = batch['question']\n",
2202
+ " answers_text = batch['answer']\n",
2203
+ "\n",
2204
+ " labels = []\n",
2205
+ " for ans in answers_text:\n",
2206
+ " num = extract_final_answer(ans)\n",
2207
+ " labels.append(answer_to_bucket(num) if num else 0)\n",
2208
+ " labels = torch.tensor(labels).to(device)\n",
2209
+ "\n",
2210
+ " logits, ind_logits = forward_hierarchical(questions)\n",
2211
+ "\n",
2212
+ " metrics['collective'] += (logits.argmax(-1) == labels).sum().item()\n",
2213
+ " for name, ind_log in ind_logits.items():\n",
2214
+ " metrics[name] += (ind_log.argmax(-1) == labels).sum().item()\n",
2215
+ " val_total += labels.size(0)\n",
2216
+ "\n",
2217
+ " accs = {k: v / val_total for k, v in metrics.items()}\n",
2218
+ " max_ind = max(accs['qwen'], accs['t5'], accs['symbolic'])\n",
2219
+ " rho = accs['collective'] / max_ind if max_ind > 0 else 0\n",
2220
+ "\n",
2221
+ " history.append({**accs, 'rho': rho, 'epoch': epoch + 1})\n",
2222
+ "\n",
2223
+ " print(f\"\\nEpoch {epoch+1}:\")\n",
2224
+ " print(f\" Collective: {accs['collective']:.1%} (hierarchical output)\")\n",
2225
+ " print(f\" Qwen-Math: {accs['qwen']:.1%}, T5: {accs['t5']:.1%}, Symbolic: {accs['symbolic']:.1%}\")\n",
2226
+ " print(f\" Level 1 (T5+Sym): {accs['level1']:.1%}\")\n",
2227
+ " print(f\" ρ = {rho:.3f}\")"
2228
+ ],
2229
+ "metadata": {
2230
+ "colab": {
2231
+ "base_uri": "https://localhost:8080/",
2232
+ "height": 385,
2233
+ "referenced_widgets": [
2234
+ "17f8d0b0ef4347fd81984c46fbb9e684",
2235
+ "7d78e730875146efa1dd35b906344678",
2236
+ "dc1906aa05194549a4d6dcc85d92dc09",
2237
+ "1f6c2b5fc10a4a98a5380a13fa70bc75",
2238
+ "dba61edae6884c92bdf9a3d76dec59c9",
2239
+ "99a31b328dbf401c9db2b340cb2105d5",
2240
+ "76efebd872c044b9a7f8c38d57eac49f",
2241
+ "5af975c1370f43d59b97ed336c866e0c",
2242
+ "4a035406ac194a67afca19b313e05b9a",
2243
+ "c429a347002a43fe8a0a57f5797767d7",
2244
+ "cce3e0efd2bb4b978c6fa150b6cac602",
2245
+ "da18291cbf374325ac676de844a353e7",
2246
+ "4e910a07266a40caabcc33eeb941bdc0",
2247
+ "9c602bb5011d48c8b04404f6b3a37fab",
2248
+ "a87e1520f5254be18114a5f782e4e234",
2249
+ "fd027301b7a6496eb8ae039c25bc4d4b",
2250
+ "9b3468290ae84339ac7c8ce2a9976741",
2251
+ "8beb040b66a1413abd534d5d455d71e2",
2252
+ "b0fc7aa19e194e91a3138867b2377db9",
2253
+ "e9a139d721a147d58952f4a1efc0401b",
2254
+ "b2c27f1f8a5f437e8f359f1e0ea05fbd",
2255
+ "1f697938d7e845bb9adcbb3015606f70",
2256
+ "7909e231d23b447c9a0975ed72e00f60",
2257
+ "654bf91d7d694141aa9d3382e72439cb",
2258
+ "ddbba23be5cd42d8a12669ae9ae7f0fc",
2259
+ "94201956b5774585b99637bf2157f1b0",
2260
+ "0d3995008ff149828c81d2b91f18be0b",
2261
+ "a3f9669d01d241e99b7a7bdd98792353",
2262
+ "e361562b232c49e6b4771a10106d0d4d",
2263
+ "b0b5f7a2436d4d51b0760b1162f219db",
2264
+ "4ceb35fd88a242218cb4992f631fe68c",
2265
+ "faca4620999e4480823607b9a673e570",
2266
+ "b2548dde84694f589b108a28ca34c4f2",
2267
+ "990d32fd8ad943898db8f524175171ba",
2268
+ "56543f9f03674385a106f46b3736a270",
2269
+ "0856e831cbb7419fbbe66e7e7fb99906",
2270
+ "f4f091fd560242a28cf09b6c33d7c9a3",
2271
+ "7421731362ea4c25abceefb9cf2c5d1b",
2272
+ "26e92b3e139a464a848b68a55a8095a8",
2273
+ "75cf349e98bd4a6bb45a66a112acfa30",
2274
+ "abbb218edf3a4f54bcd72acecfdd17b3",
2275
+ "72bc8f82e228452fb5faa36ff84d03a4",
2276
+ "bbee56a1661c4347bc538ad204e4efe2",
2277
+ "05669dee14ff4356a8185285ee9c2951",
2278
+ "a1dbd51ede52442e8b6b2cef26e8bbd9",
2279
+ "b62503ae035d447ab609dff304523785",
2280
+ "d98feea8dca847a1a087c19e0ffb40db",
2281
+ "903bdef3de794bd0a0978a9fa847dcc8",
2282
+ "81acd630739a4a99a33bc5c060b56c0b",
2283
+ "3f1848c901e14b6d86f9e10a52ca607e",
2284
+ "b47f59b606b94a68a3361a185e0040aa",
2285
+ "5f47532448504e4c8398688d85787201",
2286
+ "0e8901c365954716932d4a09bb5e615f",
2287
+ "7527f720a46a4ed384f09866b6550e6e",
2288
+ "d5df03c93765465fb8bc49dc19c88786"
2289
+ ]
2290
+ },
2291
+ "id": "ZJdVWd7poxN4",
2292
+ "outputId": "9ff5bcae-647b-4a82-8d0d-17b116d29fa6"
2293
+ },
2294
+ "execution_count": null,
2295
+ "outputs": [
2296
+ {
2297
+ "output_type": "display_data",
2298
+ "data": {
2299
+ "text/plain": [
2300
+ "Epoch 1: 0%| | 0/935 [00:00<?, ?it/s]"
2301
+ ],
2302
+ "application/vnd.jupyter.widget-view+json": {
2303
+ "version_major": 2,
2304
+ "version_minor": 0,
2305
+ "model_id": "17f8d0b0ef4347fd81984c46fbb9e684"
2306
+ }
2307
+ },
2308
+ "metadata": {}
2309
+ },
2310
+ {
2311
+ "output_type": "display_data",
2312
+ "data": {
2313
+ "text/plain": [
2314
+ "Validating: 0%| | 0/165 [00:00<?, ?it/s]"
2315
+ ],
2316
+ "application/vnd.jupyter.widget-view+json": {
2317
+ "version_major": 2,
2318
+ "version_minor": 0,
2319
+ "model_id": "da18291cbf374325ac676de844a353e7"
2320
+ }
2321
+ },
2322
+ "metadata": {}
2323
+ },
2324
+ {
2325
+ "output_type": "stream",
2326
+ "name": "stdout",
2327
+ "text": [
2328
+ "\n",
2329
+ "Epoch 1:\n",
2330
+ " Collective: 22.2% (hierarchical output)\n",
2331
+ " Qwen-Math: 19.9%, T5: 21.1%, Symbolic: 13.1%\n",
2332
+ " Level 1 (T5+Sym): 15.5%\n",
2333
+ " ρ = 1.054\n"
2334
+ ]
2335
+ },
2336
+ {
2337
+ "output_type": "display_data",
2338
+ "data": {
2339
+ "text/plain": [
2340
+ "Epoch 2: 0%| | 0/935 [00:00<?, ?it/s]"
2341
+ ],
2342
+ "application/vnd.jupyter.widget-view+json": {
2343
+ "version_major": 2,
2344
+ "version_minor": 0,
2345
+ "model_id": "7909e231d23b447c9a0975ed72e00f60"
2346
+ }
2347
+ },
2348
+ "metadata": {}
2349
+ },
2350
+ {
2351
+ "output_type": "display_data",
2352
+ "data": {
2353
+ "text/plain": [
2354
+ "Validating: 0%| | 0/165 [00:00<?, ?it/s]"
2355
+ ],
2356
+ "application/vnd.jupyter.widget-view+json": {
2357
+ "version_major": 2,
2358
+ "version_minor": 0,
2359
+ "model_id": "990d32fd8ad943898db8f524175171ba"
2360
+ }
2361
+ },
2362
+ "metadata": {}
2363
+ },
2364
+ {
2365
+ "output_type": "stream",
2366
+ "name": "stdout",
2367
+ "text": [
2368
+ "\n",
2369
+ "Epoch 2:\n",
2370
+ " Collective: 24.8% (hierarchical output)\n",
2371
+ " Qwen-Math: 20.5%, T5: 22.7%, Symbolic: 11.6%\n",
2372
+ " Level 1 (T5+Sym): 14.3%\n",
2373
+ " ρ = 1.094\n"
2374
+ ]
2375
+ },
2376
+ {
2377
+ "output_type": "display_data",
2378
+ "data": {
2379
+ "text/plain": [
2380
+ "Epoch 3: 0%| | 0/935 [00:00<?, ?it/s]"
2381
+ ],
2382
+ "application/vnd.jupyter.widget-view+json": {
2383
+ "version_major": 2,
2384
+ "version_minor": 0,
2385
+ "model_id": "a1dbd51ede52442e8b6b2cef26e8bbd9"
2386
+ }
2387
+ },
2388
+ "metadata": {}
2389
+ }
2390
+ ]
2391
+ },
2392
+ {
2393
+ "cell_type": "code",
2394
+ "source": [
2395
+ "# Cell: Summary\n",
2396
+ "print(\"\\n\" + \"=\"*70)\n",
2397
+ "print(\"HIERARCHICAL MATH COLLECTIVE - Qwen2.5-Math + T5\")\n",
2398
+ "print(\"=\"*70)\n",
2399
+ "print(\"\\nArchitecture:\")\n",
2400
+ "print(\" Level 1: T5 (linguistic) + Symbolic (arithmetic) → fused\")\n",
2401
+ "print(\" Level 2: Qwen-Math (reasoning) + Level1 → final output\")\n",
2402
+ "print(\"\\n| Epoch | Collective | Qwen | T5 | Symbolic | Level1 | ρ |\")\n",
2403
+ "print(\"|-------|------------|------|-----|----------|--------|-------|\")\n",
2404
+ "for h in history:\n",
2405
+ " print(f\"| {h['epoch']} | {h['collective']:.1%} | {h['qwen']:.1%} | {h['t5']:.1%} | \"\n",
2406
+ " f\"{h['symbolic']:.1%} | {h['level1']:.1%} | {h['rho']:.3f} |\")"
2407
+ ],
2408
+ "metadata": {
2409
+ "id": "wdlmgzsfoy_C"
2410
+ },
2411
+ "execution_count": null,
2412
+ "outputs": []
2413
+ },
2414
+ {
2415
+ "cell_type": "markdown",
2416
+ "source": [],
2417
+ "metadata": {
2418
+ "id": "HziRthapehOa"
2419
+ }
2420
+ }
2421
+ ]
2422
+ }