lollenape commited on
Commit
2209d7c
·
1 Parent(s): 73e94a9

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/quick_start_pytorch-checkpoint.ipynb ADDED
@@ -0,0 +1,991 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "gradient": {
7
+ "editing": false,
8
+ "id": "a4090294-3349-4815-96f4-98010b657359",
9
+ "kernelId": ""
10
+ }
11
+ },
12
+ "source": [
13
+ "# Paperspace Gradient: PyTorch Quick Start\n",
14
+ "Last modified: Sep 27th 2022"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {
20
+ "gradient": {
21
+ "editing": false,
22
+ "id": "4936c59a-8535-43cf-a527-e9323b2b658e",
23
+ "kernelId": ""
24
+ }
25
+ },
26
+ "source": [
27
+ "## Purpose and intended audience\n",
28
+ "\n",
29
+ "This Quick Start tutorial demonstrates PyTorch usage in a Gradient Notebook. It is aimed at users who are relatviely new to PyTorch, although you will need to be familiar with Python to understand PyTorch code.\n",
30
+ "\n",
31
+ "We use PyTorch to\n",
32
+ "\n",
33
+ "- Build a neural network that classifies FashionMNIST images\n",
34
+ "- Train and evaluate the network\n",
35
+ "- Save the model\n",
36
+ "- Perform predictions\n",
37
+ "\n",
38
+ "followed by some next steps that you can take to proceed with using Gradient.\n",
39
+ "\n",
40
+ "The material is based on the original [PyTorch Quick Start](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) at the time of writing this notebook.\n",
41
+ "\n",
42
+ "See the end of the notebook for the original copyright notice."
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "markdown",
47
+ "metadata": {
48
+ "gradient": {
49
+ "editing": false,
50
+ "id": "a55c3131-9437-483d-9c19-a165fbf8b6d4",
51
+ "kernelId": ""
52
+ }
53
+ },
54
+ "source": [
55
+ "## Check that you are on a GPU machine\n",
56
+ "\n",
57
+ "The notebook is designed to run on a Gradient GPU machine (as opposed to a CPU-only machine). The machine type, e.g., A4000, can be seen by clicking on the Machine icon on the left-hand navigation bar in the Gradient Notebook interface. It will say if it is CPU or GPU.\n",
58
+ "\n",
59
+ "![quick_start_pytorch_images/example_instance_type.png](quick_start_pytorch_images/example_instance_type.png)\n",
60
+ "\n",
61
+ "The *Creating models* section below also determines whether or not a GPU is available for us to use.\n",
62
+ "\n",
63
+ "If the machine type is CPU, you can change it by clicking *Stop Machine*, then the machine type displayed to get a drop-down list. Select a GPU machine and start up the Notebook again.\n",
64
+ "\n",
65
+ "For help with machines, see the Gradient documentation on [machine types](https://docs.paperspace.com/gradient/machines/) or [starting a Gradient Notebook](https://docs.paperspace.com/gradient/explore-train-deploy/notebooks)."
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "markdown",
70
+ "metadata": {
71
+ "gradient": {
72
+ "editing": false,
73
+ "id": "28402a66-a8c4-4672-9592-cc530b58d439",
74
+ "kernelId": ""
75
+ }
76
+ },
77
+ "source": [
78
+ "## Working with data\n",
79
+ "\n",
80
+ "PyTorch has two [primitives to work with data](https://pytorch.org/docs/stable/data.html):\n",
81
+ "``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``.\n",
82
+ "``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around\n",
83
+ "the ``Dataset``."
84
+ ]
85
+ },
86
+ {
87
+ "cell_type": "code",
88
+ "execution_count": 2,
89
+ "metadata": {
90
+ "collapsed": false,
91
+ "execution": {
92
+ "iopub.execute_input": "2022-09-27T20:36:04.965047Z",
93
+ "iopub.status.busy": "2022-09-27T20:36:04.964421Z",
94
+ "iopub.status.idle": "2022-09-27T20:36:06.330541Z",
95
+ "shell.execute_reply": "2022-09-27T20:36:06.329333Z",
96
+ "shell.execute_reply.started": "2022-09-27T20:36:04.965047Z"
97
+ },
98
+ "gradient": {
99
+ "editing": false,
100
+ "execution_count": 2,
101
+ "id": "2bab3caa-e156-4635-bc21-53031ebea60d",
102
+ "kernelId": ""
103
+ },
104
+ "jupyter": {
105
+ "outputs_hidden": false
106
+ }
107
+ },
108
+ "outputs": [],
109
+ "source": [
110
+ "import torch\n",
111
+ "from torch import nn\n",
112
+ "from torch.utils.data import DataLoader\n",
113
+ "from torchvision import datasets\n",
114
+ "from torchvision.transforms import ToTensor, Lambda, Compose"
115
+ ]
116
+ },
117
+ {
118
+ "cell_type": "markdown",
119
+ "metadata": {
120
+ "gradient": {
121
+ "editing": false,
122
+ "id": "0dfb0116-56cd-4795-bc5e-79baad627726",
123
+ "kernelId": ""
124
+ }
125
+ },
126
+ "source": [
127
+ "PyTorch offers domain-specific libraries such as [TorchText](https://pytorch.org/text/stable/index.html),\n",
128
+ "[TorchVision](https://pytorch.org/vision/stable/index.html), and [TorchAudio](https://pytorch.org/audio/stable/index.html),\n",
129
+ "all of which include datasets. For this tutorial, we will be using a TorchVision dataset.\n",
130
+ "\n",
131
+ "The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like\n",
132
+ "CIFAR, COCO ([full list here](https://pytorch.org/vision/stable/datasets.html)). In this tutorial, we\n",
133
+ "use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and\n",
134
+ "``target_transform`` to modify the samples and labels respectively."
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "code",
139
+ "execution_count": 3,
140
+ "metadata": {
141
+ "collapsed": false,
142
+ "execution": {
143
+ "iopub.execute_input": "2022-09-27T20:36:06.332087Z",
144
+ "iopub.status.busy": "2022-09-27T20:36:06.331786Z",
145
+ "iopub.status.idle": "2022-09-27T20:36:33.429172Z",
146
+ "shell.execute_reply": "2022-09-27T20:36:33.428023Z",
147
+ "shell.execute_reply.started": "2022-09-27T20:36:06.332087Z"
148
+ },
149
+ "gradient": {
150
+ "editing": false,
151
+ "execution_count": 3,
152
+ "id": "631deddf-30f0-45f1-84ab-e5f4c510c500",
153
+ "kernelId": ""
154
+ },
155
+ "jupyter": {
156
+ "outputs_hidden": false
157
+ }
158
+ },
159
+ "outputs": [
160
+ {
161
+ "name": "stdout",
162
+ "output_type": "stream",
163
+ "text": [
164
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz\n",
165
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz\n"
166
+ ]
167
+ },
168
+ {
169
+ "data": {
170
+ "application/vnd.jupyter.widget-view+json": {
171
+ "model_id": "30b312514a6a4edcb608e92ecda0a385",
172
+ "version_major": 2,
173
+ "version_minor": 0
174
+ },
175
+ "text/plain": [
176
+ " 0%| | 0/26421880 [00:00<?, ?it/s]"
177
+ ]
178
+ },
179
+ "metadata": {},
180
+ "output_type": "display_data"
181
+ },
182
+ {
183
+ "name": "stdout",
184
+ "output_type": "stream",
185
+ "text": [
186
+ "Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw\n",
187
+ "\n",
188
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz\n",
189
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz\n"
190
+ ]
191
+ },
192
+ {
193
+ "data": {
194
+ "application/vnd.jupyter.widget-view+json": {
195
+ "model_id": "4e8ad8c0dd9d4eae8d6d3a59677e7a99",
196
+ "version_major": 2,
197
+ "version_minor": 0
198
+ },
199
+ "text/plain": [
200
+ " 0%| | 0/29515 [00:00<?, ?it/s]"
201
+ ]
202
+ },
203
+ "metadata": {},
204
+ "output_type": "display_data"
205
+ },
206
+ {
207
+ "name": "stdout",
208
+ "output_type": "stream",
209
+ "text": [
210
+ "Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n",
211
+ "\n",
212
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz\n",
213
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz\n"
214
+ ]
215
+ },
216
+ {
217
+ "data": {
218
+ "application/vnd.jupyter.widget-view+json": {
219
+ "model_id": "2465426f4de748bf955849e5bfeb5384",
220
+ "version_major": 2,
221
+ "version_minor": 0
222
+ },
223
+ "text/plain": [
224
+ " 0%| | 0/4422102 [00:00<?, ?it/s]"
225
+ ]
226
+ },
227
+ "metadata": {},
228
+ "output_type": "display_data"
229
+ },
230
+ {
231
+ "name": "stdout",
232
+ "output_type": "stream",
233
+ "text": [
234
+ "Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw\n",
235
+ "\n",
236
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz\n",
237
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz\n"
238
+ ]
239
+ },
240
+ {
241
+ "data": {
242
+ "application/vnd.jupyter.widget-view+json": {
243
+ "model_id": "4f090bef77ea43f49f503faf722b1e67",
244
+ "version_major": 2,
245
+ "version_minor": 0
246
+ },
247
+ "text/plain": [
248
+ " 0%| | 0/5148 [00:00<?, ?it/s]"
249
+ ]
250
+ },
251
+ "metadata": {},
252
+ "output_type": "display_data"
253
+ },
254
+ {
255
+ "name": "stdout",
256
+ "output_type": "stream",
257
+ "text": [
258
+ "Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n",
259
+ "\n"
260
+ ]
261
+ }
262
+ ],
263
+ "source": [
264
+ "# Download training data from open datasets\n",
265
+ "training_data = datasets.FashionMNIST(\n",
266
+ " root=\"data\",\n",
267
+ " train=True,\n",
268
+ " download=True,\n",
269
+ " transform=ToTensor(),\n",
270
+ ")\n",
271
+ "\n",
272
+ "# Download test data from open datasets\n",
273
+ "test_data = datasets.FashionMNIST(\n",
274
+ " root=\"data\",\n",
275
+ " train=False,\n",
276
+ " download=True,\n",
277
+ " transform=ToTensor(),\n",
278
+ ")"
279
+ ]
280
+ },
281
+ {
282
+ "cell_type": "markdown",
283
+ "metadata": {
284
+ "gradient": {
285
+ "editing": false,
286
+ "id": "0ace6ebf-b493-4b75-9bfa-dc48bc676b21",
287
+ "kernelId": ""
288
+ }
289
+ },
290
+ "source": [
291
+ "We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports\n",
292
+ "automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e., each element\n",
293
+ "in the dataloader iterable will return a batch of 64 features and labels."
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": 4,
299
+ "metadata": {
300
+ "collapsed": false,
301
+ "execution": {
302
+ "iopub.execute_input": "2022-09-27T20:36:33.430736Z",
303
+ "iopub.status.busy": "2022-09-27T20:36:33.430441Z",
304
+ "iopub.status.idle": "2022-09-27T20:36:33.449430Z",
305
+ "shell.execute_reply": "2022-09-27T20:36:33.448119Z",
306
+ "shell.execute_reply.started": "2022-09-27T20:36:33.430708Z"
307
+ },
308
+ "gradient": {
309
+ "editing": false,
310
+ "execution_count": 4,
311
+ "id": "8e65f970-dce8-460c-b5f2-9cbee0c14900",
312
+ "kernelId": ""
313
+ },
314
+ "jupyter": {
315
+ "outputs_hidden": false
316
+ }
317
+ },
318
+ "outputs": [
319
+ {
320
+ "name": "stdout",
321
+ "output_type": "stream",
322
+ "text": [
323
+ "Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])\n",
324
+ "Shape of y: torch.Size([64]) torch.int64\n"
325
+ ]
326
+ }
327
+ ],
328
+ "source": [
329
+ "batch_size = 64\n",
330
+ "\n",
331
+ "# Create data loaders\n",
332
+ "train_dataloader = DataLoader(training_data, batch_size=batch_size)\n",
333
+ "test_dataloader = DataLoader(test_data, batch_size=batch_size)\n",
334
+ "\n",
335
+ "for X, y in test_dataloader:\n",
336
+ " print(\"Shape of X [N, C, H, W]: \", X.shape)\n",
337
+ " print(\"Shape of y: \", y.shape, y.dtype)\n",
338
+ " break"
339
+ ]
340
+ },
341
+ {
342
+ "cell_type": "markdown",
343
+ "metadata": {
344
+ "gradient": {
345
+ "editing": false,
346
+ "id": "f9d1b1f7-0850-4676-93b6-902f78be237d",
347
+ "kernelId": ""
348
+ }
349
+ },
350
+ "source": [
351
+ "Read more about [loading data in PyTorch](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)."
352
+ ]
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "metadata": {
357
+ "gradient": {
358
+ "editing": false,
359
+ "id": "d9cc95fe-194b-4a6f-b01d-91510dfcfb00",
360
+ "kernelId": ""
361
+ }
362
+ },
363
+ "source": [
364
+ "## Creating models, including GPU\n",
365
+ "\n",
366
+ "To define a neural network in PyTorch, we create a class that inherits\n",
367
+ "from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network\n",
368
+ "in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate\n",
369
+ "operations in the neural network, we move it to the GPU if available."
370
+ ]
371
+ },
372
+ {
373
+ "cell_type": "code",
374
+ "execution_count": 5,
375
+ "metadata": {
376
+ "collapsed": false,
377
+ "execution": {
378
+ "iopub.execute_input": "2022-09-27T20:36:33.453700Z",
379
+ "iopub.status.busy": "2022-09-27T20:36:33.453070Z",
380
+ "iopub.status.idle": "2022-09-27T20:36:35.334541Z",
381
+ "shell.execute_reply": "2022-09-27T20:36:35.329047Z",
382
+ "shell.execute_reply.started": "2022-09-27T20:36:33.453700Z"
383
+ },
384
+ "gradient": {
385
+ "editing": false,
386
+ "execution_count": 5,
387
+ "id": "d58d5484-8ca0-4400-91c5-d0e71cf89c12",
388
+ "kernelId": ""
389
+ },
390
+ "jupyter": {
391
+ "outputs_hidden": false
392
+ }
393
+ },
394
+ "outputs": [
395
+ {
396
+ "name": "stdout",
397
+ "output_type": "stream",
398
+ "text": [
399
+ "Using cuda device\n",
400
+ "NeuralNetwork(\n",
401
+ " (flatten): Flatten(start_dim=1, end_dim=-1)\n",
402
+ " (linear_relu_stack): Sequential(\n",
403
+ " (0): Linear(in_features=784, out_features=512, bias=True)\n",
404
+ " (1): ReLU()\n",
405
+ " (2): Linear(in_features=512, out_features=512, bias=True)\n",
406
+ " (3): ReLU()\n",
407
+ " (4): Linear(in_features=512, out_features=10, bias=True)\n",
408
+ " )\n",
409
+ ")\n"
410
+ ]
411
+ }
412
+ ],
413
+ "source": [
414
+ "# Get cpu or gpu device for training\n",
415
+ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
416
+ "print(\"Using {} device\".format(device))\n",
417
+ "\n",
418
+ "# Define model\n",
419
+ "class NeuralNetwork(nn.Module):\n",
420
+ " def __init__(self):\n",
421
+ " super(NeuralNetwork, self).__init__()\n",
422
+ " self.flatten = nn.Flatten()\n",
423
+ " self.linear_relu_stack = nn.Sequential(\n",
424
+ " nn.Linear(28*28, 512),\n",
425
+ " nn.ReLU(),\n",
426
+ " nn.Linear(512, 512),\n",
427
+ " nn.ReLU(),\n",
428
+ " nn.Linear(512, 10)\n",
429
+ " )\n",
430
+ "\n",
431
+ " def forward(self, x):\n",
432
+ " x = self.flatten(x)\n",
433
+ " logits = self.linear_relu_stack(x)\n",
434
+ " return logits\n",
435
+ "\n",
436
+ "model = NeuralNetwork().to(device)\n",
437
+ "print(model)"
438
+ ]
439
+ },
440
+ {
441
+ "cell_type": "markdown",
442
+ "metadata": {
443
+ "gradient": {
444
+ "editing": false,
445
+ "id": "7ee591d8-e529-481b-8107-e84454893bd2",
446
+ "kernelId": ""
447
+ }
448
+ },
449
+ "source": [
450
+ "Read more about [building neural networks in PyTorch](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)."
451
+ ]
452
+ },
453
+ {
454
+ "cell_type": "markdown",
455
+ "metadata": {
456
+ "gradient": {
457
+ "editing": false,
458
+ "id": "b6db5b4f-80b9-4f9e-8feb-76d0ef1e346f",
459
+ "kernelId": ""
460
+ }
461
+ },
462
+ "source": [
463
+ "## Optimizing the model parameters\n",
464
+ "\n",
465
+ "To train a model, we need a [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions)\n",
466
+ "and an [optimizer](https://pytorch.org/docs/stable/optim.html)."
467
+ ]
468
+ },
469
+ {
470
+ "cell_type": "code",
471
+ "execution_count": 6,
472
+ "metadata": {
473
+ "collapsed": false,
474
+ "execution": {
475
+ "iopub.execute_input": "2022-09-27T20:36:35.340252Z",
476
+ "iopub.status.busy": "2022-09-27T20:36:35.339874Z",
477
+ "iopub.status.idle": "2022-09-27T20:36:35.345985Z",
478
+ "shell.execute_reply": "2022-09-27T20:36:35.344793Z",
479
+ "shell.execute_reply.started": "2022-09-27T20:36:35.340209Z"
480
+ },
481
+ "gradient": {
482
+ "editing": false,
483
+ "execution_count": 6,
484
+ "id": "8c22a532-16e0-440d-888e-d879e5f53c7c",
485
+ "kernelId": ""
486
+ },
487
+ "jupyter": {
488
+ "outputs_hidden": false
489
+ }
490
+ },
491
+ "outputs": [],
492
+ "source": [
493
+ "loss_fn = nn.CrossEntropyLoss()\n",
494
+ "optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)"
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "markdown",
499
+ "metadata": {
500
+ "gradient": {
501
+ "editing": false,
502
+ "id": "5efe3473-ecf7-411c-a13b-ba54f5c257a6",
503
+ "kernelId": ""
504
+ }
505
+ },
506
+ "source": [
507
+ "In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and\n",
508
+ "backpropagates the prediction error to adjust the model's parameters."
509
+ ]
510
+ },
511
+ {
512
+ "cell_type": "code",
513
+ "execution_count": 7,
514
+ "metadata": {
515
+ "collapsed": false,
516
+ "execution": {
517
+ "iopub.execute_input": "2022-09-27T20:36:35.350028Z",
518
+ "iopub.status.busy": "2022-09-27T20:36:35.349717Z",
519
+ "iopub.status.idle": "2022-09-27T20:36:35.357590Z",
520
+ "shell.execute_reply": "2022-09-27T20:36:35.356224Z",
521
+ "shell.execute_reply.started": "2022-09-27T20:36:35.350001Z"
522
+ },
523
+ "gradient": {
524
+ "editing": false,
525
+ "execution_count": 7,
526
+ "id": "3d1af6c1-299b-4572-902a-c5e52ce0a7d2",
527
+ "kernelId": ""
528
+ },
529
+ "jupyter": {
530
+ "outputs_hidden": false
531
+ }
532
+ },
533
+ "outputs": [],
534
+ "source": [
535
+ "def train(dataloader, model, loss_fn, optimizer):\n",
536
+ " size = len(dataloader.dataset)\n",
537
+ " model.train()\n",
538
+ " for batch, (X, y) in enumerate(dataloader):\n",
539
+ " X, y = X.to(device), y.to(device)\n",
540
+ "\n",
541
+ " # Compute prediction error\n",
542
+ " pred = model(X)\n",
543
+ " loss = loss_fn(pred, y)\n",
544
+ "\n",
545
+ " # Backpropagation\n",
546
+ " optimizer.zero_grad()\n",
547
+ " loss.backward()\n",
548
+ " optimizer.step()\n",
549
+ "\n",
550
+ " if batch % 100 == 0:\n",
551
+ " loss, current = loss.item(), batch * len(X)\n",
552
+ " print(f\"loss: {loss:>7f} [{current:>5d}/{size:>5d}]\")"
553
+ ]
554
+ },
555
+ {
556
+ "cell_type": "markdown",
557
+ "metadata": {
558
+ "gradient": {
559
+ "editing": false,
560
+ "id": "f86e28f0-bb94-4443-a673-f6d3461d4e94",
561
+ "kernelId": ""
562
+ }
563
+ },
564
+ "source": [
565
+ "We also check the model's performance against the test dataset to ensure it is learning."
566
+ ]
567
+ },
568
+ {
569
+ "cell_type": "code",
570
+ "execution_count": 8,
571
+ "metadata": {
572
+ "collapsed": false,
573
+ "execution": {
574
+ "iopub.execute_input": "2022-09-27T20:36:35.362383Z",
575
+ "iopub.status.busy": "2022-09-27T20:36:35.362293Z",
576
+ "iopub.status.idle": "2022-09-27T20:36:35.370320Z",
577
+ "shell.execute_reply": "2022-09-27T20:36:35.369013Z",
578
+ "shell.execute_reply.started": "2022-09-27T20:36:35.362345Z"
579
+ },
580
+ "gradient": {
581
+ "editing": false,
582
+ "execution_count": 8,
583
+ "id": "112d81e3-cdf8-4b1e-afca-6344be54f5e5",
584
+ "kernelId": ""
585
+ },
586
+ "jupyter": {
587
+ "outputs_hidden": false
588
+ }
589
+ },
590
+ "outputs": [],
591
+ "source": [
592
+ "def test(dataloader, model, loss_fn):\n",
593
+ " size = len(dataloader.dataset)\n",
594
+ " num_batches = len(dataloader)\n",
595
+ " model.eval()\n",
596
+ " test_loss, correct = 0, 0\n",
597
+ " with torch.no_grad():\n",
598
+ " for X, y in dataloader:\n",
599
+ " X, y = X.to(device), y.to(device)\n",
600
+ " pred = model(X)\n",
601
+ " test_loss += loss_fn(pred, y).item()\n",
602
+ " correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n",
603
+ " test_loss /= num_batches\n",
604
+ " correct /= size\n",
605
+ " print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")"
606
+ ]
607
+ },
608
+ {
609
+ "cell_type": "markdown",
610
+ "metadata": {
611
+ "gradient": {
612
+ "editing": false,
613
+ "id": "4e366ecc-735f-42dd-b04e-a94816b94fd8",
614
+ "kernelId": ""
615
+ }
616
+ },
617
+ "source": [
618
+ "The training process is conducted over several iterations (*epochs*). During each epoch, the model learns\n",
619
+ "parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the\n",
620
+ "accuracy increase and the loss decrease with every epoch."
621
+ ]
622
+ },
623
+ {
624
+ "cell_type": "code",
625
+ "execution_count": 9,
626
+ "metadata": {
627
+ "collapsed": false,
628
+ "execution": {
629
+ "iopub.execute_input": "2022-09-27T20:36:35.374528Z",
630
+ "iopub.status.busy": "2022-09-27T20:36:35.374285Z",
631
+ "iopub.status.idle": "2022-09-27T20:37:29.296376Z",
632
+ "shell.execute_reply": "2022-09-27T20:37:29.295164Z",
633
+ "shell.execute_reply.started": "2022-09-27T20:36:35.374502Z"
634
+ },
635
+ "gradient": {
636
+ "editing": false,
637
+ "execution_count": 9,
638
+ "id": "50bf09d9-1318-43ef-92aa-6ee308fcafa1",
639
+ "kernelId": ""
640
+ },
641
+ "jupyter": {
642
+ "outputs_hidden": false
643
+ }
644
+ },
645
+ "outputs": [
646
+ {
647
+ "name": "stdout",
648
+ "output_type": "stream",
649
+ "text": [
650
+ "Epoch 1\n",
651
+ "-------------------------------\n",
652
+ "loss: 2.304299 [ 0/60000]\n",
653
+ "loss: 2.290307 [ 6400/60000]\n",
654
+ "loss: 2.268486 [12800/60000]\n",
655
+ "loss: 2.256835 [19200/60000]\n",
656
+ "loss: 2.248106 [25600/60000]\n",
657
+ "loss: 2.217304 [32000/60000]\n",
658
+ "loss: 2.215746 [38400/60000]\n",
659
+ "loss: 2.182278 [44800/60000]\n",
660
+ "loss: 2.179303 [51200/60000]\n",
661
+ "loss: 2.150798 [57600/60000]\n",
662
+ "Test Error: \n",
663
+ " Accuracy: 55.6%, Avg loss: 2.143109 \n",
664
+ "\n",
665
+ "Epoch 2\n",
666
+ "-------------------------------\n",
667
+ "loss: 2.155640 [ 0/60000]\n",
668
+ "loss: 2.144754 [ 6400/60000]\n",
669
+ "loss: 2.083586 [12800/60000]\n",
670
+ "loss: 2.091499 [19200/60000]\n",
671
+ "loss: 2.045041 [25600/60000]\n",
672
+ "loss: 1.986636 [32000/60000]\n",
673
+ "loss: 2.002200 [38400/60000]\n",
674
+ "loss: 1.927214 [44800/60000]\n",
675
+ "loss: 1.931510 [51200/60000]\n",
676
+ "loss: 1.847673 [57600/60000]\n",
677
+ "Test Error: \n",
678
+ " Accuracy: 59.5%, Avg loss: 1.857198 \n",
679
+ "\n",
680
+ "Epoch 3\n",
681
+ "-------------------------------\n",
682
+ "loss: 1.893984 [ 0/60000]\n",
683
+ "loss: 1.863075 [ 6400/60000]\n",
684
+ "loss: 1.748540 [12800/60000]\n",
685
+ "loss: 1.779858 [19200/60000]\n",
686
+ "loss: 1.666921 [25600/60000]\n",
687
+ "loss: 1.633243 [32000/60000]\n",
688
+ "loss: 1.639619 [38400/60000]\n",
689
+ "loss: 1.551572 [44800/60000]\n",
690
+ "loss: 1.578183 [51200/60000]\n",
691
+ "loss: 1.462901 [57600/60000]\n",
692
+ "Test Error: \n",
693
+ " Accuracy: 61.7%, Avg loss: 1.489910 \n",
694
+ "\n",
695
+ "Epoch 4\n",
696
+ "-------------------------------\n",
697
+ "loss: 1.560461 [ 0/60000]\n",
698
+ "loss: 1.525511 [ 6400/60000]\n",
699
+ "loss: 1.381848 [12800/60000]\n",
700
+ "loss: 1.445225 [19200/60000]\n",
701
+ "loss: 1.320462 [25600/60000]\n",
702
+ "loss: 1.335552 [32000/60000]\n",
703
+ "loss: 1.336702 [38400/60000]\n",
704
+ "loss: 1.266305 [44800/60000]\n",
705
+ "loss: 1.303894 [51200/60000]\n",
706
+ "loss: 1.202768 [57600/60000]\n",
707
+ "Test Error: \n",
708
+ " Accuracy: 63.3%, Avg loss: 1.229126 \n",
709
+ "\n",
710
+ "Epoch 5\n",
711
+ "-------------------------------\n",
712
+ "loss: 1.309631 [ 0/60000]\n",
713
+ "loss: 1.289756 [ 6400/60000]\n",
714
+ "loss: 1.129725 [12800/60000]\n",
715
+ "loss: 1.231920 [19200/60000]\n",
716
+ "loss: 1.100483 [25600/60000]\n",
717
+ "loss: 1.141074 [32000/60000]\n",
718
+ "loss: 1.153783 [38400/60000]\n",
719
+ "loss: 1.090403 [44800/60000]\n",
720
+ "loss: 1.133582 [51200/60000]\n",
721
+ "loss: 1.050682 [57600/60000]\n",
722
+ "Test Error: \n",
723
+ " Accuracy: 64.3%, Avg loss: 1.069880 \n",
724
+ "\n",
725
+ "Done!\n"
726
+ ]
727
+ }
728
+ ],
729
+ "source": [
730
+ "epochs = 5\n",
731
+ "for t in range(epochs):\n",
732
+ " print(f\"Epoch {t+1}\\n-------------------------------\")\n",
733
+ " train(train_dataloader, model, loss_fn, optimizer)\n",
734
+ " test(test_dataloader, model, loss_fn)\n",
735
+ "print(\"Done!\")"
736
+ ]
737
+ },
738
+ {
739
+ "cell_type": "markdown",
740
+ "metadata": {},
741
+ "source": [
742
+ "Read more about [Training your model](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html)."
743
+ ]
744
+ },
745
+ {
746
+ "cell_type": "markdown",
747
+ "metadata": {
748
+ "gradient": {
749
+ "editing": false,
750
+ "id": "88e2d48b-f1c2-43b0-956d-673d31e777cc",
751
+ "kernelId": ""
752
+ }
753
+ },
754
+ "source": [
755
+ "## Saving models\n",
756
+ "\n",
757
+ "A common way to save a model is to serialize the internal state dictionary (containing the model parameters)."
758
+ ]
759
+ },
760
+ {
761
+ "cell_type": "code",
762
+ "execution_count": 10,
763
+ "metadata": {
764
+ "collapsed": false,
765
+ "execution": {
766
+ "iopub.execute_input": "2022-09-27T20:37:29.304919Z",
767
+ "iopub.status.busy": "2022-09-27T20:37:29.304520Z",
768
+ "iopub.status.idle": "2022-09-27T20:37:51.042987Z",
769
+ "shell.execute_reply": "2022-09-27T20:37:51.041902Z",
770
+ "shell.execute_reply.started": "2022-09-27T20:37:29.304889Z"
771
+ },
772
+ "gradient": {
773
+ "editing": false,
774
+ "execution_count": 10,
775
+ "id": "5674fda2-6f1d-447c-ac05-d21934c7fe6f",
776
+ "kernelId": ""
777
+ },
778
+ "jupyter": {
779
+ "outputs_hidden": false
780
+ }
781
+ },
782
+ "outputs": [
783
+ {
784
+ "name": "stdout",
785
+ "output_type": "stream",
786
+ "text": [
787
+ "Saved PyTorch Model State to model.pth\n"
788
+ ]
789
+ }
790
+ ],
791
+ "source": [
792
+ "torch.save(model.state_dict(), \"model.pth\")\n",
793
+ "print(\"Saved PyTorch Model State to model.pth\")"
794
+ ]
795
+ },
796
+ {
797
+ "cell_type": "markdown",
798
+ "metadata": {
799
+ "gradient": {
800
+ "editing": false,
801
+ "id": "b1e15431-85cf-4788-aa7f-5c12d77f4ac3",
802
+ "kernelId": ""
803
+ }
804
+ },
805
+ "source": [
806
+ "## Loading models\n",
807
+ "\n",
808
+ "The process for loading a model includes re-creating the model structure and loading\n",
809
+ "the state dictionary into it."
810
+ ]
811
+ },
812
+ {
813
+ "cell_type": "code",
814
+ "execution_count": 11,
815
+ "metadata": {
816
+ "collapsed": false,
817
+ "execution": {
818
+ "iopub.execute_input": "2022-09-27T20:37:51.047242Z",
819
+ "iopub.status.busy": "2022-09-27T20:37:51.046988Z",
820
+ "iopub.status.idle": "2022-09-27T20:37:51.073115Z",
821
+ "shell.execute_reply": "2022-09-27T20:37:51.072175Z",
822
+ "shell.execute_reply.started": "2022-09-27T20:37:51.047216Z"
823
+ },
824
+ "gradient": {
825
+ "editing": false,
826
+ "execution_count": 11,
827
+ "id": "ee2271cf-5092-43ad-afed-b64d2e6aea2c",
828
+ "kernelId": ""
829
+ },
830
+ "jupyter": {
831
+ "outputs_hidden": false
832
+ }
833
+ },
834
+ "outputs": [
835
+ {
836
+ "data": {
837
+ "text/plain": [
838
+ "<All keys matched successfully>"
839
+ ]
840
+ },
841
+ "execution_count": 11,
842
+ "metadata": {},
843
+ "output_type": "execute_result"
844
+ }
845
+ ],
846
+ "source": [
847
+ "model = NeuralNetwork()\n",
848
+ "model.load_state_dict(torch.load(\"model.pth\"))"
849
+ ]
850
+ },
851
+ {
852
+ "cell_type": "markdown",
853
+ "metadata": {
854
+ "gradient": {
855
+ "editing": false,
856
+ "id": "83cc12b8-fca2-4ea0-91f6-cdd8065d6164",
857
+ "kernelId": ""
858
+ }
859
+ },
860
+ "source": [
861
+ "This model can now be used to make predictions.\n",
862
+ "\n"
863
+ ]
864
+ },
865
+ {
866
+ "cell_type": "code",
867
+ "execution_count": 12,
868
+ "metadata": {
869
+ "collapsed": false,
870
+ "execution": {
871
+ "iopub.execute_input": "2022-09-27T20:37:51.076687Z",
872
+ "iopub.status.busy": "2022-09-27T20:37:51.076449Z",
873
+ "iopub.status.idle": "2022-09-27T20:37:51.108217Z",
874
+ "shell.execute_reply": "2022-09-27T20:37:51.107255Z",
875
+ "shell.execute_reply.started": "2022-09-27T20:37:51.076661Z"
876
+ },
877
+ "gradient": {
878
+ "editing": true,
879
+ "execution_count": 12,
880
+ "id": "efed4977-824f-4816-91c0-05f4e10d8b54",
881
+ "kernelId": ""
882
+ },
883
+ "jupyter": {
884
+ "outputs_hidden": false
885
+ }
886
+ },
887
+ "outputs": [
888
+ {
889
+ "name": "stdout",
890
+ "output_type": "stream",
891
+ "text": [
892
+ "Predicted: \"Ankle boot\", Actual: \"Ankle boot\"\n"
893
+ ]
894
+ }
895
+ ],
896
+ "source": [
897
+ "classes = [\n",
898
+ " \"T-shirt/top\",\n",
899
+ " \"Trouser\",\n",
900
+ " \"Pullover\",\n",
901
+ " \"Dress\",\n",
902
+ " \"Coat\",\n",
903
+ " \"Sandal\",\n",
904
+ " \"Shirt\",\n",
905
+ " \"Sneaker\",\n",
906
+ " \"Bag\",\n",
907
+ " \"Ankle boot\",\n",
908
+ "]\n",
909
+ "\n",
910
+ "model.eval()\n",
911
+ "x, y = test_data[0][0], test_data[0][1]\n",
912
+ "with torch.no_grad():\n",
913
+ " pred = model(x)\n",
914
+ " predicted, actual = classes[pred[0].argmax(0)], classes[y]\n",
915
+ " print(f'Predicted: \"{predicted}\", Actual: \"{actual}\"')"
916
+ ]
917
+ },
918
+ {
919
+ "cell_type": "markdown",
920
+ "metadata": {
921
+ "gradient": {
922
+ "editing": false,
923
+ "id": "0b064ce8-bacb-45c2-8ef3-3a45ff7ecd5a",
924
+ "kernelId": ""
925
+ }
926
+ },
927
+ "source": [
928
+ "Read more about [Saving & Loading your model](https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html)."
929
+ ]
930
+ },
931
+ {
932
+ "cell_type": "markdown",
933
+ "metadata": {
934
+ "gradient": {
935
+ "editing": false,
936
+ "id": "379b3389-034a-4c17-a742-dd7c6a8281ce",
937
+ "kernelId": ""
938
+ }
939
+ },
940
+ "source": [
941
+ "## Next steps\n",
942
+ "\n",
943
+ "To proceed with PyTorch in Gradient, you can:\n",
944
+ " \n",
945
+ " - Look at other Gradient material, such as our [tutorials](https://docs.paperspace.com/gradient/tutorials/) and [blog](https://blog.paperspace.com)\n",
946
+ " - Try out further [PyTorch tutorials](https://pytorch.org/tutorials/beginner/basics/intro.html)\n",
947
+ " - Start writing your own projects, using our [documentation](https://docs.paperspace.com/gradient) when needed\n",
948
+ " \n",
949
+ "If you get stuck or need help, [contact support](https://support.paperspace.com), and we will be happy to assist.\n",
950
+ "\n",
951
+ "Good luck!"
952
+ ]
953
+ },
954
+ {
955
+ "cell_type": "markdown",
956
+ "metadata": {
957
+ "gradient": {
958
+ "editing": false,
959
+ "id": "a4d2e55f-6c65-48fe-a9e7-165931791ff2",
960
+ "kernelId": ""
961
+ }
962
+ },
963
+ "source": [
964
+ "## Original PyTorch copyright notice\n",
965
+ "\n",
966
+ "© Copyright 2021, PyTorch."
967
+ ]
968
+ }
969
+ ],
970
+ "metadata": {
971
+ "kernelspec": {
972
+ "display_name": "Python 3 (ipykernel)",
973
+ "language": "python",
974
+ "name": "python3"
975
+ },
976
+ "language_info": {
977
+ "codemirror_mode": {
978
+ "name": "ipython",
979
+ "version": 3
980
+ },
981
+ "file_extension": ".py",
982
+ "mimetype": "text/x-python",
983
+ "name": "python",
984
+ "nbconvert_exporter": "python",
985
+ "pygments_lexer": "ipython3",
986
+ "version": "3.9.13"
987
+ }
988
+ },
989
+ "nbformat": 4,
990
+ "nbformat_minor": 4
991
+ }
quick_start_pytorch.ipynb ADDED
@@ -0,0 +1,991 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "gradient": {
7
+ "editing": false,
8
+ "id": "a4090294-3349-4815-96f4-98010b657359",
9
+ "kernelId": ""
10
+ }
11
+ },
12
+ "source": [
13
+ "# Paperspace Gradient: PyTorch Quick Start\n",
14
+ "Last modified: Sep 27th 2022"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {
20
+ "gradient": {
21
+ "editing": false,
22
+ "id": "4936c59a-8535-43cf-a527-e9323b2b658e",
23
+ "kernelId": ""
24
+ }
25
+ },
26
+ "source": [
27
+ "## Purpose and intended audience\n",
28
+ "\n",
29
+ "This Quick Start tutorial demonstrates PyTorch usage in a Gradient Notebook. It is aimed at users who are relatviely new to PyTorch, although you will need to be familiar with Python to understand PyTorch code.\n",
30
+ "\n",
31
+ "We use PyTorch to\n",
32
+ "\n",
33
+ "- Build a neural network that classifies FashionMNIST images\n",
34
+ "- Train and evaluate the network\n",
35
+ "- Save the model\n",
36
+ "- Perform predictions\n",
37
+ "\n",
38
+ "followed by some next steps that you can take to proceed with using Gradient.\n",
39
+ "\n",
40
+ "The material is based on the original [PyTorch Quick Start](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) at the time of writing this notebook.\n",
41
+ "\n",
42
+ "See the end of the notebook for the original copyright notice."
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "markdown",
47
+ "metadata": {
48
+ "gradient": {
49
+ "editing": false,
50
+ "id": "a55c3131-9437-483d-9c19-a165fbf8b6d4",
51
+ "kernelId": ""
52
+ }
53
+ },
54
+ "source": [
55
+ "## Check that you are on a GPU machine\n",
56
+ "\n",
57
+ "The notebook is designed to run on a Gradient GPU machine (as opposed to a CPU-only machine). The machine type, e.g., A4000, can be seen by clicking on the Machine icon on the left-hand navigation bar in the Gradient Notebook interface. It will say if it is CPU or GPU.\n",
58
+ "\n",
59
+ "![quick_start_pytorch_images/example_instance_type.png](quick_start_pytorch_images/example_instance_type.png)\n",
60
+ "\n",
61
+ "The *Creating models* section below also determines whether or not a GPU is available for us to use.\n",
62
+ "\n",
63
+ "If the machine type is CPU, you can change it by clicking *Stop Machine*, then the machine type displayed to get a drop-down list. Select a GPU machine and start up the Notebook again.\n",
64
+ "\n",
65
+ "For help with machines, see the Gradient documentation on [machine types](https://docs.paperspace.com/gradient/machines/) or [starting a Gradient Notebook](https://docs.paperspace.com/gradient/explore-train-deploy/notebooks)."
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "markdown",
70
+ "metadata": {
71
+ "gradient": {
72
+ "editing": false,
73
+ "id": "28402a66-a8c4-4672-9592-cc530b58d439",
74
+ "kernelId": ""
75
+ }
76
+ },
77
+ "source": [
78
+ "## Working with data\n",
79
+ "\n",
80
+ "PyTorch has two [primitives to work with data](https://pytorch.org/docs/stable/data.html):\n",
81
+ "``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``.\n",
82
+ "``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around\n",
83
+ "the ``Dataset``."
84
+ ]
85
+ },
86
+ {
87
+ "cell_type": "code",
88
+ "execution_count": 2,
89
+ "metadata": {
90
+ "collapsed": false,
91
+ "execution": {
92
+ "iopub.execute_input": "2022-09-27T20:36:04.965047Z",
93
+ "iopub.status.busy": "2022-09-27T20:36:04.964421Z",
94
+ "iopub.status.idle": "2022-09-27T20:36:06.330541Z",
95
+ "shell.execute_reply": "2022-09-27T20:36:06.329333Z",
96
+ "shell.execute_reply.started": "2022-09-27T20:36:04.965047Z"
97
+ },
98
+ "gradient": {
99
+ "editing": false,
100
+ "execution_count": 2,
101
+ "id": "2bab3caa-e156-4635-bc21-53031ebea60d",
102
+ "kernelId": ""
103
+ },
104
+ "jupyter": {
105
+ "outputs_hidden": false
106
+ }
107
+ },
108
+ "outputs": [],
109
+ "source": [
110
+ "import torch\n",
111
+ "from torch import nn\n",
112
+ "from torch.utils.data import DataLoader\n",
113
+ "from torchvision import datasets\n",
114
+ "from torchvision.transforms import ToTensor, Lambda, Compose"
115
+ ]
116
+ },
117
+ {
118
+ "cell_type": "markdown",
119
+ "metadata": {
120
+ "gradient": {
121
+ "editing": false,
122
+ "id": "0dfb0116-56cd-4795-bc5e-79baad627726",
123
+ "kernelId": ""
124
+ }
125
+ },
126
+ "source": [
127
+ "PyTorch offers domain-specific libraries such as [TorchText](https://pytorch.org/text/stable/index.html),\n",
128
+ "[TorchVision](https://pytorch.org/vision/stable/index.html), and [TorchAudio](https://pytorch.org/audio/stable/index.html),\n",
129
+ "all of which include datasets. For this tutorial, we will be using a TorchVision dataset.\n",
130
+ "\n",
131
+ "The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like\n",
132
+ "CIFAR, COCO ([full list here](https://pytorch.org/vision/stable/datasets.html)). In this tutorial, we\n",
133
+ "use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and\n",
134
+ "``target_transform`` to modify the samples and labels respectively."
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "code",
139
+ "execution_count": 3,
140
+ "metadata": {
141
+ "collapsed": false,
142
+ "execution": {
143
+ "iopub.execute_input": "2022-09-27T20:36:06.332087Z",
144
+ "iopub.status.busy": "2022-09-27T20:36:06.331786Z",
145
+ "iopub.status.idle": "2022-09-27T20:36:33.429172Z",
146
+ "shell.execute_reply": "2022-09-27T20:36:33.428023Z",
147
+ "shell.execute_reply.started": "2022-09-27T20:36:06.332087Z"
148
+ },
149
+ "gradient": {
150
+ "editing": false,
151
+ "execution_count": 3,
152
+ "id": "631deddf-30f0-45f1-84ab-e5f4c510c500",
153
+ "kernelId": ""
154
+ },
155
+ "jupyter": {
156
+ "outputs_hidden": false
157
+ }
158
+ },
159
+ "outputs": [
160
+ {
161
+ "name": "stdout",
162
+ "output_type": "stream",
163
+ "text": [
164
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz\n",
165
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz\n"
166
+ ]
167
+ },
168
+ {
169
+ "data": {
170
+ "application/vnd.jupyter.widget-view+json": {
171
+ "model_id": "30b312514a6a4edcb608e92ecda0a385",
172
+ "version_major": 2,
173
+ "version_minor": 0
174
+ },
175
+ "text/plain": [
176
+ " 0%| | 0/26421880 [00:00<?, ?it/s]"
177
+ ]
178
+ },
179
+ "metadata": {},
180
+ "output_type": "display_data"
181
+ },
182
+ {
183
+ "name": "stdout",
184
+ "output_type": "stream",
185
+ "text": [
186
+ "Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw\n",
187
+ "\n",
188
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz\n",
189
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz\n"
190
+ ]
191
+ },
192
+ {
193
+ "data": {
194
+ "application/vnd.jupyter.widget-view+json": {
195
+ "model_id": "4e8ad8c0dd9d4eae8d6d3a59677e7a99",
196
+ "version_major": 2,
197
+ "version_minor": 0
198
+ },
199
+ "text/plain": [
200
+ " 0%| | 0/29515 [00:00<?, ?it/s]"
201
+ ]
202
+ },
203
+ "metadata": {},
204
+ "output_type": "display_data"
205
+ },
206
+ {
207
+ "name": "stdout",
208
+ "output_type": "stream",
209
+ "text": [
210
+ "Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n",
211
+ "\n",
212
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz\n",
213
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz\n"
214
+ ]
215
+ },
216
+ {
217
+ "data": {
218
+ "application/vnd.jupyter.widget-view+json": {
219
+ "model_id": "2465426f4de748bf955849e5bfeb5384",
220
+ "version_major": 2,
221
+ "version_minor": 0
222
+ },
223
+ "text/plain": [
224
+ " 0%| | 0/4422102 [00:00<?, ?it/s]"
225
+ ]
226
+ },
227
+ "metadata": {},
228
+ "output_type": "display_data"
229
+ },
230
+ {
231
+ "name": "stdout",
232
+ "output_type": "stream",
233
+ "text": [
234
+ "Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw\n",
235
+ "\n",
236
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz\n",
237
+ "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz\n"
238
+ ]
239
+ },
240
+ {
241
+ "data": {
242
+ "application/vnd.jupyter.widget-view+json": {
243
+ "model_id": "4f090bef77ea43f49f503faf722b1e67",
244
+ "version_major": 2,
245
+ "version_minor": 0
246
+ },
247
+ "text/plain": [
248
+ " 0%| | 0/5148 [00:00<?, ?it/s]"
249
+ ]
250
+ },
251
+ "metadata": {},
252
+ "output_type": "display_data"
253
+ },
254
+ {
255
+ "name": "stdout",
256
+ "output_type": "stream",
257
+ "text": [
258
+ "Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n",
259
+ "\n"
260
+ ]
261
+ }
262
+ ],
263
+ "source": [
264
+ "# Download training data from open datasets\n",
265
+ "training_data = datasets.FashionMNIST(\n",
266
+ " root=\"data\",\n",
267
+ " train=True,\n",
268
+ " download=True,\n",
269
+ " transform=ToTensor(),\n",
270
+ ")\n",
271
+ "\n",
272
+ "# Download test data from open datasets\n",
273
+ "test_data = datasets.FashionMNIST(\n",
274
+ " root=\"data\",\n",
275
+ " train=False,\n",
276
+ " download=True,\n",
277
+ " transform=ToTensor(),\n",
278
+ ")"
279
+ ]
280
+ },
281
+ {
282
+ "cell_type": "markdown",
283
+ "metadata": {
284
+ "gradient": {
285
+ "editing": false,
286
+ "id": "0ace6ebf-b493-4b75-9bfa-dc48bc676b21",
287
+ "kernelId": ""
288
+ }
289
+ },
290
+ "source": [
291
+ "We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports\n",
292
+ "automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e., each element\n",
293
+ "in the dataloader iterable will return a batch of 64 features and labels."
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": 4,
299
+ "metadata": {
300
+ "collapsed": false,
301
+ "execution": {
302
+ "iopub.execute_input": "2022-09-27T20:36:33.430736Z",
303
+ "iopub.status.busy": "2022-09-27T20:36:33.430441Z",
304
+ "iopub.status.idle": "2022-09-27T20:36:33.449430Z",
305
+ "shell.execute_reply": "2022-09-27T20:36:33.448119Z",
306
+ "shell.execute_reply.started": "2022-09-27T20:36:33.430708Z"
307
+ },
308
+ "gradient": {
309
+ "editing": false,
310
+ "execution_count": 4,
311
+ "id": "8e65f970-dce8-460c-b5f2-9cbee0c14900",
312
+ "kernelId": ""
313
+ },
314
+ "jupyter": {
315
+ "outputs_hidden": false
316
+ }
317
+ },
318
+ "outputs": [
319
+ {
320
+ "name": "stdout",
321
+ "output_type": "stream",
322
+ "text": [
323
+ "Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])\n",
324
+ "Shape of y: torch.Size([64]) torch.int64\n"
325
+ ]
326
+ }
327
+ ],
328
+ "source": [
329
+ "batch_size = 64\n",
330
+ "\n",
331
+ "# Create data loaders\n",
332
+ "train_dataloader = DataLoader(training_data, batch_size=batch_size)\n",
333
+ "test_dataloader = DataLoader(test_data, batch_size=batch_size)\n",
334
+ "\n",
335
+ "for X, y in test_dataloader:\n",
336
+ " print(\"Shape of X [N, C, H, W]: \", X.shape)\n",
337
+ " print(\"Shape of y: \", y.shape, y.dtype)\n",
338
+ " break"
339
+ ]
340
+ },
341
+ {
342
+ "cell_type": "markdown",
343
+ "metadata": {
344
+ "gradient": {
345
+ "editing": false,
346
+ "id": "f9d1b1f7-0850-4676-93b6-902f78be237d",
347
+ "kernelId": ""
348
+ }
349
+ },
350
+ "source": [
351
+ "Read more about [loading data in PyTorch](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html)."
352
+ ]
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "metadata": {
357
+ "gradient": {
358
+ "editing": false,
359
+ "id": "d9cc95fe-194b-4a6f-b01d-91510dfcfb00",
360
+ "kernelId": ""
361
+ }
362
+ },
363
+ "source": [
364
+ "## Creating models, including GPU\n",
365
+ "\n",
366
+ "To define a neural network in PyTorch, we create a class that inherits\n",
367
+ "from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network\n",
368
+ "in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate\n",
369
+ "operations in the neural network, we move it to the GPU if available."
370
+ ]
371
+ },
372
+ {
373
+ "cell_type": "code",
374
+ "execution_count": 5,
375
+ "metadata": {
376
+ "collapsed": false,
377
+ "execution": {
378
+ "iopub.execute_input": "2022-09-27T20:36:33.453700Z",
379
+ "iopub.status.busy": "2022-09-27T20:36:33.453070Z",
380
+ "iopub.status.idle": "2022-09-27T20:36:35.334541Z",
381
+ "shell.execute_reply": "2022-09-27T20:36:35.329047Z",
382
+ "shell.execute_reply.started": "2022-09-27T20:36:33.453700Z"
383
+ },
384
+ "gradient": {
385
+ "editing": false,
386
+ "execution_count": 5,
387
+ "id": "d58d5484-8ca0-4400-91c5-d0e71cf89c12",
388
+ "kernelId": ""
389
+ },
390
+ "jupyter": {
391
+ "outputs_hidden": false
392
+ }
393
+ },
394
+ "outputs": [
395
+ {
396
+ "name": "stdout",
397
+ "output_type": "stream",
398
+ "text": [
399
+ "Using cuda device\n",
400
+ "NeuralNetwork(\n",
401
+ " (flatten): Flatten(start_dim=1, end_dim=-1)\n",
402
+ " (linear_relu_stack): Sequential(\n",
403
+ " (0): Linear(in_features=784, out_features=512, bias=True)\n",
404
+ " (1): ReLU()\n",
405
+ " (2): Linear(in_features=512, out_features=512, bias=True)\n",
406
+ " (3): ReLU()\n",
407
+ " (4): Linear(in_features=512, out_features=10, bias=True)\n",
408
+ " )\n",
409
+ ")\n"
410
+ ]
411
+ }
412
+ ],
413
+ "source": [
414
+ "# Get cpu or gpu device for training\n",
415
+ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
416
+ "print(\"Using {} device\".format(device))\n",
417
+ "\n",
418
+ "# Define model\n",
419
+ "class NeuralNetwork(nn.Module):\n",
420
+ " def __init__(self):\n",
421
+ " super(NeuralNetwork, self).__init__()\n",
422
+ " self.flatten = nn.Flatten()\n",
423
+ " self.linear_relu_stack = nn.Sequential(\n",
424
+ " nn.Linear(28*28, 512),\n",
425
+ " nn.ReLU(),\n",
426
+ " nn.Linear(512, 512),\n",
427
+ " nn.ReLU(),\n",
428
+ " nn.Linear(512, 10)\n",
429
+ " )\n",
430
+ "\n",
431
+ " def forward(self, x):\n",
432
+ " x = self.flatten(x)\n",
433
+ " logits = self.linear_relu_stack(x)\n",
434
+ " return logits\n",
435
+ "\n",
436
+ "model = NeuralNetwork().to(device)\n",
437
+ "print(model)"
438
+ ]
439
+ },
440
+ {
441
+ "cell_type": "markdown",
442
+ "metadata": {
443
+ "gradient": {
444
+ "editing": false,
445
+ "id": "7ee591d8-e529-481b-8107-e84454893bd2",
446
+ "kernelId": ""
447
+ }
448
+ },
449
+ "source": [
450
+ "Read more about [building neural networks in PyTorch](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)."
451
+ ]
452
+ },
453
+ {
454
+ "cell_type": "markdown",
455
+ "metadata": {
456
+ "gradient": {
457
+ "editing": false,
458
+ "id": "b6db5b4f-80b9-4f9e-8feb-76d0ef1e346f",
459
+ "kernelId": ""
460
+ }
461
+ },
462
+ "source": [
463
+ "## Optimizing the model parameters\n",
464
+ "\n",
465
+ "To train a model, we need a [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions)\n",
466
+ "and an [optimizer](https://pytorch.org/docs/stable/optim.html)."
467
+ ]
468
+ },
469
+ {
470
+ "cell_type": "code",
471
+ "execution_count": 6,
472
+ "metadata": {
473
+ "collapsed": false,
474
+ "execution": {
475
+ "iopub.execute_input": "2022-09-27T20:36:35.340252Z",
476
+ "iopub.status.busy": "2022-09-27T20:36:35.339874Z",
477
+ "iopub.status.idle": "2022-09-27T20:36:35.345985Z",
478
+ "shell.execute_reply": "2022-09-27T20:36:35.344793Z",
479
+ "shell.execute_reply.started": "2022-09-27T20:36:35.340209Z"
480
+ },
481
+ "gradient": {
482
+ "editing": false,
483
+ "execution_count": 6,
484
+ "id": "8c22a532-16e0-440d-888e-d879e5f53c7c",
485
+ "kernelId": ""
486
+ },
487
+ "jupyter": {
488
+ "outputs_hidden": false
489
+ }
490
+ },
491
+ "outputs": [],
492
+ "source": [
493
+ "loss_fn = nn.CrossEntropyLoss()\n",
494
+ "optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)"
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "markdown",
499
+ "metadata": {
500
+ "gradient": {
501
+ "editing": false,
502
+ "id": "5efe3473-ecf7-411c-a13b-ba54f5c257a6",
503
+ "kernelId": ""
504
+ }
505
+ },
506
+ "source": [
507
+ "In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and\n",
508
+ "backpropagates the prediction error to adjust the model's parameters."
509
+ ]
510
+ },
511
+ {
512
+ "cell_type": "code",
513
+ "execution_count": 7,
514
+ "metadata": {
515
+ "collapsed": false,
516
+ "execution": {
517
+ "iopub.execute_input": "2022-09-27T20:36:35.350028Z",
518
+ "iopub.status.busy": "2022-09-27T20:36:35.349717Z",
519
+ "iopub.status.idle": "2022-09-27T20:36:35.357590Z",
520
+ "shell.execute_reply": "2022-09-27T20:36:35.356224Z",
521
+ "shell.execute_reply.started": "2022-09-27T20:36:35.350001Z"
522
+ },
523
+ "gradient": {
524
+ "editing": false,
525
+ "execution_count": 7,
526
+ "id": "3d1af6c1-299b-4572-902a-c5e52ce0a7d2",
527
+ "kernelId": ""
528
+ },
529
+ "jupyter": {
530
+ "outputs_hidden": false
531
+ }
532
+ },
533
+ "outputs": [],
534
+ "source": [
535
+ "def train(dataloader, model, loss_fn, optimizer):\n",
536
+ " size = len(dataloader.dataset)\n",
537
+ " model.train()\n",
538
+ " for batch, (X, y) in enumerate(dataloader):\n",
539
+ " X, y = X.to(device), y.to(device)\n",
540
+ "\n",
541
+ " # Compute prediction error\n",
542
+ " pred = model(X)\n",
543
+ " loss = loss_fn(pred, y)\n",
544
+ "\n",
545
+ " # Backpropagation\n",
546
+ " optimizer.zero_grad()\n",
547
+ " loss.backward()\n",
548
+ " optimizer.step()\n",
549
+ "\n",
550
+ " if batch % 100 == 0:\n",
551
+ " loss, current = loss.item(), batch * len(X)\n",
552
+ " print(f\"loss: {loss:>7f} [{current:>5d}/{size:>5d}]\")"
553
+ ]
554
+ },
555
+ {
556
+ "cell_type": "markdown",
557
+ "metadata": {
558
+ "gradient": {
559
+ "editing": false,
560
+ "id": "f86e28f0-bb94-4443-a673-f6d3461d4e94",
561
+ "kernelId": ""
562
+ }
563
+ },
564
+ "source": [
565
+ "We also check the model's performance against the test dataset to ensure it is learning."
566
+ ]
567
+ },
568
+ {
569
+ "cell_type": "code",
570
+ "execution_count": 8,
571
+ "metadata": {
572
+ "collapsed": false,
573
+ "execution": {
574
+ "iopub.execute_input": "2022-09-27T20:36:35.362383Z",
575
+ "iopub.status.busy": "2022-09-27T20:36:35.362293Z",
576
+ "iopub.status.idle": "2022-09-27T20:36:35.370320Z",
577
+ "shell.execute_reply": "2022-09-27T20:36:35.369013Z",
578
+ "shell.execute_reply.started": "2022-09-27T20:36:35.362345Z"
579
+ },
580
+ "gradient": {
581
+ "editing": false,
582
+ "execution_count": 8,
583
+ "id": "112d81e3-cdf8-4b1e-afca-6344be54f5e5",
584
+ "kernelId": ""
585
+ },
586
+ "jupyter": {
587
+ "outputs_hidden": false
588
+ }
589
+ },
590
+ "outputs": [],
591
+ "source": [
592
+ "def test(dataloader, model, loss_fn):\n",
593
+ " size = len(dataloader.dataset)\n",
594
+ " num_batches = len(dataloader)\n",
595
+ " model.eval()\n",
596
+ " test_loss, correct = 0, 0\n",
597
+ " with torch.no_grad():\n",
598
+ " for X, y in dataloader:\n",
599
+ " X, y = X.to(device), y.to(device)\n",
600
+ " pred = model(X)\n",
601
+ " test_loss += loss_fn(pred, y).item()\n",
602
+ " correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n",
603
+ " test_loss /= num_batches\n",
604
+ " correct /= size\n",
605
+ " print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")"
606
+ ]
607
+ },
608
+ {
609
+ "cell_type": "markdown",
610
+ "metadata": {
611
+ "gradient": {
612
+ "editing": false,
613
+ "id": "4e366ecc-735f-42dd-b04e-a94816b94fd8",
614
+ "kernelId": ""
615
+ }
616
+ },
617
+ "source": [
618
+ "The training process is conducted over several iterations (*epochs*). During each epoch, the model learns\n",
619
+ "parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the\n",
620
+ "accuracy increase and the loss decrease with every epoch."
621
+ ]
622
+ },
623
+ {
624
+ "cell_type": "code",
625
+ "execution_count": 9,
626
+ "metadata": {
627
+ "collapsed": false,
628
+ "execution": {
629
+ "iopub.execute_input": "2022-09-27T20:36:35.374528Z",
630
+ "iopub.status.busy": "2022-09-27T20:36:35.374285Z",
631
+ "iopub.status.idle": "2022-09-27T20:37:29.296376Z",
632
+ "shell.execute_reply": "2022-09-27T20:37:29.295164Z",
633
+ "shell.execute_reply.started": "2022-09-27T20:36:35.374502Z"
634
+ },
635
+ "gradient": {
636
+ "editing": false,
637
+ "execution_count": 9,
638
+ "id": "50bf09d9-1318-43ef-92aa-6ee308fcafa1",
639
+ "kernelId": ""
640
+ },
641
+ "jupyter": {
642
+ "outputs_hidden": false
643
+ }
644
+ },
645
+ "outputs": [
646
+ {
647
+ "name": "stdout",
648
+ "output_type": "stream",
649
+ "text": [
650
+ "Epoch 1\n",
651
+ "-------------------------------\n",
652
+ "loss: 2.304299 [ 0/60000]\n",
653
+ "loss: 2.290307 [ 6400/60000]\n",
654
+ "loss: 2.268486 [12800/60000]\n",
655
+ "loss: 2.256835 [19200/60000]\n",
656
+ "loss: 2.248106 [25600/60000]\n",
657
+ "loss: 2.217304 [32000/60000]\n",
658
+ "loss: 2.215746 [38400/60000]\n",
659
+ "loss: 2.182278 [44800/60000]\n",
660
+ "loss: 2.179303 [51200/60000]\n",
661
+ "loss: 2.150798 [57600/60000]\n",
662
+ "Test Error: \n",
663
+ " Accuracy: 55.6%, Avg loss: 2.143109 \n",
664
+ "\n",
665
+ "Epoch 2\n",
666
+ "-------------------------------\n",
667
+ "loss: 2.155640 [ 0/60000]\n",
668
+ "loss: 2.144754 [ 6400/60000]\n",
669
+ "loss: 2.083586 [12800/60000]\n",
670
+ "loss: 2.091499 [19200/60000]\n",
671
+ "loss: 2.045041 [25600/60000]\n",
672
+ "loss: 1.986636 [32000/60000]\n",
673
+ "loss: 2.002200 [38400/60000]\n",
674
+ "loss: 1.927214 [44800/60000]\n",
675
+ "loss: 1.931510 [51200/60000]\n",
676
+ "loss: 1.847673 [57600/60000]\n",
677
+ "Test Error: \n",
678
+ " Accuracy: 59.5%, Avg loss: 1.857198 \n",
679
+ "\n",
680
+ "Epoch 3\n",
681
+ "-------------------------------\n",
682
+ "loss: 1.893984 [ 0/60000]\n",
683
+ "loss: 1.863075 [ 6400/60000]\n",
684
+ "loss: 1.748540 [12800/60000]\n",
685
+ "loss: 1.779858 [19200/60000]\n",
686
+ "loss: 1.666921 [25600/60000]\n",
687
+ "loss: 1.633243 [32000/60000]\n",
688
+ "loss: 1.639619 [38400/60000]\n",
689
+ "loss: 1.551572 [44800/60000]\n",
690
+ "loss: 1.578183 [51200/60000]\n",
691
+ "loss: 1.462901 [57600/60000]\n",
692
+ "Test Error: \n",
693
+ " Accuracy: 61.7%, Avg loss: 1.489910 \n",
694
+ "\n",
695
+ "Epoch 4\n",
696
+ "-------------------------------\n",
697
+ "loss: 1.560461 [ 0/60000]\n",
698
+ "loss: 1.525511 [ 6400/60000]\n",
699
+ "loss: 1.381848 [12800/60000]\n",
700
+ "loss: 1.445225 [19200/60000]\n",
701
+ "loss: 1.320462 [25600/60000]\n",
702
+ "loss: 1.335552 [32000/60000]\n",
703
+ "loss: 1.336702 [38400/60000]\n",
704
+ "loss: 1.266305 [44800/60000]\n",
705
+ "loss: 1.303894 [51200/60000]\n",
706
+ "loss: 1.202768 [57600/60000]\n",
707
+ "Test Error: \n",
708
+ " Accuracy: 63.3%, Avg loss: 1.229126 \n",
709
+ "\n",
710
+ "Epoch 5\n",
711
+ "-------------------------------\n",
712
+ "loss: 1.309631 [ 0/60000]\n",
713
+ "loss: 1.289756 [ 6400/60000]\n",
714
+ "loss: 1.129725 [12800/60000]\n",
715
+ "loss: 1.231920 [19200/60000]\n",
716
+ "loss: 1.100483 [25600/60000]\n",
717
+ "loss: 1.141074 [32000/60000]\n",
718
+ "loss: 1.153783 [38400/60000]\n",
719
+ "loss: 1.090403 [44800/60000]\n",
720
+ "loss: 1.133582 [51200/60000]\n",
721
+ "loss: 1.050682 [57600/60000]\n",
722
+ "Test Error: \n",
723
+ " Accuracy: 64.3%, Avg loss: 1.069880 \n",
724
+ "\n",
725
+ "Done!\n"
726
+ ]
727
+ }
728
+ ],
729
+ "source": [
730
+ "epochs = 5\n",
731
+ "for t in range(epochs):\n",
732
+ " print(f\"Epoch {t+1}\\n-------------------------------\")\n",
733
+ " train(train_dataloader, model, loss_fn, optimizer)\n",
734
+ " test(test_dataloader, model, loss_fn)\n",
735
+ "print(\"Done!\")"
736
+ ]
737
+ },
738
+ {
739
+ "cell_type": "markdown",
740
+ "metadata": {},
741
+ "source": [
742
+ "Read more about [Training your model](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html)."
743
+ ]
744
+ },
745
+ {
746
+ "cell_type": "markdown",
747
+ "metadata": {
748
+ "gradient": {
749
+ "editing": false,
750
+ "id": "88e2d48b-f1c2-43b0-956d-673d31e777cc",
751
+ "kernelId": ""
752
+ }
753
+ },
754
+ "source": [
755
+ "## Saving models\n",
756
+ "\n",
757
+ "A common way to save a model is to serialize the internal state dictionary (containing the model parameters)."
758
+ ]
759
+ },
760
+ {
761
+ "cell_type": "code",
762
+ "execution_count": 10,
763
+ "metadata": {
764
+ "collapsed": false,
765
+ "execution": {
766
+ "iopub.execute_input": "2022-09-27T20:37:29.304919Z",
767
+ "iopub.status.busy": "2022-09-27T20:37:29.304520Z",
768
+ "iopub.status.idle": "2022-09-27T20:37:51.042987Z",
769
+ "shell.execute_reply": "2022-09-27T20:37:51.041902Z",
770
+ "shell.execute_reply.started": "2022-09-27T20:37:29.304889Z"
771
+ },
772
+ "gradient": {
773
+ "editing": false,
774
+ "execution_count": 10,
775
+ "id": "5674fda2-6f1d-447c-ac05-d21934c7fe6f",
776
+ "kernelId": ""
777
+ },
778
+ "jupyter": {
779
+ "outputs_hidden": false
780
+ }
781
+ },
782
+ "outputs": [
783
+ {
784
+ "name": "stdout",
785
+ "output_type": "stream",
786
+ "text": [
787
+ "Saved PyTorch Model State to model.pth\n"
788
+ ]
789
+ }
790
+ ],
791
+ "source": [
792
+ "torch.save(model.state_dict(), \"model.pth\")\n",
793
+ "print(\"Saved PyTorch Model State to model.pth\")"
794
+ ]
795
+ },
796
+ {
797
+ "cell_type": "markdown",
798
+ "metadata": {
799
+ "gradient": {
800
+ "editing": false,
801
+ "id": "b1e15431-85cf-4788-aa7f-5c12d77f4ac3",
802
+ "kernelId": ""
803
+ }
804
+ },
805
+ "source": [
806
+ "## Loading models\n",
807
+ "\n",
808
+ "The process for loading a model includes re-creating the model structure and loading\n",
809
+ "the state dictionary into it."
810
+ ]
811
+ },
812
+ {
813
+ "cell_type": "code",
814
+ "execution_count": 11,
815
+ "metadata": {
816
+ "collapsed": false,
817
+ "execution": {
818
+ "iopub.execute_input": "2022-09-27T20:37:51.047242Z",
819
+ "iopub.status.busy": "2022-09-27T20:37:51.046988Z",
820
+ "iopub.status.idle": "2022-09-27T20:37:51.073115Z",
821
+ "shell.execute_reply": "2022-09-27T20:37:51.072175Z",
822
+ "shell.execute_reply.started": "2022-09-27T20:37:51.047216Z"
823
+ },
824
+ "gradient": {
825
+ "editing": false,
826
+ "execution_count": 11,
827
+ "id": "ee2271cf-5092-43ad-afed-b64d2e6aea2c",
828
+ "kernelId": ""
829
+ },
830
+ "jupyter": {
831
+ "outputs_hidden": false
832
+ }
833
+ },
834
+ "outputs": [
835
+ {
836
+ "data": {
837
+ "text/plain": [
838
+ "<All keys matched successfully>"
839
+ ]
840
+ },
841
+ "execution_count": 11,
842
+ "metadata": {},
843
+ "output_type": "execute_result"
844
+ }
845
+ ],
846
+ "source": [
847
+ "model = NeuralNetwork()\n",
848
+ "model.load_state_dict(torch.load(\"model.pth\"))"
849
+ ]
850
+ },
851
+ {
852
+ "cell_type": "markdown",
853
+ "metadata": {
854
+ "gradient": {
855
+ "editing": false,
856
+ "id": "83cc12b8-fca2-4ea0-91f6-cdd8065d6164",
857
+ "kernelId": ""
858
+ }
859
+ },
860
+ "source": [
861
+ "This model can now be used to make predictions.\n",
862
+ "\n"
863
+ ]
864
+ },
865
+ {
866
+ "cell_type": "code",
867
+ "execution_count": 12,
868
+ "metadata": {
869
+ "collapsed": false,
870
+ "execution": {
871
+ "iopub.execute_input": "2022-09-27T20:37:51.076687Z",
872
+ "iopub.status.busy": "2022-09-27T20:37:51.076449Z",
873
+ "iopub.status.idle": "2022-09-27T20:37:51.108217Z",
874
+ "shell.execute_reply": "2022-09-27T20:37:51.107255Z",
875
+ "shell.execute_reply.started": "2022-09-27T20:37:51.076661Z"
876
+ },
877
+ "gradient": {
878
+ "editing": true,
879
+ "execution_count": 12,
880
+ "id": "efed4977-824f-4816-91c0-05f4e10d8b54",
881
+ "kernelId": ""
882
+ },
883
+ "jupyter": {
884
+ "outputs_hidden": false
885
+ }
886
+ },
887
+ "outputs": [
888
+ {
889
+ "name": "stdout",
890
+ "output_type": "stream",
891
+ "text": [
892
+ "Predicted: \"Ankle boot\", Actual: \"Ankle boot\"\n"
893
+ ]
894
+ }
895
+ ],
896
+ "source": [
897
+ "classes = [\n",
898
+ " \"T-shirt/top\",\n",
899
+ " \"Trouser\",\n",
900
+ " \"Pullover\",\n",
901
+ " \"Dress\",\n",
902
+ " \"Coat\",\n",
903
+ " \"Sandal\",\n",
904
+ " \"Shirt\",\n",
905
+ " \"Sneaker\",\n",
906
+ " \"Bag\",\n",
907
+ " \"Ankle boot\",\n",
908
+ "]\n",
909
+ "\n",
910
+ "model.eval()\n",
911
+ "x, y = test_data[0][0], test_data[0][1]\n",
912
+ "with torch.no_grad():\n",
913
+ " pred = model(x)\n",
914
+ " predicted, actual = classes[pred[0].argmax(0)], classes[y]\n",
915
+ " print(f'Predicted: \"{predicted}\", Actual: \"{actual}\"')"
916
+ ]
917
+ },
918
+ {
919
+ "cell_type": "markdown",
920
+ "metadata": {
921
+ "gradient": {
922
+ "editing": false,
923
+ "id": "0b064ce8-bacb-45c2-8ef3-3a45ff7ecd5a",
924
+ "kernelId": ""
925
+ }
926
+ },
927
+ "source": [
928
+ "Read more about [Saving & Loading your model](https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html)."
929
+ ]
930
+ },
931
+ {
932
+ "cell_type": "markdown",
933
+ "metadata": {
934
+ "gradient": {
935
+ "editing": false,
936
+ "id": "379b3389-034a-4c17-a742-dd7c6a8281ce",
937
+ "kernelId": ""
938
+ }
939
+ },
940
+ "source": [
941
+ "## Next steps\n",
942
+ "\n",
943
+ "To proceed with PyTorch in Gradient, you can:\n",
944
+ " \n",
945
+ " - Look at other Gradient material, such as our [tutorials](https://docs.paperspace.com/gradient/tutorials/) and [blog](https://blog.paperspace.com)\n",
946
+ " - Try out further [PyTorch tutorials](https://pytorch.org/tutorials/beginner/basics/intro.html)\n",
947
+ " - Start writing your own projects, using our [documentation](https://docs.paperspace.com/gradient) when needed\n",
948
+ " \n",
949
+ "If you get stuck or need help, [contact support](https://support.paperspace.com), and we will be happy to assist.\n",
950
+ "\n",
951
+ "Good luck!"
952
+ ]
953
+ },
954
+ {
955
+ "cell_type": "markdown",
956
+ "metadata": {
957
+ "gradient": {
958
+ "editing": false,
959
+ "id": "a4d2e55f-6c65-48fe-a9e7-165931791ff2",
960
+ "kernelId": ""
961
+ }
962
+ },
963
+ "source": [
964
+ "## Original PyTorch copyright notice\n",
965
+ "\n",
966
+ "© Copyright 2021, PyTorch."
967
+ ]
968
+ }
969
+ ],
970
+ "metadata": {
971
+ "kernelspec": {
972
+ "display_name": "Python 3 (ipykernel)",
973
+ "language": "python",
974
+ "name": "python3"
975
+ },
976
+ "language_info": {
977
+ "codemirror_mode": {
978
+ "name": "ipython",
979
+ "version": 3
980
+ },
981
+ "file_extension": ".py",
982
+ "mimetype": "text/x-python",
983
+ "name": "python",
984
+ "nbconvert_exporter": "python",
985
+ "pygments_lexer": "ipython3",
986
+ "version": "3.9.16"
987
+ }
988
+ },
989
+ "nbformat": 4,
990
+ "nbformat_minor": 4
991
+ }