Prince-1's picture
Add files using upload-large-folder tool
d157f08 verified
1
00:00:01,080 --> 00:00:08,280
And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far
2
00:00:08,490 --> 00:00:15,240
we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all
3
00:00:15,420 --> 00:00:22,660
those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a
4
00:00:22,660 --> 00:00:27,170
simple process where we reduce the size or dimensionality of the future map.
5
00:00:27,280 --> 00:00:31,690
The purpose of this reductionists reduced number of parameters that we need to train whilst retaining
6
00:00:31,690 --> 00:00:36,670
most of the important features and information in the image.
7
00:00:36,870 --> 00:00:39,100
They are basically tree types of pooling we can apply.
8
00:00:39,100 --> 00:00:43,800
There are actually some Wolper does take a look at these Tree Man types that are used.
9
00:00:43,870 --> 00:00:46,250
So here's an example of Max pooling.
10
00:00:46,300 --> 00:00:52,900
Imagine this is the really outputs from all this input output here was reproduced from the real real
11
00:00:52,940 --> 00:00:53,510
layer.
12
00:00:53,800 --> 00:00:57,430
So you can imagine these values at the zeros here were actually negative values.
13
00:00:57,820 --> 00:01:03,790
So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just
14
00:01:03,790 --> 00:01:09,520
like we did with the straight and of the kernels we used in the convolutional Liya and basically using
15
00:01:09,520 --> 00:01:10,530
a two by two.
16
00:01:10,600 --> 00:01:15,250
It splits up into two by two two by two two by two by two grid.
17
00:01:15,580 --> 00:01:24,190
So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them
18
00:01:24,190 --> 00:01:25,380
into this block here.
19
00:01:25,750 --> 00:01:29,270
So this is what we call downsampling or subsampling.
20
00:01:29,320 --> 00:01:35,440
Basically we have sort of like compressed the image here and retain the most Max important features
21
00:01:36,680 --> 00:01:37,470
actually.
22
00:01:37,470 --> 00:01:40,160
Let's go back to the previous slide and previously.
23
00:01:40,210 --> 00:01:42,810
We mentioned average and sampling.
24
00:01:42,850 --> 00:01:48,850
Now as you can imagine average and sampling would just simply be the average of these values here here
25
00:01:49,120 --> 00:01:53,130
here here and sampling would just be the sum of these values.
26
00:01:53,460 --> 00:01:55,090
So it's also a way we can use pooling.
27
00:01:55,090 --> 00:02:01,900
However in majority of convolutional neural nets we always use maximally.
28
00:02:01,940 --> 00:02:04,740
So this is only so far just to do a recap.
29
00:02:04,880 --> 00:02:10,370
We have an input image with our key and all that is being slid across this image producing multiple
30
00:02:10,370 --> 00:02:11,380
different filters here.
31
00:02:11,450 --> 00:02:15,920
All of it seems much of the same size as the input image and that's because of zero padding.
32
00:02:16,250 --> 00:02:22,430
Then we have a real output which basically is the same size up of matrix as this except all the negative
33
00:02:22,430 --> 00:02:23,850
values into zeros.
34
00:02:24,230 --> 00:02:30,470
And then we have the subsampling are pulling away a lot downsampling which basically reduces this image.
35
00:02:30,530 --> 00:02:37,220
This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four
36
00:02:37,360 --> 00:02:41,570
and we get a two by two and that's still 12 filters.
37
00:02:41,750 --> 00:02:44,540
However they have not been downsampled.
38
00:02:44,540 --> 00:02:45,880
So let's move on now.
39
00:02:46,310 --> 00:02:52,100
So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight
40
00:02:52,100 --> 00:02:54,540
of two and no padding applied.
41
00:02:54,560 --> 00:02:58,280
That's how we actually get this four by four here.
42
00:02:58,280 --> 00:03:01,920
It takes a two by two jump to make two jump and blah blah blah.
43
00:03:04,060 --> 00:03:08,170
So for smaller and put images or larger images we can use larger pools.
44
00:03:09,020 --> 00:03:14,530
Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing
45
00:03:14,530 --> 00:03:16,890
dimensionality width and height.
46
00:03:16,930 --> 00:03:18,330
Those are the only two dimensions we have.
47
00:03:18,340 --> 00:03:22,150
We reduce the of the previous layer by half.
48
00:03:22,330 --> 00:03:26,950
And to us removing tree quarter or 75 percent of the activations seen in the previously
49
00:03:31,290 --> 00:03:32,940
so keep moving on.
50
00:03:32,940 --> 00:03:39,470
This makes our model more invariant to small or minor transformations or distortions no input image.
51
00:03:39,570 --> 00:03:45,000
Since we're now averaging or taking to max or put from a small area of an image what this actually means
52
00:03:45,000 --> 00:03:51,020
is that we're instead of looking at specific pixels here in an image because we're actually dwindling
53
00:03:51,050 --> 00:03:57,480
sample and looking at a max in an area we sort of add some sort of variance or spatial variance too
54
00:03:57,480 --> 00:03:58,150
awful to say.
55
00:03:58,170 --> 00:04:04,800
So if filters on super specific to certain areas and I remember they being slid across the image.
56
00:04:04,800 --> 00:04:10,410
So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually
57
00:04:11,310 --> 00:04:13,160
add some invariants Now to it.
58
00:04:13,170 --> 00:04:20,490
So this actually increases do basically the ability of all convolutional model to generalize to information
59
00:04:20,490 --> 00:04:21,790
is never seen before.
60
00:04:23,540 --> 00:04:29,450
So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them
61
00:04:29,450 --> 00:04:30,250
later on.
62
00:04:30,410 --> 00:04:35,990
But for now seeing is of course to CNN and this is the last layer to fully connected.
63
00:04:36,030 --> 00:04:36,730
FCPA.