| 1 | |
| 00:00:01,080 --> 00:00:08,280 | |
| And welcome back to 7.5 which is about pooling This is the next sequence of leads in our CNN so far | |
| 2 | |
| 00:00:08,490 --> 00:00:15,240 | |
| we've dealt with convolutional nearly the convolution part and real you know let's look at pulling all | |
| 3 | |
| 00:00:15,420 --> 00:00:22,660 | |
| those Monas subsampling spooling as I just said assaults and then a subsampling or downsampling is a | |
| 4 | |
| 00:00:22,660 --> 00:00:27,170 | |
| simple process where we reduce the size or dimensionality of the future map. | |
| 5 | |
| 00:00:27,280 --> 00:00:31,690 | |
| The purpose of this reductionists reduced number of parameters that we need to train whilst retaining | |
| 6 | |
| 00:00:31,690 --> 00:00:36,670 | |
| most of the important features and information in the image. | |
| 7 | |
| 00:00:36,870 --> 00:00:39,100 | |
| They are basically tree types of pooling we can apply. | |
| 8 | |
| 00:00:39,100 --> 00:00:43,800 | |
| There are actually some Wolper does take a look at these Tree Man types that are used. | |
| 9 | |
| 00:00:43,870 --> 00:00:46,250 | |
| So here's an example of Max pooling. | |
| 10 | |
| 00:00:46,300 --> 00:00:52,900 | |
| Imagine this is the really outputs from all this input output here was reproduced from the real real | |
| 11 | |
| 00:00:52,940 --> 00:00:53,510 | |
| layer. | |
| 12 | |
| 00:00:53,800 --> 00:00:57,430 | |
| So you can imagine these values at the zeros here were actually negative values. | |
| 13 | |
| 00:00:57,820 --> 00:01:03,790 | |
| So Max bhool basically uses a two by two Kial here we can define the screen size anything we want just | |
| 14 | |
| 00:01:03,790 --> 00:01:09,520 | |
| like we did with the straight and of the kernels we used in the convolutional Liya and basically using | |
| 15 | |
| 00:01:09,520 --> 00:01:10,530 | |
| a two by two. | |
| 16 | |
| 00:01:10,600 --> 00:01:15,250 | |
| It splits up into two by two two by two two by two by two grid. | |
| 17 | |
| 00:01:15,580 --> 00:01:24,190 | |
| So what it does Max beling takes it massively out of each tutelary for 167 2:41 and 235 and puts them | |
| 18 | |
| 00:01:24,190 --> 00:01:25,380 | |
| into this block here. | |
| 19 | |
| 00:01:25,750 --> 00:01:29,270 | |
| So this is what we call downsampling or subsampling. | |
| 20 | |
| 00:01:29,320 --> 00:01:35,440 | |
| Basically we have sort of like compressed the image here and retain the most Max important features | |
| 21 | |
| 00:01:36,680 --> 00:01:37,470 | |
| actually. | |
| 22 | |
| 00:01:37,470 --> 00:01:40,160 | |
| Let's go back to the previous slide and previously. | |
| 23 | |
| 00:01:40,210 --> 00:01:42,810 | |
| We mentioned average and sampling. | |
| 24 | |
| 00:01:42,850 --> 00:01:48,850 | |
| Now as you can imagine average and sampling would just simply be the average of these values here here | |
| 25 | |
| 00:01:49,120 --> 00:01:53,130 | |
| here here and sampling would just be the sum of these values. | |
| 26 | |
| 00:01:53,460 --> 00:01:55,090 | |
| So it's also a way we can use pooling. | |
| 27 | |
| 00:01:55,090 --> 00:02:01,900 | |
| However in majority of convolutional neural nets we always use maximally. | |
| 28 | |
| 00:02:01,940 --> 00:02:04,740 | |
| So this is only so far just to do a recap. | |
| 29 | |
| 00:02:04,880 --> 00:02:10,370 | |
| We have an input image with our key and all that is being slid across this image producing multiple | |
| 30 | |
| 00:02:10,370 --> 00:02:11,380 | |
| different filters here. | |
| 31 | |
| 00:02:11,450 --> 00:02:15,920 | |
| All of it seems much of the same size as the input image and that's because of zero padding. | |
| 32 | |
| 00:02:16,250 --> 00:02:22,430 | |
| Then we have a real output which basically is the same size up of matrix as this except all the negative | |
| 33 | |
| 00:02:22,430 --> 00:02:23,850 | |
| values into zeros. | |
| 34 | |
| 00:02:24,230 --> 00:02:30,470 | |
| And then we have the subsampling are pulling away a lot downsampling which basically reduces this image. | |
| 35 | |
| 00:02:30,530 --> 00:02:37,220 | |
| This Sorry this matrix by half 14 by 14 because as you can see using a two by two we have four by four | |
| 36 | |
| 00:02:37,360 --> 00:02:41,570 | |
| and we get a two by two and that's still 12 filters. | |
| 37 | |
| 00:02:41,750 --> 00:02:44,540 | |
| However they have not been downsampled. | |
| 38 | |
| 00:02:44,540 --> 00:02:45,880 | |
| So let's move on now. | |
| 39 | |
| 00:02:46,310 --> 00:02:52,100 | |
| So let's talk a bit more about pooling typically pooling is done using two by two windows with a straight | |
| 40 | |
| 00:02:52,100 --> 00:02:54,540 | |
| of two and no padding applied. | |
| 41 | |
| 00:02:54,560 --> 00:02:58,280 | |
| That's how we actually get this four by four here. | |
| 42 | |
| 00:02:58,280 --> 00:03:01,920 | |
| It takes a two by two jump to make two jump and blah blah blah. | |
| 43 | |
| 00:03:04,060 --> 00:03:08,170 | |
| So for smaller and put images or larger images we can use larger pools. | |
| 44 | |
| 00:03:09,020 --> 00:03:14,530 | |
| Or smaller pools whichever you want to do and using the above settings pooling has the effect of reducing | |
| 45 | |
| 00:03:14,530 --> 00:03:16,890 | |
| dimensionality width and height. | |
| 46 | |
| 00:03:16,930 --> 00:03:18,330 | |
| Those are the only two dimensions we have. | |
| 47 | |
| 00:03:18,340 --> 00:03:22,150 | |
| We reduce the of the previous layer by half. | |
| 48 | |
| 00:03:22,330 --> 00:03:26,950 | |
| And to us removing tree quarter or 75 percent of the activations seen in the previously | |
| 49 | |
| 00:03:31,290 --> 00:03:32,940 | |
| so keep moving on. | |
| 50 | |
| 00:03:32,940 --> 00:03:39,470 | |
| This makes our model more invariant to small or minor transformations or distortions no input image. | |
| 51 | |
| 00:03:39,570 --> 00:03:45,000 | |
| Since we're now averaging or taking to max or put from a small area of an image what this actually means | |
| 52 | |
| 00:03:45,000 --> 00:03:51,020 | |
| is that we're instead of looking at specific pixels here in an image because we're actually dwindling | |
| 53 | |
| 00:03:51,050 --> 00:03:57,480 | |
| sample and looking at a max in an area we sort of add some sort of variance or spatial variance too | |
| 54 | |
| 00:03:57,480 --> 00:03:58,150 | |
| awful to say. | |
| 55 | |
| 00:03:58,170 --> 00:04:04,800 | |
| So if filters on super specific to certain areas and I remember they being slid across the image. | |
| 56 | |
| 00:04:04,800 --> 00:04:10,410 | |
| So imagine this filter have been Slackware's image looking for a specific edge or whatever it can actually | |
| 57 | |
| 00:04:11,310 --> 00:04:13,160 | |
| add some invariants Now to it. | |
| 58 | |
| 00:04:13,170 --> 00:04:20,490 | |
| So this actually increases do basically the ability of all convolutional model to generalize to information | |
| 59 | |
| 00:04:20,490 --> 00:04:21,790 | |
| is never seen before. | |
| 60 | |
| 00:04:23,540 --> 00:04:29,450 | |
| So now let's move on to what is kind of to finally the awesomely as in-between of you discussed them | |
| 61 | |
| 00:04:29,450 --> 00:04:30,250 | |
| later on. | |
| 62 | |
| 00:04:30,410 --> 00:04:35,990 | |
| But for now seeing is of course to CNN and this is the last layer to fully connected. | |
| 63 | |
| 00:04:36,030 --> 00:04:36,730 | |
| FCPA. | |