| 1 | |
| 00:00:00,900 --> 00:00:04,640 | |
| So just do a quick recap on the last section. | |
| 2 | |
| 00:00:05,070 --> 00:00:05,520 | |
| All right. | |
| 3 | |
| 00:00:05,580 --> 00:00:11,550 | |
| We saw basically that we learned the bias trick in how we actually make this matrix calculation much | |
| 4 | |
| 00:00:11,550 --> 00:00:12,850 | |
| simpler and faster. | |
| 5 | |
| 00:00:12,890 --> 00:00:16,630 | |
| And that's we talked about how we got to wait's calculations here. | |
| 6 | |
| 00:00:17,010 --> 00:00:23,190 | |
| And we talked about basically it being a series of linear equations now know that Q With it is linear. | |
| 7 | |
| 00:00:23,240 --> 00:00:30,390 | |
| Now if you come from a machine learning background you're wondering what makes us different to regressions | |
| 8 | |
| 00:00:30,450 --> 00:00:35,430 | |
| or logistic directions as Reems which also a series of linear equations. | |
| 9 | |
| 00:00:35,430 --> 00:00:38,450 | |
| Well let's get into that in this chapter right now. | |
| 10 | |
| 00:00:40,230 --> 00:00:42,940 | |
| So now let's talk about activation functions. | |
| 11 | |
| 00:00:43,230 --> 00:00:51,350 | |
| So should you Enric in India but notice that it was just input value multiplied by the weights Plus | |
| 12 | |
| 00:00:51,360 --> 00:00:52,480 | |
| the biased. | |
| 13 | |
| 00:00:52,500 --> 00:00:56,370 | |
| And some of those that goes into the node is what produces the output. | |
| 14 | |
| 00:00:56,370 --> 00:01:06,180 | |
| However in reality yes that is true but it's these w x y plus beta and Pitsea actually passed into an | |
| 15 | |
| 00:01:06,180 --> 00:01:07,670 | |
| activation function. | |
| 16 | |
| 00:01:07,980 --> 00:01:14,190 | |
| And basically the simplest an activation function is that simple this type of activation function is | |
| 17 | |
| 00:01:14,190 --> 00:01:16,740 | |
| that this one here Max zero. | |
| 18 | |
| 00:01:16,830 --> 00:01:17,560 | |
| Excellent. | |
| 19 | |
| 00:01:17,940 --> 00:01:23,460 | |
| What this means is that any value over Zero is going to be passed. | |
| 20 | |
| 00:01:23,550 --> 00:01:24,470 | |
| Right. | |
| 21 | |
| 00:01:24,870 --> 00:01:26,660 | |
| Now I'll tell you why this is important later. | |
| 22 | |
| 00:01:26,700 --> 00:01:29,000 | |
| But just imagine we have a function a max function here. | |
| 23 | |
| 00:01:29,410 --> 00:01:31,590 | |
| And we have this way it's going into it. | |
| 24 | |
| 00:01:31,620 --> 00:01:34,480 | |
| So let's see where we it's what we got point eight eight five. | |
| 25 | |
| 00:01:34,490 --> 00:01:35,880 | |
| What each do believe. | |
| 26 | |
| 00:01:36,320 --> 00:01:41,520 | |
| That's a CMO we had six of us point seventy five this next election because it's crucial that again | |
| 27 | |
| 00:01:41,520 --> 00:01:46,090 | |
| zero which means that anything below zero will be negated. | |
| 28 | |
| 00:01:46,470 --> 00:01:47,570 | |
| And you see that here. | |
| 29 | |
| 00:01:47,670 --> 00:01:49,610 | |
| However anything above you will be allowed. | |
| 30 | |
| 00:01:49,710 --> 00:01:53,250 | |
| So in this example here point 75 is allowed. | |
| 31 | |
| 00:01:53,250 --> 00:02:00,420 | |
| However if the weeds give us a negative value which can happen easily then ethics is clamped and zero. | |
| 32 | |
| 00:02:00,750 --> 00:02:01,760 | |
| It isn't 6.5. | |
| 33 | |
| 00:02:01,770 --> 00:02:03,310 | |
| It doesn't take any positive value. | |
| 34 | |
| 00:02:03,440 --> 00:02:04,790 | |
| It just just becomes zero. | |
| 35 | |
| 00:02:05,010 --> 00:02:08,430 | |
| With this activation function. | |
| 36 | |
| 00:02:08,530 --> 00:02:11,320 | |
| So let's talk about the real activation function. | |
| 37 | |
| 00:02:11,580 --> 00:02:17,780 | |
| Reduced handsful rectified ling a unit and it probably is the most common activation function used in | |
| 38 | |
| 00:02:17,800 --> 00:02:21,370 | |
| CNN's and your own that's now the appearance is quite simple. | |
| 39 | |
| 00:02:21,410 --> 00:02:23,360 | |
| You just take a look at it. | |
| 40 | |
| 00:02:23,420 --> 00:02:24,840 | |
| Basically it's this. | |
| 41 | |
| 00:02:24,970 --> 00:02:27,180 | |
| It clamps a negative value to zero. | |
| 42 | |
| 00:02:27,560 --> 00:02:32,950 | |
| So there's no negative value is being passed going forward and what value is basically left alone. | |
| 43 | |
| 00:02:33,410 --> 00:02:37,200 | |
| So default value is X which is always positive. | |
| 44 | |
| 00:02:37,200 --> 00:02:42,660 | |
| So it's basically this really. | |
| 45 | |
| 00:02:42,730 --> 00:02:45,520 | |
| So why do we need these activation functions. | |
| 46 | |
| 00:02:45,580 --> 00:02:47,550 | |
| Remember I said if you come from. | |
| 47 | |
| 00:02:47,650 --> 00:02:54,040 | |
| Coming from a machine learning Becchio you're thinking so far with activation functions and that's basically | |
| 48 | |
| 00:02:54,190 --> 00:02:58,880 | |
| just another form of basically not a model linear model. | |
| 49 | |
| 00:02:59,000 --> 00:03:06,820 | |
| However having activation functions allows us to basically expressed the linearity relationships in | |
| 50 | |
| 00:03:06,820 --> 00:03:07,700 | |
| this data. | |
| 51 | |
| 00:03:08,200 --> 00:03:11,580 | |
| Most of my models have a tough time dealing with this. | |
| 52 | |
| 00:03:11,650 --> 00:03:12,070 | |
| All right. | |
| 53 | |
| 00:03:12,070 --> 00:03:18,870 | |
| So the whole huge advantage of deepening is its ability to understand nonlinear models. | |
| 54 | |
| 00:03:19,480 --> 00:03:21,360 | |
| And what do we mean by linear. | |
| 55 | |
| 00:03:21,360 --> 00:03:24,100 | |
| Now imagine we have two categories of data. | |
| 56 | |
| 00:03:24,140 --> 00:03:33,980 | |
| Imagine just for expressions like this is cats dogs and no linear separate Rubell data will be easily. | |
| 57 | |
| 00:03:34,110 --> 00:03:40,970 | |
| This is called The Decision boundary will be easily separated here however the non linear is linearly. | |
| 58 | |
| 00:03:40,990 --> 00:03:42,020 | |
| Sorry about that. | |
| 59 | |
| 00:03:42,550 --> 00:03:48,100 | |
| None linearly separable little is going to require a much more complicated function to do. | |
| 60 | |
| 00:03:48,490 --> 00:03:49,360 | |
| Now we can use. | |
| 61 | |
| 00:03:49,360 --> 00:03:55,420 | |
| You may have been thinking we can use complicated polynomial decision about your type functions to get | |
| 62 | |
| 00:03:55,420 --> 00:03:58,440 | |
| this but what about if it was more than two dimensions. | |
| 63 | |
| 00:03:58,450 --> 00:04:01,520 | |
| Imagine how complicated that function is going to be. | |
| 64 | |
| 00:04:01,840 --> 00:04:09,100 | |
| That's going to separate us data and introducing activation functions which allows us to basically expressed | |
| 65 | |
| 00:04:09,100 --> 00:04:13,810 | |
| a non linear relationship and data is what makes that so powerful. | |
| 66 | |
| 00:04:14,440 --> 00:04:15,590 | |
| So there are a few other types. | |
| 67 | |
| 00:04:15,600 --> 00:04:21,390 | |
| Activation functions some of which we may use in the source we definitely use rectify Lyndie and it's | |
| 68 | |
| 00:04:21,400 --> 00:04:22,530 | |
| quite a bit. | |
| 69 | |
| 00:04:22,610 --> 00:04:29,230 | |
| Sometimes we use sigmoid sometimes use hyperbolic tangent but rarely ever use those and we use those | |
| 70 | |
| 00:04:29,560 --> 00:04:32,040 | |
| in specialized cases which you'll see later on. | |
| 71 | |
| 00:04:35,180 --> 00:04:42,930 | |
| So as before mentioned in the previous chapter we have be obeyed of course by its functions. | |
| 72 | |
| 00:04:42,930 --> 00:04:45,640 | |
| So why do we need by assumptions. | |
| 73 | |
| 00:04:45,770 --> 00:04:52,670 | |
| You know you know training right now because you may be thinking we can pretty much make it anything. | |
| 74 | |
| 00:04:53,040 --> 00:04:54,940 | |
| However that doesn't change much. | |
| 75 | |
| 00:04:55,110 --> 00:04:57,600 | |
| It's simply changed a gradient. | |
| 76 | |
| 00:04:57,600 --> 00:05:04,260 | |
| If you're from a mathematical background he would know any value multiplied by x increases the steepness | |
| 77 | |
| 00:05:04,380 --> 00:05:05,940 | |
| of the values going in. | |
| 78 | |
| 00:05:06,000 --> 00:05:07,170 | |
| That's what this is here. | |
| 79 | |
| 00:05:07,230 --> 00:05:13,280 | |
| So you can see that it seemed this way it was point 5 1 2 and the sigmoid function here. | |
| 80 | |
| 00:05:13,770 --> 00:05:16,100 | |
| So you can actually see point five. | |
| 81 | |
| 00:05:16,100 --> 00:05:17,510 | |
| It's not that steep. | |
| 82 | |
| 00:05:17,750 --> 00:05:21,620 | |
| One gets deeper and definitely had to it's super steep here. | |
| 83 | |
| 00:05:22,230 --> 00:05:30,280 | |
| However what if we wanted to shift left and right we can do that without changing by changing the way | |
| 84 | |
| 00:05:30,420 --> 00:05:31,530 | |
| in front of the Exa. | |
| 85 | |
| 00:05:31,800 --> 00:05:37,320 | |
| But we can do it by adding a value to it adding a value actually shifts at the left and right here as | |
| 86 | |
| 00:05:37,320 --> 00:05:38,550 | |
| you can see. | |
| 87 | |
| 00:05:39,210 --> 00:05:46,830 | |
| So now we have the weights here constant weights going into X plus a constant value here 5 minus 1 0 | |
| 88 | |
| 00:05:47,190 --> 00:05:48,070 | |
| 5. | |
| 89 | |
| 00:05:48,180 --> 00:05:54,280 | |
| So we can actually used we it's about sort of bias values to actually shifted left and right. | |
| 90 | |
| 00:05:55,740 --> 00:06:01,990 | |
| So let's talk a bit about in your own inspiration behind your own that's why the way. | |
| 91 | |
| 00:06:02,000 --> 00:06:03,680 | |
| Why is it called you and that's in the first place. | |
| 92 | |
| 00:06:03,690 --> 00:06:05,960 | |
| Is it a replication of a brain. | |
| 93 | |
| 00:06:05,970 --> 00:06:06,840 | |
| Not quite. | |
| 94 | |
| 00:06:06,840 --> 00:06:15,360 | |
| However activation units basically act make these hidden units these notes act like neurons because | |
| 95 | |
| 00:06:16,020 --> 00:06:22,950 | |
| if you know how neurons work neurons receive inputs along these lines here believe dical dendrites I | |
| 96 | |
| 00:06:22,950 --> 00:06:23,500 | |
| think. | |
| 97 | |
| 00:06:23,940 --> 00:06:29,550 | |
| And basically when these we just sit and value trouble the neuron fires and neurons connected to many | |
| 98 | |
| 00:06:29,700 --> 00:06:35,250 | |
| different neurons who then received these inputs from this hearing that is a very similar analogy to | |
| 99 | |
| 00:06:35,250 --> 00:06:40,980 | |
| how neural nets work when a value input value crosses that threshold it fails. | |
| 100 | |
| 00:06:43,510 --> 00:06:48,300 | |
| And let's talk about how deep in deep learning what do we mean by deep. | |
| 101 | |
| 00:06:48,640 --> 00:06:53,770 | |
| Everyone says deep listening but no one actually tells you what deep this deep is actually the number | |
| 102 | |
| 00:06:53,770 --> 00:06:55,290 | |
| of hidden layers here. | |
| 103 | |
| 00:06:55,750 --> 00:07:00,710 | |
| So this is a nice example illustration of neuro that here. | |
| 104 | |
| 00:07:01,210 --> 00:07:03,970 | |
| These are the connections with our heads and biases. | |
| 105 | |
| 00:07:03,970 --> 00:07:10,540 | |
| These are the nodes here and this one has one two three four five six seven eight hidden layers here. | |
| 106 | |
| 00:07:10,940 --> 00:07:12,800 | |
| As you can probably count this one do as well. | |
| 107 | |
| 00:07:12,850 --> 00:07:14,320 | |
| Sometimes they do. | |
| 108 | |
| 00:07:14,410 --> 00:07:17,530 | |
| So this isn't all the hidden layers here that we don't see. | |
| 109 | |
| 00:07:17,530 --> 00:07:23,650 | |
| And I imagine now we have millions millions of thousands of these like parameters and all the stuff | |
| 110 | |
| 00:07:23,650 --> 00:07:29,850 | |
| here that gives you an example of how complicated your own skin actually get especially deep neural | |
| 111 | |
| 00:07:29,870 --> 00:07:31,660 | |
| nets. | |
| 112 | |
| 00:07:31,990 --> 00:07:40,210 | |
| Now when do we need to be deep deep actually helped to represent non-linear representations in the data | |
| 113 | |
| 00:07:40,270 --> 00:07:41,390 | |
| quite well. | |
| 114 | |
| 00:07:41,800 --> 00:07:49,070 | |
| So if you want to actually train a complicated network it's always better to go deep rather than shallow. | |
| 115 | |
| 00:07:49,600 --> 00:07:55,780 | |
| However deeper it is not always easier and better to work with deeper networks require a lot more training | |
| 116 | |
| 00:07:55,780 --> 00:07:56,430 | |
| time. | |
| 117 | |
| 00:07:56,680 --> 00:07:59,700 | |
| And sometimes it tend to fit the input data. | |
| 118 | |
| 00:07:59,710 --> 00:08:02,720 | |
| No I haven't introduced the concept of overfitting yet. | |
| 119 | |
| 00:08:02,950 --> 00:08:05,320 | |
| However we will deal with it very shortly. | |
| 120 | |
| 00:08:07,690 --> 00:08:13,300 | |
| So like I said your real magic of your own that happens during training because we see that it's very | |
| 121 | |
| 00:08:13,300 --> 00:08:18,280 | |
| simple to execute a tree and you know that when you don't it but when it's when I mean execute I mean | |
| 122 | |
| 00:08:18,670 --> 00:08:19,810 | |
| pasand but do it and get it. | |
| 123 | |
| 00:08:19,840 --> 00:08:23,440 | |
| But but how exactly do we get these weights and biases. | |
| 124 | |
| 00:08:23,440 --> 00:08:25,630 | |
| This is exactly what we want to do. | |
| 125 | |
| 00:08:26,020 --> 00:08:28,820 | |
| And I haven't told you about it yet so let's find out. | |