AI_DL_Assignment / 6. Neural Networks Explained /5. Activation Functions.srt
Prince-1's picture
Add files using upload-large-folder tool
17e2002 verified
1
00:00:00,900 --> 00:00:04,640
So just do a quick recap on the last section.
2
00:00:05,070 --> 00:00:05,520
All right.
3
00:00:05,580 --> 00:00:11,550
We saw basically that we learned the bias trick in how we actually make this matrix calculation much
4
00:00:11,550 --> 00:00:12,850
simpler and faster.
5
00:00:12,890 --> 00:00:16,630
And that's we talked about how we got to wait's calculations here.
6
00:00:17,010 --> 00:00:23,190
And we talked about basically it being a series of linear equations now know that Q With it is linear.
7
00:00:23,240 --> 00:00:30,390
Now if you come from a machine learning background you're wondering what makes us different to regressions
8
00:00:30,450 --> 00:00:35,430
or logistic directions as Reems which also a series of linear equations.
9
00:00:35,430 --> 00:00:38,450
Well let's get into that in this chapter right now.
10
00:00:40,230 --> 00:00:42,940
So now let's talk about activation functions.
11
00:00:43,230 --> 00:00:51,350
So should you Enric in India but notice that it was just input value multiplied by the weights Plus
12
00:00:51,360 --> 00:00:52,480
the biased.
13
00:00:52,500 --> 00:00:56,370
And some of those that goes into the node is what produces the output.
14
00:00:56,370 --> 00:01:06,180
However in reality yes that is true but it's these w x y plus beta and Pitsea actually passed into an
15
00:01:06,180 --> 00:01:07,670
activation function.
16
00:01:07,980 --> 00:01:14,190
And basically the simplest an activation function is that simple this type of activation function is
17
00:01:14,190 --> 00:01:16,740
that this one here Max zero.
18
00:01:16,830 --> 00:01:17,560
Excellent.
19
00:01:17,940 --> 00:01:23,460
What this means is that any value over Zero is going to be passed.
20
00:01:23,550 --> 00:01:24,470
Right.
21
00:01:24,870 --> 00:01:26,660
Now I'll tell you why this is important later.
22
00:01:26,700 --> 00:01:29,000
But just imagine we have a function a max function here.
23
00:01:29,410 --> 00:01:31,590
And we have this way it's going into it.
24
00:01:31,620 --> 00:01:34,480
So let's see where we it's what we got point eight eight five.
25
00:01:34,490 --> 00:01:35,880
What each do believe.
26
00:01:36,320 --> 00:01:41,520
That's a CMO we had six of us point seventy five this next election because it's crucial that again
27
00:01:41,520 --> 00:01:46,090
zero which means that anything below zero will be negated.
28
00:01:46,470 --> 00:01:47,570
And you see that here.
29
00:01:47,670 --> 00:01:49,610
However anything above you will be allowed.
30
00:01:49,710 --> 00:01:53,250
So in this example here point 75 is allowed.
31
00:01:53,250 --> 00:02:00,420
However if the weeds give us a negative value which can happen easily then ethics is clamped and zero.
32
00:02:00,750 --> 00:02:01,760
It isn't 6.5.
33
00:02:01,770 --> 00:02:03,310
It doesn't take any positive value.
34
00:02:03,440 --> 00:02:04,790
It just just becomes zero.
35
00:02:05,010 --> 00:02:08,430
With this activation function.
36
00:02:08,530 --> 00:02:11,320
So let's talk about the real activation function.
37
00:02:11,580 --> 00:02:17,780
Reduced handsful rectified ling a unit and it probably is the most common activation function used in
38
00:02:17,800 --> 00:02:21,370
CNN's and your own that's now the appearance is quite simple.
39
00:02:21,410 --> 00:02:23,360
You just take a look at it.
40
00:02:23,420 --> 00:02:24,840
Basically it's this.
41
00:02:24,970 --> 00:02:27,180
It clamps a negative value to zero.
42
00:02:27,560 --> 00:02:32,950
So there's no negative value is being passed going forward and what value is basically left alone.
43
00:02:33,410 --> 00:02:37,200
So default value is X which is always positive.
44
00:02:37,200 --> 00:02:42,660
So it's basically this really.
45
00:02:42,730 --> 00:02:45,520
So why do we need these activation functions.
46
00:02:45,580 --> 00:02:47,550
Remember I said if you come from.
47
00:02:47,650 --> 00:02:54,040
Coming from a machine learning Becchio you're thinking so far with activation functions and that's basically
48
00:02:54,190 --> 00:02:58,880
just another form of basically not a model linear model.
49
00:02:59,000 --> 00:03:06,820
However having activation functions allows us to basically expressed the linearity relationships in
50
00:03:06,820 --> 00:03:07,700
this data.
51
00:03:08,200 --> 00:03:11,580
Most of my models have a tough time dealing with this.
52
00:03:11,650 --> 00:03:12,070
All right.
53
00:03:12,070 --> 00:03:18,870
So the whole huge advantage of deepening is its ability to understand nonlinear models.
54
00:03:19,480 --> 00:03:21,360
And what do we mean by linear.
55
00:03:21,360 --> 00:03:24,100
Now imagine we have two categories of data.
56
00:03:24,140 --> 00:03:33,980
Imagine just for expressions like this is cats dogs and no linear separate Rubell data will be easily.
57
00:03:34,110 --> 00:03:40,970
This is called The Decision boundary will be easily separated here however the non linear is linearly.
58
00:03:40,990 --> 00:03:42,020
Sorry about that.
59
00:03:42,550 --> 00:03:48,100
None linearly separable little is going to require a much more complicated function to do.
60
00:03:48,490 --> 00:03:49,360
Now we can use.
61
00:03:49,360 --> 00:03:55,420
You may have been thinking we can use complicated polynomial decision about your type functions to get
62
00:03:55,420 --> 00:03:58,440
this but what about if it was more than two dimensions.
63
00:03:58,450 --> 00:04:01,520
Imagine how complicated that function is going to be.
64
00:04:01,840 --> 00:04:09,100
That's going to separate us data and introducing activation functions which allows us to basically expressed
65
00:04:09,100 --> 00:04:13,810
a non linear relationship and data is what makes that so powerful.
66
00:04:14,440 --> 00:04:15,590
So there are a few other types.
67
00:04:15,600 --> 00:04:21,390
Activation functions some of which we may use in the source we definitely use rectify Lyndie and it's
68
00:04:21,400 --> 00:04:22,530
quite a bit.
69
00:04:22,610 --> 00:04:29,230
Sometimes we use sigmoid sometimes use hyperbolic tangent but rarely ever use those and we use those
70
00:04:29,560 --> 00:04:32,040
in specialized cases which you'll see later on.
71
00:04:35,180 --> 00:04:42,930
So as before mentioned in the previous chapter we have be obeyed of course by its functions.
72
00:04:42,930 --> 00:04:45,640
So why do we need by assumptions.
73
00:04:45,770 --> 00:04:52,670
You know you know training right now because you may be thinking we can pretty much make it anything.
74
00:04:53,040 --> 00:04:54,940
However that doesn't change much.
75
00:04:55,110 --> 00:04:57,600
It's simply changed a gradient.
76
00:04:57,600 --> 00:05:04,260
If you're from a mathematical background he would know any value multiplied by x increases the steepness
77
00:05:04,380 --> 00:05:05,940
of the values going in.
78
00:05:06,000 --> 00:05:07,170
That's what this is here.
79
00:05:07,230 --> 00:05:13,280
So you can see that it seemed this way it was point 5 1 2 and the sigmoid function here.
80
00:05:13,770 --> 00:05:16,100
So you can actually see point five.
81
00:05:16,100 --> 00:05:17,510
It's not that steep.
82
00:05:17,750 --> 00:05:21,620
One gets deeper and definitely had to it's super steep here.
83
00:05:22,230 --> 00:05:30,280
However what if we wanted to shift left and right we can do that without changing by changing the way
84
00:05:30,420 --> 00:05:31,530
in front of the Exa.
85
00:05:31,800 --> 00:05:37,320
But we can do it by adding a value to it adding a value actually shifts at the left and right here as
86
00:05:37,320 --> 00:05:38,550
you can see.
87
00:05:39,210 --> 00:05:46,830
So now we have the weights here constant weights going into X plus a constant value here 5 minus 1 0
88
00:05:47,190 --> 00:05:48,070
5.
89
00:05:48,180 --> 00:05:54,280
So we can actually used we it's about sort of bias values to actually shifted left and right.
90
00:05:55,740 --> 00:06:01,990
So let's talk a bit about in your own inspiration behind your own that's why the way.
91
00:06:02,000 --> 00:06:03,680
Why is it called you and that's in the first place.
92
00:06:03,690 --> 00:06:05,960
Is it a replication of a brain.
93
00:06:05,970 --> 00:06:06,840
Not quite.
94
00:06:06,840 --> 00:06:15,360
However activation units basically act make these hidden units these notes act like neurons because
95
00:06:16,020 --> 00:06:22,950
if you know how neurons work neurons receive inputs along these lines here believe dical dendrites I
96
00:06:22,950 --> 00:06:23,500
think.
97
00:06:23,940 --> 00:06:29,550
And basically when these we just sit and value trouble the neuron fires and neurons connected to many
98
00:06:29,700 --> 00:06:35,250
different neurons who then received these inputs from this hearing that is a very similar analogy to
99
00:06:35,250 --> 00:06:40,980
how neural nets work when a value input value crosses that threshold it fails.
100
00:06:43,510 --> 00:06:48,300
And let's talk about how deep in deep learning what do we mean by deep.
101
00:06:48,640 --> 00:06:53,770
Everyone says deep listening but no one actually tells you what deep this deep is actually the number
102
00:06:53,770 --> 00:06:55,290
of hidden layers here.
103
00:06:55,750 --> 00:07:00,710
So this is a nice example illustration of neuro that here.
104
00:07:01,210 --> 00:07:03,970
These are the connections with our heads and biases.
105
00:07:03,970 --> 00:07:10,540
These are the nodes here and this one has one two three four five six seven eight hidden layers here.
106
00:07:10,940 --> 00:07:12,800
As you can probably count this one do as well.
107
00:07:12,850 --> 00:07:14,320
Sometimes they do.
108
00:07:14,410 --> 00:07:17,530
So this isn't all the hidden layers here that we don't see.
109
00:07:17,530 --> 00:07:23,650
And I imagine now we have millions millions of thousands of these like parameters and all the stuff
110
00:07:23,650 --> 00:07:29,850
here that gives you an example of how complicated your own skin actually get especially deep neural
111
00:07:29,870 --> 00:07:31,660
nets.
112
00:07:31,990 --> 00:07:40,210
Now when do we need to be deep deep actually helped to represent non-linear representations in the data
113
00:07:40,270 --> 00:07:41,390
quite well.
114
00:07:41,800 --> 00:07:49,070
So if you want to actually train a complicated network it's always better to go deep rather than shallow.
115
00:07:49,600 --> 00:07:55,780
However deeper it is not always easier and better to work with deeper networks require a lot more training
116
00:07:55,780 --> 00:07:56,430
time.
117
00:07:56,680 --> 00:07:59,700
And sometimes it tend to fit the input data.
118
00:07:59,710 --> 00:08:02,720
No I haven't introduced the concept of overfitting yet.
119
00:08:02,950 --> 00:08:05,320
However we will deal with it very shortly.
120
00:08:07,690 --> 00:08:13,300
So like I said your real magic of your own that happens during training because we see that it's very
121
00:08:13,300 --> 00:08:18,280
simple to execute a tree and you know that when you don't it but when it's when I mean execute I mean
122
00:08:18,670 --> 00:08:19,810
pasand but do it and get it.
123
00:08:19,840 --> 00:08:23,440
But but how exactly do we get these weights and biases.
124
00:08:23,440 --> 00:08:25,630
This is exactly what we want to do.
125
00:08:26,020 --> 00:08:28,820
And I haven't told you about it yet so let's find out.