Add files using upload-large-folder tool

17e2002 verified 3 months ago

12.1 kB

	1
	00:00:00,900 --> 00:00:04,640
	So just do a quick recap on the last section.

	2
	00:00:05,070 --> 00:00:05,520
	All right.

	3
	00:00:05,580 --> 00:00:11,550
	We saw basically that we learned the bias trick in how we actually make this matrix calculation much

	4
	00:00:11,550 --> 00:00:12,850
	simpler and faster.

	5
	00:00:12,890 --> 00:00:16,630
	And that's we talked about how we got to wait's calculations here.

	6
	00:00:17,010 --> 00:00:23,190
	And we talked about basically it being a series of linear equations now know that Q With it is linear.

	7
	00:00:23,240 --> 00:00:30,390
	Now if you come from a machine learning background you're wondering what makes us different to regressions

	8
	00:00:30,450 --> 00:00:35,430
	or logistic directions as Reems which also a series of linear equations.

	9
	00:00:35,430 --> 00:00:38,450
	Well let's get into that in this chapter right now.

	10
	00:00:40,230 --> 00:00:42,940
	So now let's talk about activation functions.

	11
	00:00:43,230 --> 00:00:51,350
	So should you Enric in India but notice that it was just input value multiplied by the weights Plus

	12
	00:00:51,360 --> 00:00:52,480
	the biased.

	13
	00:00:52,500 --> 00:00:56,370
	And some of those that goes into the node is what produces the output.

	14
	00:00:56,370 --> 00:01:06,180
	However in reality yes that is true but it's these w x y plus beta and Pitsea actually passed into an

	15
	00:01:06,180 --> 00:01:07,670
	activation function.

	16
	00:01:07,980 --> 00:01:14,190
	And basically the simplest an activation function is that simple this type of activation function is

	17
	00:01:14,190 --> 00:01:16,740
	that this one here Max zero.

	18
	00:01:16,830 --> 00:01:17,560
	Excellent.

	19
	00:01:17,940 --> 00:01:23,460
	What this means is that any value over Zero is going to be passed.

	20
	00:01:23,550 --> 00:01:24,470
	Right.

	21
	00:01:24,870 --> 00:01:26,660
	Now I'll tell you why this is important later.

	22
	00:01:26,700 --> 00:01:29,000
	But just imagine we have a function a max function here.

	23
	00:01:29,410 --> 00:01:31,590
	And we have this way it's going into it.

	24
	00:01:31,620 --> 00:01:34,480
	So let's see where we it's what we got point eight eight five.

	25
	00:01:34,490 --> 00:01:35,880
	What each do believe.

	26
	00:01:36,320 --> 00:01:41,520
	That's a CMO we had six of us point seventy five this next election because it's crucial that again

	27
	00:01:41,520 --> 00:01:46,090
	zero which means that anything below zero will be negated.

	28
	00:01:46,470 --> 00:01:47,570
	And you see that here.

	29
	00:01:47,670 --> 00:01:49,610
	However anything above you will be allowed.

	30
	00:01:49,710 --> 00:01:53,250
	So in this example here point 75 is allowed.

	31
	00:01:53,250 --> 00:02:00,420
	However if the weeds give us a negative value which can happen easily then ethics is clamped and zero.

	32
	00:02:00,750 --> 00:02:01,760
	It isn't 6.5.

	33
	00:02:01,770 --> 00:02:03,310
	It doesn't take any positive value.

	34
	00:02:03,440 --> 00:02:04,790
	It just just becomes zero.

	35
	00:02:05,010 --> 00:02:08,430
	With this activation function.

	36
	00:02:08,530 --> 00:02:11,320
	So let's talk about the real activation function.

	37
	00:02:11,580 --> 00:02:17,780
	Reduced handsful rectified ling a unit and it probably is the most common activation function used in

	38
	00:02:17,800 --> 00:02:21,370
	CNN's and your own that's now the appearance is quite simple.

	39
	00:02:21,410 --> 00:02:23,360
	You just take a look at it.

	40
	00:02:23,420 --> 00:02:24,840
	Basically it's this.

	41
	00:02:24,970 --> 00:02:27,180
	It clamps a negative value to zero.

	42
	00:02:27,560 --> 00:02:32,950
	So there's no negative value is being passed going forward and what value is basically left alone.

	43
	00:02:33,410 --> 00:02:37,200
	So default value is X which is always positive.

	44
	00:02:37,200 --> 00:02:42,660
	So it's basically this really.

	45
	00:02:42,730 --> 00:02:45,520
	So why do we need these activation functions.

	46
	00:02:45,580 --> 00:02:47,550
	Remember I said if you come from.

	47
	00:02:47,650 --> 00:02:54,040
	Coming from a machine learning Becchio you're thinking so far with activation functions and that's basically

	48
	00:02:54,190 --> 00:02:58,880
	just another form of basically not a model linear model.

	49
	00:02:59,000 --> 00:03:06,820
	However having activation functions allows us to basically expressed the linearity relationships in

	50
	00:03:06,820 --> 00:03:07,700
	this data.

	51
	00:03:08,200 --> 00:03:11,580
	Most of my models have a tough time dealing with this.

	52
	00:03:11,650 --> 00:03:12,070
	All right.

	53
	00:03:12,070 --> 00:03:18,870
	So the whole huge advantage of deepening is its ability to understand nonlinear models.

	54
	00:03:19,480 --> 00:03:21,360
	And what do we mean by linear.

	55
	00:03:21,360 --> 00:03:24,100
	Now imagine we have two categories of data.

	56
	00:03:24,140 --> 00:03:33,980
	Imagine just for expressions like this is cats dogs and no linear separate Rubell data will be easily.

	57
	00:03:34,110 --> 00:03:40,970
	This is called The Decision boundary will be easily separated here however the non linear is linearly.

	58
	00:03:40,990 --> 00:03:42,020
	Sorry about that.

	59
	00:03:42,550 --> 00:03:48,100
	None linearly separable little is going to require a much more complicated function to do.

	60
	00:03:48,490 --> 00:03:49,360
	Now we can use.

	61
	00:03:49,360 --> 00:03:55,420
	You may have been thinking we can use complicated polynomial decision about your type functions to get

	62
	00:03:55,420 --> 00:03:58,440
	this but what about if it was more than two dimensions.

	63
	00:03:58,450 --> 00:04:01,520
	Imagine how complicated that function is going to be.

	64
	00:04:01,840 --> 00:04:09,100
	That's going to separate us data and introducing activation functions which allows us to basically expressed

	65
	00:04:09,100 --> 00:04:13,810
	a non linear relationship and data is what makes that so powerful.

	66
	00:04:14,440 --> 00:04:15,590
	So there are a few other types.

	67
	00:04:15,600 --> 00:04:21,390
	Activation functions some of which we may use in the source we definitely use rectify Lyndie and it's

	68
	00:04:21,400 --> 00:04:22,530
	quite a bit.

	69
	00:04:22,610 --> 00:04:29,230
	Sometimes we use sigmoid sometimes use hyperbolic tangent but rarely ever use those and we use those

	70
	00:04:29,560 --> 00:04:32,040
	in specialized cases which you'll see later on.

	71
	00:04:35,180 --> 00:04:42,930
	So as before mentioned in the previous chapter we have be obeyed of course by its functions.

	72
	00:04:42,930 --> 00:04:45,640
	So why do we need by assumptions.

	73
	00:04:45,770 --> 00:04:52,670
	You know you know training right now because you may be thinking we can pretty much make it anything.

	74
	00:04:53,040 --> 00:04:54,940
	However that doesn't change much.

	75
	00:04:55,110 --> 00:04:57,600
	It's simply changed a gradient.

	76
	00:04:57,600 --> 00:05:04,260
	If you're from a mathematical background he would know any value multiplied by x increases the steepness

	77
	00:05:04,380 --> 00:05:05,940
	of the values going in.

	78
	00:05:06,000 --> 00:05:07,170
	That's what this is here.

	79
	00:05:07,230 --> 00:05:13,280
	So you can see that it seemed this way it was point 5 1 2 and the sigmoid function here.

	80
	00:05:13,770 --> 00:05:16,100
	So you can actually see point five.

	81
	00:05:16,100 --> 00:05:17,510
	It's not that steep.

	82
	00:05:17,750 --> 00:05:21,620
	One gets deeper and definitely had to it's super steep here.

	83
	00:05:22,230 --> 00:05:30,280
	However what if we wanted to shift left and right we can do that without changing by changing the way

	84
	00:05:30,420 --> 00:05:31,530
	in front of the Exa.

	85
	00:05:31,800 --> 00:05:37,320
	But we can do it by adding a value to it adding a value actually shifts at the left and right here as

	86
	00:05:37,320 --> 00:05:38,550
	you can see.

	87
	00:05:39,210 --> 00:05:46,830
	So now we have the weights here constant weights going into X plus a constant value here 5 minus 1 0

	88
	00:05:47,190 --> 00:05:48,070
	5.

	89
	00:05:48,180 --> 00:05:54,280
	So we can actually used we it's about sort of bias values to actually shifted left and right.

	90
	00:05:55,740 --> 00:06:01,990
	So let's talk a bit about in your own inspiration behind your own that's why the way.

	91
	00:06:02,000 --> 00:06:03,680
	Why is it called you and that's in the first place.

	92
	00:06:03,690 --> 00:06:05,960
	Is it a replication of a brain.

	93
	00:06:05,970 --> 00:06:06,840
	Not quite.

	94
	00:06:06,840 --> 00:06:15,360
	However activation units basically act make these hidden units these notes act like neurons because

	95
	00:06:16,020 --> 00:06:22,950
	if you know how neurons work neurons receive inputs along these lines here believe dical dendrites I

	96
	00:06:22,950 --> 00:06:23,500
	think.

	97
	00:06:23,940 --> 00:06:29,550
	And basically when these we just sit and value trouble the neuron fires and neurons connected to many

	98
	00:06:29,700 --> 00:06:35,250
	different neurons who then received these inputs from this hearing that is a very similar analogy to

	99
	00:06:35,250 --> 00:06:40,980
	how neural nets work when a value input value crosses that threshold it fails.

	100
	00:06:43,510 --> 00:06:48,300
	And let's talk about how deep in deep learning what do we mean by deep.

	101
	00:06:48,640 --> 00:06:53,770
	Everyone says deep listening but no one actually tells you what deep this deep is actually the number

	102
	00:06:53,770 --> 00:06:55,290
	of hidden layers here.

	103
	00:06:55,750 --> 00:07:00,710
	So this is a nice example illustration of neuro that here.

	104
	00:07:01,210 --> 00:07:03,970
	These are the connections with our heads and biases.

	105
	00:07:03,970 --> 00:07:10,540
	These are the nodes here and this one has one two three four five six seven eight hidden layers here.

	106
	00:07:10,940 --> 00:07:12,800
	As you can probably count this one do as well.

	107
	00:07:12,850 --> 00:07:14,320
	Sometimes they do.

	108
	00:07:14,410 --> 00:07:17,530
	So this isn't all the hidden layers here that we don't see.

	109
	00:07:17,530 --> 00:07:23,650
	And I imagine now we have millions millions of thousands of these like parameters and all the stuff

	110
	00:07:23,650 --> 00:07:29,850
	here that gives you an example of how complicated your own skin actually get especially deep neural

	111
	00:07:29,870 --> 00:07:31,660
	nets.

	112
	00:07:31,990 --> 00:07:40,210
	Now when do we need to be deep deep actually helped to represent non-linear representations in the data

	113
	00:07:40,270 --> 00:07:41,390
	quite well.

	114
	00:07:41,800 --> 00:07:49,070
	So if you want to actually train a complicated network it's always better to go deep rather than shallow.

	115
	00:07:49,600 --> 00:07:55,780
	However deeper it is not always easier and better to work with deeper networks require a lot more training

	116
	00:07:55,780 --> 00:07:56,430
	time.

	117
	00:07:56,680 --> 00:07:59,700
	And sometimes it tend to fit the input data.

	118
	00:07:59,710 --> 00:08:02,720
	No I haven't introduced the concept of overfitting yet.

	119
	00:08:02,950 --> 00:08:05,320
	However we will deal with it very shortly.

	120
	00:08:07,690 --> 00:08:13,300
	So like I said your real magic of your own that happens during training because we see that it's very

	121
	00:08:13,300 --> 00:08:18,280
	simple to execute a tree and you know that when you don't it but when it's when I mean execute I mean

	122
	00:08:18,670 --> 00:08:19,810
	pasand but do it and get it.

	123
	00:08:19,840 --> 00:08:23,440
	But but how exactly do we get these weights and biases.

	124
	00:08:23,440 --> 00:08:25,630
	This is exactly what we want to do.

	125
	00:08:26,020 --> 00:08:28,820
	And I haven't told you about it yet so let's find out.