1 00:00:00,530 --> 00:00:01,090 I guess. 2 00:00:01,110 --> 00:00:05,220 And welcome to chapter eight point five where we talk about what one encoding. 3 00:00:05,520 --> 00:00:11,900 So as we saw before we did some transmissions on the extreme and X deciliter that's so image data. 4 00:00:11,970 --> 00:00:15,520 Now what about our label little white green and white test. 5 00:00:15,780 --> 00:00:16,530 Well let's find out. 6 00:00:16,530 --> 00:00:19,790 So let's recap what we did on the image data. 7 00:00:19,860 --> 00:00:23,880 We added a four dimension to go from this to this. 8 00:00:23,880 --> 00:00:32,070 We changed it to flow to two data type and we normalized it between 0 and 1 by dividing by 255. 9 00:00:32,450 --> 00:00:33,960 But what do we do to label that. 10 00:00:33,960 --> 00:00:35,450 Now that's true. 11 00:00:35,940 --> 00:00:39,730 So the label is basically in this form for white train. 12 00:00:39,750 --> 00:00:47,850 It's going to be a matrix that is 60000 that has 60000 elements and each element indicates a class label. 13 00:00:47,890 --> 00:00:57,360 So for x for y treatment which is a 28 if this element in white and extreme is going to be 28 by 28 14 00:00:57,840 --> 00:01:00,900 and this zero corresponds to its label here. 15 00:01:01,150 --> 00:01:04,710 However Harris does not use label data like this. 16 00:01:04,890 --> 00:01:06,820 It needs it to be a hot one encoded. 17 00:01:06,900 --> 00:01:09,850 And what does that look like that looks like this. 18 00:01:10,080 --> 00:01:11,640 So we have labels here. 19 00:01:12,030 --> 00:01:20,840 And instead of having a for being represented here it's basically a matrix to has 10 columns now. 20 00:01:21,270 --> 00:01:24,900 So instead of having no call them so effectively want to call them. 21 00:01:24,930 --> 00:01:26,890 Sorry sixty dozen columns. 22 00:01:27,000 --> 00:01:34,580 It has 10 columns here and 60000 rows and each column is a row. 23 00:01:34,590 --> 00:01:39,390 Sorry has basically a 1 0 0 indicating which label it is. 24 00:01:39,690 --> 00:01:42,900 So imagine we have this being transformed. 25 00:01:42,900 --> 00:01:44,090 Sorry let's look at this. 26 00:01:44,090 --> 00:01:45,520 This is a td rule here. 27 00:01:45,810 --> 00:01:48,340 Being transformed into this. 28 00:01:48,370 --> 00:01:55,520 So instead of having this rule before what one coding makes it into this I hope you understand clearly 29 00:01:55,530 --> 00:01:56,600 so we're going to do this now. 30 00:01:56,640 --> 00:01:58,770 You know I write in my book. 31 00:01:59,650 --> 00:01:59,960 OK. 32 00:01:59,970 --> 00:02:06,310 So Step three is a hot one including a full y labels and to do this we basically use any utilities that 33 00:02:06,310 --> 00:02:10,280 are imported from the utilities and all this stuff here. 34 00:02:10,290 --> 00:02:12,600 It just sends it to categorical. 35 00:02:12,600 --> 00:02:17,210 That is how Cara's calls dysfunction Hotpoint including two categorical. 36 00:02:17,460 --> 00:02:18,550 So we have Whitopia. 37 00:02:18,620 --> 00:02:23,160 It's equal to utilities not to categorical and just put the wager in here. 38 00:02:23,310 --> 00:02:25,140 And that transforms it. 39 00:02:25,140 --> 00:02:27,220 So let's take a look at how this actually looks. 40 00:02:27,220 --> 00:02:28,510 So let's run this here. 41 00:02:28,820 --> 00:02:30,120 So no why train. 42 00:02:30,270 --> 00:02:32,420 Let's look at the first rule in waitron. 43 00:02:32,830 --> 00:02:40,700 It's this and basically can see one two three four five six seven eight nine ten elements. 44 00:02:40,920 --> 00:02:45,270 And with this one this looks like the nine and fifth element here. 45 00:02:45,300 --> 00:02:46,920 So this is number five. 46 00:02:46,920 --> 00:02:51,720 So the first element in overtreating data is number five. 47 00:02:52,110 --> 00:02:54,710 So now let's move on to creating a model.