Prince-1 commited on
Commit
e62bc71
·
verified ·
1 Parent(s): 0182da2

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. 15. Transfer Learning Build a Flower & Monkey Breed Classifier/3. Build a Monkey Breed Classifier with MobileNet using Transfer Learning.srt +769 -0
  2. 15. Transfer Learning Build a Flower & Monkey Breed Classifier/3.1 Download the Monkey Breed Dataset.html +1 -0
  3. 15. Transfer Learning Build a Flower & Monkey Breed Classifier/4. Build a Flower Classifier with VGG16 using Transfer Learning.srt +475 -0
  4. 15. Transfer Learning Build a Flower & Monkey Breed Classifier/4.1 Download the 17-Flowers Dataset.html +1 -0
  5. 16. Design Your Own CNN - LittleVGG A Simpsons Classifier/1. Chapter Introduction.srt +27 -0
  6. 16. Design Your Own CNN - LittleVGG A Simpsons Classifier/2. Introducing LittleVGG.srt +87 -0
  7. 16. Design Your Own CNN - LittleVGG A Simpsons Classifier/3. Simpsons Character Recognition using LittleVGG.srt +583 -0
  8. 16. Design Your Own CNN - LittleVGG A Simpsons Classifier/3.1 Download Simpsons Dataset.html +1 -0
  9. 16. Design Your Own CNN - LittleVGG/16.2 LittleVGG - Simpsons.ipynb +0 -0
  10. 17. Advanced Activation Functions & Initializations/1. Chapter Introduction.srt +27 -0
  11. 17. Advanced Activation Functions & Initializations/2. Dying ReLU Problem and Introduction to Leaky ReLU, ELU and PReLUs.srt +279 -0
  12. 17. Advanced Activation Functions & Initializations/3. Advanced Initializations.srt +151 -0
  13. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.2 Building an Emotion Detector with LittleVGG.ipynb +723 -0
  14. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.3A - Age, Gender Detection.ipynb +174 -0
  15. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.3B Age, Gender with Emotion.ipynb +526 -0
  16. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/Face Detection - Friends Characters.ipynb +526 -0
  17. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/Face Extraction from Video.ipynb +93 -0
  18. Gender Recognition/rajeev.jpg +0 -0
  19. 18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/wide_resnet.py +152 -0
  20. 18. Facial Applications - Emotion, Age & Gender Recognition/1. Chapter Introduction.srt +47 -0
  21. 18. Facial Applications - Emotion, Age & Gender Recognition/2. Build an Emotion, Facial Expression Detector.srt +1239 -0
  22. 18. Facial Applications - Emotion, Age & Gender Recognition/2.1 Download Dataset.html +1 -0
  23. 18. Facial Applications - Emotion, Age & Gender Recognition/3. Build EmotionAgeGender Recognition in our Deep Surveillance Monitor.srt +1547 -0
  24. 18. Facial Applications - Emotion, Age & Gender Recognition/3.1 Download weights file.html +1 -0
  25. 18. Facial Applications - Emotion, Age & Gender Recognition/3.2 Code and files required for project.html +1 -0
  26. 19. Medical Imaging - Image Segmentation with U-Net/1. Chapter Overview on Image Segmentation & Medical Imaging in U-Net.srt +31 -0
  27. 19. Medical Imaging - Image Segmentation with U-Net/2. What is Segmentation And Applications in Medical Imaging.srt +215 -0
  28. 19. Medical Imaging - Image Segmentation with U-Net/3. U-Net Image Segmentation with CNNs.srt +203 -0
  29. 19. Medical Imaging - Image Segmentation with U-Net/4. The Intersection over Union (IoU) Metric.srt +267 -0
  30. 19. Medical Imaging - Image Segmentation with U-Net/5. Finding the Nuclei in Divergent Images.srt +875 -0
  31. 19. Medical Imaging - Image Segmentation with U-Net/5.1 Download U-Net.html +1 -0
  32. 19. Medical Imaging Segmentation using U-Net/U-Net (not compatible with TensorFlow 2.0, required to downgrade).ipynb +0 -0
  33. 20. Principles of Object Detection/1. Chapter Introduction.srt +43 -0
  34. 20. Principles of Object Detection/2. Object Detection Introduction - Sliding Windows with HOGs.srt +303 -0
  35. 20. Principles of Object Detection/3. R-CNN, Fast R-CNN, Faster R-CNN and Mask R-CNN.srt +847 -0
  36. 20. Principles of Object Detection/4. Single Shot Detectors (SSDs).srt +115 -0
  37. 20. Principles of Object Detection/5. YOLO to YOLOv3.srt +203 -0
  38. 21. TensforFlow Object Detection/Go to the folder speciefid in this file +12 -0
  39. 21. TensforFlow Object Detection/object_detection_tutorial.ipynb +0 -0
  40. 21. TensorFlow Object Detection API/1. Chapter Introduction.srt +27 -0
  41. 21. TensorFlow Object Detection API/2. TFOD API Install and Setup.srt +255 -0
  42. 21. TensorFlow Object Detection API/2.1 Download the code (for those not using the Virtual Machine).html +1 -0
  43. 21. TensorFlow Object Detection API/3. Experiment with a ResNet SSD on images, webcam and videos.srt +471 -0
  44. 21. TensorFlow Object Detection API/4. How to Train a TFOD Model.srt +503 -0
  45. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/1. Chapter Introduction.srt +23 -0
  46. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2. Setting up and install Yolo DarkNet and DarkFlow.srt +363 -0
  47. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2.1 Guide to the MacOS Install.html +1 -0
  48. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2.2 Download the YOLO files (if not using the VM).html +1 -0
  49. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/3. Experiment with YOLO on still images, webcam and videos.srt +547 -0
  50. 22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/4. Build your own YOLO Object Detector - Detecting London Underground Signs.srt +1011 -0
15. Transfer Learning Build a Flower & Monkey Breed Classifier/3. Build a Monkey Breed Classifier with MobileNet using Transfer Learning.srt ADDED
@@ -0,0 +1,769 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,490 --> 00:00:06,120
3
+ Hi and welcome to Chapter 15 point to where we're going to build a monkey bridge justifier and basically
4
+
5
+ 2
6
+ 00:00:06,120 --> 00:00:11,780
7
+ use the concept of transfer learning to get very high accuracy very quickly.
8
+
9
+ 3
10
+ 00:00:11,780 --> 00:00:13,550
11
+ So let's take a look at this dataset.
12
+
13
+ 4
14
+ 00:00:13,560 --> 00:00:19,380
15
+ This is where it was taken a stick and from a Kaggle project and basically it has about 80 images I
16
+
17
+ 5
18
+ 00:00:19,380 --> 00:00:22,240
19
+ think of about 10 different types of monkeys each.
20
+
21
+ 6
22
+ 00:00:22,270 --> 00:00:24,290
23
+ Is it a species of monkeys here.
24
+
25
+ 7
26
+ 00:00:25,140 --> 00:00:30,120
27
+ And actually not 80 18:00 into the 152 images of each class.
28
+
29
+ 8
30
+ 00:00:30,120 --> 00:00:35,420
31
+ And these are some sample images here and you'll notice that some are quite small and differently.
32
+
33
+ 9
34
+ 00:00:35,670 --> 00:00:40,740
35
+ Different aspect ratios images of various sizes and quality as well.
36
+
37
+ 10
38
+ 00:00:40,770 --> 00:00:45,900
39
+ So it's pretty much like what you might build as your own data sets effectively.
40
+
41
+ 11
42
+ 00:00:46,050 --> 00:00:53,310
43
+ It's not well standardized not super neat not super high quality images just random images taken from
44
+
45
+ 12
46
+ 00:00:53,310 --> 00:00:54,080
47
+ the Internet.
48
+
49
+ 13
50
+ 00:00:54,360 --> 00:01:01,280
51
+ So now let's move onto Pitre notebook and begin creating this toxify OK.
52
+
53
+ 14
54
+ 00:01:01,290 --> 00:01:07,830
55
+ So before we begin I hope you downloaded your resource file monkey Breede datasets and have placed it
56
+
57
+ 15
58
+ 00:01:07,950 --> 00:01:09,560
59
+ inside of the territory here.
60
+
61
+ 16
62
+ 00:01:09,810 --> 00:01:15,660
63
+ This is the translating directory so you have monkey abused monkey read our victory here with our training
64
+
65
+ 17
66
+ 00:01:15,690 --> 00:01:16,300
67
+ images.
68
+
69
+ 18
70
+ 00:01:16,320 --> 00:01:17,960
71
+ And each one is in full to here.
72
+
73
+ 19
74
+ 00:01:18,300 --> 00:01:22,980
75
+ And let's go back and now hopefully that's set up correctly for you.
76
+
77
+ 20
78
+ 00:01:23,250 --> 00:01:28,070
79
+ So now we can go back so we went back to full and open up here.
80
+
81
+ 21
82
+ 00:01:28,200 --> 00:01:33,630
83
+ I already have it open right now so I'm going to go through this step by step so you understand exactly
84
+
85
+ 22
86
+ 00:01:33,630 --> 00:01:36,210
87
+ how we can apply transfer linning.
88
+
89
+ 23
90
+ 00:01:36,210 --> 00:01:37,140
91
+ All right.
92
+
93
+ 24
94
+ 00:01:37,140 --> 00:01:39,180
95
+ So we're doing this with more on that.
96
+
97
+ 25
98
+ 00:01:39,210 --> 00:01:44,090
99
+ And the reason I have to move on that for is because it actually trains quite quickly on C-p use.
100
+
101
+ 26
102
+ 00:01:44,430 --> 00:01:46,430
103
+ So let's import ballots here.
104
+
105
+ 27
106
+ 00:01:47,010 --> 00:01:53,880
107
+ And then let's define image rows and columns so we're going to use uniform square images of 2:24 by
108
+
109
+ 28
110
+ 00:01:53,880 --> 00:01:55,650
111
+
112
+ 29
113
+ 2:24 in size.
114
+
115
+ 30
116
+ 00:01:55,740 --> 00:01:58,620
117
+ And this is how we basically define that.
118
+
119
+ 31
120
+ 00:01:58,650 --> 00:02:01,280
121
+ When we loaded in we wanted to we had to be Image nets.
122
+
123
+ 32
124
+ 00:02:01,290 --> 00:02:03,970
125
+ We've seen this before in our pretreated Model S..
126
+
127
+ 33
128
+ 00:02:04,020 --> 00:02:06,240
129
+ However we haven't seen these parameters here.
130
+
131
+ 34
132
+ 00:02:06,240 --> 00:02:11,610
133
+ I will quickly discuss this with you what we're going to do is that we're going to include the top and
134
+
135
+ 35
136
+ 00:02:11,610 --> 00:02:13,120
137
+ said this at Falls.
138
+
139
+ 36
140
+ 00:02:13,380 --> 00:02:18,150
141
+ What this means is that the fully connectedly as the last layers on the top of the model are basically
142
+
143
+ 37
144
+ 00:02:18,150 --> 00:02:19,820
145
+ not included in the model.
146
+
147
+ 38
148
+ 00:02:20,130 --> 00:02:24,650
149
+ So I'm going to show you what it looks pretty soon and in what shape is of center thing.
150
+
151
+ 39
152
+ 00:02:24,690 --> 00:02:30,420
153
+ We just defined in what shape of this model to be this as why we define these parameters up here and
154
+
155
+ 40
156
+ 00:02:30,420 --> 00:02:33,050
157
+ tree means color depth of tree RGV.
158
+
159
+ 41
160
+ 00:02:33,450 --> 00:02:36,990
161
+ So this is a cool thing we can do with terrorist models upload.
162
+
163
+ 42
164
+ 00:02:37,320 --> 00:02:39,790
165
+ So we have a model here called Mobile that.
166
+
167
+ 43
168
+ 00:02:39,890 --> 00:02:44,310
169
+ So by addressing the layers within that Dudley as districting an array.
170
+
171
+ 44
172
+ 00:02:44,490 --> 00:02:50,450
173
+ And we can basically loop through these areas here and actually turn it off manually.
174
+
175
+ 45
176
+ 00:02:50,690 --> 00:02:57,360
177
+ The treatable parameter a flag that controls what it is should be trainable or not.
178
+
179
+ 46
180
+ 00:02:57,360 --> 00:03:02,980
181
+ So what we do in this two lines of code here is that we basically setting all the layers in Mobile and
182
+
183
+ 47
184
+ 00:03:02,990 --> 00:03:06,450
185
+ that's to be non-tradable basically fixed.
186
+
187
+ 48
188
+ 00:03:06,450 --> 00:03:09,160
189
+ This is how we freeze DeWitt's right here.
190
+
191
+ 49
192
+ 00:03:09,690 --> 00:03:11,830
193
+ So now we could actually print these layers here.
194
+
195
+ 50
196
+ 00:03:12,120 --> 00:03:16,250
197
+ And basically what we are printing is the name of Leo number.
198
+
199
+ 51
200
+ 00:03:16,320 --> 00:03:22,740
201
+ I go to the loop and we're going to print the flag Liautaud trainable what it's treatable.
202
+
203
+ 52
204
+ 00:03:22,770 --> 00:03:23,940
205
+ True or false.
206
+
207
+ 53
208
+ 00:03:23,970 --> 00:03:29,970
209
+ So you get to see all the layers now which is quite a bit a mobile that are set to false.
210
+
211
+ 54
212
+ 00:03:29,970 --> 00:03:31,840
213
+ So this is pretty awesome already.
214
+
215
+ 55
216
+ 00:03:32,100 --> 00:03:35,290
217
+ So I hope you're following simple code so far.
218
+
219
+ 56
220
+ 00:03:36,000 --> 00:03:42,090
221
+ So now we're going to do is we're going to create a simple function here that basically adds to fully
222
+
223
+ 57
224
+ 00:03:42,090 --> 00:03:47,730
225
+ connected head back onto the model we loaded here because remember we loaded it.
226
+
227
+ 58
228
+ 00:03:47,880 --> 00:03:49,180
229
+ But we didn't get to the top.
230
+
231
+ 59
232
+ 00:03:49,200 --> 00:03:52,950
233
+ So now we have a model without any top.
234
+
235
+ 60
236
+ 00:03:52,950 --> 00:03:55,070
237
+ So no actually I want to show you something quickly.
238
+
239
+ 61
240
+ 00:03:55,290 --> 00:03:57,590
241
+ What if we said this to true.
242
+
243
+ 62
244
+ 00:03:57,620 --> 00:03:58,090
245
+ All right.
246
+
247
+ 63
248
+ 00:03:58,110 --> 00:03:59,220
249
+ How would this model look.
250
+
251
+ 64
252
+ 00:03:59,250 --> 00:04:04,330
253
+ So we saw we had 86 differently as the last one being removed.
254
+
255
+ 65
256
+ 00:04:04,650 --> 00:04:09,000
257
+ So let's now print this and see what it looks
258
+
259
+ 66
260
+ 00:04:14,460 --> 00:04:16,170
261
+ takes about five to 10 seconds to run.
262
+
263
+ 67
264
+ 00:04:16,170 --> 00:04:17,500
265
+ There we go.
266
+
267
+ 68
268
+ 00:04:18,330 --> 00:04:18,990
269
+ Oh good.
270
+
271
+ 69
272
+ 00:04:18,990 --> 00:04:24,770
273
+ So before we head up to 86 now we see we have basically this is the top fully connected head.
274
+
275
+ 70
276
+ 00:04:25,200 --> 00:04:28,240
277
+ This is what we left out before previously.
278
+
279
+ 71
280
+ 00:04:28,320 --> 00:04:29,510
281
+ So now let's put it back in.
282
+
283
+ 72
284
+ 00:04:29,670 --> 00:04:30,690
285
+ OK.
286
+
287
+ 73
288
+ 00:04:32,350 --> 00:04:35,090
289
+ Because what we're going to do we're going to add a head here.
290
+
291
+ 74
292
+ 00:04:35,340 --> 00:04:38,660
293
+ These are really as we we are going to add onto the model now.
294
+
295
+ 75
296
+ 00:04:38,710 --> 00:04:40,660
297
+ So how do we use this function.
298
+
299
+ 76
300
+ 00:04:40,660 --> 00:04:43,720
301
+ This function takes a number of classes.
302
+
303
+ 77
304
+ 00:04:43,790 --> 00:04:46,120
305
+ I do our data sets.
306
+
307
+ 78
308
+ 00:04:46,420 --> 00:04:48,220
309
+ We specify how many classes we want.
310
+
311
+ 79
312
+ 00:04:48,220 --> 00:04:54,370
313
+ So for a monkey breed they is that it's going to be 10 and the bottom bottom model is basically this
314
+
315
+ 80
316
+ 00:04:54,420 --> 00:04:55,040
317
+ model here.
318
+
319
+ 81
320
+ 00:04:55,080 --> 00:04:57,250
321
+ Well not at all it's for us and it's.
322
+
323
+ 82
324
+ 00:04:57,580 --> 00:05:00,040
325
+ So let's quickly see what this function does.
326
+
327
+ 83
328
+ 00:05:00,100 --> 00:05:02,200
329
+ It takes a lot of muddled model here.
330
+
331
+ 84
332
+ 00:05:02,420 --> 00:05:07,310
333
+ Guess's gets the output part of it here and we create basically the top model now.
334
+
335
+ 85
336
+ 00:05:07,660 --> 00:05:13,990
337
+ So what we do now we have to find a top model like this here and now the top model we just simply basically
338
+
339
+ 86
340
+ 00:05:14,080 --> 00:05:15,450
341
+ add these layers here.
342
+
343
+ 87
344
+ 00:05:15,670 --> 00:05:18,240
345
+ It's a different way of Ardingly as no one cares.
346
+
347
+ 88
348
+ 00:05:18,580 --> 00:05:21,010
349
+ So we added an adjusted to the top model here.
350
+
351
+ 89
352
+ 00:05:21,280 --> 00:05:28,600
353
+ So for us we do global pooling Tuti we do a densely with a thousand and 28 nodes that again another
354
+
355
+ 90
356
+ 00:05:28,600 --> 00:05:33,590
357
+ densely a here and then we do a final densely with soft Macksville attend classes we want.
358
+
359
+ 91
360
+ 00:05:33,790 --> 00:05:38,490
361
+ And then what this does retune the model Top Model back.
362
+
363
+ 92
364
+ 00:05:38,600 --> 00:05:45,640
365
+ OK so now what we do below is obviously we just load all Olias of need and defined number of classes
366
+
367
+ 93
368
+ 00:05:45,670 --> 00:05:51,640
369
+ but now we can actually use our function here where we actually enter a number of classes.
370
+
371
+ 94
372
+ 00:05:51,730 --> 00:05:57,480
373
+ We enter the mobile in that model that we created we loaded before and we add a top.
374
+
375
+ 95
376
+ 00:05:57,580 --> 00:06:02,000
377
+ That's actually a we defined here to this model and that's why we call it the C.
378
+
379
+ 96
380
+ 00:06:02,360 --> 00:06:08,840
381
+ And what we do know is that we use this cross model function so we use it now to get inputs here which
382
+
383
+ 97
384
+ 00:06:08,840 --> 00:06:13,680
385
+ are defined as a mobile at model output speed the other possible are we going to train.
386
+
387
+ 98
388
+ 00:06:13,840 --> 00:06:18,970
389
+ And basically this combines it into one model now one model where it looks like this when print printed
390
+
391
+ 99
392
+ 00:06:18,970 --> 00:06:19,920
393
+ out.
394
+
395
+ 100
396
+ 00:06:20,520 --> 00:06:26,300
397
+ So a lot of layers I just saw before 86 Lia's But now we have four sort six Malis is now these are three
398
+
399
+ 101
400
+ 00:06:26,320 --> 00:06:27,360
401
+ to find here.
402
+
403
+ 102
404
+ 00:06:27,790 --> 00:06:29,250
405
+ And that's going to show up right here.
406
+
407
+ 103
408
+ 00:06:29,320 --> 00:06:30,590
409
+ So this is pretty cool.
410
+
411
+ 104
412
+ 00:06:30,820 --> 00:06:31,950
413
+ And look at this here.
414
+
415
+ 105
416
+ 00:06:31,960 --> 00:06:37,710
417
+ So we have five million parameters five point equally actually and trainable parameters.
418
+
419
+ 106
420
+ 00:06:37,750 --> 00:06:38,870
421
+ Only 2.6.
422
+
423
+ 107
424
+ 00:06:38,890 --> 00:06:43,430
425
+ And the non-tradable parameters which are the width of we froze our trillion.
426
+
427
+ 108
428
+ 00:06:43,720 --> 00:06:48,850
429
+ So effectively we've taken a model of at MIT How was pretty complex not super complex like a Viji and
430
+
431
+ 109
432
+ 00:06:48,880 --> 00:06:54,850
433
+ couple of others but complex enough and we've made it into a much simpler model to train.
434
+
435
+ 110
436
+ 00:06:55,030 --> 00:06:57,390
437
+ So let's get to training of monkey breed.
438
+
439
+ 111
440
+ 00:06:57,400 --> 00:07:00,600
441
+ They just had no training on Lockerby to classify.
442
+
443
+ 112
444
+ 00:07:00,910 --> 00:07:05,720
445
+ So we loaded data sets using imaged digit data generators that you've seen before.
446
+
447
+ 113
448
+ 00:07:06,460 --> 00:07:12,410
449
+ We do a standard thing here which you of which you should be pretty familiar with by now and then we
450
+
451
+ 114
452
+ 00:07:12,400 --> 00:07:14,530
453
+ define some checkpoints and Colback sorry.
454
+
455
+ 115
456
+ 00:07:14,650 --> 00:07:20,470
457
+ So we use stopping and checkpointing here and then we train for only five bucks for now.
458
+
459
+ 116
460
+ 00:07:20,680 --> 00:07:23,440
461
+ That's because we don't want it to be like take too long.
462
+
463
+ 117
464
+ 00:07:23,830 --> 00:07:26,380
465
+ And actually treating it separately in this window here.
466
+
467
+ 118
468
+ 00:07:26,830 --> 00:07:30,980
469
+ So I've actually already trained almost five e-books and realize so much time.
470
+
471
+ 119
472
+ 00:07:31,390 --> 00:07:38,590
473
+ So look at this here you can see after this epoch which took just under five minutes our validation
474
+
475
+ 120
476
+ 00:07:38,590 --> 00:07:41,180
477
+ accuracy was 88 percent already.
478
+
479
+ 121
480
+ 00:07:41,470 --> 00:07:44,750
481
+ That is actually pretty damn good for such a short space of time.
482
+
483
+ 122
484
+ 00:07:45,070 --> 00:07:49,580
485
+ Now and a second night duration because it's such a early start to the trading hour.
486
+
487
+ 123
488
+ 00:07:49,630 --> 00:07:55,690
489
+ Even though the trading loss is much lower the and accuracy is a little bit less 84 percent.
490
+
491
+ 124
492
+ 00:07:55,780 --> 00:07:56,320
493
+ That's OK.
494
+
495
+ 125
496
+ 00:07:56,350 --> 00:07:58,060
497
+ We can sort of live it out.
498
+
499
+ 126
500
+ 00:07:58,120 --> 00:08:03,280
501
+ We'll let a train from what ebox and see how it evolves because training these pre-treat models when
502
+
503
+ 127
504
+ 00:08:03,280 --> 00:08:09,120
505
+ it's something which is frozen is a little bit different than how we turn to CNN's they basically do
506
+
507
+ 128
508
+ 00:08:09,180 --> 00:08:12,890
509
+ they do effectively converge and get a very high value.
510
+
511
+ 129
512
+ 00:08:12,940 --> 00:08:15,920
513
+ However you do sometimes see some odd fluctuations like this.
514
+
515
+ 130
516
+ 00:08:16,270 --> 00:08:20,320
517
+ And look we have it back up to 91 percent 90 percent.
518
+
519
+ 131
520
+ 00:08:20,320 --> 00:08:26,110
521
+ If you wait a few minutes sorry about 20 seconds at least here we can actually see what evaluation accuracy
522
+
523
+ 132
524
+ 00:08:26,110 --> 00:08:28,540
525
+ is at the end of the fifth book.
526
+
527
+ 133
528
+ 00:08:28,540 --> 00:08:39,640
529
+ So let's wait and see what it looks.
530
+
531
+ 134
532
+ 00:08:39,650 --> 00:08:46,360
533
+ One thing to note is that you can actually see our callback stopping callback actually telling us how
534
+
535
+ 135
536
+ 00:08:46,370 --> 00:08:52,340
537
+ validation loss did not improve did not improve if we left this for 20 bucks and we had actually it
538
+
539
+ 136
540
+ 00:08:52,340 --> 00:08:57,060
541
+ was here as well so basically no matter what this is going to be the last epoch because I'm pretty sure
542
+
543
+ 137
544
+ 00:08:57,060 --> 00:09:00,070
545
+ I said my patience to Tree Hill.
546
+
547
+ 138
548
+ 00:09:00,320 --> 00:09:00,770
549
+ Yep.
550
+
551
+ 139
552
+ 00:09:00,770 --> 00:09:02,360
553
+ I usually always do.
554
+
555
+ 140
556
+ 00:09:02,840 --> 00:09:07,250
557
+ So right now what it's doing had a reason why I stuck it two seconds even did two seconds would have
558
+
559
+ 141
560
+ 00:09:07,250 --> 00:09:13,220
561
+ passed by the time I started the sentence is that it's predicting on treating all validation data set.
562
+
563
+ 142
564
+ 00:09:13,220 --> 00:09:16,540
565
+ Now that's something that a lot of beginners don't know.
566
+
567
+ 143
568
+ 00:09:16,720 --> 00:09:19,620
569
+ Take the seat pause at the end of it like a note stuck.
570
+
571
+ 144
572
+ 00:09:19,760 --> 00:09:20,770
573
+ It isn't actually stuck.
574
+
575
+ 145
576
+ 00:09:20,780 --> 00:09:24,270
577
+ It's just waiting to run on the validation data set now.
578
+
579
+ 146
580
+ 00:09:24,380 --> 00:09:29,480
581
+ So it takes a little while to honestly because sometimes validation data sets are quite big.
582
+
583
+ 147
584
+ 00:09:29,800 --> 00:09:31,110
585
+ Ah there we go.
586
+
587
+ 148
588
+ 00:09:31,140 --> 00:09:32,240
589
+ So look at this.
590
+
591
+ 149
592
+ 00:09:32,410 --> 00:09:37,120
593
+ We got 93 percent accuracy in such a short space of time.
594
+
595
+ 150
596
+ 00:09:37,190 --> 00:09:38,300
597
+ So this is quite good.
598
+
599
+ 151
600
+ 00:09:38,360 --> 00:09:41,050
601
+ So no it's actually go back to this main page here.
602
+
603
+ 152
604
+ 00:09:41,450 --> 00:09:44,290
605
+ Let's look at our model which takes me about 10 seconds.
606
+
607
+ 153
608
+ 00:09:47,460 --> 00:09:52,570
609
+ And what are we going to do once this model is loaded We're going to basically use open C-v because
610
+
611
+ 154
612
+ 00:09:52,810 --> 00:09:53,310
613
+ messy.
614
+
615
+ 155
616
+ 00:09:53,340 --> 00:09:59,970
617
+ But of course I wrote quickly that loads the images here and it runs into predictive that we just loaded
618
+
619
+ 156
620
+ 00:10:00,180 --> 00:10:07,980
621
+ here and so on already and we're actually going to see the monkey class see how accurate a real ossify
622
+
623
+ 157
624
+ 00:10:07,980 --> 00:10:10,410
625
+ really is 90 percent accurate.
626
+
627
+ 158
628
+ 00:10:10,410 --> 00:10:12,280
629
+ So let's find out.
630
+
631
+ 159
632
+ 00:10:12,480 --> 00:10:13,390
633
+ There we go.
634
+
635
+ 160
636
+ 00:10:13,800 --> 00:10:14,560
637
+ So this is the truth.
638
+
639
+ 161
640
+ 00:10:14,560 --> 00:10:16,720
641
+ US battled.
642
+
643
+ 162
644
+ 00:10:17,080 --> 00:10:18,710
645
+ Yes that's like a Japanese monkey.
646
+
647
+ 163
648
+ 00:10:20,080 --> 00:10:22,850
649
+ OK so fiercely it got this one wrong.
650
+
651
+ 164
652
+ 00:10:23,120 --> 00:10:28,310
653
+ This is what Elmo predicted Whitehead had a cabbage in and no it was not a white hat.
654
+
655
+ 165
656
+ 00:10:28,810 --> 00:10:30,520
657
+ Let's see if he gets it right.
658
+
659
+ 166
660
+ 00:10:30,520 --> 00:10:31,300
661
+ Yeah it did.
662
+
663
+ 167
664
+ 00:10:31,330 --> 00:10:32,630
665
+ Got this one right.
666
+
667
+ 168
668
+ 00:10:32,710 --> 00:10:33,110
669
+ Pick me.
670
+
671
+ 169
672
+ 00:10:33,110 --> 00:10:34,020
673
+ I'm almost at.
674
+
675
+ 170
676
+ 00:10:34,270 --> 00:10:37,010
677
+ Let's see what the other is.
678
+
679
+ 171
680
+ 00:10:37,070 --> 00:10:39,000
681
+ Gary langar definitely.
682
+
683
+ 172
684
+ 00:10:39,020 --> 00:10:40,020
685
+ Right.
686
+
687
+ 173
688
+ 00:10:40,280 --> 00:10:41,590
689
+ Pygmy marmosets again.
690
+
691
+ 174
692
+ 00:10:41,660 --> 00:10:42,710
693
+ Got it right.
694
+
695
+ 175
696
+ 00:10:42,740 --> 00:10:44,090
697
+ Got it right.
698
+
699
+ 176
700
+ 00:10:44,090 --> 00:10:44,990
701
+ Got that right.
702
+
703
+ 177
704
+ 00:10:44,990 --> 00:10:46,210
705
+ Got it right.
706
+
707
+ 178
708
+ 00:10:46,550 --> 00:10:48,010
709
+ Got it right again.
710
+
711
+ 179
712
+ 00:10:48,560 --> 00:10:49,930
713
+ Got it right.
714
+
715
+ 180
716
+ 00:10:50,000 --> 00:10:51,350
717
+ So seems pretty good.
718
+
719
+ 181
720
+ 00:10:51,530 --> 00:10:55,230
721
+ So aside from the first one model got basically nine out of 10 right.
722
+
723
+ 182
724
+ 00:10:55,250 --> 00:10:58,550
725
+ Which kind of corresponds to 90 percent accuracy.
726
+
727
+ 183
728
+ 00:10:58,550 --> 00:10:59,560
729
+ We got here.
730
+
731
+ 184
732
+ 00:10:59,930 --> 00:11:07,100
733
+ So you've just learnt to create a model a basically a train model using transfer learning and you see
734
+
735
+ 185
736
+ 00:11:07,100 --> 00:11:07,850
737
+ how simple it is.
738
+
739
+ 186
740
+ 00:11:07,850 --> 00:11:15,770
741
+ You just basically linnets load it with the weight speed frozen and the top being not included.
742
+
743
+ 187
744
+ 00:11:15,770 --> 00:11:18,800
745
+ Then you build the function to add the top whatever top you want to add.
746
+
747
+ 188
748
+ 00:11:18,860 --> 00:11:24,770
749
+ I didn't hear all these make sure the Lasley is number of classes you have in your dataset.
750
+
751
+ 189
752
+ 00:11:24,860 --> 00:11:28,540
753
+ Then you basically compare concatenates and compile the bottles here.
754
+
755
+ 190
756
+ 00:11:29,690 --> 00:11:32,890
757
+ Well combinable I should say you do it your image under it.
758
+
759
+ 191
760
+ 00:11:32,900 --> 00:11:38,980
761
+ It does define your check points and callbacks compile and we go and train.
762
+
763
+ 192
764
+ 00:11:39,400 --> 00:11:42,880
765
+ So it's really very simple and I hope you find a disruptive quite useful.
766
+
767
+ 193
768
+ 00:11:43,060 --> 00:11:43,340
769
+ Thank you.
15. Transfer Learning Build a Flower & Monkey Breed Classifier/3.1 Download the Monkey Breed Dataset.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/1l-7wsAaDi89TpaFPjFW-oS8pdgSn_ekw/view?usp=sharing";</script>
15. Transfer Learning Build a Flower & Monkey Breed Classifier/4. Build a Flower Classifier with VGG16 using Transfer Learning.srt ADDED
@@ -0,0 +1,475 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,750 --> 00:00:06,480
3
+ Hi and welcome to Chapter 15 point tree where we're about to build a flow of classify and we're going
4
+
5
+ 2
6
+ 00:00:06,480 --> 00:00:08,580
7
+ to use transfer learning to do this.
8
+
9
+ 3
10
+ 00:00:08,580 --> 00:00:10,280
11
+ So let's take a look at how we actually.
12
+
13
+ 4
14
+ 00:00:10,370 --> 00:00:13,230
15
+ What is Flora ossify our flower data set.
16
+
17
+ 5
18
+ 00:00:13,230 --> 00:00:14,920
19
+ I should say so.
20
+
21
+ 6
22
+ 00:00:15,060 --> 00:00:20,150
23
+ It comes from the Oxford University's visual geometry group as called 17.
24
+
25
+ 7
26
+ 00:00:20,310 --> 00:00:26,270
27
+ And that's because there are 17 categories of flowers and their images in each class say the sets and
28
+
29
+ 8
30
+ 00:00:26,270 --> 00:00:27,200
31
+ not that much.
32
+
33
+ 9
34
+ 00:00:28,110 --> 00:00:33,900
35
+ So this is some sample images from the flow Josephite the flowers 17 there is that this is the web page
36
+
37
+ 10
38
+ 00:00:33,900 --> 00:00:34,940
39
+ from Oxford University.
40
+
41
+ 11
42
+ 00:00:34,950 --> 00:00:40,170
43
+ And this is the link you can go to if you want to download it from day itself or you can use that link
44
+
45
+ 12
46
+ 00:00:40,230 --> 00:00:44,580
47
+ I have on the left here on the demi site panel.
48
+
49
+ 13
50
+ 00:00:44,640 --> 00:00:49,440
51
+ Please use that link to actually download it because I've already preprocess the data into a format
52
+
53
+ 14
54
+ 00:00:49,470 --> 00:00:54,330
55
+ that is easily imported into Karris if you downloaded from Oxford University site you're going to have
56
+
57
+ 15
58
+ 00:00:54,330 --> 00:00:55,760
59
+ to do a pre-processing itself.
60
+
61
+ 16
62
+ 00:00:55,770 --> 00:01:00,630
63
+ And I don't think if you're a beginner you're not going to find that fun at all although it's a good
64
+
65
+ 17
66
+ 00:01:00,630 --> 00:01:02,740
67
+ exercise to do sometimes.
68
+
69
+ 18
70
+ 00:01:03,600 --> 00:01:08,790
71
+ So anyway our approach to this problem is that we're going to actually use a pre-trained Fiji A16 model
72
+
73
+ 19
74
+ 00:01:09,540 --> 00:01:14,490
75
+ with all of its way it's frozen except the top layer and we're only going to train the top ahead of
76
+
77
+ 20
78
+ 00:01:14,490 --> 00:01:17,490
79
+ the model with a final output of 17 classes.
80
+
81
+ 21
82
+ 00:01:17,490 --> 00:01:21,370
83
+ So let's go back to our I and notebook and get this done.
84
+
85
+ 22
86
+ 00:01:21,710 --> 00:01:22,100
87
+ OK.
88
+
89
+ 23
90
+ 00:01:22,140 --> 00:01:24,750
91
+ So welcome back to our virtual machine.
92
+
93
+ 24
94
+ 00:01:24,780 --> 00:01:28,820
95
+ I hope you downloaded the flowers dataset and extracted it to this folder here.
96
+
97
+ 25
98
+ 00:01:29,040 --> 00:01:34,170
99
+ That's this folder called transfer linning and financing and Plaisted right here so we can quickly just
100
+
101
+ 26
102
+ 00:01:34,170 --> 00:01:37,910
103
+ inspect it taking a look at some of those pictures.
104
+
105
+ 27
106
+ 00:01:38,330 --> 00:01:42,120
107
+ Let's put it on toenail view and it looks quite nice.
108
+
109
+ 28
110
+ 00:01:42,120 --> 00:01:46,280
111
+ So as you can see we don't have that many images in this data set.
112
+
113
+ 29
114
+ 00:01:46,380 --> 00:01:51,380
115
+ So let's see what kind of accuracy we can get without transfer learning on the Viji model.
116
+
117
+ 30
118
+ 00:01:51,390 --> 00:01:53,380
119
+ So let's go to it here.
120
+
121
+ 31
122
+ 00:01:53,790 --> 00:02:02,170
123
+ So no let me just close some of these windows open and let's quickly go back to this one here so you
124
+
125
+ 32
126
+ 00:02:02,170 --> 00:02:03,350
127
+ can actually see how I do it.
128
+
129
+ 33
130
+ 00:02:03,360 --> 00:02:05,080
131
+ It's 15.
132
+
133
+ 34
134
+ 00:02:05,080 --> 00:02:07,090
135
+ And we go to making a flower classifier.
136
+
137
+ 35
138
+ 00:02:07,210 --> 00:02:08,440
139
+ That's this file here.
140
+
141
+ 36
142
+ 00:02:08,830 --> 00:02:10,260
143
+ So now that we're in the file.
144
+
145
+ 37
146
+ 00:02:10,300 --> 00:02:11,800
147
+ Let's take a look at what's going on.
148
+
149
+ 38
150
+ 00:02:11,800 --> 00:02:15,770
151
+ So we import the BTG model that's easily done here.
152
+
153
+ 39
154
+ 00:02:16,120 --> 00:02:23,470
155
+ Viji was designed to work open 24 or 224 by 224 pixel image input's Isiah's.
156
+
157
+ 40
158
+ 00:02:23,500 --> 00:02:26,450
159
+ So let's keep the standard size and go forward.
160
+
161
+ 41
162
+ 00:02:26,530 --> 00:02:32,200
163
+ So let's load the model with out his weights or with the weights of image's nuts without the top layer.
164
+
165
+ 42
166
+ 00:02:32,410 --> 00:02:34,360
167
+ I should say so we do that.
168
+
169
+ 43
170
+ 00:02:34,420 --> 00:02:36,960
171
+ And let's just print out the layers in this model.
172
+
173
+ 44
174
+ 00:02:37,060 --> 00:02:37,560
175
+ OK.
176
+
177
+ 45
178
+ 00:02:37,930 --> 00:02:44,740
179
+ So as you can see default actually is loaded here and by default all the layers are trainable.
180
+
181
+ 46
182
+ 00:02:44,740 --> 00:02:52,370
183
+ True that means the default in of when you load EGD all the weights are trainable.
184
+
185
+ 47
186
+ 00:02:52,630 --> 00:02:55,090
187
+ So we now have to set this true to false.
188
+
189
+ 48
190
+ 00:02:55,090 --> 00:02:56,490
191
+ So that's what we do here.
192
+
193
+ 49
194
+ 00:02:56,860 --> 00:03:03,010
195
+ So we loaded with our top head with Image net weights and we set all the treatable as we said this flag
196
+
197
+ 50
198
+ 00:03:03,090 --> 00:03:04,210
199
+ to false.
200
+
201
+ 51
202
+ 00:03:04,270 --> 00:03:08,030
203
+ So let's do this quickly and that's done there.
204
+
205
+ 52
206
+ 00:03:08,520 --> 00:03:13,450
207
+ And now let's create the function where we add a fully connected head.
208
+
209
+ 53
210
+ 00:03:13,510 --> 00:03:17,960
211
+ This is where we delay as we add now back to the top of our Viji that network.
212
+
213
+ 54
214
+ 00:03:18,190 --> 00:03:24,340
215
+ Notice this is different to the layers we added in the mobile network and that's because PDG has a different
216
+
217
+ 55
218
+ 00:03:24,340 --> 00:03:26,000
219
+ design to mobile and that.
220
+
221
+ 56
222
+ 00:03:26,020 --> 00:03:30,190
223
+ So you're going to have to look at the final design BTG and replace easily as here.
224
+
225
+ 57
226
+ 00:03:30,340 --> 00:03:35,700
227
+ And this here this densely a number of densely as dense units here.
228
+
229
+ 58
230
+ 00:03:36,190 --> 00:03:38,440
231
+ By default we are going to use 256.
232
+
233
+ 59
234
+ 00:03:38,440 --> 00:03:47,550
235
+ However this function allows us to specify it in here we can add 128 and it would be 128 units here.
236
+
237
+ 60
238
+ 00:03:47,890 --> 00:03:50,480
239
+ So let's leave the default right.
240
+
241
+ 61
242
+ 00:03:50,500 --> 00:03:57,220
243
+ And then you said drop out who said these things we input a number of classes which is 17 from the flow
244
+
245
+ 62
246
+ 00:03:57,220 --> 00:04:01,450
247
+ was data set 17 17 Sivam should make sense you know.
248
+
249
+ 63
250
+ 00:04:01,780 --> 00:04:04,730
251
+ And we just concatenated models here.
252
+
253
+ 64
254
+ 00:04:05,110 --> 00:04:08,800
255
+ Well the parts of the model to get the full model and then printed out.
256
+
257
+ 65
258
+ 00:04:08,800 --> 00:04:13,690
259
+ So let's take a look at and we see there 14 million parameters.
260
+
261
+ 66
262
+ 00:04:13,880 --> 00:04:18,150
263
+ It's less than between 19 and 16 sorry BTD 19.
264
+
265
+ 67
266
+ 00:04:18,440 --> 00:04:23,180
267
+ And with treatable parameters only 135 tells him that's quite good.
268
+
269
+ 68
270
+ 00:04:23,720 --> 00:04:25,060
271
+ So let me just run this.
272
+
273
+ 69
274
+ 00:04:25,130 --> 00:04:33,150
275
+ So we have fresh and no we just do it data generators here to deflower validation and Floetry unfold
276
+
277
+ 70
278
+ 00:04:33,250 --> 00:04:35,290
279
+ as we said our size.
280
+
281
+ 71
282
+ 00:04:35,320 --> 00:04:38,210
283
+ We can go actually just keep it at 16.
284
+
285
+ 72
286
+ 00:04:38,490 --> 00:04:38,910
287
+ All right.
288
+
289
+ 73
290
+ 00:04:38,950 --> 00:04:43,140
291
+ And keep going here.
292
+
293
+ 74
294
+ 00:04:43,260 --> 00:04:49,500
295
+ So now we declare all callbacks right here and we just create we create a callback array which we pass
296
+
297
+ 75
298
+ 00:04:49,500 --> 00:04:51,740
299
+ in here and let's run this now.
300
+
301
+ 76
302
+ 00:04:51,850 --> 00:04:55,430
303
+ So I to leave you to run this over and run this already.
304
+
305
+ 77
306
+ 00:04:55,450 --> 00:04:56,800
307
+ And it takes quite some time.
308
+
309
+ 78
310
+ 00:04:57,040 --> 00:05:01,540
311
+ But what I want you to observe is look at the validation accuracy in 25 books.
312
+
313
+ 79
314
+ 00:05:01,540 --> 00:05:06,230
315
+ The highest we get was actually 95 percent which is quite good.
316
+
317
+ 80
318
+ 00:05:06,820 --> 00:05:11,500
319
+ So you keep going see did it ever pass 95 tree at one time.
320
+
321
+ 81
322
+ 00:05:11,560 --> 00:05:12,990
323
+ So this is quite good.
324
+
325
+ 82
326
+ 00:05:13,240 --> 00:05:19,370
327
+ So we've got 95 percent accuracy using transfer linning using Viji 16 in translating.
328
+
329
+ 83
330
+ 00:05:19,630 --> 00:05:22,710
331
+ So let's keep going let's see what else we can do.
332
+
333
+ 84
334
+ 00:05:22,750 --> 00:05:24,080
335
+ OK.
336
+
337
+ 85
338
+ 00:05:24,430 --> 00:05:26,020
339
+ So this section here.
340
+
341
+ 86
342
+ 00:05:26,020 --> 00:05:27,620
343
+ Can we speed this up.
344
+
345
+ 87
346
+ 00:05:27,730 --> 00:05:31,060
347
+ So let's try resizing the images to 64 by 64.
348
+
349
+ 88
350
+ 00:05:31,200 --> 00:05:34,820
351
+ You remember it was assigned to a can 224 224.
352
+
353
+ 89
354
+ 00:05:34,910 --> 00:05:37,660
355
+ Now let's do this to 64.
356
+
357
+ 90
358
+ 00:05:37,930 --> 00:05:44,100
359
+ So let's use this comment to setting the input size.
360
+
361
+ 91
362
+ 00:05:44,100 --> 00:05:49,660
363
+ Now to.
364
+
365
+ 92
366
+ 00:05:49,780 --> 00:05:55,670
367
+ All right and do the standard thing where we load with image that way it's we don't include the top
368
+
369
+ 93
370
+ 00:05:55,780 --> 00:06:01,810
371
+ specified in U shape and we make the last train with three syllables.
372
+
373
+ 94
374
+ 00:06:02,190 --> 00:06:04,040
375
+ So that's good.
376
+
377
+ 95
378
+ 00:06:04,050 --> 00:06:07,050
379
+ And now let's move on to this.
380
+
381
+ 96
382
+ 00:06:07,460 --> 00:06:13,330
383
+ Let us actually start treating the small so as we can see this model has a different input sites.
384
+
385
+ 97
386
+ 00:06:14,180 --> 00:06:16,010
387
+ And let's see what we get.
388
+
389
+ 98
390
+ 00:06:16,010 --> 00:06:18,940
391
+ So I've trained this before so you don't have to do it.
392
+
393
+ 99
394
+ 00:06:18,950 --> 00:06:26,180
395
+ So what I want you to see though is that what what's happened here previously before actually did not
396
+
397
+ 100
398
+ 00:06:26,180 --> 00:06:30,130
399
+ used the callbacks or that's it in view but I should have thought it and I.
400
+
401
+ 101
402
+ 00:06:30,410 --> 00:06:32,490
403
+ But what I've done now is a discipline we do.
404
+
405
+ 102
406
+ 00:06:32,540 --> 00:06:41,660
407
+ So we see some callbacks feedback from stopping so we see it's not increasing monitoring patients is
408
+
409
+ 103
410
+ 00:06:41,660 --> 00:06:42,310
411
+ good.
412
+
413
+ 104
414
+ 00:06:42,320 --> 00:06:45,740
415
+ So at the end Epopt 12 is what we use.
416
+
417
+ 105
418
+ 00:06:45,770 --> 00:06:49,530
419
+ So let's go back to Iraq 12 pastorate ago.
420
+
421
+ 106
422
+ 00:06:49,920 --> 00:06:53,210
423
+ That's this one 82 percent.
424
+
425
+ 107
426
+ 00:06:53,230 --> 00:06:58,340
427
+ So 82 percent was our best loess validation loss and our best accuracy.
428
+
429
+ 108
430
+ 00:06:58,340 --> 00:07:06,500
431
+ So you can see by resizing the images a 64 by 64 which is a substantial decrease in size 2 to 24 by
432
+
433
+ 109
434
+ 00:07:06,500 --> 00:07:09,580
435
+ 224 we got it into possessory.
436
+
437
+ 110
438
+ 00:07:09,860 --> 00:07:10,860
439
+ How much was it again.
440
+
441
+ 111
442
+ 00:07:11,520 --> 00:07:11,850
443
+ Sorry.
444
+
445
+ 112
446
+ 00:07:11,950 --> 00:07:13,930
447
+ 82 percent accuracy.
448
+
449
+ 113
450
+ 00:07:14,060 --> 00:07:20,570
451
+ So that's not too bad to be fair actually sorry 86 percent accuracy we got that was fifteen point five
452
+
453
+ 114
454
+ 00:07:20,570 --> 00:07:22,150
455
+ six five two.
456
+
457
+ 115
458
+ 00:07:22,370 --> 00:07:22,730
459
+ Right.
460
+
461
+ 116
462
+ 00:07:22,730 --> 00:07:24,540
463
+ So that is actually this one.
464
+
465
+ 117
466
+ 00:07:25,010 --> 00:07:26,140
467
+ So yep.
468
+
469
+ 118
470
+ 00:07:26,150 --> 00:07:27,620
471
+ So this is good.
472
+
473
+ 119
474
+ 00:07:27,710 --> 00:07:29,960
475
+ It's not great but is pretty good.
15. Transfer Learning Build a Flower & Monkey Breed Classifier/4.1 Download the 17-Flowers Dataset.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/16KBCSvjMSCJSdcrvcws3g-bk9Ov9JrFS/view?usp=sharing";</script>
16. Design Your Own CNN - LittleVGG A Simpsons Classifier/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,780 --> 00:00:08,490
3
+ Hi and welcome to Chapter 16 where we get to design or customize CNN one week we're going to call little
4
+
5
+ 2
6
+ 00:00:08,490 --> 00:00:12,360
7
+ Viji.
8
+
9
+ 3
10
+ 00:00:12,420 --> 00:00:17,270
11
+ So in this section we've introduced the concepts of how we developed the tool VDB.
12
+
13
+ 4
14
+ 00:00:17,520 --> 00:00:21,630
15
+ And then in sixteen point two we're actually going to use little Viji to do some Simpsons character
16
+
17
+ 5
18
+ 00:00:21,630 --> 00:00:22,720
19
+ recognition.
20
+
21
+ 6
22
+ 00:00:22,740 --> 00:00:24,630
23
+ So I hope you're looking forward to getting started.
24
+
25
+ 7
26
+ 00:00:24,750 --> 00:00:25,850
27
+ Let's get into it.
16. Design Your Own CNN - LittleVGG A Simpsons Classifier/2. Introducing LittleVGG.srt ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,830 --> 00:00:08,220
3
+ And welcome to chapter sixteen point one where I introduce little BTG and customize Fiji network.
4
+
5
+ 2
6
+ 00:00:08,460 --> 00:00:15,220
7
+ So BTG little BTG is basically a dumb size version of EGD 16 or 19.
8
+
9
+ 3
10
+ 00:00:15,510 --> 00:00:21,320
11
+ And remember Viji inspired networks all use a series of Treffry convolutional Lia's where a number of
12
+
13
+ 4
14
+ 00:00:21,320 --> 00:00:25,160
15
+ filters just increase as you go for it and fit it into the network.
16
+
17
+ 5
18
+ 00:00:25,160 --> 00:00:29,770
19
+ So let's take a look at a little BTG network dagga on network architecture.
20
+
21
+ 6
22
+ 00:00:30,080 --> 00:00:33,550
23
+ So this was Viji 19 and 16 here.
24
+
25
+ 7
26
+ 00:00:33,860 --> 00:00:36,490
27
+ So what I've done Viji BTD.
28
+
29
+ 8
30
+ 00:00:36,620 --> 00:00:38,640
31
+ You can call them has 9 Wheatly.
32
+
33
+ 9
34
+ 00:00:38,690 --> 00:00:41,900
35
+ And basically this is how it lines up compared to this one here.
36
+
37
+ 10
38
+ 00:00:41,900 --> 00:00:42,480
39
+ All right.
40
+
41
+ 11
42
+ 00:00:42,650 --> 00:00:50,460
43
+ So we have it first compositionally is here with 64 filter's Max spieling then our Ottilia here is convolutional
44
+
45
+ 12
46
+ 00:00:50,460 --> 00:00:52,700
47
+ Lia's with 128 filters.
48
+
49
+ 13
50
+ 00:00:53,000 --> 00:00:54,070
51
+ And then this one here.
52
+
53
+ 14
54
+ 00:00:54,080 --> 00:00:56,400
55
+ However we don't have as much we just have to.
56
+
57
+ 15
58
+ 00:00:56,840 --> 00:01:00,070
59
+ And then we have Max beling again and then our fully connectedly.
60
+
61
+ 16
62
+ 00:01:00,080 --> 00:01:01,180
63
+ So we stop here.
64
+
65
+ 17
66
+ 00:01:01,430 --> 00:01:06,760
67
+ We don't go on to do as deeply as here like even Viji 11 does.
68
+
69
+ 18
70
+ 00:01:06,830 --> 00:01:14,090
71
+ So we just stop at it 56 count one count of this last convolutional filter and then we get straight
72
+
73
+ 19
74
+ 00:01:14,090 --> 00:01:16,200
75
+ into the FC leads.
76
+
77
+ 20
78
+ 00:01:16,490 --> 00:01:18,570
79
+ So this is a number of parameters here.
80
+
81
+ 21
82
+ 00:01:18,740 --> 00:01:25,810
83
+ So let's build this in Chris and then we get to use this on the Simpsons joining on the Simpsons character
84
+
85
+ 22
86
+ 00:01:26,070 --> 00:01:26,400
87
+ set.
16. Design Your Own CNN - LittleVGG A Simpsons Classifier/3. Simpsons Character Recognition using LittleVGG.srt ADDED
@@ -0,0 +1,583 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,630 --> 00:00:06,260
3
+ So in Section sixteen point two we're going to use little Viji on our Simpsons character.
4
+
5
+ 2
6
+ 00:00:06,260 --> 00:00:09,180
7
+ They are set to do some Simpsons character recognition.
8
+
9
+ 3
10
+ 00:00:09,180 --> 00:00:10,560
11
+ This is going to be pretty cool.
12
+
13
+ 4
14
+ 00:00:10,560 --> 00:00:15,720
15
+ So this latest ad was sourced from Kaggle Here's a link to it and it basically it actually came with
16
+
17
+ 5
18
+ 00:00:15,720 --> 00:00:21,060
19
+ more classes and this however limited it to 20 classes of the most popular characters and there are
20
+
21
+ 6
22
+ 00:00:21,060 --> 00:00:24,990
23
+ about 200 to 400 R.G. be pitches in it each.
24
+
25
+ 7
26
+ 00:00:24,990 --> 00:00:31,530
27
+ All of which are different sizes and orientations and aspect ratios and images of yes as I just said
28
+
29
+ 8
30
+ 00:00:31,530 --> 00:00:31,770
31
+ that.
32
+
33
+ 9
34
+ 00:00:31,780 --> 00:00:38,250
35
+ So that's fine but it's good to know that even though some images have multiple characters some of the
36
+
37
+ 10
38
+ 00:00:38,250 --> 00:00:41,420
39
+ images are basically the main characters focus of it.
40
+
41
+ 11
42
+ 00:00:41,610 --> 00:00:45,030
43
+ So let's take a look at some of the images here.
44
+
45
+ 12
46
+ 00:00:45,360 --> 00:00:47,100
47
+ So this is what I mean by the way.
48
+
49
+ 13
50
+ 00:00:47,100 --> 00:00:49,440
51
+ Like Lisa is definitely the main character here.
52
+
53
+ 14
54
+ 00:00:49,440 --> 00:00:51,150
55
+ However they are the characters here.
56
+
57
+ 15
58
+ 00:00:51,360 --> 00:00:53,600
59
+ So that may confuse overclassify.
60
+
61
+ 16
62
+ 00:00:53,730 --> 00:00:55,140
63
+ But let's see how it performs.
64
+
65
+ 17
66
+ 00:00:57,450 --> 00:00:57,860
67
+ OK.
68
+
69
+ 18
70
+ 00:00:57,880 --> 00:01:03,090
71
+ So we're in a virtual machine with Python Simpson's notebook open.
72
+
73
+ 19
74
+ 00:01:03,100 --> 00:01:08,420
75
+ Let me just direct you to how we got into here in case you're confused says Chapter 16 here.
76
+
77
+ 20
78
+ 00:01:08,530 --> 00:01:15,280
79
+ I knew it when CNN called it Viji and I hope you downloaded the data set in the resources section of
80
+
81
+ 21
82
+ 00:01:15,280 --> 00:01:18,650
83
+ this file of the section of this chapter and place it right here.
84
+
85
+ 22
86
+ 00:01:18,700 --> 00:01:21,770
87
+ And again just always check to make sure the images are here.
88
+
89
+ 23
90
+ 00:01:21,880 --> 00:01:27,610
91
+ So we have characters or classes and basically let's go back to this file.
92
+
93
+ 24
94
+ 00:01:27,610 --> 00:01:28,400
95
+ All right.
96
+
97
+ 25
98
+ 00:01:28,540 --> 00:01:30,370
99
+ So let's have an open here.
100
+
101
+ 26
102
+ 00:01:30,530 --> 00:01:34,520
103
+ So now basically you've seen the standard way we do things right.
104
+
105
+ 27
106
+ 00:01:34,600 --> 00:01:39,580
107
+ What I'm going to tell you about here though is that we're resizing the images to the two by two the
108
+
109
+ 28
110
+ 00:01:39,580 --> 00:01:41,240
111
+ two pixels are quite small.
112
+
113
+ 29
114
+ 00:01:41,320 --> 00:01:42,410
115
+ So let's see how it performs.
116
+
117
+ 30
118
+ 00:01:42,430 --> 00:01:43,270
119
+ OK.
120
+
121
+ 31
122
+ 00:01:43,270 --> 00:01:50,140
123
+ So we're doing some data augmentation the same usual thing as we do rotations with shifting horizontal
124
+
125
+ 32
126
+ 00:01:50,140 --> 00:01:52,410
127
+ flipping so we can get to characters.
128
+
129
+ 33
130
+ 00:01:52,420 --> 00:01:54,110
131
+ Adds a variety to the characters.
132
+
133
+ 34
134
+ 00:01:54,520 --> 00:01:59,490
135
+ And then we declare our generators here and then this is the part I want to show you.
136
+
137
+ 35
138
+ 00:01:59,740 --> 00:02:02,350
139
+ This is where we build our little Viji model.
140
+
141
+ 36
142
+ 00:02:02,350 --> 00:02:03,040
143
+ All right.
144
+
145
+ 37
146
+ 00:02:03,040 --> 00:02:09,020
147
+ So I believe you remember from diagram should units lights that this is how it's defined.
148
+
149
+ 38
150
+ 00:02:09,040 --> 00:02:11,240
151
+ So we have to convolutional is here.
152
+
153
+ 39
154
+ 00:02:11,550 --> 00:02:18,090
155
+ Sixty four filters each and all the filters all the kernel sizes here are tree by tree.
156
+
157
+ 40
158
+ 00:02:18,130 --> 00:02:20,640
159
+ This is typical of Viji family model.
160
+
161
+ 41
162
+ 00:02:20,790 --> 00:02:21,730
163
+ All right.
164
+
165
+ 42
166
+ 00:02:21,970 --> 00:02:25,720
167
+ So we have these convolutional is here then itude set.
168
+
169
+ 43
170
+ 00:02:25,720 --> 00:02:28,940
171
+ Second second set here and then another set here.
172
+
173
+ 44
174
+ 00:02:28,960 --> 00:02:37,380
175
+ These are the ones with 256 filters and then we have a final F.C. density is here with some drop out.
176
+
177
+ 45
178
+ 00:02:37,420 --> 00:02:41,270
179
+ We have two sets of dense dense connections here.
180
+
181
+ 46
182
+ 00:02:41,890 --> 00:02:48,530
183
+ And then we finally go to 0 number Asaf Max classify with number of classes which is 20 which we defined
184
+
185
+ 47
186
+ 00:02:48,650 --> 00:02:49,350
187
+ up here.
188
+
189
+ 48
190
+ 00:02:50,890 --> 00:02:52,330
191
+ So let's run this actually.
192
+
193
+ 49
194
+ 00:02:52,330 --> 00:02:55,790
195
+ Let me run the first one.
196
+
197
+ 50
198
+ 00:02:55,950 --> 00:02:56,680
199
+ There we go.
200
+
201
+ 51
202
+ 00:02:57,450 --> 00:02:59,410
203
+ And then let's run this one here.
204
+
205
+ 52
206
+ 00:03:00,870 --> 00:03:01,250
207
+ So good.
208
+
209
+ 53
210
+ 00:03:01,260 --> 00:03:04,940
211
+ So we just display our models quite smoothly as here.
212
+
213
+ 54
214
+ 00:03:05,100 --> 00:03:09,620
215
+ But luckily this is a substantially smaller model compared to Visagie 16.
216
+
217
+ 55
218
+ 00:03:09,640 --> 00:03:12,090
219
+ Is only 2.2 million parameters.
220
+
221
+ 56
222
+ 00:03:12,390 --> 00:03:15,510
223
+ And if you want to take a look at our model we can plot it.
224
+
225
+ 57
226
+ 00:03:15,570 --> 00:03:16,320
227
+ Remember how we did it.
228
+
229
+ 58
230
+ 00:03:16,320 --> 00:03:22,830
231
+ Ilya Ylia chapters so this is a visualization of one model shows you the inputs and outputs of every
232
+
233
+ 59
234
+ 00:03:22,850 --> 00:03:25,090
235
+ layer.
236
+
237
+ 60
238
+ 00:03:25,110 --> 00:03:31,590
239
+ It's quite long but imagine how long each doing 19 would be okay.
240
+
241
+ 61
242
+ 00:03:31,720 --> 00:03:35,620
243
+ So this is where I should put some notes in for you guys.
244
+
245
+ 62
246
+ 00:03:36,910 --> 00:03:44,160
247
+ Training our little G.G. model right.
248
+
249
+ 63
250
+ 00:03:44,200 --> 00:03:48,300
251
+ So we have all callbacks here seem typical call that we've used before.
252
+
253
+ 64
254
+ 00:03:48,430 --> 00:03:55,330
255
+ Checkpointing really stopping and leading rate adjustments on Pluto and a number of samples that we've
256
+
257
+ 65
258
+ 00:03:55,330 --> 00:03:58,510
259
+ gotten from the generators before actually did.
260
+
261
+ 66
262
+ 00:03:58,810 --> 00:04:01,610
263
+ And we're going to train transfer at least to any box you can dream from.
264
+
265
+ 67
266
+ 00:04:01,990 --> 00:04:07,840
267
+ Always recommend trying for more but if time is of the essence and this is more of a practical educational
268
+
269
+ 68
270
+ 00:04:07,840 --> 00:04:11,840
271
+ exercise as opposed to us trying to get the best performance out of these models.
272
+
273
+ 69
274
+ 00:04:12,070 --> 00:04:15,650
275
+ So what we're doing here we're just testing a little VDU model.
276
+
277
+ 70
278
+ 00:04:15,930 --> 00:04:20,080
279
+ It's just one here just to see how he performs on our data set.
280
+
281
+ 71
282
+ 00:04:20,080 --> 00:04:23,200
283
+ So I've run this for 10 bucks as I said.
284
+
285
+ 72
286
+ 00:04:23,470 --> 00:04:25,260
287
+ And let's see how it performs.
288
+
289
+ 73
290
+ 00:04:25,330 --> 00:04:28,190
291
+ Fifteen percent on validation accuracy.
292
+
293
+ 74
294
+ 00:04:28,240 --> 00:04:29,100
295
+ Not great.
296
+
297
+ 75
298
+ 00:04:29,260 --> 00:04:29,740
299
+ OK.
300
+
301
+ 76
302
+ 00:04:29,950 --> 00:04:31,040
303
+ And again not great.
304
+
305
+ 77
306
+ 00:04:31,040 --> 00:04:37,720
307
+ Are there any accuracy to that as we train steadily improves the good to this column here on the training
308
+
309
+ 78
310
+ 00:04:37,750 --> 00:04:40,920
311
+ accuracy keeps going up and up.
312
+
313
+ 79
314
+ 00:04:41,200 --> 00:04:43,680
315
+ And also a good validation accuracy.
316
+
317
+ 80
318
+ 00:04:43,780 --> 00:04:45,300
319
+ It also keeps going up and up.
320
+
321
+ 81
322
+ 00:04:45,340 --> 00:04:52,310
323
+ So I'm pretty sure if I left the trim maybe 50 bucks it could have gone 90s into percent accuracy.
324
+
325
+ 82
326
+ 00:04:52,330 --> 00:04:53,620
327
+ So this is good to know.
328
+
329
+ 83
330
+ 00:04:53,830 --> 00:05:00,460
331
+ So we do see the flexibility and power of a simple Viji model that honestly doesn't take that long to
332
+
333
+ 84
334
+ 00:05:00,460 --> 00:05:06,460
335
+ train 500 something seconds says no to that pre-book.
336
+
337
+ 85
338
+ 00:05:06,630 --> 00:05:13,610
339
+ All right so this is let's look at the performance of this model so let me just title this section Vigurs
340
+
341
+ 86
342
+ 00:05:13,740 --> 00:05:15,660
343
+ to make this a bit cleaner.
344
+
345
+ 87
346
+ 00:05:21,130 --> 00:05:21,970
347
+ OK.
348
+
349
+ 88
350
+ 00:05:22,280 --> 00:05:25,680
351
+ My psyche is giving some of this use sticking about.
352
+
353
+ 89
354
+ 00:05:25,770 --> 00:05:28,380
355
+ But look at the confusion metrics here.
356
+
357
+ 90
358
+ 00:05:28,550 --> 00:05:33,890
359
+ So we can see a high number in the middle rows in a row here which is also good.
360
+
361
+ 91
362
+ 00:05:33,900 --> 00:05:40,440
363
+ Always good but we do see some issues here with some characters being basically basically misclassified.
364
+
365
+ 92
366
+ 00:05:40,760 --> 00:05:43,740
367
+ So we know this is a 77 percent accurate model.
368
+
369
+ 93
370
+ 00:05:43,850 --> 00:05:47,380
371
+ So we know it is going to have some issues but it's generally going to be quite good.
372
+
373
+ 94
374
+ 00:05:47,900 --> 00:05:56,500
375
+ So we can see which characters is performing poorly on and the 1 score to find that may be interesting.
376
+
377
+ 95
378
+ 00:05:56,900 --> 00:05:57,860
379
+ All right.
380
+
381
+ 96
382
+ 00:05:58,040 --> 00:06:00,140
383
+ And point one score for him.
384
+
385
+ 97
386
+ 00:06:00,410 --> 00:06:04,390
387
+ But a super high precision rates means a lot of false positives.
388
+
389
+ 98
390
+ 00:06:04,730 --> 00:06:05,080
391
+ Right.
392
+
393
+ 99
394
+ 00:06:05,150 --> 00:06:08,510
395
+ And no one else seems to be nearly that bad.
396
+
397
+ 100
398
+ 00:06:10,490 --> 00:06:15,130
399
+ Devon to see may be here being misclassified as Mo.
400
+
401
+ 101
402
+ 00:06:16,030 --> 00:06:17,290
403
+ That is interesting.
404
+
405
+ 102
406
+ 00:06:17,290 --> 00:06:17,990
407
+ All right.
408
+
409
+ 103
410
+ 00:06:18,220 --> 00:06:21,290
411
+ But generally we see a nice smooth diagonal trolled here.
412
+
413
+ 104
414
+ 00:06:21,310 --> 00:06:25,640
415
+ This is a much easier way to visualize this data.
416
+
417
+ 105
418
+ 00:06:25,690 --> 00:06:26,300
419
+ OK.
420
+
421
+ 106
422
+ 00:06:26,900 --> 00:06:30,670
423
+ So I'm going to look at our model and see if I'm a.
424
+
425
+ 107
426
+ 00:06:33,500 --> 00:06:35,110
427
+ Takes about 10 seconds.
428
+
429
+ 108
430
+ 00:06:35,180 --> 00:06:37,790
431
+ I hate malls for this reason.
432
+
433
+ 109
434
+ 00:06:37,790 --> 00:06:38,430
435
+ There we go.
436
+
437
+ 110
438
+ 00:06:38,690 --> 00:06:42,340
439
+ So this is a messy open see if you could create it.
440
+
441
+ 111
442
+ 00:06:42,530 --> 00:06:44,740
443
+ However you can spend some time going through it.
444
+
445
+ 112
446
+ 00:06:44,900 --> 00:06:48,440
447
+ It's not super difficult to understand and use similar code before.
448
+
449
+ 113
450
+ 00:06:48,620 --> 00:06:53,930
451
+ But basically we're just going to display right good display predicted over.
452
+
453
+ 114
454
+ 00:06:53,990 --> 00:06:54,550
455
+ True.
456
+
457
+ 115
458
+ 00:06:54,560 --> 00:06:56,730
459
+ So let's see how it performs.
460
+
461
+ 116
462
+ 00:06:56,840 --> 00:06:57,330
463
+ Good.
464
+
465
+ 117
466
+ 00:06:57,350 --> 00:07:01,850
467
+ This is Mulhouse is is not homo.
468
+
469
+ 118
470
+ 00:07:01,960 --> 00:07:04,760
471
+ This text is too big it can resize it after actually.
472
+
473
+ 119
474
+ 00:07:04,760 --> 00:07:07,850
475
+ Let me show you how to resize it as we're here.
476
+
477
+ 120
478
+ 00:07:07,890 --> 00:07:10,390
479
+ So as you see this is this function here.
480
+
481
+ 121
482
+ 00:07:10,380 --> 00:07:13,680
483
+ Draw test is where we actually drawing a text.
484
+
485
+ 122
486
+ 00:07:13,710 --> 00:07:21,750
487
+ So this is the font size so we can reduce the font size here and we see how it becomes so good.
488
+
489
+ 123
490
+ 00:07:21,750 --> 00:07:27,270
491
+ If you're wondering though what this did to here this is the Ticknor sort of boldness of the top of
492
+
493
+ 124
494
+ 00:07:27,270 --> 00:07:28,030
495
+ the font.
496
+
497
+ 125
498
+ 00:07:28,530 --> 00:07:31,050
499
+ And there are a number of forms we can use an open C.v.
500
+
501
+ 126
502
+ 00:07:31,110 --> 00:07:33,540
503
+ This is one of the nice looking ones in my opinion.
504
+
505
+ 127
506
+ 00:07:33,660 --> 00:07:36,430
507
+ So let's take a look at testify results.
508
+
509
+ 128
510
+ 00:07:36,450 --> 00:07:43,530
511
+ This is Charles Montgomery Burns and it is in fact Charles I mean it's a poor very good dresser declawing
512
+
513
+ 129
514
+ 00:07:43,620 --> 00:07:46,710
515
+ should be easy to spot given his green hair.
516
+
517
+ 130
518
+ 00:07:46,710 --> 00:07:49,140
519
+ Very good side show Bob pretty good.
520
+
521
+ 131
522
+ 00:07:49,200 --> 00:07:51,500
523
+ Kristi again Mo.
524
+
525
+ 132
526
+ 00:07:52,530 --> 00:07:54,140
527
+ This is clearly not principles.
528
+
529
+ 133
530
+ 00:07:54,210 --> 00:07:59,440
531
+ This is Lisa however because she's wearing a cap that's probably why it got confuse.
532
+
533
+ 134
534
+ 00:07:59,490 --> 00:08:03,900
535
+ Although the captor's is not similar to what Principal is going to wear.
536
+
537
+ 135
538
+ 00:08:03,900 --> 00:08:10,290
539
+ So I'm not sure why or classify as well that this would be a good time to actually visualize how classifiers
540
+
541
+ 136
542
+ 00:08:10,290 --> 00:08:13,360
543
+ perceived the character's pool.
544
+
545
+ 137
546
+ 00:08:13,390 --> 00:08:18,400
547
+ Again at Ibraham or grandpa.
548
+
549
+ 138
550
+ 00:08:18,950 --> 00:08:19,600
551
+ That's pretty good.
552
+
553
+ 139
554
+ 00:08:19,690 --> 00:08:20,840
555
+ So for classified.
556
+
557
+ 140
558
+ 00:08:20,860 --> 00:08:25,680
559
+ Seventy seven percent accurate it actually performed fairly well in our test data.
560
+
561
+ 141
562
+ 00:08:26,170 --> 00:08:26,950
563
+ OK.
564
+
565
+ 142
566
+ 00:08:27,050 --> 00:08:30,100
567
+ So I hope you had some fun playing with little Ujiji.
568
+
569
+ 143
570
+ 00:08:30,140 --> 00:08:35,990
571
+ It's a very good model in my opinion and you can use Adeptus do a number of your applications if you
572
+
573
+ 144
574
+ 00:08:35,990 --> 00:08:36,750
575
+ want.
576
+
577
+ 145
578
+ 00:08:37,180 --> 00:08:37,550
579
+ OK.
580
+
581
+ 146
582
+ 00:08:37,760 --> 00:08:38,060
583
+ Thank you.
16. Design Your Own CNN - LittleVGG A Simpsons Classifier/3.1 Download Simpsons Dataset.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/1GmS93M5h5CHmQWKzMtXdAZmNW2jmxo01/view?usp=sharing";</script>
16. Design Your Own CNN - LittleVGG/16.2 LittleVGG - Simpsons.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
17. Advanced Activation Functions & Initializations/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,750 --> 00:00:07,020
3
+ Hi and welcome to Chapter 17 where talk about advance activation functions and initializations those
4
+
5
+ 2
6
+ 00:00:07,020 --> 00:00:12,830
7
+ are some features that Kurus allows us to configure when tuning or creating our CNN's.
8
+
9
+ 3
10
+ 00:00:12,840 --> 00:00:18,720
11
+ So before we begin I'm going to talk about a dying real problem and introduce to you to why we need
12
+
13
+ 4
14
+ 00:00:18,730 --> 00:00:21,510
15
+ liquorish real elu and previews as well.
16
+
17
+ 5
18
+ 00:00:21,840 --> 00:00:24,570
19
+ And then I talk about advance initializations.
20
+
21
+ 6
22
+ 00:00:24,850 --> 00:00:25,150
23
+ OK.
24
+
25
+ 7
26
+ 00:00:25,170 --> 00:00:26,400
27
+ So let's get started.
17. Advanced Activation Functions & Initializations/2. Dying ReLU Problem and Introduction to Leaky ReLU, ELU and PReLUs.srt ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,560 --> 00:00:07,040
3
+ Hi and welcome to chapter seventeen point one we are introduced to introduce to you some advance activation
4
+
5
+ 2
6
+ 00:00:07,040 --> 00:00:07,780
7
+ functions.
8
+
9
+ 3
10
+ 00:00:09,870 --> 00:00:17,100
11
+ So if you remember from earlier slides activation functions are what introduces the non linear linearity
12
+
13
+ 4
14
+ 00:00:17,490 --> 00:00:20,850
15
+ that provide neural nets with incredible performance.
16
+
17
+ 5
18
+ 00:00:20,850 --> 00:00:26,940
19
+ However what we've said before really is basically the activation unit of choice for all with CNN's.
20
+
21
+ 6
22
+ 00:00:27,090 --> 00:00:33,860
23
+ And it basically left you at that and you probably will as usual if no one ever really isn't perfect.
24
+
25
+ 7
26
+ 00:00:34,020 --> 00:00:40,000
27
+ I'll tell you why there's a problem and it's called the dying real problem.
28
+
29
+ 8
30
+ 00:00:40,090 --> 00:00:45,830
31
+ So when training a number of real units can often die during training.
32
+
33
+ 9
34
+ 00:00:45,900 --> 00:00:47,250
35
+ And what does that mean.
36
+
37
+ 10
38
+ 00:00:47,260 --> 00:00:49,240
39
+ This happens when a large gritty inflows.
40
+
41
+ 11
42
+ 00:00:49,240 --> 00:00:55,210
43
+ True that true that neuron which causes the weeds to update in such a way that the unit never activates
44
+
45
+ 12
46
+ 00:00:55,230 --> 00:00:55,790
47
+ anymore.
48
+
49
+ 13
50
+ 00:00:55,990 --> 00:01:01,220
51
+ So regardless of future input PERDIDA that really you basically is always going to be switched off.
52
+
53
+ 14
54
+ 00:01:01,240 --> 00:01:04,360
55
+ So we're going to have a part of the network that's effectively dead.
56
+
57
+ 15
58
+ 00:01:04,750 --> 00:01:10,620
59
+ So what happens is that exactly as it says here the output of this unit is always going to be zero.
60
+
61
+ 16
62
+ 00:01:10,810 --> 00:01:17,000
63
+ So now we have basically wasted units and a wasted amount of connections and that work.
64
+
65
+ 17
66
+ 00:01:17,020 --> 00:01:22,760
67
+ And apparently sometimes as much as 40 percent of them that can be dead because of these dying real
68
+
69
+ 18
70
+ 00:01:23,260 --> 00:01:24,020
71
+ units.
72
+
73
+ 19
74
+ 00:01:25,850 --> 00:01:28,250
75
+ So how do we fix a dying a real problem.
76
+
77
+ 20
78
+ 00:01:28,640 --> 00:01:35,240
79
+ OK so fiercely Let's take a look at the different types of real functions are about to discuss you know
80
+
81
+ 21
82
+ 00:01:35,240 --> 00:01:40,130
83
+ really it is a standard clamping basically at zero activation function.
84
+
85
+ 22
86
+ 00:01:40,190 --> 00:01:41,010
87
+ Right.
88
+
89
+ 23
90
+ 00:01:41,060 --> 00:01:43,380
91
+ Everything over zero is allowed to pass.
92
+
93
+ 24
94
+ 00:01:43,400 --> 00:01:46,250
95
+ Everything negative is basically clamped at zero.
96
+
97
+ 25
98
+ 00:01:46,610 --> 00:01:53,120
99
+ However what is leaky leaky real do has a small negative slope it's a green one here and basically it
100
+
101
+ 26
102
+ 00:01:53,120 --> 00:01:54,430
103
+ is a linear function.
104
+
105
+ 27
106
+ 00:01:54,590 --> 00:02:00,830
107
+ However it is a parameter in front of it that basically allows it to not grow that much.
108
+
109
+ 28
110
+ 00:02:00,830 --> 00:02:06,950
111
+ So it's not going to have basically a factor that basically limits how much it can how big how negative
112
+
113
+ 29
114
+ 00:02:06,950 --> 00:02:08,910
115
+ it can go.
116
+
117
+ 30
118
+ 00:02:08,960 --> 00:02:12,350
119
+ What about you know previous tons of parametric.
120
+
121
+ 31
122
+ 00:02:12,950 --> 00:02:16,880
123
+ And basically it is very similar to leaky Belu.
124
+
125
+ 32
126
+ 00:02:17,000 --> 00:02:23,810
127
+ However parametric pre-New or parametric reel basically has a lot of function and that a parameter that
128
+
129
+ 33
130
+ 00:02:23,810 --> 00:02:31,820
131
+ can control the steepness of the slope and Eliu which is exponential riu basically the negative portion
132
+
133
+ 34
134
+ 00:02:31,820 --> 00:02:35,830
135
+ of it follows an exponential curve with these parameters here.
136
+
137
+ 35
138
+ 00:02:36,250 --> 00:02:37,050
139
+ OK.
140
+
141
+ 36
142
+ 00:02:38,270 --> 00:02:42,660
143
+ So it tends to be a good mix of the good parts of really and leaky really.
144
+
145
+ 37
146
+ 00:02:42,860 --> 00:02:46,660
147
+ However it can saturate on large negative values as you can see here.
148
+
149
+ 38
150
+ 00:02:47,050 --> 00:02:47,460
151
+ OK.
152
+
153
+ 39
154
+ 00:02:51,000 --> 00:02:56,120
155
+ So there are some other exotic and exotic because they're not commonly used.
156
+
157
+ 40
158
+ 00:02:56,160 --> 00:03:03,600
159
+ Activision functions there's Kerry-Lugar which combines concatenates Alpert's of tubular functions one
160
+
161
+ 41
162
+ 00:03:03,600 --> 00:03:05,240
163
+ positive and one negative.
164
+
165
+ 42
166
+ 00:03:05,640 --> 00:03:07,050
167
+ Doubling the output value.
168
+
169
+ 43
170
+ 00:03:07,050 --> 00:03:12,920
171
+ Not entirely sure when you would use this but is a decent paper here you can read about it is also reduced
172
+
173
+ 44
174
+ 00:03:12,920 --> 00:03:13,600
175
+ 6.
176
+
177
+ 45
178
+ 00:03:13,680 --> 00:03:17,310
179
+ So basically really six is just capped at negative six.
180
+
181
+ 46
182
+ 00:03:17,360 --> 00:03:22,350
183
+ There is no special reason for selecting six other than it would be best for it is a four data set according
184
+
185
+ 47
186
+ 00:03:22,350 --> 00:03:23,490
187
+ to this paper here.
188
+
189
+ 48
190
+ 00:03:23,880 --> 00:03:26,730
191
+ And there are many others as well as Maxo at offset.
192
+
193
+ 49
194
+ 00:03:26,760 --> 00:03:27,820
195
+ Do you get the idea.
196
+
197
+ 50
198
+ 00:03:28,050 --> 00:03:34,700
199
+ However in practice I would suggest you use BQE real or relo.
200
+
201
+ 51
202
+ 00:03:34,880 --> 00:03:37,130
203
+ So when you use something other than a redo.
204
+
205
+ 52
206
+ 00:03:37,370 --> 00:03:42,370
207
+ So we've just seen some of the variations here but is no hard and fast rule and disciplining.
208
+
209
+ 53
210
+ 00:03:42,530 --> 00:03:46,340
211
+ That's why some people called deepening Wolfer art than science.
212
+
213
+ 54
214
+ 00:03:46,430 --> 00:03:49,770
215
+ That's kind of true because of the number of factors.
216
+
217
+ 55
218
+ 00:03:49,800 --> 00:03:54,210
219
+ And basically there's so many interchangeable and dependencies.
220
+
221
+ 56
222
+ 00:03:54,380 --> 00:03:55,670
223
+ How does your data look.
224
+
225
+ 57
226
+ 00:03:55,700 --> 00:03:57,670
227
+ What is it going to be used on.
228
+
229
+ 58
230
+ 00:03:57,720 --> 00:03:59,840
231
+ There's a lot of things to configure and network.
232
+
233
+ 59
234
+ 00:04:00,050 --> 00:04:01,880
235
+ That's why a lot of people say it's an art.
236
+
237
+ 60
238
+ 00:04:01,880 --> 00:04:07,270
239
+ However there are some general rules you can use that would generally get you good results.
240
+
241
+ 61
242
+ 00:04:07,390 --> 00:04:08,130
243
+ OK.
244
+
245
+ 62
246
+ 00:04:08,270 --> 00:04:15,800
247
+ So generally a good rule of thumb is to always use a real office and then you are just lending rates
248
+
249
+ 63
250
+ 00:04:15,800 --> 00:04:19,300
251
+ to get the best accuracy you can with your CNN analyst.
252
+
253
+ 64
254
+ 00:04:19,400 --> 00:04:19,880
255
+ OK.
256
+
257
+ 65
258
+ 00:04:20,300 --> 00:04:24,790
259
+ Once that's done then you can start experimenting with different real functions.
260
+
261
+ 66
262
+ 00:04:24,920 --> 00:04:30,930
263
+ You can go from leaky really to do as well as a nice progressive step you can take.
264
+
265
+ 67
266
+ 00:04:31,000 --> 00:04:35,200
267
+ However in most cases it can skip leakage really and go straight to Eliu.
268
+
269
+ 68
270
+ 00:04:35,670 --> 00:04:40,340
271
+ I always think others find I get better results with elu compared to real.
272
+
273
+ 69
274
+ 00:04:40,550 --> 00:04:46,800
275
+ So if you want to just try to get the best network and best accuracy as possible use Elu.
276
+
277
+ 70
278
+ 00:04:46,910 --> 00:04:47,340
279
+ OK.
17. Advanced Activation Functions & Initializations/3. Advanced Initializations.srt ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,510 --> 00:00:00,830
3
+ OK.
4
+
5
+ 2
6
+ 00:00:00,840 --> 00:00:07,580
7
+ So this brings us to Section seventeen point two where we talk about advance initializations.
8
+
9
+ 3
10
+ 00:00:07,700 --> 00:00:13,250
11
+ So something we sort of glossed over in discourse is that we all the sales starting weights are random
12
+
13
+ 4
14
+ 00:00:13,400 --> 00:00:16,440
15
+ or initials all the way it's but are it truly random.
16
+
17
+ 5
18
+ 00:00:16,490 --> 00:00:19,170
19
+ And that means are they pulled from a uniform distribution.
20
+
21
+ 6
22
+ 00:00:19,460 --> 00:00:21,240
23
+ Well kind of yes.
24
+
25
+ 7
26
+ 00:00:21,260 --> 00:00:23,040
27
+ That is the default.
28
+
29
+ 8
30
+ 00:00:23,060 --> 00:00:26,970
31
+ However there are a number of other initializations that Carrot's offers.
32
+
33
+ 9
34
+ 00:00:27,160 --> 00:00:28,420
35
+ So let's take a look.
36
+
37
+ 10
38
+ 00:00:28,550 --> 00:00:33,000
39
+ These are the many initialization functions Kurus can provide us with.
40
+
41
+ 11
42
+ 00:00:33,230 --> 00:00:34,780
43
+ And let's take a look at how they look.
44
+
45
+ 12
46
+ 00:00:34,790 --> 00:00:35,490
47
+ All right.
48
+
49
+ 13
50
+ 00:00:35,600 --> 00:00:43,720
51
+ So this is what uniform distributions one would look like random normal random normal uniform some claret
52
+
53
+ 14
54
+ 00:00:43,940 --> 00:00:48,270
55
+ garrote normal orthogonal identity glorify uniform.
56
+
57
+ 15
58
+ 00:00:48,410 --> 00:00:54,710
59
+ So you can see that there are some definite differences in these functions you can do a normal distribution
60
+
61
+ 16
62
+ 00:00:54,710 --> 00:00:56,380
63
+ style where it's more sent.
64
+
65
+ 17
66
+ 00:00:56,420 --> 00:01:01,160
67
+ The bulk of the initializations would be centered around this area 0 you can do some which just randomly
68
+
69
+ 18
70
+ 00:01:01,160 --> 00:01:05,720
71
+ between two point one to point zero one that sort of thing.
72
+
73
+ 19
74
+ 00:01:05,860 --> 00:01:06,650
75
+ OK.
76
+
77
+ 20
78
+ 00:01:07,970 --> 00:01:10,820
79
+ So we have tons of initializers.
80
+
81
+ 21
82
+ 00:01:10,820 --> 00:01:12,260
83
+ Which one do we use.
84
+
85
+ 22
86
+ 00:01:12,260 --> 00:01:17,690
87
+ So generally we always want to use a zero scented initialization within a small range example minus
88
+
89
+ 23
90
+ 00:01:17,690 --> 00:01:18,340
91
+ 1 to 1.
92
+
93
+ 24
94
+ 00:01:18,350 --> 00:01:19,490
95
+ Typically best.
96
+
97
+ 25
98
+ 00:01:19,730 --> 00:01:26,330
99
+ And this was recommended in Stanfords C-s to treat one costs can be division goes which is of course
100
+
101
+ 26
102
+ 00:01:26,330 --> 00:01:31,460
103
+ I highly recommend you take it's pretty theoretical but it goes into a lot of detail especially in the
104
+
105
+ 27
106
+ 00:01:31,460 --> 00:01:35,170
107
+ training part of CNN's and neural networks as well.
108
+
109
+ 28
110
+ 00:01:35,480 --> 00:01:41,810
111
+ So some other good choices for initialises h e normal works pretty well when you're using real activations
112
+
113
+ 29
114
+ 00:01:42,470 --> 00:01:44,630
115
+ Gourab normal works pretty well too.
116
+
117
+ 30
118
+ 00:01:44,840 --> 00:01:51,360
119
+ And glue rotini form which is the Karris is a default random initialiser if you're going back.
120
+
121
+ 31
122
+ 00:01:51,380 --> 00:01:52,560
123
+ That is what it is here.
124
+
125
+ 32
126
+ 00:01:54,610 --> 00:02:01,290
127
+ So most times just so you know the initial dose of choice you choose it doesn't really impact your accuracy
128
+
129
+ 33
130
+ 00:02:01,290 --> 00:02:02,530
131
+ as you get in the end.
132
+
133
+ 34
134
+ 00:02:02,670 --> 00:02:06,620
135
+ However it can definitely impact a number of ebox we take to get there.
136
+
137
+ 35
138
+ 00:02:06,900 --> 00:02:14,070
139
+ So it is something you can experiment with maybe after the fact if you if you think for some reason
140
+
141
+ 36
142
+ 00:02:14,070 --> 00:02:18,650
143
+ that you read in the research paper that if you're using maybe an inception style model or resin that's
144
+
145
+ 37
146
+ 00:02:18,660 --> 00:02:23,940
147
+ 50 model it works better with this type of initialises initialization then you change it.
148
+
149
+ 38
150
+ 00:02:23,940 --> 00:02:26,360
151
+ Otherwise I would just stick with the Cristi fault.
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.2 Building an Emotion Detector with LittleVGG.ipynb ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Using LittleVGG for Emotion Detection"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "### Training Emotion Detector"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": 8,
20
+ "metadata": {},
21
+ "outputs": [
22
+ {
23
+ "name": "stdout",
24
+ "output_type": "stream",
25
+ "text": [
26
+ "Found 28709 images belonging to 7 classes.\n",
27
+ "Found 3589 images belonging to 7 classes.\n"
28
+ ]
29
+ }
30
+ ],
31
+ "source": [
32
+ "from __future__ import print_function\n",
33
+ "import tensorflow as tf\n",
34
+ "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
35
+ "from tensorflow.keras.models import Sequential\n",
36
+ "from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization\n",
37
+ "from tensorflow.keras.layers import Conv2D, MaxPooling2D\n",
38
+ "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
39
+ "import os\n",
40
+ "\n",
41
+ "num_classes = 7\n",
42
+ "img_rows, img_cols = 48, 48\n",
43
+ "batch_size = 16\n",
44
+ "\n",
45
+ "train_data_dir = './fer2013/train'\n",
46
+ "validation_data_dir = './fer2013/validation'\n",
47
+ "\n",
48
+ "# Let's use some data augmentaiton \n",
49
+ "train_datagen = ImageDataGenerator(\n",
50
+ " rescale=1./255,\n",
51
+ " rotation_range=30,\n",
52
+ " shear_range=0.3,\n",
53
+ " zoom_range=0.3,\n",
54
+ " width_shift_range=0.4,\n",
55
+ " height_shift_range=0.4,\n",
56
+ " horizontal_flip=True,\n",
57
+ " fill_mode='nearest')\n",
58
+ " \n",
59
+ "validation_datagen = ImageDataGenerator(rescale=1./255)\n",
60
+ " \n",
61
+ "train_generator = train_datagen.flow_from_directory(\n",
62
+ " train_data_dir,\n",
63
+ " color_mode = 'grayscale',\n",
64
+ " target_size=(img_rows, img_cols),\n",
65
+ " batch_size=batch_size,\n",
66
+ " class_mode='categorical',\n",
67
+ " shuffle=True)\n",
68
+ " \n",
69
+ "validation_generator = validation_datagen.flow_from_directory(\n",
70
+ " validation_data_dir,\n",
71
+ " color_mode = 'grayscale',\n",
72
+ " target_size=(img_rows, img_cols),\n",
73
+ " batch_size=batch_size,\n",
74
+ " class_mode='categorical',\n",
75
+ " shuffle=True)"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "markdown",
80
+ "metadata": {},
81
+ "source": [
82
+ "## Our Keras Imports"
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "execution_count": 9,
88
+ "metadata": {},
89
+ "outputs": [],
90
+ "source": [
91
+ "from tensorflow.keras.models import Sequential\n",
92
+ "from tensorflow.keras.layers import BatchNormalization\n",
93
+ "from tensorflow.keras.layers import Conv2D, MaxPooling2D\n",
94
+ "from tensorflow.keras.layers import ELU\n",
95
+ "from tensorflow.keras.layers import Activation, Flatten, Dropout, Dense"
96
+ ]
97
+ },
98
+ {
99
+ "cell_type": "markdown",
100
+ "metadata": {},
101
+ "source": [
102
+ "## Keras LittleVGG Model"
103
+ ]
104
+ },
105
+ {
106
+ "cell_type": "code",
107
+ "execution_count": 10,
108
+ "metadata": {},
109
+ "outputs": [
110
+ {
111
+ "name": "stdout",
112
+ "output_type": "stream",
113
+ "text": [
114
+ "Model: \"sequential_1\"\n",
115
+ "_________________________________________________________________\n",
116
+ "Layer (type) Output Shape Param # \n",
117
+ "=================================================================\n",
118
+ "conv2d_8 (Conv2D) (None, 48, 48, 32) 320 \n",
119
+ "_________________________________________________________________\n",
120
+ "activation_11 (Activation) (None, 48, 48, 32) 0 \n",
121
+ "_________________________________________________________________\n",
122
+ "batch_normalization_10 (Batc (None, 48, 48, 32) 128 \n",
123
+ "_________________________________________________________________\n",
124
+ "conv2d_9 (Conv2D) (None, 48, 48, 32) 9248 \n",
125
+ "_________________________________________________________________\n",
126
+ "activation_12 (Activation) (None, 48, 48, 32) 0 \n",
127
+ "_________________________________________________________________\n",
128
+ "batch_normalization_11 (Batc (None, 48, 48, 32) 128 \n",
129
+ "_________________________________________________________________\n",
130
+ "max_pooling2d_4 (MaxPooling2 (None, 24, 24, 32) 0 \n",
131
+ "_________________________________________________________________\n",
132
+ "dropout_6 (Dropout) (None, 24, 24, 32) 0 \n",
133
+ "_________________________________________________________________\n",
134
+ "conv2d_10 (Conv2D) (None, 24, 24, 64) 18496 \n",
135
+ "_________________________________________________________________\n",
136
+ "activation_13 (Activation) (None, 24, 24, 64) 0 \n",
137
+ "_________________________________________________________________\n",
138
+ "batch_normalization_12 (Batc (None, 24, 24, 64) 256 \n",
139
+ "_________________________________________________________________\n",
140
+ "conv2d_11 (Conv2D) (None, 24, 24, 64) 36928 \n",
141
+ "_________________________________________________________________\n",
142
+ "activation_14 (Activation) (None, 24, 24, 64) 0 \n",
143
+ "_________________________________________________________________\n",
144
+ "batch_normalization_13 (Batc (None, 24, 24, 64) 256 \n",
145
+ "_________________________________________________________________\n",
146
+ "max_pooling2d_5 (MaxPooling2 (None, 12, 12, 64) 0 \n",
147
+ "_________________________________________________________________\n",
148
+ "dropout_7 (Dropout) (None, 12, 12, 64) 0 \n",
149
+ "_________________________________________________________________\n",
150
+ "conv2d_12 (Conv2D) (None, 12, 12, 128) 73856 \n",
151
+ "_________________________________________________________________\n",
152
+ "activation_15 (Activation) (None, 12, 12, 128) 0 \n",
153
+ "_________________________________________________________________\n",
154
+ "batch_normalization_14 (Batc (None, 12, 12, 128) 512 \n",
155
+ "_________________________________________________________________\n",
156
+ "conv2d_13 (Conv2D) (None, 12, 12, 128) 147584 \n",
157
+ "_________________________________________________________________\n",
158
+ "activation_16 (Activation) (None, 12, 12, 128) 0 \n",
159
+ "_________________________________________________________________\n",
160
+ "batch_normalization_15 (Batc (None, 12, 12, 128) 512 \n",
161
+ "_________________________________________________________________\n",
162
+ "max_pooling2d_6 (MaxPooling2 (None, 6, 6, 128) 0 \n",
163
+ "_________________________________________________________________\n",
164
+ "dropout_8 (Dropout) (None, 6, 6, 128) 0 \n",
165
+ "_________________________________________________________________\n",
166
+ "conv2d_14 (Conv2D) (None, 6, 6, 256) 295168 \n",
167
+ "_________________________________________________________________\n",
168
+ "activation_17 (Activation) (None, 6, 6, 256) 0 \n",
169
+ "_________________________________________________________________\n",
170
+ "batch_normalization_16 (Batc (None, 6, 6, 256) 1024 \n",
171
+ "_________________________________________________________________\n",
172
+ "conv2d_15 (Conv2D) (None, 6, 6, 256) 590080 \n",
173
+ "_________________________________________________________________\n",
174
+ "activation_18 (Activation) (None, 6, 6, 256) 0 \n",
175
+ "_________________________________________________________________\n",
176
+ "batch_normalization_17 (Batc (None, 6, 6, 256) 1024 \n",
177
+ "_________________________________________________________________\n",
178
+ "max_pooling2d_7 (MaxPooling2 (None, 3, 3, 256) 0 \n",
179
+ "_________________________________________________________________\n",
180
+ "dropout_9 (Dropout) (None, 3, 3, 256) 0 \n",
181
+ "_________________________________________________________________\n",
182
+ "flatten_1 (Flatten) (None, 2304) 0 \n",
183
+ "_________________________________________________________________\n",
184
+ "dense_3 (Dense) (None, 64) 147520 \n",
185
+ "_________________________________________________________________\n",
186
+ "activation_19 (Activation) (None, 64) 0 \n",
187
+ "_________________________________________________________________\n",
188
+ "batch_normalization_18 (Batc (None, 64) 256 \n",
189
+ "_________________________________________________________________\n",
190
+ "dropout_10 (Dropout) (None, 64) 0 \n",
191
+ "_________________________________________________________________\n",
192
+ "dense_4 (Dense) (None, 64) 4160 \n",
193
+ "_________________________________________________________________\n",
194
+ "activation_20 (Activation) (None, 64) 0 \n",
195
+ "_________________________________________________________________\n",
196
+ "batch_normalization_19 (Batc (None, 64) 256 \n",
197
+ "_________________________________________________________________\n",
198
+ "dropout_11 (Dropout) (None, 64) 0 \n",
199
+ "_________________________________________________________________\n",
200
+ "dense_5 (Dense) (None, 7) 455 \n",
201
+ "_________________________________________________________________\n",
202
+ "activation_21 (Activation) (None, 7) 0 \n",
203
+ "=================================================================\n",
204
+ "Total params: 1,328,167\n",
205
+ "Trainable params: 1,325,991\n",
206
+ "Non-trainable params: 2,176\n",
207
+ "_________________________________________________________________\n",
208
+ "None\n"
209
+ ]
210
+ }
211
+ ],
212
+ "source": [
213
+ "model = Sequential()\n",
214
+ "\n",
215
+ "model.add(Conv2D(32, (3, 3), padding = 'same', kernel_initializer=\"he_normal\",\n",
216
+ " input_shape = (img_rows, img_cols, 1)))\n",
217
+ "model.add(Activation('elu'))\n",
218
+ "model.add(BatchNormalization())\n",
219
+ "model.add(Conv2D(32, (3, 3), padding = \"same\", kernel_initializer=\"he_normal\", \n",
220
+ " input_shape = (img_rows, img_cols, 1)))\n",
221
+ "model.add(Activation('elu'))\n",
222
+ "model.add(BatchNormalization())\n",
223
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
224
+ "model.add(Dropout(0.2))\n",
225
+ "\n",
226
+ "# Block #2: second CONV => RELU => CONV => RELU => POOL\n",
227
+ "# layer set\n",
228
+ "model.add(Conv2D(64, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
229
+ "model.add(Activation('elu'))\n",
230
+ "model.add(BatchNormalization())\n",
231
+ "model.add(Conv2D(64, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
232
+ "model.add(Activation('elu'))\n",
233
+ "model.add(BatchNormalization())\n",
234
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
235
+ "model.add(Dropout(0.2))\n",
236
+ "\n",
237
+ "# Block #3: third CONV => RELU => CONV => RELU => POOL\n",
238
+ "# layer set\n",
239
+ "model.add(Conv2D(128, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
240
+ "model.add(Activation('elu'))\n",
241
+ "model.add(BatchNormalization())\n",
242
+ "model.add(Conv2D(128, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
243
+ "model.add(Activation('elu'))\n",
244
+ "model.add(BatchNormalization())\n",
245
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
246
+ "model.add(Dropout(0.2))\n",
247
+ "\n",
248
+ "# Block #4: third CONV => RELU => CONV => RELU => POOL\n",
249
+ "# layer set\n",
250
+ "model.add(Conv2D(256, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
251
+ "model.add(Activation('elu'))\n",
252
+ "model.add(BatchNormalization())\n",
253
+ "model.add(Conv2D(256, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
254
+ "model.add(Activation('elu'))\n",
255
+ "model.add(BatchNormalization())\n",
256
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
257
+ "model.add(Dropout(0.2))\n",
258
+ "\n",
259
+ "# Block #5: first set of FC => RELU layers\n",
260
+ "model.add(Flatten())\n",
261
+ "model.add(Dense(64, kernel_initializer=\"he_normal\"))\n",
262
+ "model.add(Activation('elu'))\n",
263
+ "model.add(BatchNormalization())\n",
264
+ "model.add(Dropout(0.5))\n",
265
+ "\n",
266
+ "# Block #6: second set of FC => RELU layers\n",
267
+ "model.add(Dense(64, kernel_initializer=\"he_normal\"))\n",
268
+ "model.add(Activation('elu'))\n",
269
+ "model.add(BatchNormalization())\n",
270
+ "model.add(Dropout(0.5))\n",
271
+ "\n",
272
+ "# Block #7: softmax classifier\n",
273
+ "model.add(Dense(num_classes, kernel_initializer=\"he_normal\"))\n",
274
+ "model.add(Activation(\"softmax\"))\n",
275
+ "\n",
276
+ "print(model.summary())"
277
+ ]
278
+ },
279
+ {
280
+ "cell_type": "markdown",
281
+ "metadata": {},
282
+ "source": [
283
+ "## Training our model"
284
+ ]
285
+ },
286
+ {
287
+ "cell_type": "code",
288
+ "execution_count": 12,
289
+ "metadata": {},
290
+ "outputs": [
291
+ {
292
+ "name": "stdout",
293
+ "output_type": "stream",
294
+ "text": [
295
+ "WARNING:tensorflow:sample_weight modes were coerced from\n",
296
+ " ...\n",
297
+ " to \n",
298
+ " ['...']\n",
299
+ "Train for 1795 steps\n",
300
+ "1795/1795 [==============================] - 607s 338ms/step - loss: 2.0255 - accuracy: 0.2012\n"
301
+ ]
302
+ }
303
+ ],
304
+ "source": [
305
+ "from tensorflow.keras.optimizers import RMSprop, SGD, Adam\n",
306
+ "from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau\n",
307
+ "\n",
308
+ " \n",
309
+ "checkpoint = ModelCheckpoint(\"emotion_little_vgg.h5\",\n",
310
+ " monitor=\"val_loss\",\n",
311
+ " mode=\"min\",\n",
312
+ " save_best_only = True,\n",
313
+ " verbose=1)\n",
314
+ "\n",
315
+ "earlystop = EarlyStopping(monitor = 'val_loss', \n",
316
+ " min_delta = 0, \n",
317
+ " patience = 3,\n",
318
+ " verbose = 1,\n",
319
+ " restore_best_weights = True)\n",
320
+ "\n",
321
+ "reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.2, patience = 3, verbose = 1, min_delta = 0.0001)\n",
322
+ "\n",
323
+ "# we put our call backs into a callback list\n",
324
+ "callbacks = [earlystop, checkpoint] #reduce_lr]\n",
325
+ "\n",
326
+ "# We use a very small learning rate \n",
327
+ "model.compile(loss = 'categorical_crossentropy',\n",
328
+ " optimizer = Adam(lr=0.001),\n",
329
+ " metrics = ['accuracy'])\n",
330
+ "\n",
331
+ "nb_train_samples = 28273\n",
332
+ "nb_validation_samples = 3534\n",
333
+ "epochs = 5\n",
334
+ "\n",
335
+ "history = model.fit(\n",
336
+ " train_generator,\n",
337
+ " epochs = epochs)\n",
338
+ " callbacks = callbacks)"
339
+ ]
340
+ },
341
+ {
342
+ "cell_type": "code",
343
+ "execution_count": 15,
344
+ "metadata": {},
345
+ "outputs": [
346
+ {
347
+ "name": "stdout",
348
+ "output_type": "stream",
349
+ "text": [
350
+ "Found 3589 images belonging to 7 classes.\n",
351
+ "Confusion Matrix\n",
352
+ "[[ 0 0 0 439 0 52 0]\n",
353
+ " [ 0 0 0 52 0 3 0]\n",
354
+ " [ 0 0 0 486 0 42 0]\n",
355
+ " [ 0 0 0 790 0 89 0]\n",
356
+ " [ 0 0 0 565 0 61 0]\n",
357
+ " [ 0 0 0 496 0 98 0]\n",
358
+ " [ 0 0 0 401 0 15 0]]\n",
359
+ "Classification Report\n",
360
+ " precision recall f1-score support\n",
361
+ "\n",
362
+ " Angry 0.00 0.00 0.00 491\n",
363
+ " Disgust 0.00 0.00 0.00 55\n",
364
+ " Fear 0.00 0.00 0.00 528\n",
365
+ " Happy 0.24 0.90 0.38 879\n",
366
+ " Neutral 0.00 0.00 0.00 626\n",
367
+ " Sad 0.27 0.16 0.21 594\n",
368
+ " Surprise 0.00 0.00 0.00 416\n",
369
+ "\n",
370
+ " accuracy 0.25 3589\n",
371
+ " macro avg 0.07 0.15 0.08 3589\n",
372
+ "weighted avg 0.10 0.25 0.13 3589\n",
373
+ "\n"
374
+ ]
375
+ },
376
+ {
377
+ "name": "stderr",
378
+ "output_type": "stream",
379
+ "text": [
380
+ "C:\\ProgramData\\Anaconda3\\envs\\cv\\lib\\site-packages\\sklearn\\metrics\\classification.py:1437: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n",
381
+ " 'precision', 'predicted', average, warn_for)\n"
382
+ ]
383
+ },
384
+ {
385
+ "data": {
386
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAekAAAHHCAYAAACbaKDRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3debglVXnv8e8PZFAREBCCgOKAUxwQUCESgxoTp4hjnHJFQtLG2Rhzg8NVork3Js6axKQjKniNI0FwuAoiOJAANtiCiCIiSgcCNggOINrd7/2j6tDb9gzd5+yza1fx/TxPPadqVe3aa/dw3v2uWkOqCkmSNH226roCkiRpdgZpSZKmlEFakqQpZZCWJGlKGaQlSZpSBmlJkqbUrbqugCRJS/X7D79tXXPt+rHf99zzb/pcVT167DfeTAZpSVLvXXPtes753J3Gft+t9/zObmO/6RYwSEuSeq+ADWzouhpj5zNpSZKmlJm0JGkAivVlJi1JkibETFqS1HvNM+nhLRhlkJYkDYIdxyRJ0sSYSUuSeq8o1tfwmrvNpCVJmlJm0pKkQbDjmCRJU6iA9QMM0jZ3S5I0pcykJUmDMMTmbjNpSZKmlJm0JKn3CgY5BMsgLUkahOHNN2ZztyRJU8tMWpLUe0U5BEuSJE2OmbQkqf8K1g8vkTaTliRpWplJS5J6rxhm726DtCRpAMJ60nUlxs7mbkmSppSZtCSp9wrYYMcxSZI0KWbSkqRBGOIzaYO0JKn3imEGaZu7JUmaUmbSkqRB2FBm0pIkaULMpCVJvTfUZ9IGaUlS7xVh/QAbh4f3iSRJGggzaUnSINhxTJIkTYyZtCSp9+w4JknS1Arra3iNw8P7RJIkDYSZtCSp9wrYMMC8c3ifSJKkgTCTliQNwhA7jplJS5I0pcykJUm9V2XvbkmSptYGMvZtIUnumWT1yPbjJC9LskuSU5N8p/15+/b6JHlnkkuSnJ/kgPnub5CWJGmRqurbVbV/Ve0PHAjcAJwIHA2cVlX7Aae1xwCPAfZrtxXAu+e7v0FaktR7zYxjW41920KPBL5bVd8HDgeOa8uPA57Y7h8OHF+Ns4Cdk+w51w0N0pIkzW23JKtGthXzXPsM4EPt/h5VdSVA+3P3tnwv4PKR16xpy2ZlxzFJ0gAsW8extVV10ILvnmwLPAF45UKXzlJWc11skJYk9d4UzDj2GOC8qrqqPb4qyZ5VdWXbnH11W74G2GfkdXsDV8x1U5u7JUlaumeysakb4GTgiHb/COCkkfLntL28Dwaun2kWn42ZtCRpENZXNzOOJbkN8CjgeSPFbwQ+muQo4AfA09ryzwCPBS6h6Ql+5Hz3NkhLkrQEVXUDsOsmZdfQ9Pbe9NoCXri59zZIS5J6r8hihkxNPYO0JGkQNjgtqCRJmhQzaUlS783MODY0w/tEkiQNhJm0JKn3inQ2BGs5LUsmneRJSSrJvZbj/pIk3RIsV3P3M4Gv0Ew2vmRJzPglSfPawFZj37o29uCXZAfgocDDaaY/OybJYcAxwFrgvsC5wB9VVSV5LPDW9tx5wF2r6vFJjgHuCOwLrE2yD/Diqlrdvs+ZwPOr6vxxfwZJUr9UsVwLbHRqOTLUJwKfraqLk1yb5IC2/IHAb9JMJH4m8NAkq4B/AR5WVd9L8qFN7nUgcGhV3ZjkCOC5wMuS3APYbq4A3S4ltgJga7Y+8DbsOOaPKG10051v03UVxma779/QdRXGKrcaTiNcrVvXdRXG4uf8jF/UTcN7eLxMluNf8DOBt7f7H26PPw2cU1VrAJKspsmQfwpcWlXfa6//EG1wbZ1cVTe2+x8D/leSvwT+GHj/XBWoqpXASoAds0s9JL82M5s0Nhe/dsFV7HrjHn+yqusqjNXWu+2+8EU9sf6qqxe+qAfOrtOW6c5hw6yrQPbbWIN0kl2BRwD3TVLA1jTD1z4D3DRy6fr2vRf6E/3ZzE5V3ZDkVOBw4A+B4fxmlCRpFuPOpJ8KHF9VN68EkuSLwKFzXP8t4K5J9q2qy4CnL3D/9wCfBL5cVdeOob6SpAEofCa9OZ5JszzXqBOA5wPf3fTi9lnzC4DPJlkLnDPfzavq3CQ/Bt43pvpKkgZiiDOOjTVIV9Vhs5S9E3jnJmUvGjk8varulSTAPwKr2muO2fReSe5IM2zslPHVWpKk6TQNXzv+tO1IdiGwE01v71+T5DnA2cCrq2rDBOsnSZpyRdhQ49+61vn4hKp6G/C2zbjueOD45a+RJEnTofMgLUnSOPhMWpKkKVTAhgH27h7eJ5IkaSDMpCVJAxDWD3DGMTNpSZKmlJm0JKn3fCYtSZImykxakjQIQ3wmbZCWJPVeVWzuliRJk2MmLUkahCEuVTm8TyRJ0kCYSUuSeq+ADXYckyRpGsXmbkmSNDlm0pKk3mtmHLO5W9Im7v3qH3RdhbFZ33UFxmz9VVd3XQVpSQzSkqRBWD/AJ7gGaUlS7xUZZHP38L52SJI0EGbSkqRB2DDAvHN4n0iSpIEwk5Yk9V4VrPeZtCRJmhQzaUnSIAyxd7dBWpLUe80QrOE1Dg/vE0mSNBBm0pKkQVg/wKUqzaQlSZpSBmlJUu/NrII17m1zJNk5yceTfCvJRUkOSbJLklOTfKf9efv22iR5Z5JLkpyf5ID57m2QliQNQNNxbNzbZnoH8NmquhfwAOAi4GjgtKraDzitPQZ4DLBfu60A3j3fjQ3SkiQtUpIdgYcBxwJU1S+q6jrgcOC49rLjgCe2+4cDx1fjLGDnJHvOdX+DtCRpEDaQsW+b4a7AD4H3JflakvckuS2wR1VdCdD+3L29fi/g8pHXr2nLZmWQliRpbrslWTWyrdjk/K2AA4B3V9UDgZ+xsWl7NrNF/prrYodgSZJ6bxnn7l5bVQfNc34NsKaqzm6PP04TpK9KsmdVXdk2Z189cv0+I6/fG7hirpubSUuSBqGLjmNV9d/A5Unu2RY9EvgmcDJwRFt2BHBSu38y8Jy2l/fBwPUzzeKzMZOWJGlpXgx8MMm2wKXAkTRJ8EeTHAX8AHhae+1ngMcClwA3tNfOySAtSeq9Zu7ubmYcq6rVwGxN4o+c5doCXri597a5W5KkKWUmLUkahM0cMtUrW5xJJ1mfZHWSC5N8PcnLk2zVnjsoyTvHX81fq8O+SZ613O8jSVKXFpNJ31hV+wMk2R34N2An4HVVtQpYNcb6zWVf4Fnte0uSbuFm5u4emiU9k66qq2nmHn1R2538sCSfAkjyO23GvbqdheV2SbZK8k9tFv6pJJ9J8tT2+suS7NbuH5TkjLnuA7wR+O227M+X8hkkScPQ4dzdy2bJz6Sr6tK2uXv3TU69AnhhVZ2ZZAfg58CTabLg+7XXXwS8d4G3mO0+RwOvqKrHz/aCdkaYFQDbc5tFfS5Jkro2rq8Js7UxnAm8NclLgJ2rah1wKPCxqtrQDgA/fTPuPdt95lVVK6vqoKo6aBu224KPIUnqpWVYpnIams+XHKST3BVYz8YpzwCoqjcCfwLcGjgryb2YPZjPWDdSn+0XuI8kSYO3pCCd5A7APwP/0A7QHj13t6q6oKr+jqYz2b2ArwBPaZ9N7wEcNvKSy4AD2/2nLHCfnwC3W0rdJUnDUXS2CtayWswz6VsnWQ1sQ5P9fgB46yzXvSzJw2my7G8C/w/4Jc0MLN8ALgbOBq5vr/9r4Ngkr2rL57vPBmBdkq8D76+qty3ic0iSBmQamqfHbYuDdFVtPc+5M4Az2v0Xz3ZNkldU1U+T7AqcA1zQXv9l4B6z3HPW+zDLdGuSJA1JFzOOfSrJzsC2wBvaDmSSJC3aUMdJTzxIV9Vhk35PSZL6yLm7JUmDYCYtSdIU6nKpyuXU/ZxnkiRpVmbSkqRBmIZxzeNmJi1J0pQyk5Yk9V8Ns+OYmbQkSVPKTFqS1HtOZiJJ0hQbYpC2uVuSpCllJi1J6j0nM5EkSRNlJi1JGoQaYCZtkJYkDYIzjkmSpIkxk5Yk9V4545gkSZokM2lpiS591x5dV2Fs7vyHV3ddhbHaetdduq7C2Ky/5tquqzD17DgmSdJUcpy0JEmaIDNpSdIgDLG520xakqQpZSYtSeq9oS5VaSYtSdKUMpOWJPVfNROaDI1BWpI0CM7dLUmSJsZMWpLUe4VDsCRJ0gSZSUuSBmCY04IapCVJgzDE3t02d0uSNKXMpCVJg2DHMUmS9CuSXJbkgiSrk6xqy3ZJcmqS77Q/b9+WJ8k7k1yS5PwkB8x3b4O0JKn3qppMetzbFnh4Ve1fVQe1x0cDp1XVfsBp7THAY4D92m0F8O75bmqQliQNwobK2LclOBw4rt0/DnjiSPnx1TgL2DnJnnPdxCAtSdLSFHBKknOTrGjL9qiqKwHan7u35XsBl4+8dk1bNis7jkmSBmGZhmDtNvOcubWyqlZucs1Dq+qKJLsDpyb51jz3my09n7PmBmlJkua2duQ586yq6or259VJTgQeDFyVZM+qurJtzr66vXwNsM/Iy/cGrpjr3jZ3S5IGoYuOY0lum+R2M/vA7wHfAE4GjmgvOwI4qd0/GXhO28v7YOD6mWbx2Uw0k06yHrhgpOiJVXXZJOsgSRqeYot7Y4/LHsCJSaCJqf9WVZ9N8lXgo0mOAn4APK29/jPAY4FLgBuAI+e7+aSbu2+sqv3HdbM0fyqpqg3juqckSZurqi4FHjBL+TXAI2cpL+CFm3v/zpu7k2yd5E1JvtoO7H5eW75DktOSnNcOEj+8Ld83yUVJ/gk4j19t25ck3ULVMmxdm3Qmfeskq9v971XVk4CjaNrkH5RkO+DMJKfQdFF/UlX9OMluwFlJTm5fe0/gyKp6wWxv0naBXwGwPbdZzs8jSdKymYbm7t8D7p/kqe3xTjQzsawB/k+ShwEbaMaR7dFe8/12EPis2u7xKwF2zC7T8GVIkrScaphzd0/DEKwAL66qz/1KYfJc4A7AgVX1yySXAdu3p3820RpKktSBzp9JA58Dnp9kG4Ak92i7se8EXN0G6IcDd+6ykpKkKTfAh9LTkEm/B9gXOK/trf1DmjlOPwh8sp3pZTUw3wwukqRbOJu7l6iqdpilbAPwqnbb1CFz3Oq+46yXJEnTaBoyaUmSlmyZ5u7u1DQ8k5YkSbMwk5Yk9V7hM2lJkqZTAQMM0jZ3S5I0pcykJUmDYMcxSZI0MWbSkqRhGGAmbZCWJA1ABtm72+ZuSZKmlJm0JGkYBtjcbSYtSdKUMpOWJPVfDXPGMTNpSZKmlJm0JGkYBvhM2iAtSRoIm7slSdKEmElLkoZhgM3dZtKSJE0pM2lpib516Ae6rsLY/D77d12F8dr19l3XYHyuubbrGky/AWbSBmlJUv8V4DhpSZI0KWbSkqRBqAE2d5tJS5I0pcykJUnDMMBM2iAtSRoGO45JkqRJMZOWJA1CBtjcbSYtSdKUMpOWJPVfMciOY2bSkiRNKTNpSdIAZJC9uw3SkqRhsLlbkiRNipm0JGkYzKQlSdKkmElLkoZhgJm0QVqS1H/FIHt329wtSdKUMkhLkgYhNf5ts9432TrJ15J8qj2+S5Kzk3wnyUeSbNuWb9ceX9Ke33ehexukJUlampcCF40c/x3wtqraD/gRcFRbfhTwo6q6O/C29rp5GaQlScNQy7AtIMnewOOA97THAR4BfLy95Djgie3+4e0x7flHttfPaWxBOslPNzl+bpJ/GNf9JUmaQm8H/iewoT3eFbiuqta1x2uAvdr9vYDLAdrz17fXz8lMWpKkue2WZNXItmLmRJLHA1dX1bkj18+WGddmnJvVRIZgJfkD4DXAtsA1wLOr6qokxwB3o/l2sQ/w91X1r0kOA17fXntP4EvAC4AjgftW1Z+39/1T4N5V9fJJfA5J0vTa3I5eW2htVR00x7mHAk9I8lhge2BHmsx65yS3arPlvYEr2uvX0MS6NUluBewEXDvfm48zk751ktUzG02QnfEV4OCqeiDwYZqmgRn3p2nPPwR4bZI7tuUPBv4CuB9NIH9y+9onJNmmveZI4H2bViTJiplvPb/kpvF9QkmSWlX1yqrau6r2BZ4BfKGqng2cDjy1vewI4KR2/+T2mPb8F6pqYpn0jVW1/8xBkucCM98+9gY+kmRPmmz6eyOvO6mqbgRuTHI6TXC+Djinqi5t7/Uh4NCq+niSLwCPT3IRsE1VXbBpRapqJbASYMfsMsA5aCRJv2Z6JjP5K+DDSf4G+BpwbFt+LPCBJJfQZNDPWOhGk5px7F3AW6vq5LYp+5iRc5sG0Vqg/D3Aq4BvMUsWLUnSpFXVGcAZ7f6lNAnnptf8HHjaltx3Uh3HdgL+q90/YpNzhyfZPsmuwGHAV9vyB7cDwrcCnk7TZE5VnU3Tpv8s4EPLXXFJUg8sx/CrKWiHnVSQPgb4WJIvA2s3OXcO8GngLOANVTXzgP0/gTcC36BpHj9x5DUfBc6sqh8tZ6UlST0ywCA9tubuqtphk+P3A+9v909i44PzTV1cVStmKb+hqp4+x2sOpZmtRZKkwerVOOkkOye5mKaT2mld10eSND26mrt7OXW6VGVVHTNH+Rm0D+A3Kb8OuMeyVkqSpCnhetKSpGGYgsx33AzSkqRhGGCQ7tUzaUmSbknMpCVJvTctHb3GzUxakqQpZSYtSRqG6Zm7e2wM0pKkYbC5W5IkTYqZtCRpEOw4JkmSJsZMWpI0DGbSkiRpUsykJUn9N9DJTAzSkqRhGGCQtrlbkqQpZSYtSRoGM2lJkjQpZtLSEj3hO4/uugpj9N9dV2Cs8rMbu66CJmiIHcfMpCVJmlIGaUmSppTN3ZKkYbC5W5IkTYqZtCSp/5xxTJKkKTbAIG1ztyRJU8pMWpI0DGbSkiRpUsykJUm9F4bZccxMWpKkKWUmLUkahgFm0gZpSVL/DXSctM3dkiRNKTNpSdIwmElLkqRJMZOWJA3DADNpg7QkaRDsOCZJkibGTFqSNAxm0pIkaVLMpCVJ/VeYSc9IUkneMnL8iiTHLPJeOyd5wSJfe1mS3RbzWknSsKTGv3Vtsc3dNwFPHlOA3BmYNUgn2XoM95ckaeySbJ/knCRfT3Jhkr9uy++S5Owk30nykSTbtuXbtceXtOf3Xeg9Fhuk1wErgT+fpdJ3SHJCkq+220Pb8mOSvGLkum+0FXwjcLckq5O8KclhSU5P8m/ABe21n0hybvuHsGKRdZYkDVktwza/m4BHVNUDgP2BRyc5GPg74G1VtR/wI+Co9vqjgB9V1d2Bt7XXzWspHcf+EXh2kp02KX9HW7kHAU8B3rPAfY4GvltV+1fVX7ZlDwZeXVX3aY//uKoOBA4CXpJk1/lumGRFklVJVv2Sm7bkM0mStFmq8dP2cJt2K+ARwMfb8uOAJ7b7h7fHtOcfmSTzvceiO45V1Y+THA+8BLhx5NTvAvcZed8dk9xuC29/TlV9b+T4JUme1O7vA+wHXDNP3VbSZPrsmF2m4KmCJGm5dfEMuX0sey5wd5rk9bvAdVW1rr1kDbBXu78XcDlAVa1Lcj2wK7B2rvsvtXf324HzgPeNlG0FHFJVo4GbJOv41cx9+3nu+7OR1x1GE/gPqaobkpyxwGslSRqX3ZKsGjle2SaCAFTVemD/JDsDJwL3nuUeM18fZsua5/1qsaRx0lV1LfBRNra3A5wCvGjmIMn+7e5lwAFt2QHAXdrynwDzZdo70bTh35DkXsDBS6mzJGmglueZ9NqqOmhkW8ksquo64AyaGLVzkpkkeG/ginZ/DU1rMO35nYBr5/tI45jM5C3AaC/vlwAHJTk/yTeBP2vLTwB2SbIaeD5wMUBVXQOc2XYke9Ms9/8scKsk5wNvAM4aQ50lSUOyHAF6gebztqP0zu3+rWlafS8CTgee2l52BHBSu39ye0x7/gtVNe+7LKq5u6p2GNm/CrjNyPFa4OmzvOZG4PfmuN+zNik6Y+TcTcBj5njdvltQbUmSxmlP4Lj2ufRWwEer6lNtgvrhJH8DfA04tr3+WOADSS6hyaCfsdAbOOOYJKn3wuwPfJdTVZ0PPHCW8ktpRiltWv5z4Glb8h7O3S1J0pQyk5YkDcMAB9wapCVJgzANc22Pm83dkiRNKTNpSdIwmElLkqRJMZOWJA3DADNpg7Qkqf/KjmOSJGmCzKQlScNgJi1JkibFTFqSNAg+k5YkSRNjJi1JGoYBZtIGaUnSINjcLUmSJsZMWpLUf8Ugm7vNpCVJmlJm0tISfXftrl1XYWz25r+7rsJY1U47dF2F8fmvrivQAwPMpA3SkqTeC3YckyRJE2QmLUkaBjNpSZI0KWbSkqRBSA0vlTZIS5L6z3HSkiRpksykJUmD4BAsSZI0MWbSkqRhGGAmbZCWJA2Czd2SJGlizKQlScNgJi1JkibFTFqS1H/lM2lJkjRBZtKSpGEYYCZtkJYk9V6wuVuSJE2QmbQkaRgGuFSlmbQkSVPKTFqSNAg+kx6zJK9OcmGS85OsTvKQzXzdvkm+sdz1kyT1RC3T1rHOMukkhwCPBw6oqpuS7AZs21V9JEmaNl02d+8JrK2qmwCqai1AktcCfwDcGvgP4HlVVUkOBN4L3AB8pZsqS5KmVTZ0XYPx67K5+xRgnyQXJ/mnJL/Tlv9DVT2oqu5LE6gf35a/D3hJVR2y0I2TrEiyKsmqX3LT8tRekqRl1lmQrqqfAgcCK4AfAh9J8lzg4UnOTnIB8AjgN5PsBOxcVV9sX/6BBe69sqoOqqqDtmG75fsQkqTp4TPp8aqq9cAZwBltUH4ecH/goKq6PMkxwPY0k8lMwR+XJGladdG7O8k+wPHAbwAbgJVV9Y4kuwAfAfYFLgP+sKp+lCTAO4DH0jy+fW5VnTfX/TvLpJPcM8l+I0X7A99u99cm2QF4KkBVXQdcn+TQ9vyzJ1dTSZLmtA74i6q6N3Aw8MIk9wGOBk6rqv2A09pjgMcA+7XbCuDd8928y0x6B+BdSXam+ZCX0FT4OuACmm8eXx25/kjgvUluAD432apKkqZa0cmMY1V1JXBlu/+TJBcBewGHA4e1lx1H02r8V2358VVVwFlJdk6yZ3ufX9NZkK6qc4HfmuXUa9pttusfMFJ0zPLUTJKkLZdkX+CBwNnAHjOBt6quTLJ7e9lewOUjL1vTlk1XkJYkaZyW6Zn0bklWjRyvrKqVv/bezSPaE4CXVdWPm0fPs5rtxJw1N0hLkjS3tVV10HwXJNmGJkB/sKr+vS2+aqYZO8mewNVt+Rpgn5GX7w1cMde9XWBDkjQMHQzBantrHwtcVFVvHTl1MnBEu38EcNJI+XPSOBi4fq7n0WAmLUkagNDZAhsPBf4HcEGS1W3Zq4A3Ah9NchTwA+Bp7bnP0Ay/uoRmCNaR893cIC1J0iJV1VeY/TkzwCNnub6AF27u/Q3SkqT+q+pkCNZy85m0JElTykxakjQIHT2TXlYGaUnSMAwwSNvcLUnSlDKTliQNwhCbu82kJUmaUmbSkqT+K2DD8FJpg7QkaRiGF6Nt7pYkaVqZSUuSBsGOY5IkaWLMpCVJw+Dc3ZIkaVLMpKUl+vnlt+u6CprDhu9c1nUVNEFDfCZtkJYk9V/hECxJkjQ5ZtKSpN4LEDuOSZKkSTGTliQNw4auKzB+BmlJ0iDY3C1JkibGTFqS1H8OwZIkSZNkJi1JGoAa5NzdBmlJ0iAMcVpQm7slSZpSZtKSpGEYYHO3mbQkSVPKTFqS1H8FGeCMY2bSkiRNKTNpSdIwDPCZtEFakjQMw4vRNndLkjStzKQlSYPgKliSJGlizKQlScNwS82kk7w6yYVJzk+yOslDlqMyST6TZOfluLckacAK2LAMW8cWzKSTHAI8Hjigqm5Kshuw7ebcPMmtqmrdZlwXIFX12M25ryRJtwSbk0nvCaytqpsAqmptVV2R5LI2YJPkoCRntPvHJFmZ5BTg+CTPTXJSks8m+XaS17XX7ZvkoiT/BJwH7DNzzyS3TfLpJF9P8o0kT29fc2CSLyY5N8nnkuw5/j8SSVLfhCI1/q1rm/NM+hTgtUkuBj4PfKSqvrjAaw4EDq2qG5M8F3gwcF/gBuCrST4NrAXuCRxZVS8AaBJqAB4NXFFVj2vLd0qyDfAu4PCq+mEbuP838MebvnmSFcCK9vCnn6+Pf3szPudS7EbzeYbAz7KlXvrxZX8LJvRZLlvuN2hM7t/YL5b9Hfz/suXuPIH3GIwFg3RV/TTJgcBvAw8HPpLk6AVednJV3ThyfGpVXQOQ5N+BQ4FPAN+vqrNmef0FwJuT/B3wqar6cpL70gT6U9tgvjVw5Rx1XgmsXOizjUuSVVV10KTebzn5WaaTn2U6+VmmzBRkvuO2Wb27q2o9cAZwRpILgCOAdWxsLt9+k5f8bNNbzHG86XUz73dx+8XgscDftk3nJwIXVtUhm1NnSdItzACD9ILPpJPcM8l+I0X7A9+naRk7sC17ygK3eVSSXZLcGngicOYC73lH4Iaq+r/Am4EDgG8Dd2g7spFkmyS/uVD9JUnqq83JpHcA3tUOjVoHXELzvPfewLFJXgWcvcA9vgJ8ALg78G9VtSrJvvNcfz/gTUk2AL8Enl9Vv0jyVOCdSXZq6/524MLN+AzLbWJN6xPgZ5lOfpbp5GeZFjNDsAYmtczNA23HsYOq6kXL+kaSpFusnW5zxzr4nn869vuesvr153b5rN5pQSVJg9DFEKwk701ydZJvjJTtkuTUJN9pf96+LU+Sdya5pJ0c7ICF7r/sQbqq3m8WLUkaqPfTDBsedTRwWlXtB5zWHgM8Btiv3VYA717o5mbSkqRhqBr/tuBb1peAazcpPhw4rt0/jqbD9Ez58dU4C9h5oUm5DNKL0I7ZHowkL92csmnXNiXt03U9JHVhGQL04vts7VFVVwK0P3dvy/cCLh+5bk1bNieD9OL8c5JzkrxgIAuCHDFL2XMnXYmlqqYX5Ce6rse4JHlz34cZts/m5ty6rt+WSHJB+xxx1q3r+i1Gkj2SHJvk/7XH90lyVNf1mjK7JVk1sq1Y+CVzyixl834TcKnKRaiqQ9ux438MrEpyDvC+qjq146ptkSTPBJ4F3CXJySOndgSu6aZWS3ZWkgdV1Ve7rsgYfAtYmeRWwPuAD1XV9R3XaUudS/NLaK5fTnedbJJWAskAAA4rSURBVHWW5PHtzxe2Pz/Q/nw2zZTHffR+mn9br26PLwY+AhzbVYUWrViuyUzWLqJ391VJ9qyqK9vm7Kvb8jXAaGvf3sAV893IIL1IVfWdJK8BVgHvBB6YZr7SV1XVv3dbu832HzRTq+4GvGWk/CdALzMDmqlrn5fk+zQz2oUmyb5/t9XaclX1HuA9Se4JHAmcn+RM4F+r6vRua7d5quouXddhXKrq+wBJHlpVDx05dXT79/L6bmq2JLtV1UeTvBKgqtYlWd91pQbgZJoWyje2P08aKX9Rkg8DDwGun2kWn4tBehGS3J/ml+bjgFOBP6iq89qZ0v4T6EWQbn/pfD/J7wI3VtWGJPcA7kUzf3ofPabrCoxTkq1p/j7uRbP4wdeBlyd5XlU9o9PKbaF2GMp+jEwj3Ha66ZvbJjm0qr4CkOS3gNt2XKfF+lmSXWmbXJMcDPSttWajDiYzSfIh4DCaZvE1wOtogvNH20cHPwCe1l7+GZrpri+haX05cqH7G6QX5x+Af6XJmm9eSKRdwvM13VVr0b4E/Hb7S/Q0mtaBp9M04/XKSLazO78+p3yvJHkr8ASav5P/U1XntKf+Lslyr+w2Vkn+BHgpTfPeauBgmi+0j+iyXot0FPDeduZDgOuYZTW+nng5TXZ3t7Y14A7AU7ut0uJ1sbRkVT1zjlOPnOXaYuPjks1ikN5CbWZzeVV9YLbzc5VPuVTVDe23vndV1d8n+VrXlVqMJE+gabq/I81zoDsDFwF97ID1DeA1VTXb884HT7oyS/RS4EHAWVX18CT3Av664zotSlWdCzwgyY40/3d6m3m2LYC/Q7NscIBvV9UvO66WRhikt1BVrU+ya5Jtq2r5V6udjLQLlzybJkuA/v7beANNlvb5qnpgkocDc33TnXbvA56U5FCa5sivVNWJAD0MDD+vqp8nIcl2VfWt9ll7LyV5HM0Xv+2brihQVb17Jp3kacBnq+rCthXwgCR/U1XndV23RRngKlh9/UXcte8DZ7Y9om9ebrOq3tpdlZbkZcArgRPb/6x3BXrRMWkWv6yqa5JslWSrqjo9zbrkffSPNIvSfKg9fl6S362qLWoumxJr2uGKn6BZE/5HLNCrdVol+WfgNjSdFN9D0zx8zrwvml7/q6o+1n4R/H2aVQffTdOpSVPAIL04V7TbVsDtOq7LklXVF4EvjhxfCrykuxotyXVJdgC+DHwwydU0q7f10e8A922fY5HkOHraoa+qntTuHpPkdGAn4LMdVmkpfquq7p/k/Kr66yRvoSedRWcx05P7ccC7q+qkJMd0WJ/FK2CDmbSAqurls7S5tL80f+1fd1X1sVPP4cCNNK0Dz6YJBr1rhmx9G7gTTcsNNOMrezc0LslWwPlVdV+4+Uthn810Fr2hHdFxLdDXoWb/leRfgN+l6ZC4Hb2d5GpJM4RNLYP0IiT5JL8e1K6n6RX9L1X188nXakleMbK/PfAUepp9VtXPktwZ2K+qjktyG2Drruu1SLsCF7WT5UDT8eo/ZyaeqaondFazLdAO7ft6kjtV1Q+6rs8YfKptuv97mslaoGn27qM/pFkc4s1VdV078cZfdlwnjTBIL86lNEMVZp4VPh24CrgHzdCs/9FRvRal7a066swkvcx2kvwpzeoyuwB3o5kX95+ZZThED7y26wqM0Z7Ahe0XjtF+HL34ogGQ5EE0Izve0B7vQPP44VvA27qs25ZKsmNV/ZjmS/kZbdkuwE00yUY/mUmr9cCqetjI8SeTfKmqHpbkws5qtUibzKG8FXAg8BsdVWepXkgzPOlsuHlmuN3nf8l0qqovJvkNms9TwFer6r87rtZiDeER0UyzMEkeRjNhxYuB/YGV9Gt88b/RTHM627StfZuuddAM0otzh9GmuyR3oplaE6CPw7JG/6OuA77HxqFYfXNTVf1iZlhMO+91L79etxOAvBb4As3fzbuSvL6q3tttzRblsVX1V6MFba/7PrXYbF1VM0sSPh1YWVUnACckWd1hvbZYVT2+ncb4dwbyCKJhJq3WXwBfSfJdml+edwFekOS2bFxDtDeGNL8y8MUkrwJuneRRwAuAT3Zcp8X6S5pWm2sA2ukb/wPoY5B+FPBXm5Q9ZpayabZ1kltV1TqaxyejqyH17ndpVVWSE2lazjSlevcPaxpU1WfSrIJ1L5og/a2RzmJv765mi5PkybMUXw9cUFVXz3Jumh1N0wpwAfA8mrly+9qpZw3NYiczfsKvrkU79ZI8n+aL0t3yq8s53o7mC0effIjmS+Bamh7eXwZIcnf6O9/1cFaNG+gQrNQAmwcmoZ1Uf19GvuhU1fGdVWgJknwaOISNE5gcBpxF0xHu9X2Y6nRAPYdvluR44H40K+gUzfCyc2iWE+zF5Dnt/Na3B/6W5gvUjJ+MNB33RrsAxZ7AKVX1s7bsHsAOfZylK8k3af6f937VuJ2226N+647jX27gs5e97dxFLFU5NmbSi5DkAzQ9h1ezcTKAAnoZpGnWjrl3VV0FzULwbJx16EtsXDd3mn0COAAgyQlV9ZSO6zMO3223GTPL3fVmAp12+tLrk2zarL1Dkh369sWqqs6apeziLuoyJoNaNW6IDNKLcxBwnxpOM8S+MwG6dTVwj6q6NklfJtsf7Z06iJ6pA5s059Ns7Jy4PU0/jm/Tz4VPBqOqvp/kAGBmfvgz+9gicLPB/EreyCC9ON+gGaI072LdPfLlJJ8CPtYePxX4UtsR7rruqrVFao793kpyB+B/0i7kMFPex5ngqup+o8dtYHheR9VRK8lradY6npnW9H1JPlZVf9NhtTTCIL04uwHfbCdmuKktq6o6vMM6LcULgSfTfJsOTQ/1E9qWgod3WbEt8IAkP6ap/63bfdj4jG3H7qq2aB8EPkIznvXPgCOAH3ZaozFpl0h8UNf1EM+kGUHwc4AkbwTOA/oXpAfaccwgvTjHjOyHJrj1dTlEaFb0+URVndAuH3hPmn8bfWnqpqr6OvXnfHatqmOTvHRmEZQezwT38pHDrWj6DwziC0fPXUbTSjMzOmU7frUfRL/Y3C24eSao/YFn0cx9+z2aqSf76kvAbye5PfB5mmkBn06zQIW6M/Ml6cp2/eIrgL07rM9SjHZ2W0fzjPqEjuqijW6ima71VJpc9FE0c0C8E6Cq+roa3mAYpLdAO9TiGTRZ8zU0TZGpqr40Cc8lVXVDkqOAd1XV3yf5WteVEn/TDmH6C+BdwI7An3dbpcWZ6QSX5LYzQ5c0FU5stxlndFSP8TCTvsX7Fs0EBn9QVZcAJOnlL81NJMkhNJnzzHSg/tvoWFV9qt29nv70DZhV++/rWGAH4E5JHgA8r6pe0G3NbrmSbA08qqr+qOu6aG7+It4yT6HJpE9P8lngw/zq0J++ehnwSuDEqrowyV3ZOLGJJizJu5inh3pPmyDfDvw+MLPM5tfbRSrUkapan+QOSbatqj6uObAJ15O+xauqE4ET26FJT6RpetwjybtpAtwpnVZwkWY6JY0cXwr0MRAMxehSgX8NvK6rioxTVV0+s/BJa/1c12piLqNZmvZkfnUJ0amfze7XFLBhQ9e1GDuD9CK0z9Q+CHywXebxaTRTHvYqSCd5e1W9LMknmSVz69Nav0NSVTcv0pLkZaPHPXZ5O5VuJdmW5kvgRR3XSU1nxCtoetz3Zia7WxKD9BK18w//S7v1zcx0n2/utBaaz1Da7/4MeAewF83CIafQjM9XhwY2q53N3RqWqjq3/fnFdnYrqsqxqxq7qlqLQ/qmTpLTmb0VrXez2g2VQfoWrF30/XXAi2g6wG2VZB3NMKzXd1q5W7AkP2HjL87b9Hn2tHbayblUVb1hYpXRbF4xsr89TefYdR3VZenMpDUwLwMeCjyoqr4H0PbsfneSP6+qt3Vau1uoqhrSs8HZxkTflmao366AQbpDM61pI87s66x2Q2WQvmV7Ds04ybUzBVV1aZI/onlmaJDWklTVW2b2k9wOeClwJM3wxbfM9TpNRtvxdcZWNCv8/UZH1Vmicu5uDc42owF6RlX9MMk2XVRIw9MGgpfTPJM+Djigqn7Uba3UOpeNj1bW0QzJOmrOq6dZQZVDsDQs801gMIDJDdS1JG+iWWFtJXC/qvppx1US0K5AdnlV3aU9PoLmefRlwDc7rJo2sVXXFVCnHpDkx7NsPwHut+CrpYX9BXBH4DXAFaP/xkY6xGny/oX2i3g789vf0rRyXE/zhaqfNtT4t46ZSd+CDXR5R02RqjIRmE5bt3M8QLPi3cqqOgE4IcnqDuulTfgfSJJuebZOMpOkPRL4wsi5/iZvVePfOtbfvwxJ0mJ9CPhikrXAjTSr+5Hk7jRN3v1T5dzdkqT+q6r/neQ0YE/glKqbU8atgBd3VzNtyiAtSbdAVXXWLGUXd1GXsZmC5ulx85m0JElTykxakjQI5TNpSZKm0XT0xh43m7slSZpSZtKSpP4rpmKGsHEzk5YkaUqZSUuShmGAq2CZSUuSNKXMpCVJvVdADfCZtEFaktR/VTZ3S5KkjZI8Osm3k1yS5Ohx399MWpI0CJNu7k6yNfCPwKOANcBXk5xcVd8c13uYSUuStDgPBi6pqkur6hfAh4HDx/kGZtKSpGGY/DPpvYDLR47XAA8Z5xsYpCVJvfcTfvS5z9fHd1uGW2+fZNXI8cqqWtnuZ5brx9rmbpCWJPVeVT26g7ddA+wzcrw3cMU438Bn0pIkLc5Xgf2S3CXJtsAzgJPH+QZm0pIkLUJVrUvyIuBzwNbAe6vqwnG+R2qA629KkjQENndLkjSlDNKSJE0pg7QkSVPKIC1J0pQySEuSNKUM0pIkTSmDtCRJU8ogLUnSlPr/Z6bR2NdNyw4AAAAASUVORK5CYII=\n",
387
+ "text/plain": [
388
+ "<Figure size 576x576 with 2 Axes>"
389
+ ]
390
+ },
391
+ "metadata": {
392
+ "needs_background": "light"
393
+ },
394
+ "output_type": "display_data"
395
+ }
396
+ ],
397
+ "source": [
398
+ "import matplotlib.pyplot as plt\n",
399
+ "import sklearn\n",
400
+ "from sklearn.metrics import classification_report, confusion_matrix\n",
401
+ "import numpy as np\n",
402
+ "\n",
403
+ "nb_train_samples = 28273\n",
404
+ "nb_validation_samples = 3534\n",
405
+ "\n",
406
+ "# We need to recreate our validation generator with shuffle = false\n",
407
+ "validation_generator = validation_datagen.flow_from_directory(\n",
408
+ " validation_data_dir,\n",
409
+ " color_mode = 'grayscale',\n",
410
+ " target_size=(img_rows, img_cols),\n",
411
+ " batch_size=batch_size,\n",
412
+ " class_mode='categorical',\n",
413
+ " shuffle=False)\n",
414
+ "\n",
415
+ "class_labels = validation_generator.class_indices\n",
416
+ "class_labels = {v: k for k, v in class_labels.items()}\n",
417
+ "classes = list(class_labels.values())\n",
418
+ "\n",
419
+ "#Confution Matrix and Classification Report\n",
420
+ "Y_pred = model.predict(validation_generator)\n",
421
+ "y_pred = np.argmax(Y_pred, axis=1)\n",
422
+ "\n",
423
+ "print('Confusion Matrix')\n",
424
+ "print(confusion_matrix(validation_generator.classes, y_pred))\n",
425
+ "print('Classification Report')\n",
426
+ "target_names = list(class_labels.values())\n",
427
+ "print(classification_report(validation_generator.classes, y_pred, target_names=target_names))\n",
428
+ "\n",
429
+ "plt.figure(figsize=(8,8))\n",
430
+ "cnf_matrix = confusion_matrix(validation_generator.classes, y_pred)\n",
431
+ "\n",
432
+ "plt.imshow(cnf_matrix, interpolation='nearest')\n",
433
+ "plt.colorbar()\n",
434
+ "tick_marks = np.arange(len(classes))\n",
435
+ "_ = plt.xticks(tick_marks, classes, rotation=90)\n",
436
+ "_ = plt.yticks(tick_marks, classes)"
437
+ ]
438
+ },
439
+ {
440
+ "cell_type": "markdown",
441
+ "metadata": {},
442
+ "source": [
443
+ "### Loading our saved model"
444
+ ]
445
+ },
446
+ {
447
+ "cell_type": "code",
448
+ "execution_count": 20,
449
+ "metadata": {},
450
+ "outputs": [],
451
+ "source": [
452
+ "from tensorflow.keras.models import load_model\n",
453
+ "\n",
454
+ "classifier = load_model('emotion_little_vgg.h5')"
455
+ ]
456
+ },
457
+ {
458
+ "cell_type": "markdown",
459
+ "metadata": {},
460
+ "source": [
461
+ "### Get our class labels"
462
+ ]
463
+ },
464
+ {
465
+ "cell_type": "code",
466
+ "execution_count": 21,
467
+ "metadata": {},
468
+ "outputs": [
469
+ {
470
+ "name": "stdout",
471
+ "output_type": "stream",
472
+ "text": [
473
+ "Found 3589 images belonging to 7 classes.\n",
474
+ "{0: 'Angry', 1: 'Disgust', 2: 'Fear', 3: 'Happy', 4: 'Neutral', 5: 'Sad', 6: 'Surprise'}\n"
475
+ ]
476
+ }
477
+ ],
478
+ "source": [
479
+ "validation_generator = validation_datagen.flow_from_directory(\n",
480
+ " validation_data_dir,\n",
481
+ " color_mode = 'grayscale',\n",
482
+ " target_size=(img_rows, img_cols),\n",
483
+ " batch_size=batch_size,\n",
484
+ " class_mode='categorical',\n",
485
+ " shuffle=False)\n",
486
+ "\n",
487
+ "class_labels = validation_generator.class_indices\n",
488
+ "class_labels = {v: k for k, v in class_labels.items()}\n",
489
+ "classes = list(class_labels.values())\n",
490
+ "print(class_labels)"
491
+ ]
492
+ },
493
+ {
494
+ "cell_type": "markdown",
495
+ "metadata": {},
496
+ "source": [
497
+ "### Let's test on some of validation images"
498
+ ]
499
+ },
500
+ {
501
+ "cell_type": "code",
502
+ "execution_count": 25,
503
+ "metadata": {},
504
+ "outputs": [],
505
+ "source": [
506
+ "from tensorflow.keras.models import load_model\n",
507
+ "from tensorflow.keras.optimizers import RMSprop, SGD, Adam\n",
508
+ "from tensorflow.keras.preprocessing import image\n",
509
+ "import numpy as np\n",
510
+ "import os\n",
511
+ "import cv2\n",
512
+ "import numpy as np\n",
513
+ "from os import listdir\n",
514
+ "from os.path import isfile, join\n",
515
+ "import re\n",
516
+ "\n",
517
+ "def draw_test(name, pred, im, true_label):\n",
518
+ " BLACK = [0,0,0]\n",
519
+ " expanded_image = cv2.copyMakeBorder(im, 160, 0, 0, 300 ,cv2.BORDER_CONSTANT,value=BLACK)\n",
520
+ " cv2.putText(expanded_image, \"predited - \"+ pred, (20, 60) , cv2.FONT_HERSHEY_SIMPLEX,1, (0,0,255), 2)\n",
521
+ " cv2.putText(expanded_image, \"true - \"+ true_label, (20, 120) , cv2.FONT_HERSHEY_SIMPLEX,1, (0,255,0), 2)\n",
522
+ " cv2.imshow(name, expanded_image)\n",
523
+ "\n",
524
+ "\n",
525
+ "def getRandomImage(path, img_width, img_height):\n",
526
+ " \"\"\"function loads a random images from a random folder in our test path \"\"\"\n",
527
+ " folders = list(filter(lambda x: os.path.isdir(os.path.join(path, x)), os.listdir(path)))\n",
528
+ " random_directory = np.random.randint(0,len(folders))\n",
529
+ " path_class = folders[random_directory]\n",
530
+ " file_path = path + path_class\n",
531
+ " file_names = [f for f in listdir(file_path) if isfile(join(file_path, f))]\n",
532
+ " random_file_index = np.random.randint(0,len(file_names))\n",
533
+ " image_name = file_names[random_file_index]\n",
534
+ " final_path = file_path + \"/\" + image_name\n",
535
+ " return image.load_img(final_path, target_size = (img_width, img_height),grayscale=True), final_path, path_class\n",
536
+ "\n",
537
+ "# dimensions of our images\n",
538
+ "img_width, img_height = 48, 48\n",
539
+ "\n",
540
+ "# We use a very small learning rate \n",
541
+ "model.compile(loss = 'categorical_crossentropy',\n",
542
+ " optimizer = RMSprop(lr = 0.001),\n",
543
+ " metrics = ['accuracy'])\n",
544
+ "\n",
545
+ "files = []\n",
546
+ "predictions = []\n",
547
+ "true_labels = []\n",
548
+ "\n",
549
+ "# predicting images\n",
550
+ "for i in range(0, 10):\n",
551
+ " path = './fer2013/validation/' \n",
552
+ " img, final_path, true_label = getRandomImage(path, img_width, img_height)\n",
553
+ " files.append(final_path)\n",
554
+ " true_labels.append(true_label)\n",
555
+ " x = image.img_to_array(img)\n",
556
+ " x = x * 1./255\n",
557
+ " x = np.expand_dims(x, axis=0)\n",
558
+ " images = np.vstack([x])\n",
559
+ " classes = model.predict_classes(images, batch_size = 10)\n",
560
+ " predictions.append(classes)\n",
561
+ " \n",
562
+ "for i in range(0, len(files)):\n",
563
+ " image = cv2.imread((files[i]))\n",
564
+ " image = cv2.resize(image, None, fx=3, fy=3, interpolation = cv2.INTER_CUBIC)\n",
565
+ " draw_test(\"Prediction\", class_labels[predictions[i][0]], image, true_labels[i])\n",
566
+ " cv2.waitKey(0)\n",
567
+ "\n",
568
+ "cv2.destroyAllWindows()"
569
+ ]
570
+ },
571
+ {
572
+ "cell_type": "markdown",
573
+ "metadata": {},
574
+ "source": [
575
+ "### Test on a single image"
576
+ ]
577
+ },
578
+ {
579
+ "cell_type": "code",
580
+ "execution_count": 27,
581
+ "metadata": {},
582
+ "outputs": [],
583
+ "source": [
584
+ "from tensorflow.keras.models import load_model\n",
585
+ "from tensorflow.keras.preprocessing import image\n",
586
+ "import numpy as np\n",
587
+ "import os\n",
588
+ "import cv2\n",
589
+ "import numpy as np\n",
590
+ "from os import listdir\n",
591
+ "from os.path import isfile, join\n",
592
+ "from tensorflow.keras.preprocessing.image import img_to_array\n",
593
+ "\n",
594
+ "face_classifier = cv2.CascadeClassifier('./Haarcascades/haarcascade_frontalface_default.xml')\n",
595
+ "\n",
596
+ "def face_detector(img):\n",
597
+ " # Convert image to grayscale\n",
598
+ " gray = cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY)\n",
599
+ " faces = face_classifier.detectMultiScale(gray, 1.3, 5)\n",
600
+ " if faces is ():\n",
601
+ " return (0,0,0,0), np.zeros((48,48), np.uint8), img\n",
602
+ " \n",
603
+ " allfaces = [] \n",
604
+ " rects = []\n",
605
+ " for (x,y,w,h) in faces:\n",
606
+ " cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)\n",
607
+ " roi_gray = gray[y:y+h, x:x+w]\n",
608
+ " roi_gray = cv2.resize(roi_gray, (48, 48), interpolation = cv2.INTER_AREA)\n",
609
+ " allfaces.append(roi_gray)\n",
610
+ " rects.append((x,w,y,h))\n",
611
+ " return rects, allfaces, img\n",
612
+ "\n",
613
+ "img = cv2.imread(\"rajeev.jpg\")\n",
614
+ "rects, faces, image = face_detector(img)\n",
615
+ "\n",
616
+ "i = 0\n",
617
+ "for face in faces:\n",
618
+ " roi = face.astype(\"float\") / 255.0\n",
619
+ " roi = img_to_array(roi)\n",
620
+ " roi = np.expand_dims(roi, axis=0)\n",
621
+ "\n",
622
+ " # make a prediction on the ROI, then lookup the class\n",
623
+ " preds = classifier.predict(roi)[0]\n",
624
+ " label = class_labels[preds.argmax()] \n",
625
+ "\n",
626
+ " #Overlay our detected emotion on our pic\n",
627
+ " label_position = (rects[i][0] + int((rects[i][1]/2)), abs(rects[i][2] - 10))\n",
628
+ " i =+ 1\n",
629
+ " cv2.putText(image, label, label_position , cv2.FONT_HERSHEY_SIMPLEX,1, (0,255,0), 2)\n",
630
+ " \n",
631
+ "cv2.imshow(\"Emotion Detector\", image)\n",
632
+ "cv2.waitKey(0)\n",
633
+ "\n",
634
+ "cv2.destroyAllWindows()"
635
+ ]
636
+ },
637
+ {
638
+ "cell_type": "markdown",
639
+ "metadata": {},
640
+ "source": [
641
+ "### Let's try this on our webcam\n"
642
+ ]
643
+ },
644
+ {
645
+ "cell_type": "code",
646
+ "execution_count": 29,
647
+ "metadata": {},
648
+ "outputs": [],
649
+ "source": [
650
+ "import cv2\n",
651
+ "import numpy as np\n",
652
+ "from time import sleep\n",
653
+ "from tensorflow.keras.preprocessing.image import img_to_array\n",
654
+ "\n",
655
+ "face_classifier = cv2.CascadeClassifier('./Haarcascades/haarcascade_frontalface_default.xml')\n",
656
+ "\n",
657
+ "def face_detector(img):\n",
658
+ " # Convert image to grayscale\n",
659
+ " gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)\n",
660
+ " faces = face_classifier.detectMultiScale(gray, 1.3, 5)\n",
661
+ " if faces is ():\n",
662
+ " return (0,0,0,0), np.zeros((48,48), np.uint8), img\n",
663
+ " \n",
664
+ " for (x,y,w,h) in faces:\n",
665
+ " cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)\n",
666
+ " roi_gray = gray[y:y+h, x:x+w]\n",
667
+ "\n",
668
+ " try:\n",
669
+ " roi_gray = cv2.resize(roi_gray, (48, 48), interpolation = cv2.INTER_AREA)\n",
670
+ " except:\n",
671
+ " return (x,w,y,h), np.zeros((48,48), np.uint8), img\n",
672
+ " return (x,w,y,h), roi_gray, img\n",
673
+ "\n",
674
+ "cap = cv2.VideoCapture(0)\n",
675
+ "\n",
676
+ "while True:\n",
677
+ "\n",
678
+ " ret, frame = cap.read()\n",
679
+ " rect, face, image = face_detector(frame)\n",
680
+ " if np.sum([face]) != 0.0:\n",
681
+ " roi = face.astype(\"float\") / 255.0\n",
682
+ " roi = img_to_array(roi)\n",
683
+ " roi = np.expand_dims(roi, axis=0)\n",
684
+ "\n",
685
+ " # make a prediction on the ROI, then lookup the class\n",
686
+ " preds = classifier.predict(roi)[0]\n",
687
+ " label = class_labels[preds.argmax()] \n",
688
+ " label_position = (rect[0] + int((rect[1]/2)), rect[2] + 25)\n",
689
+ " cv2.putText(image, label, label_position , cv2.FONT_HERSHEY_SIMPLEX,2, (0,255,0), 3)\n",
690
+ " else:\n",
691
+ " cv2.putText(image, \"No Face Found\", (20, 60) , cv2.FONT_HERSHEY_SIMPLEX,2, (0,255,0), 3)\n",
692
+ " \n",
693
+ " cv2.imshow('All', image)\n",
694
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
695
+ " break\n",
696
+ " \n",
697
+ "cap.release()\n",
698
+ "cv2.destroyAllWindows() "
699
+ ]
700
+ }
701
+ ],
702
+ "metadata": {
703
+ "kernelspec": {
704
+ "display_name": "Python 3",
705
+ "language": "python",
706
+ "name": "python3"
707
+ },
708
+ "language_info": {
709
+ "codemirror_mode": {
710
+ "name": "ipython",
711
+ "version": 3
712
+ },
713
+ "file_extension": ".py",
714
+ "mimetype": "text/x-python",
715
+ "name": "python",
716
+ "nbconvert_exporter": "python",
717
+ "pygments_lexer": "ipython3",
718
+ "version": "3.7.4"
719
+ }
720
+ },
721
+ "nbformat": 4,
722
+ "nbformat_minor": 2
723
+ }
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.3A - Age, Gender Detection.ipynb ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "### Let's run our Age and Gender Detector\n",
8
+ "\n",
9
+ "- Please see https://github.com/yu4u/age-gender-estimation for source code project.\n",
10
+ "- In this notebook we re-use the model trained by yu4u"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": null,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": []
19
+ },
20
+ {
21
+ "cell_type": "code",
22
+ "execution_count": 5,
23
+ "metadata": {},
24
+ "outputs": [
25
+ {
26
+ "name": "stdout",
27
+ "output_type": "stream",
28
+ "text": [
29
+ "Downloading data from https://github.com/yu4u/age-gender-estimation/releases/download/v0.5/weights.28-3.73.hdf5\n",
30
+ "195854336/195848088 [==============================] - 173s 1us/step\n"
31
+ ]
32
+ }
33
+ ],
34
+ "source": [
35
+ "from pathlib import Path\n",
36
+ "import cv2\n",
37
+ "import dlib\n",
38
+ "import sys\n",
39
+ "import numpy as np\n",
40
+ "import argparse\n",
41
+ "from contextlib import contextmanager\n",
42
+ "from wide_resnet import WideResNet\n",
43
+ "from tensorflow.keras.utils import get_file\n",
44
+ "\n",
45
+ "# Load our cassade classifier for faces\n",
46
+ "face_classifier = cv2.CascadeClassifier('./Haarcascades/haarcascade_frontalface_default.xml')\n",
47
+ "\n",
48
+ "# Load our pretrained model for Gender and Age Detection\n",
49
+ "pretrained_model = \"https://github.com/yu4u/age-gender-estimation/releases/download/v0.5/weights.28-3.73.hdf5\"\n",
50
+ "modhash = 'fbe63257a054c1c5466cfd7bf14646d6'\n",
51
+ "\n",
52
+ "# Face Detection function\n",
53
+ "def face_detector(img):\n",
54
+ " # Convert image to grayscale for faster detection\n",
55
+ " gray = cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY)\n",
56
+ " faces = face_classifier.detectMultiScale(gray, 1.3, 5)\n",
57
+ " if faces is ():\n",
58
+ " return False ,(0,0,0,0), np.zeros((1,48,48,3), np.uint8), img\n",
59
+ " \n",
60
+ " allfaces = [] \n",
61
+ " rects = []\n",
62
+ " for (x,y,w,h) in faces:\n",
63
+ " cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)\n",
64
+ " roi = img[y:y+h, x:x+w]\n",
65
+ " roi_groiray = cv2.resize(roi, (64, 64), interpolation = cv2.INTER_AREA)\n",
66
+ " allfaces.append(roi)\n",
67
+ " rects.append((x,w,y,h))\n",
68
+ " return True, rects, allfaces, img\n",
69
+ "\n",
70
+ "# Define our model parameters\n",
71
+ "depth = 16\n",
72
+ "k = 8\n",
73
+ "weight_file = None\n",
74
+ "margin = 0.4\n",
75
+ "image_dir = None\n",
76
+ "\n",
77
+ "# Get our weight file \n",
78
+ "if not weight_file:\n",
79
+ " weight_file = get_file(\"weights.28-3.73.hdf5\", pretrained_model, cache_subdir=\"pretrained_models\",\n",
80
+ " file_hash=modhash, cache_dir=Path(sys.argv[0]).resolve().parent)\n",
81
+ "\n",
82
+ "# load model and weights\n",
83
+ "img_size = 64\n",
84
+ "model = WideResNet(img_size, depth=depth, k=k)()\n",
85
+ "model.load_weights(weight_file)\n",
86
+ "\n",
87
+ "# Initialize Webcam\n",
88
+ "cap = cv2.VideoCapture(0)\n",
89
+ "\n",
90
+ "while True:\n",
91
+ " ret, frame = cap.read()\n",
92
+ " ret, rects, faces, image = face_detector(frame)\n",
93
+ " preprocessed_faces = []\n",
94
+ " i = 0\n",
95
+ " if ret:\n",
96
+ " for (i,face) in enumerate(faces):\n",
97
+ " face = cv2.resize(face, (64, 64), interpolation = cv2.INTER_AREA)\n",
98
+ " preprocessed_faces.append(face)\n",
99
+ "\n",
100
+ " # make a prediction on the faces detected\n",
101
+ " results = model.predict(np.array(preprocessed_faces))\n",
102
+ " predicted_genders = results[0]\n",
103
+ " ages = np.arange(0, 101).reshape(101, 1)\n",
104
+ " predicted_ages = results[1].dot(ages).flatten()\n",
105
+ "\n",
106
+ " # draw results\n",
107
+ " for (i, f) in enumerate(faces):\n",
108
+ " label = \"{}, {}\".format(int(predicted_ages[i]),\n",
109
+ " \"F\" if predicted_genders[i][0] > 0.5 else \"M\")\n",
110
+ "\n",
111
+ " #Overlay our detected emotion on our pic\n",
112
+ " label_position = (rects[i][0] + int((rects[i][1]/2)), abs(rects[i][2] - 10))\n",
113
+ " i =+ 1\n",
114
+ " cv2.putText(image, label, label_position , cv2.FONT_HERSHEY_SIMPLEX,1, (0,255,0), 2)\n",
115
+ "\n",
116
+ " cv2.imshow(\"Emotion Detector\", image)\n",
117
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
118
+ " break\n",
119
+ "\n",
120
+ "cap.release()\n",
121
+ "cv2.destroyAllWindows() "
122
+ ]
123
+ },
124
+ {
125
+ "cell_type": "markdown",
126
+ "metadata": {},
127
+ "source": [
128
+ "## Note if you get the following error, you need to enable your webcam \n",
129
+ "<img src=\"error.jpg\">\n",
130
+ "### Enable your webcam by doing the following:\n",
131
+ "<img src=\"webcam.jpg\">"
132
+ ]
133
+ },
134
+ {
135
+ "cell_type": "code",
136
+ "execution_count": 2,
137
+ "metadata": {},
138
+ "outputs": [],
139
+ "source": [
140
+ "# Run these lines if your webcam fails to be realesed due to error in code\n",
141
+ "cap.release()\n",
142
+ "cv2.destroyAllWindows()"
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": null,
148
+ "metadata": {},
149
+ "outputs": [],
150
+ "source": []
151
+ }
152
+ ],
153
+ "metadata": {
154
+ "kernelspec": {
155
+ "display_name": "Python 3",
156
+ "language": "python",
157
+ "name": "python3"
158
+ },
159
+ "language_info": {
160
+ "codemirror_mode": {
161
+ "name": "ipython",
162
+ "version": 3
163
+ },
164
+ "file_extension": ".py",
165
+ "mimetype": "text/x-python",
166
+ "name": "python",
167
+ "nbconvert_exporter": "python",
168
+ "pygments_lexer": "ipython3",
169
+ "version": "3.7.4"
170
+ }
171
+ },
172
+ "nbformat": 4,
173
+ "nbformat_minor": 2
174
+ }
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/18.3B Age, Gender with Emotion.ipynb ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Age, Gender and Emotion Detection\n",
8
+ "\n",
9
+ "### Let's load our classfiers"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "execution_count": 3,
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "from pathlib import Path\n",
19
+ "import cv2\n",
20
+ "import dlib\n",
21
+ "import sys\n",
22
+ "import numpy as np\n",
23
+ "import argparse\n",
24
+ "from contextlib import contextmanager\n",
25
+ "from wide_resnet import WideResNet\n",
26
+ "from tensorflow.keras.utils import get_file\n",
27
+ "from tensorflow.keras.models import load_model\n",
28
+ "from tensorflow.keras.preprocessing.image import img_to_array\n",
29
+ "\n",
30
+ "classifier = load_model('emotion_little_vgg.h5')\n",
31
+ "face_classifier = cv2.CascadeClassifier('./Haarcascades/haarcascade_frontalface_default.xml')\n",
32
+ "pretrained_model = \"https://github.com/yu4u/age-gender-estimation/releases/download/v0.5/weights.28-3.73.hdf5\""
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "markdown",
37
+ "metadata": {},
38
+ "source": [
39
+ "### Testing our Emotion, Age and Gender Detector - Using Webcam"
40
+ ]
41
+ },
42
+ {
43
+ "cell_type": "code",
44
+ "execution_count": 4,
45
+ "metadata": {},
46
+ "outputs": [],
47
+ "source": [
48
+ "modhash = 'fbe63257a054c1c5466cfd7bf14646d6'\n",
49
+ "emotion_classes = {0: 'Angry', 1: 'Fear', 2: 'Happy', 3: 'Neutral', 4: 'Sad', 5: 'Surprise'}\n",
50
+ "\n",
51
+ "def face_detector(img):\n",
52
+ " # Convert image to grayscale for faster detection\n",
53
+ " gray = cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY)\n",
54
+ " faces = face_classifier.detectMultiScale(gray, 1.3, 5)\n",
55
+ " if faces is ():\n",
56
+ " return False ,(0,0,0,0), np.zeros((1,48,48,3), np.uint8), img\n",
57
+ " \n",
58
+ " allfaces = [] \n",
59
+ " rects = []\n",
60
+ " for (x,y,w,h) in faces:\n",
61
+ " cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)\n",
62
+ " roi = img[y:y+h, x:x+w]\n",
63
+ " allfaces.append(roi)\n",
64
+ " rects.append((x,w,y,h))\n",
65
+ " return True, rects, allfaces, img\n",
66
+ "\n",
67
+ "# Define our model parameters\n",
68
+ "depth = 16\n",
69
+ "k = 8\n",
70
+ "weight_file = None\n",
71
+ "margin = 0.4\n",
72
+ "image_dir = None\n",
73
+ "\n",
74
+ "# Get our weight file \n",
75
+ "if not weight_file:\n",
76
+ " weight_file = get_file(\"weights.28-3.73.hdf5\", pretrained_model, cache_subdir=\"pretrained_models\",\n",
77
+ " file_hash=modhash, cache_dir=Path(sys.argv[0]).resolve().parent)\n",
78
+ "\n",
79
+ "# load model and weights\n",
80
+ "img_size = 64\n",
81
+ "model = WideResNet(img_size, depth=depth, k=k)()\n",
82
+ "model.load_weights(weight_file)\n",
83
+ "\n",
84
+ "# Initialize Webcam\n",
85
+ "cap = cv2.VideoCapture(0)\n",
86
+ "\n",
87
+ "while True:\n",
88
+ " ret, frame = cap.read()\n",
89
+ " ret, rects, faces, image = face_detector(frame)\n",
90
+ " preprocessed_faces_ag = []\n",
91
+ " preprocessed_faces_emo = []\n",
92
+ " \n",
93
+ " if ret:\n",
94
+ " for (i,face) in enumerate(faces):\n",
95
+ " face_ag = cv2.resize(face, (64, 64), interpolation = cv2.INTER_AREA)\n",
96
+ " preprocessed_faces_ag.append(face_ag)\n",
97
+ "\n",
98
+ " face_gray_emo = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)\n",
99
+ " face_gray_emo = cv2.resize(face_gray_emo, (48, 48), interpolation = cv2.INTER_AREA)\n",
100
+ " face_gray_emo = face_gray_emo.astype(\"float\") / 255.0\n",
101
+ " face_gray_emo = img_to_array(face_gray_emo)\n",
102
+ " face_gray_emo = np.expand_dims(face_gray_emo, axis=0)\n",
103
+ " preprocessed_faces_emo.append(face_gray_emo)\n",
104
+ " \n",
105
+ " # make a prediction for Age and Gender\n",
106
+ " results = model.predict(np.array(preprocessed_faces_ag))\n",
107
+ " predicted_genders = results[0]\n",
108
+ " ages = np.arange(0, 101).reshape(101, 1)\n",
109
+ " predicted_ages = results[1].dot(ages).flatten()\n",
110
+ "\n",
111
+ " # make a prediction for Emotion \n",
112
+ " emo_labels = []\n",
113
+ " for (i, face) in enumerate(faces):\n",
114
+ " preds = classifier.predict(preprocessed_faces_emo[i])[0]\n",
115
+ " emo_labels.append(emotion_classes[preds.argmax()])\n",
116
+ " \n",
117
+ " # draw results, for Age and Gender\n",
118
+ " for (i, face) in enumerate(faces):\n",
119
+ " label = \"{}, {}, {}\".format(int(predicted_ages[i]),\n",
120
+ " \"F\" if predicted_genders[i][0] > 0.6 else \"M\",\n",
121
+ " emo_labels[i])\n",
122
+ " \n",
123
+ " #Overlay our detected emotion on our pic\n",
124
+ " for (i, face) in enumerate(faces):\n",
125
+ " label_position = (rects[i][0] + int((rects[i][1]/2)), abs(rects[i][2] - 10))\n",
126
+ " cv2.putText(image, label, label_position , cv2.FONT_HERSHEY_PLAIN,1, (0,255,0), 2)\n",
127
+ "\n",
128
+ " cv2.imshow(\"Emotion Detector\", image)\n",
129
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
130
+ " break\n",
131
+ "\n",
132
+ "cap.release()\n",
133
+ "cv2.destroyAllWindows() "
134
+ ]
135
+ },
136
+ {
137
+ "cell_type": "code",
138
+ "execution_count": 4,
139
+ "metadata": {},
140
+ "outputs": [],
141
+ "source": [
142
+ "cap.release()\n",
143
+ "cv2.destroyAllWindows() "
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "markdown",
148
+ "metadata": {},
149
+ "source": [
150
+ "### Testing our Emotion, Age and Gender Detector - On Images"
151
+ ]
152
+ },
153
+ {
154
+ "cell_type": "code",
155
+ "execution_count": 5,
156
+ "metadata": {},
157
+ "outputs": [
158
+ {
159
+ "ename": "FileNotFoundError",
160
+ "evalue": "[WinError 3] The system cannot find the path specified: './images/'",
161
+ "output_type": "error",
162
+ "traceback": [
163
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
164
+ "\u001b[1;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
165
+ "\u001b[1;32m<ipython-input-5-d6f5a6d6ebc5>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 42\u001b[0m \u001b[0mmodel\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload_weights\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mweight_file\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 43\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 44\u001b[1;33m \u001b[0mimage_names\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mf\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mf\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mlistdir\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mimage_path\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0misfile\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjoin\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mimage_path\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mf\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 45\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 46\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mimage_name\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mimage_names\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
166
+ "\u001b[1;31mFileNotFoundError\u001b[0m: [WinError 3] The system cannot find the path specified: './images/'"
167
+ ]
168
+ }
169
+ ],
170
+ "source": [
171
+ "from os import listdir\n",
172
+ "from os.path import isfile, join\n",
173
+ "import os\n",
174
+ "import cv2\n",
175
+ "\n",
176
+ "# Define Image Path Here\n",
177
+ "image_path = \"./images/\"\n",
178
+ "\n",
179
+ "modhash = 'fbe63257a054c1c5466cfd7bf14646d6'\n",
180
+ "emotion_classes = {0: 'Angry', 1: 'Fear', 2: 'Happy', 3: 'Neutral', 4: 'Sad', 5: 'Surprise'}\n",
181
+ "\n",
182
+ "def face_detector(img):\n",
183
+ " # Convert image to grayscale for faster detection\n",
184
+ " gray = cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY)\n",
185
+ " faces = face_classifier.detectMultiScale(gray, 1.3, 5)\n",
186
+ " if faces is ():\n",
187
+ " return False ,(0,0,0,0), np.zeros((1,48,48,3), np.uint8), img\n",
188
+ " \n",
189
+ " allfaces = [] \n",
190
+ " rects = []\n",
191
+ " for (x,y,w,h) in faces:\n",
192
+ " cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)\n",
193
+ " roi = img[y:y+h, x:x+w]\n",
194
+ " allfaces.append(roi)\n",
195
+ " rects.append((x,w,y,h))\n",
196
+ " return True, rects, allfaces, img\n",
197
+ "\n",
198
+ "# Define our model parameters\n",
199
+ "depth = 16\n",
200
+ "k = 8\n",
201
+ "weight_file = None\n",
202
+ "margin = 0.4\n",
203
+ "image_dir = None\n",
204
+ "\n",
205
+ "# Get our weight file \n",
206
+ "if not weight_file:\n",
207
+ " weight_file = get_file(\"weights.28-3.73.hdf5\", pretrained_model, cache_subdir=\"pretrained_models\",\n",
208
+ " file_hash=modhash, cache_dir=Path(sys.argv[0]).resolve().parent)\n",
209
+ "# load model and weights\n",
210
+ "img_size = 64\n",
211
+ "model = WideResNet(img_size, depth=depth, k=k)()\n",
212
+ "model.load_weights(weight_file)\n",
213
+ "\n",
214
+ "image_names = [f for f in listdir(image_path) if isfile(join(image_path, f))]\n",
215
+ "\n",
216
+ "for image_name in image_names:\n",
217
+ " frame = cv2.imread(\"./images/\" + image_name)\n",
218
+ " ret, rects, faces, image = face_detector(frame)\n",
219
+ " preprocessed_faces_ag = []\n",
220
+ " preprocessed_faces_emo = []\n",
221
+ " \n",
222
+ " if ret:\n",
223
+ " for (i,face) in enumerate(faces):\n",
224
+ " face_ag = cv2.resize(face, (64, 64), interpolation = cv2.INTER_AREA)\n",
225
+ " preprocessed_faces_ag.append(face_ag)\n",
226
+ "\n",
227
+ " face_gray_emo = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)\n",
228
+ " face_gray_emo = cv2.resize(face_gray_emo, (48, 48), interpolation = cv2.INTER_AREA)\n",
229
+ " face_gray_emo = face_gray_emo.astype(\"float\") / 255.0\n",
230
+ " face_gray_emo = img_to_array(face_gray_emo)\n",
231
+ " face_gray_emo = np.expand_dims(face_gray_emo, axis=0)\n",
232
+ " preprocessed_faces_emo.append(face_gray_emo)\n",
233
+ " \n",
234
+ " # make a prediction for Age and Gender\n",
235
+ " results = model.predict(np.array(preprocessed_faces_ag))\n",
236
+ " predicted_genders = results[0]\n",
237
+ " ages = np.arange(0, 101).reshape(101, 1)\n",
238
+ " predicted_ages = results[1].dot(ages).flatten()\n",
239
+ "\n",
240
+ " # make a prediction for Emotion \n",
241
+ " emo_labels = []\n",
242
+ " for (i, face) in enumerate(faces):\n",
243
+ " preds = classifier.predict(preprocessed_faces_emo[i])[0]\n",
244
+ " emo_labels.append(emotion_classes[preds.argmax()])\n",
245
+ " \n",
246
+ " # draw results, for Age and Gender\n",
247
+ " for (i, face) in enumerate(faces):\n",
248
+ " label = \"{}, {}, {}\".format(int(predicted_ages[i]),\n",
249
+ " \"F\" if predicted_genders[i][0] > 0.4 else \"M\",\n",
250
+ " emo_labels[i])\n",
251
+ " \n",
252
+ " #Overlay our detected emotion on our pic\n",
253
+ " for (i, face) in enumerate(faces):\n",
254
+ " label_position = (rects[i][0] + int((rects[i][1]/2)), abs(rects[i][2] - 10))\n",
255
+ " cv2.putText(image, label, label_position , cv2.FONT_HERSHEY_PLAIN,1, (0,255,0), 2)\n",
256
+ "\n",
257
+ " cv2.imshow(\"Emotion Detector\", image)\n",
258
+ " cv2.waitKey(0)\n",
259
+ "\n",
260
+ "cv2.destroyAllWindows() "
261
+ ]
262
+ },
263
+ {
264
+ "cell_type": "markdown",
265
+ "metadata": {},
266
+ "source": [
267
+ "### Using Dlib's Face Detection"
268
+ ]
269
+ },
270
+ {
271
+ "cell_type": "code",
272
+ "execution_count": 7,
273
+ "metadata": {},
274
+ "outputs": [],
275
+ "source": [
276
+ "from os import listdir\n",
277
+ "from os.path import isfile, join\n",
278
+ "import os\n",
279
+ "import cv2\n",
280
+ "\n",
281
+ "# Define Image Path Here\n",
282
+ "image_path = \"./images/\"\n",
283
+ "\n",
284
+ "modhash = 'fbe63257a054c1c5466cfd7bf14646d6'\n",
285
+ "emotion_classes = {0: 'Angry', 1: 'Fear', 2: 'Happy', 3: 'Neutral', 4: 'Sad', 5: 'Surprise'}\n",
286
+ "\n",
287
+ "def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,\n",
288
+ " font_scale=0.8, thickness=1):\n",
289
+ " size = cv2.getTextSize(label, font, font_scale, thickness)[0]\n",
290
+ " x, y = point\n",
291
+ " cv2.rectangle(image, (x, y - size[1]), (x + size[0], y), (255, 0, 0), cv2.FILLED)\n",
292
+ " cv2.putText(image, label, point, font, font_scale, (255, 255, 255), thickness, lineType=cv2.LINE_AA)\n",
293
+ " \n",
294
+ "\n",
295
+ "# Define our model parameters\n",
296
+ "depth = 16\n",
297
+ "k = 8\n",
298
+ "weight_file = None\n",
299
+ "margin = 0.4\n",
300
+ "image_dir = None\n",
301
+ "\n",
302
+ "# Get our weight file \n",
303
+ "if not weight_file:\n",
304
+ " weight_file = get_file(\"weights.28-3.73.hdf5\", pretrained_model, cache_subdir=\"pretrained_models\",\n",
305
+ " file_hash=modhash, cache_dir=Path(sys.argv[0]).resolve().parent)\n",
306
+ "# load model and weights\n",
307
+ "img_size = 64\n",
308
+ "model = WideResNet(img_size, depth=depth, k=k)()\n",
309
+ "model.load_weights(weight_file)\n",
310
+ "\n",
311
+ "detector = dlib.get_frontal_face_detector()\n",
312
+ "\n",
313
+ "image_names = [f for f in listdir(image_path) if isfile(join(image_path, f))]\n",
314
+ "\n",
315
+ "for image_name in image_names:\n",
316
+ " frame = cv2.imread(\"./images/\" + image_name)\n",
317
+ " preprocessed_faces_emo = [] \n",
318
+ " \n",
319
+ " input_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
320
+ " img_h, img_w, _ = np.shape(input_img)\n",
321
+ " detected = detector(frame, 1)\n",
322
+ " faces = np.empty((len(detected), img_size, img_size, 3))\n",
323
+ " \n",
324
+ " preprocessed_faces_emo = []\n",
325
+ " if len(detected) > 0:\n",
326
+ " for i, d in enumerate(detected):\n",
327
+ " x1, y1, x2, y2, w, h = d.left(), d.top(), d.right() + 1, d.bottom() + 1, d.width(), d.height()\n",
328
+ " xw1 = max(int(x1 - margin * w), 0)\n",
329
+ " yw1 = max(int(y1 - margin * h), 0)\n",
330
+ " xw2 = min(int(x2 + margin * w), img_w - 1)\n",
331
+ " yw2 = min(int(y2 + margin * h), img_h - 1)\n",
332
+ " cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)\n",
333
+ " # cv2.rectangle(img, (xw1, yw1), (xw2, yw2), (255, 0, 0), 2)\n",
334
+ " faces[i, :, :, :] = cv2.resize(frame[yw1:yw2 + 1, xw1:xw2 + 1, :], (img_size, img_size))\n",
335
+ " face = frame[yw1:yw2 + 1, xw1:xw2 + 1, :]\n",
336
+ " face_gray_emo = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)\n",
337
+ " face_gray_emo = cv2.resize(face_gray_emo, (48, 48), interpolation = cv2.INTER_AREA)\n",
338
+ " face_gray_emo = face_gray_emo.astype(\"float\") / 255.0\n",
339
+ " face_gray_emo = img_to_array(face_gray_emo)\n",
340
+ " face_gray_emo = np.expand_dims(face_gray_emo, axis=0)\n",
341
+ " preprocessed_faces_emo.append(face_gray_emo)\n",
342
+ "\n",
343
+ " # make a prediction for Age and Gender\n",
344
+ " results = model.predict(np.array(faces))\n",
345
+ " predicted_genders = results[0]\n",
346
+ " ages = np.arange(0, 101).reshape(101, 1)\n",
347
+ " predicted_ages = results[1].dot(ages).flatten()\n",
348
+ "\n",
349
+ " # make a prediction for Emotion \n",
350
+ " emo_labels = []\n",
351
+ " for i, d in enumerate(detected):\n",
352
+ " preds = classifier.predict(preprocessed_faces_emo[i])[0]\n",
353
+ " emo_labels.append(emotion_classes[preds.argmax()])\n",
354
+ " \n",
355
+ " # draw results\n",
356
+ " for i, d in enumerate(detected):\n",
357
+ " label = \"{}, {}, {}\".format(int(predicted_ages[i]),\n",
358
+ " \"F\" if predicted_genders[i][0] > 0.4 else \"M\", emo_labels[i])\n",
359
+ " draw_label(frame, (d.left(), d.top()), label)\n",
360
+ "\n",
361
+ " cv2.imshow(\"Emotion Detector\", frame)\n",
362
+ " cv2.waitKey(0)\n",
363
+ "\n",
364
+ "cv2.destroyAllWindows() "
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "markdown",
369
+ "metadata": {},
370
+ "source": [
371
+ "### And now using dlib's detector with our webcam"
372
+ ]
373
+ },
374
+ {
375
+ "cell_type": "code",
376
+ "execution_count": 8,
377
+ "metadata": {},
378
+ "outputs": [],
379
+ "source": [
380
+ "from os import listdir\n",
381
+ "from os.path import isfile, join\n",
382
+ "import os\n",
383
+ "import cv2\n",
384
+ "\n",
385
+ "# Define Image Path Here\n",
386
+ "image_path = \"./images/\"\n",
387
+ "\n",
388
+ "modhash = 'fbe63257a054c1c5466cfd7bf14646d6'\n",
389
+ "emotion_classes = {0: 'Angry', 1: 'Fear', 2: 'Happy', 3: 'Neutral', 4: 'Sad', 5: 'Surprise'}\n",
390
+ "\n",
391
+ "def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,\n",
392
+ " font_scale=0.8, thickness=1):\n",
393
+ " size = cv2.getTextSize(label, font, font_scale, thickness)[0]\n",
394
+ " x, y = point\n",
395
+ " cv2.rectangle(image, (x, y - size[1]), (x + size[0], y), (255, 0, 0), cv2.FILLED)\n",
396
+ " cv2.putText(image, label, point, font, font_scale, (255, 255, 255), thickness, lineType=cv2.LINE_AA)\n",
397
+ " \n",
398
+ "\n",
399
+ "# Define our model parameters\n",
400
+ "depth = 16\n",
401
+ "k = 8\n",
402
+ "weight_file = None\n",
403
+ "margin = 0.4\n",
404
+ "image_dir = None\n",
405
+ "\n",
406
+ "# Get our weight file \n",
407
+ "if not weight_file:\n",
408
+ " weight_file = get_file(\"weights.28-3.73.hdf5\", pretrained_model, cache_subdir=\"pretrained_models\",\n",
409
+ " file_hash=modhash, cache_dir=Path(sys.argv[0]).resolve().parent)\n",
410
+ "# load model and weights\n",
411
+ "img_size = 64\n",
412
+ "model = WideResNet(img_size, depth=depth, k=k)()\n",
413
+ "model.load_weights(weight_file)\n",
414
+ "\n",
415
+ "detector = dlib.get_frontal_face_detector()\n",
416
+ "\n",
417
+ "# Initialize Webcam\n",
418
+ "cap = cv2.VideoCapture(0)\n",
419
+ "\n",
420
+ "while True:\n",
421
+ " ret, frame = cap.read()\n",
422
+ " preprocessed_faces_emo = [] \n",
423
+ " \n",
424
+ " input_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
425
+ " img_h, img_w, _ = np.shape(input_img)\n",
426
+ " detected = detector(frame, 1)\n",
427
+ " faces = np.empty((len(detected), img_size, img_size, 3))\n",
428
+ " \n",
429
+ " preprocessed_faces_emo = []\n",
430
+ " if len(detected) > 0:\n",
431
+ " for i, d in enumerate(detected):\n",
432
+ " x1, y1, x2, y2, w, h = d.left(), d.top(), d.right() + 1, d.bottom() + 1, d.width(), d.height()\n",
433
+ " xw1 = max(int(x1 - margin * w), 0)\n",
434
+ " yw1 = max(int(y1 - margin * h), 0)\n",
435
+ " xw2 = min(int(x2 + margin * w), img_w - 1)\n",
436
+ " yw2 = min(int(y2 + margin * h), img_h - 1)\n",
437
+ " cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)\n",
438
+ " # cv2.rectangle(img, (xw1, yw1), (xw2, yw2), (255, 0, 0), 2)\n",
439
+ " faces[i, :, :, :] = cv2.resize(frame[yw1:yw2 + 1, xw1:xw2 + 1, :], (img_size, img_size))\n",
440
+ " face = frame[yw1:yw2 + 1, xw1:xw2 + 1, :]\n",
441
+ " face_gray_emo = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)\n",
442
+ " face_gray_emo = cv2.resize(face_gray_emo, (48, 48), interpolation = cv2.INTER_AREA)\n",
443
+ " face_gray_emo = face_gray_emo.astype(\"float\") / 255.0\n",
444
+ " face_gray_emo = img_to_array(face_gray_emo)\n",
445
+ " face_gray_emo = np.expand_dims(face_gray_emo, axis=0)\n",
446
+ " preprocessed_faces_emo.append(face_gray_emo)\n",
447
+ "\n",
448
+ " # make a prediction for Age and Gender\n",
449
+ " results = model.predict(np.array(faces))\n",
450
+ " predicted_genders = results[0]\n",
451
+ " ages = np.arange(0, 101).reshape(101, 1)\n",
452
+ " predicted_ages = results[1].dot(ages).flatten()\n",
453
+ "\n",
454
+ " # make a prediction for Emotion \n",
455
+ " emo_labels = []\n",
456
+ " for i, d in enumerate(detected):\n",
457
+ " preds = classifier.predict(preprocessed_faces_emo[i])[0]\n",
458
+ " emo_labels.append(emotion_classes[preds.argmax()])\n",
459
+ " \n",
460
+ " # draw results\n",
461
+ " for i, d in enumerate(detected):\n",
462
+ " label = \"{}, {}, {}\".format(int(predicted_ages[i]),\n",
463
+ " \"F\" if predicted_genders[i][0] > 0.4 else \"M\", emo_labels[i])\n",
464
+ " draw_label(frame, (d.left(), d.top()), label)\n",
465
+ "\n",
466
+ " cv2.imshow(\"Emotion Detector\", frame)\n",
467
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
468
+ " break\n",
469
+ "\n",
470
+ "cap.release()\n",
471
+ "cv2.destroyAllWindows() "
472
+ ]
473
+ },
474
+ {
475
+ "cell_type": "code",
476
+ "execution_count": null,
477
+ "metadata": {},
478
+ "outputs": [],
479
+ "source": [
480
+ "\n"
481
+ ]
482
+ },
483
+ {
484
+ "cell_type": "code",
485
+ "execution_count": null,
486
+ "metadata": {},
487
+ "outputs": [],
488
+ "source": []
489
+ },
490
+ {
491
+ "cell_type": "code",
492
+ "execution_count": null,
493
+ "metadata": {},
494
+ "outputs": [],
495
+ "source": []
496
+ },
497
+ {
498
+ "cell_type": "code",
499
+ "execution_count": null,
500
+ "metadata": {},
501
+ "outputs": [],
502
+ "source": []
503
+ }
504
+ ],
505
+ "metadata": {
506
+ "kernelspec": {
507
+ "display_name": "Python 3",
508
+ "language": "python",
509
+ "name": "python3"
510
+ },
511
+ "language_info": {
512
+ "codemirror_mode": {
513
+ "name": "ipython",
514
+ "version": 3
515
+ },
516
+ "file_extension": ".py",
517
+ "mimetype": "text/x-python",
518
+ "name": "python",
519
+ "nbconvert_exporter": "python",
520
+ "pygments_lexer": "ipython3",
521
+ "version": "3.7.4"
522
+ }
523
+ },
524
+ "nbformat": 4,
525
+ "nbformat_minor": 2
526
+ }
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/Face Detection - Friends Characters.ipynb ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Basic Deep Learning Face Recogntion\n",
8
+ "## Building a Friends TV Show Character Identifier"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "markdown",
13
+ "metadata": {},
14
+ "source": [
15
+ "### Let's train our model\n",
16
+ "I've created a dataset with the faces of 4 Friends characters taken from a handful of different scenes."
17
+ ]
18
+ },
19
+ {
20
+ "cell_type": "code",
21
+ "execution_count": 33,
22
+ "metadata": {},
23
+ "outputs": [
24
+ {
25
+ "name": "stdout",
26
+ "output_type": "stream",
27
+ "text": [
28
+ "Found 2663 images belonging to 4 classes.\n",
29
+ "Found 955 images belonging to 4 classes.\n"
30
+ ]
31
+ }
32
+ ],
33
+ "source": [
34
+ "from __future__ import print_function\n",
35
+ "import keras\n",
36
+ "from keras.preprocessing.image import ImageDataGenerator\n",
37
+ "from keras.models import Sequential\n",
38
+ "from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization\n",
39
+ "from keras.layers import Conv2D, MaxPooling2D\n",
40
+ "from keras.preprocessing.image import ImageDataGenerator\n",
41
+ "import os\n",
42
+ "\n",
43
+ "num_classes = 4\n",
44
+ "img_rows, img_cols = 48, 48\n",
45
+ "batch_size = 16\n",
46
+ "\n",
47
+ "train_data_dir = './faces/train'\n",
48
+ "validation_data_dir = './faces/validation'\n",
49
+ "\n",
50
+ "# Let's use some data augmentaiton \n",
51
+ "train_datagen = ImageDataGenerator(\n",
52
+ " rescale=1./255,\n",
53
+ " rotation_range=30,\n",
54
+ " shear_range=0.3,\n",
55
+ " zoom_range=0.3,\n",
56
+ " width_shift_range=0.4,\n",
57
+ " height_shift_range=0.4,\n",
58
+ " horizontal_flip=True,\n",
59
+ " fill_mode='nearest')\n",
60
+ " \n",
61
+ "validation_datagen = ImageDataGenerator(rescale=1./255)\n",
62
+ " \n",
63
+ "train_generator = train_datagen.flow_from_directory(\n",
64
+ " train_data_dir,\n",
65
+ " target_size=(img_rows, img_cols),\n",
66
+ " batch_size=batch_size,\n",
67
+ " class_mode='categorical',\n",
68
+ " shuffle=True)\n",
69
+ " \n",
70
+ "validation_generator = validation_datagen.flow_from_directory(\n",
71
+ " validation_data_dir,\n",
72
+ " target_size=(img_rows, img_cols),\n",
73
+ " batch_size=batch_size,\n",
74
+ " class_mode='categorical',\n",
75
+ " shuffle=True)"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "code",
80
+ "execution_count": 37,
81
+ "metadata": {},
82
+ "outputs": [],
83
+ "source": [
84
+ "#Our Keras imports\n",
85
+ "from keras.models import Sequential\n",
86
+ "from keras.layers.normalization import BatchNormalization\n",
87
+ "from keras.layers.convolutional import Conv2D, MaxPooling2D\n",
88
+ "from keras.layers.advanced_activations import ELU\n",
89
+ "from keras.layers.core import Activation, Flatten, Dropout, Dense"
90
+ ]
91
+ },
92
+ {
93
+ "cell_type": "markdown",
94
+ "metadata": {},
95
+ "source": [
96
+ "### Creating a simple VGG based model for Face Recognition"
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": 35,
102
+ "metadata": {},
103
+ "outputs": [
104
+ {
105
+ "name": "stdout",
106
+ "output_type": "stream",
107
+ "text": [
108
+ "_________________________________________________________________\n",
109
+ "Layer (type) Output Shape Param # \n",
110
+ "=================================================================\n",
111
+ "conv2d_25 (Conv2D) (None, 48, 48, 32) 896 \n",
112
+ "_________________________________________________________________\n",
113
+ "activation_34 (Activation) (None, 48, 48, 32) 0 \n",
114
+ "_________________________________________________________________\n",
115
+ "batch_normalization_31 (Batc (None, 48, 48, 32) 128 \n",
116
+ "_________________________________________________________________\n",
117
+ "conv2d_26 (Conv2D) (None, 48, 48, 32) 9248 \n",
118
+ "_________________________________________________________________\n",
119
+ "activation_35 (Activation) (None, 48, 48, 32) 0 \n",
120
+ "_________________________________________________________________\n",
121
+ "batch_normalization_32 (Batc (None, 48, 48, 32) 128 \n",
122
+ "_________________________________________________________________\n",
123
+ "max_pooling2d_13 (MaxPooling (None, 24, 24, 32) 0 \n",
124
+ "_________________________________________________________________\n",
125
+ "dropout_19 (Dropout) (None, 24, 24, 32) 0 \n",
126
+ "_________________________________________________________________\n",
127
+ "conv2d_27 (Conv2D) (None, 24, 24, 64) 18496 \n",
128
+ "_________________________________________________________________\n",
129
+ "activation_36 (Activation) (None, 24, 24, 64) 0 \n",
130
+ "_________________________________________________________________\n",
131
+ "batch_normalization_33 (Batc (None, 24, 24, 64) 256 \n",
132
+ "_________________________________________________________________\n",
133
+ "conv2d_28 (Conv2D) (None, 24, 24, 64) 36928 \n",
134
+ "_________________________________________________________________\n",
135
+ "activation_37 (Activation) (None, 24, 24, 64) 0 \n",
136
+ "_________________________________________________________________\n",
137
+ "batch_normalization_34 (Batc (None, 24, 24, 64) 256 \n",
138
+ "_________________________________________________________________\n",
139
+ "max_pooling2d_14 (MaxPooling (None, 12, 12, 64) 0 \n",
140
+ "_________________________________________________________________\n",
141
+ "dropout_20 (Dropout) (None, 12, 12, 64) 0 \n",
142
+ "_________________________________________________________________\n",
143
+ "conv2d_29 (Conv2D) (None, 12, 12, 128) 73856 \n",
144
+ "_________________________________________________________________\n",
145
+ "activation_38 (Activation) (None, 12, 12, 128) 0 \n",
146
+ "_________________________________________________________________\n",
147
+ "batch_normalization_35 (Batc (None, 12, 12, 128) 512 \n",
148
+ "_________________________________________________________________\n",
149
+ "conv2d_30 (Conv2D) (None, 12, 12, 128) 147584 \n",
150
+ "_________________________________________________________________\n",
151
+ "activation_39 (Activation) (None, 12, 12, 128) 0 \n",
152
+ "_________________________________________________________________\n",
153
+ "batch_normalization_36 (Batc (None, 12, 12, 128) 512 \n",
154
+ "_________________________________________________________________\n",
155
+ "max_pooling2d_15 (MaxPooling (None, 6, 6, 128) 0 \n",
156
+ "_________________________________________________________________\n",
157
+ "dropout_21 (Dropout) (None, 6, 6, 128) 0 \n",
158
+ "_________________________________________________________________\n",
159
+ "conv2d_31 (Conv2D) (None, 6, 6, 256) 295168 \n",
160
+ "_________________________________________________________________\n",
161
+ "activation_40 (Activation) (None, 6, 6, 256) 0 \n",
162
+ "_________________________________________________________________\n",
163
+ "batch_normalization_37 (Batc (None, 6, 6, 256) 1024 \n",
164
+ "_________________________________________________________________\n",
165
+ "conv2d_32 (Conv2D) (None, 6, 6, 256) 590080 \n",
166
+ "_________________________________________________________________\n",
167
+ "activation_41 (Activation) (None, 6, 6, 256) 0 \n",
168
+ "_________________________________________________________________\n",
169
+ "batch_normalization_38 (Batc (None, 6, 6, 256) 1024 \n",
170
+ "_________________________________________________________________\n",
171
+ "max_pooling2d_16 (MaxPooling (None, 3, 3, 256) 0 \n",
172
+ "_________________________________________________________________\n",
173
+ "dropout_22 (Dropout) (None, 3, 3, 256) 0 \n",
174
+ "_________________________________________________________________\n",
175
+ "flatten_4 (Flatten) (None, 2304) 0 \n",
176
+ "_________________________________________________________________\n",
177
+ "dense_10 (Dense) (None, 64) 147520 \n",
178
+ "_________________________________________________________________\n",
179
+ "activation_42 (Activation) (None, 64) 0 \n",
180
+ "_________________________________________________________________\n",
181
+ "batch_normalization_39 (Batc (None, 64) 256 \n",
182
+ "_________________________________________________________________\n",
183
+ "dropout_23 (Dropout) (None, 64) 0 \n",
184
+ "_________________________________________________________________\n",
185
+ "dense_11 (Dense) (None, 64) 4160 \n",
186
+ "_________________________________________________________________\n",
187
+ "activation_43 (Activation) (None, 64) 0 \n",
188
+ "_________________________________________________________________\n",
189
+ "batch_normalization_40 (Batc (None, 64) 256 \n",
190
+ "_________________________________________________________________\n",
191
+ "dropout_24 (Dropout) (None, 64) 0 \n",
192
+ "_________________________________________________________________\n",
193
+ "dense_12 (Dense) (None, 4) 260 \n",
194
+ "_________________________________________________________________\n",
195
+ "activation_44 (Activation) (None, 4) 0 \n",
196
+ "=================================================================\n",
197
+ "Total params: 1,328,548\n",
198
+ "Trainable params: 1,326,372\n",
199
+ "Non-trainable params: 2,176\n",
200
+ "_________________________________________________________________\n",
201
+ "None\n"
202
+ ]
203
+ }
204
+ ],
205
+ "source": [
206
+ "model = Sequential()\n",
207
+ "\n",
208
+ "model.add(Conv2D(32, (3, 3), padding = 'same', kernel_initializer=\"he_normal\",\n",
209
+ " input_shape = (img_rows, img_cols, 3)))\n",
210
+ "model.add(Activation('elu'))\n",
211
+ "model.add(BatchNormalization())\n",
212
+ "model.add(Conv2D(32, (3, 3), padding = \"same\", kernel_initializer=\"he_normal\", \n",
213
+ " input_shape = (img_rows, img_cols, 3)))\n",
214
+ "model.add(Activation('elu'))\n",
215
+ "model.add(BatchNormalization())\n",
216
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
217
+ "model.add(Dropout(0.2))\n",
218
+ "\n",
219
+ "# Block #2: second CONV => RELU => CONV => RELU => POOL\n",
220
+ "# layer set\n",
221
+ "model.add(Conv2D(64, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
222
+ "model.add(Activation('elu'))\n",
223
+ "model.add(BatchNormalization())\n",
224
+ "model.add(Conv2D(64, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
225
+ "model.add(Activation('elu'))\n",
226
+ "model.add(BatchNormalization())\n",
227
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
228
+ "model.add(Dropout(0.2))\n",
229
+ "\n",
230
+ "# Block #3: third CONV => RELU => CONV => RELU => POOL\n",
231
+ "# layer set\n",
232
+ "model.add(Conv2D(128, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
233
+ "model.add(Activation('elu'))\n",
234
+ "model.add(BatchNormalization())\n",
235
+ "model.add(Conv2D(128, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
236
+ "model.add(Activation('elu'))\n",
237
+ "model.add(BatchNormalization())\n",
238
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
239
+ "model.add(Dropout(0.2))\n",
240
+ "\n",
241
+ "# Block #4: third CONV => RELU => CONV => RELU => POOL\n",
242
+ "# layer set\n",
243
+ "model.add(Conv2D(256, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
244
+ "model.add(Activation('elu'))\n",
245
+ "model.add(BatchNormalization())\n",
246
+ "model.add(Conv2D(256, (3, 3), padding=\"same\", kernel_initializer=\"he_normal\"))\n",
247
+ "model.add(Activation('elu'))\n",
248
+ "model.add(BatchNormalization())\n",
249
+ "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
250
+ "model.add(Dropout(0.2))\n",
251
+ "\n",
252
+ "# Block #5: first set of FC => RELU layers\n",
253
+ "model.add(Flatten())\n",
254
+ "model.add(Dense(64, kernel_initializer=\"he_normal\"))\n",
255
+ "model.add(Activation('elu'))\n",
256
+ "model.add(BatchNormalization())\n",
257
+ "model.add(Dropout(0.5))\n",
258
+ "\n",
259
+ "# Block #6: second set of FC => RELU layers\n",
260
+ "model.add(Dense(64, kernel_initializer=\"he_normal\"))\n",
261
+ "model.add(Activation('elu'))\n",
262
+ "model.add(BatchNormalization())\n",
263
+ "model.add(Dropout(0.5))\n",
264
+ "\n",
265
+ "# Block #7: softmax classifier\n",
266
+ "model.add(Dense(num_classes, kernel_initializer=\"he_normal\"))\n",
267
+ "model.add(Activation(\"softmax\"))\n",
268
+ "\n",
269
+ "print(model.summary())"
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "markdown",
274
+ "metadata": {},
275
+ "source": [
276
+ "### Training our Model"
277
+ ]
278
+ },
279
+ {
280
+ "cell_type": "code",
281
+ "execution_count": 36,
282
+ "metadata": {},
283
+ "outputs": [
284
+ {
285
+ "name": "stdout",
286
+ "output_type": "stream",
287
+ "text": [
288
+ "Epoch 1/10\n",
289
+ "166/166 [==============================] - 76s 457ms/step - loss: 1.1153 - acc: 0.5700 - val_loss: 1.4428 - val_acc: 0.4841\n",
290
+ "\n",
291
+ "Epoch 00001: val_loss improved from inf to 1.44279, saving model to /home/deeplearningcv/DeepLearningCV/Trained Models/face_recognition_friends_vgg.h5\n",
292
+ "Epoch 2/10\n",
293
+ "166/166 [==============================] - 67s 403ms/step - loss: 0.7034 - acc: 0.7343 - val_loss: 3.7705 - val_acc: 0.2705\n",
294
+ "\n",
295
+ "Epoch 00002: val_loss did not improve from 1.44279\n",
296
+ "Epoch 3/10\n",
297
+ "166/166 [==============================] - 62s 373ms/step - loss: 0.6037 - acc: 0.7690 - val_loss: 0.9403 - val_acc: 0.6912\n",
298
+ "\n",
299
+ "Epoch 00003: val_loss improved from 1.44279 to 0.94025, saving model to /home/deeplearningcv/DeepLearningCV/Trained Models/face_recognition_friends_vgg.h5\n",
300
+ "Epoch 4/10\n",
301
+ "166/166 [==============================] - 62s 373ms/step - loss: 0.5432 - acc: 0.7988 - val_loss: 1.3018 - val_acc: 0.5548\n",
302
+ "\n",
303
+ "Epoch 00004: val_loss did not improve from 0.94025\n",
304
+ "Epoch 5/10\n",
305
+ "166/166 [==============================] - 69s 414ms/step - loss: 0.4715 - acc: 0.8301 - val_loss: 3.8879 - val_acc: 0.1534\n",
306
+ "\n",
307
+ "Epoch 00005: val_loss did not improve from 0.94025\n",
308
+ "Epoch 6/10\n",
309
+ "166/166 [==============================] - 77s 467ms/step - loss: 0.4233 - acc: 0.8524 - val_loss: 0.6878 - val_acc: 0.7093\n",
310
+ "\n",
311
+ "Epoch 00006: val_loss improved from 0.94025 to 0.68784, saving model to /home/deeplearningcv/DeepLearningCV/Trained Models/face_recognition_friends_vgg.h5\n",
312
+ "Epoch 7/10\n",
313
+ "166/166 [==============================] - 71s 429ms/step - loss: 0.4130 - acc: 0.8636 - val_loss: 3.3402 - val_acc: 0.2971\n",
314
+ "\n",
315
+ "Epoch 00007: val_loss did not improve from 0.68784\n",
316
+ "Epoch 8/10\n",
317
+ "166/166 [==============================] - 79s 477ms/step - loss: 0.3821 - acc: 0.8748 - val_loss: 2.6729 - val_acc: 0.6283\n",
318
+ "\n",
319
+ "Epoch 00008: val_loss did not improve from 0.68784\n",
320
+ "Epoch 9/10\n",
321
+ "166/166 [==============================] - 86s 519ms/step - loss: 0.3622 - acc: 0.8709 - val_loss: 1.5067 - val_acc: 0.5197\n",
322
+ "Restoring model weights from the end of the best epoch\n",
323
+ "\n",
324
+ "Epoch 00009: val_loss did not improve from 0.68784\n",
325
+ "\n",
326
+ "Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0019999999552965165.\n",
327
+ "Epoch 00009: early stopping\n"
328
+ ]
329
+ }
330
+ ],
331
+ "source": [
332
+ "from keras.optimizers import RMSprop, SGD, Adam\n",
333
+ "from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau\n",
334
+ "\n",
335
+ " \n",
336
+ "checkpoint = ModelCheckpoint(\"/home/deeplearningcv/DeepLearningCV/Trained Models/face_recognition_friends_vgg.h5\",\n",
337
+ " monitor=\"val_loss\",\n",
338
+ " mode=\"min\",\n",
339
+ " save_best_only = True,\n",
340
+ " verbose=1)\n",
341
+ "\n",
342
+ "earlystop = EarlyStopping(monitor = 'val_loss', \n",
343
+ " min_delta = 0, \n",
344
+ " patience = 3,\n",
345
+ " verbose = 1,\n",
346
+ " restore_best_weights = True)\n",
347
+ "\n",
348
+ "reduce_lr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.2, patience = 3, verbose = 1, min_delta = 0.0001)\n",
349
+ "\n",
350
+ "# we put our call backs into a callback list\n",
351
+ "callbacks = [earlystop, checkpoint, reduce_lr]\n",
352
+ "\n",
353
+ "# We use a very small learning rate \n",
354
+ "model.compile(loss = 'categorical_crossentropy',\n",
355
+ " optimizer = Adam(lr=0.01),\n",
356
+ " metrics = ['accuracy'])\n",
357
+ "\n",
358
+ "nb_train_samples = 2663\n",
359
+ "nb_validation_samples = 955\n",
360
+ "epochs = 10\n",
361
+ "\n",
362
+ "history = model.fit_generator(\n",
363
+ " train_generator,\n",
364
+ " steps_per_epoch = nb_train_samples // batch_size,\n",
365
+ " epochs = epochs,\n",
366
+ " callbacks = callbacks,\n",
367
+ " validation_data = validation_generator,\n",
368
+ " validation_steps = nb_validation_samples // batch_size)"
369
+ ]
370
+ },
371
+ {
372
+ "cell_type": "markdown",
373
+ "metadata": {},
374
+ "source": [
375
+ "#### Getting our Class Labels"
376
+ ]
377
+ },
378
+ {
379
+ "cell_type": "code",
380
+ "execution_count": 39,
381
+ "metadata": {},
382
+ "outputs": [
383
+ {
384
+ "data": {
385
+ "text/plain": [
386
+ "{0: 'Chandler', 1: 'Joey', 2: 'Pheobe', 3: 'Rachel'}"
387
+ ]
388
+ },
389
+ "execution_count": 39,
390
+ "metadata": {},
391
+ "output_type": "execute_result"
392
+ }
393
+ ],
394
+ "source": [
395
+ "class_labels = validation_generator.class_indices\n",
396
+ "class_labels = {v: k for k, v in class_labels.items()}\n",
397
+ "classes = list(class_labels.values())\n",
398
+ "class_labels"
399
+ ]
400
+ },
401
+ {
402
+ "cell_type": "code",
403
+ "execution_count": null,
404
+ "metadata": {},
405
+ "outputs": [],
406
+ "source": [
407
+ "# Load our model\n",
408
+ "from keras.models import load_model\n",
409
+ "\n",
410
+ "classifier = load_model('/home/deeplearningcv/DeepLearningCV/Trained Models/face_recognition_friends_vgg.h5')"
411
+ ]
412
+ },
413
+ {
414
+ "cell_type": "markdown",
415
+ "metadata": {},
416
+ "source": [
417
+ "### Testing our model on some real video"
418
+ ]
419
+ },
420
+ {
421
+ "cell_type": "code",
422
+ "execution_count": 43,
423
+ "metadata": {},
424
+ "outputs": [],
425
+ "source": [
426
+ "from os import listdir\n",
427
+ "from os.path import isfile, join\n",
428
+ "import os\n",
429
+ "import cv2\n",
430
+ "import numpy as np\n",
431
+ "\n",
432
+ "\n",
433
+ "face_classes = {0: 'Chandler', 1: 'Joey', 2: 'Pheobe', 3: 'Rachel'}\n",
434
+ "\n",
435
+ "def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,\n",
436
+ " font_scale=0.8, thickness=1):\n",
437
+ " size = cv2.getTextSize(label, font, font_scale, thickness)[0]\n",
438
+ " x, y = point\n",
439
+ " cv2.rectangle(image, (x, y - size[1]), (x + size[0], y), (255, 0, 0), cv2.FILLED)\n",
440
+ " cv2.putText(image, label, point, font, font_scale, (255, 255, 255), thickness, lineType=cv2.LINE_AA)\n",
441
+ " \n",
442
+ "margin = 0.2\n",
443
+ "# load model and weights\n",
444
+ "img_size = 64\n",
445
+ "\n",
446
+ "detector = dlib.get_frontal_face_detector()\n",
447
+ "\n",
448
+ "cap = cv2.VideoCapture('testfriends.mp4')\n",
449
+ "\n",
450
+ "while True:\n",
451
+ " ret, frame = cap.read()\n",
452
+ " frame = cv2.resize(frame, None, fx=0.5, fy=0.5, interpolation = cv2.INTER_LINEAR)\n",
453
+ " preprocessed_faces = [] \n",
454
+ " \n",
455
+ " input_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
456
+ " img_h, img_w, _ = np.shape(input_img)\n",
457
+ " detected = detector(frame, 1)\n",
458
+ " faces = np.empty((len(detected), img_size, img_size, 3))\n",
459
+ " \n",
460
+ " preprocessed_faces_emo = []\n",
461
+ " if len(detected) > 0:\n",
462
+ " for i, d in enumerate(detected):\n",
463
+ " x1, y1, x2, y2, w, h = d.left(), d.top(), d.right() + 1, d.bottom() + 1, d.width(), d.height()\n",
464
+ " xw1 = max(int(x1 - margin * w), 0)\n",
465
+ " yw1 = max(int(y1 - margin * h), 0)\n",
466
+ " xw2 = min(int(x2 + margin * w), img_w - 1)\n",
467
+ " yw2 = min(int(y2 + margin * h), img_h - 1)\n",
468
+ " cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)\n",
469
+ " # cv2.rectangle(img, (xw1, yw1), (xw2, yw2), (255, 0, 0), 2)\n",
470
+ " #faces[i, :, :, :] = cv2.resize(frame[yw1:yw2 + 1, xw1:xw2 + 1, :], (img_size, img_size))\n",
471
+ " face = frame[yw1:yw2 + 1, xw1:xw2 + 1, :]\n",
472
+ " face = cv2.resize(face, (48, 48), interpolation = cv2.INTER_AREA)\n",
473
+ " face = face.astype(\"float\") / 255.0\n",
474
+ " face = img_to_array(face)\n",
475
+ " face = np.expand_dims(face, axis=0)\n",
476
+ " preprocessed_faces.append(face)\n",
477
+ "\n",
478
+ " # make a prediction for Emotion \n",
479
+ " face_labels = []\n",
480
+ " for i, d in enumerate(detected):\n",
481
+ " preds = classifier.predict(preprocessed_faces[i])[0]\n",
482
+ " face_labels.append(face_classes[preds.argmax()])\n",
483
+ " \n",
484
+ " # draw results\n",
485
+ " for i, d in enumerate(detected):\n",
486
+ " label = \"{}\".format(face_labels[i])\n",
487
+ " draw_label(frame, (d.left(), d.top()), label)\n",
488
+ "\n",
489
+ " cv2.imshow(\"Friend Character Identifier\", frame)\n",
490
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
491
+ " break\n",
492
+ "\n",
493
+ "cap.release()\n",
494
+ "cv2.destroyAllWindows() "
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "code",
499
+ "execution_count": null,
500
+ "metadata": {},
501
+ "outputs": [],
502
+ "source": []
503
+ }
504
+ ],
505
+ "metadata": {
506
+ "kernelspec": {
507
+ "display_name": "Python 3",
508
+ "language": "python",
509
+ "name": "python3"
510
+ },
511
+ "language_info": {
512
+ "codemirror_mode": {
513
+ "name": "ipython",
514
+ "version": 3
515
+ },
516
+ "file_extension": ".py",
517
+ "mimetype": "text/x-python",
518
+ "name": "python",
519
+ "nbconvert_exporter": "python",
520
+ "pygments_lexer": "ipython3",
521
+ "version": "3.6.6"
522
+ }
523
+ },
524
+ "nbformat": 4,
525
+ "nbformat_minor": 2
526
+ }
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/Face Extraction from Video.ipynb ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "### Extracting the faces from a video"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "code",
12
+ "execution_count": null,
13
+ "metadata": {},
14
+ "outputs": [],
15
+ "source": [
16
+ "from os import listdir\n",
17
+ "from os.path import isfile, join\n",
18
+ "import os\n",
19
+ "import cv2\n",
20
+ "import dlib\n",
21
+ "import numpy as np\n",
22
+ "\n",
23
+ "# Define Image Path Here\n",
24
+ "image_path = \"./images/\"\n",
25
+ "\n",
26
+ "def draw_label(image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX,\n",
27
+ " font_scale=0.8, thickness=1):\n",
28
+ " size = cv2.getTextSize(label, font, font_scale, thickness)[0]\n",
29
+ " x, y = point\n",
30
+ " cv2.rectangle(image, (x, y - size[1]), (x + size[0], y), (255, 0, 0), cv2.FILLED)\n",
31
+ " cv2.putText(image, label, point, font, font_scale, (255, 255, 255), thickness, lineType=cv2.LINE_AA)\n",
32
+ " \n",
33
+ "detector = dlib.get_frontal_face_detector()\n",
34
+ "\n",
35
+ "# Initialize Webcam\n",
36
+ "cap = cv2.VideoCapture('testfriends.mp4')\n",
37
+ "img_size = 64\n",
38
+ "margin = 0.2\n",
39
+ "frame_count = 0\n",
40
+ "\n",
41
+ "while True:\n",
42
+ " ret, frame = cap.read()\n",
43
+ " frame_count += 1\n",
44
+ " print(frame_count) \n",
45
+ " \n",
46
+ " input_img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
47
+ " img_h, img_w, _ = np.shape(input_img)\n",
48
+ " detected = detector(frame, 1)\n",
49
+ " faces = []\n",
50
+ " \n",
51
+ " if len(detected) > 0:\n",
52
+ " for i, d in enumerate(detected):\n",
53
+ " x1, y1, x2, y2, w, h = d.left(), d.top(), d.right() + 1, d.bottom() + 1, d.width(), d.height()\n",
54
+ " xw1 = max(int(x1 - margin * w), 0)\n",
55
+ " yw1 = max(int(y1 - margin * h), 0)\n",
56
+ " xw2 = min(int(x2 + margin * w), img_w - 1)\n",
57
+ " yw2 = min(int(y2 + margin * h), img_h - 1)\n",
58
+ " face = frame[yw1:yw2 + 1, xw1:xw2 + 1, :]\n",
59
+ " file_name = \"./faces/\"+str(frame_count)+\"_\"+str(i)+\".jpg\"\n",
60
+ " cv2.imwrite(file_name, face)\n",
61
+ " cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)\n",
62
+ "\n",
63
+ " cv2.imshow(\"Face Detector\", frame)\n",
64
+ " if cv2.waitKey(1) == 13: #13 is the Enter Key\n",
65
+ " break\n",
66
+ "\n",
67
+ "cap.release()\n",
68
+ "cv2.destroyAllWindows() "
69
+ ]
70
+ }
71
+ ],
72
+ "metadata": {
73
+ "kernelspec": {
74
+ "display_name": "Python 3",
75
+ "language": "python",
76
+ "name": "python3"
77
+ },
78
+ "language_info": {
79
+ "codemirror_mode": {
80
+ "name": "ipython",
81
+ "version": 3
82
+ },
83
+ "file_extension": ".py",
84
+ "mimetype": "text/x-python",
85
+ "name": "python",
86
+ "nbconvert_exporter": "python",
87
+ "pygments_lexer": "ipython3",
88
+ "version": "3.6.6"
89
+ }
90
+ },
91
+ "nbformat": 4,
92
+ "nbformat_minor": 2
93
+ }
Gender Recognition/rajeev.jpg RENAMED
File without changes
18 . Deep Survaliance - Build a Face Detector with Emotion, Age and Gender Recognition/wide_resnet.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This code is imported from the following project: https://github.com/asmith26/wide_resnets_keras
2
+
3
+ import logging
4
+ import sys
5
+ import numpy as np
6
+ from tensorflow.keras.models import Model
7
+ from tensorflow.keras.layers import Input, Activation, add, Dense, Flatten, Dropout
8
+ from tensorflow.keras.layers import Conv2D, AveragePooling2D
9
+ from tensorflow.keras.layers import BatchNormalization
10
+ from tensorflow.keras.regularizers import l2
11
+ from tensorflow.keras import backend as K
12
+
13
+ sys.setrecursionlimit(2 ** 20)
14
+ np.random.seed(2 ** 10)
15
+
16
+
17
+ class WideResNet:
18
+ def __init__(self, image_size, depth=16, k=8):
19
+ self._depth = depth
20
+ self._k = k
21
+ self._dropout_probability = 0
22
+ self._weight_decay = 0.0005
23
+ self._use_bias = False
24
+ self._weight_init = "he_normal"
25
+
26
+ if K.image_data_format() == "channels_first":
27
+ logging.debug("image_dim_ordering = 'th'")
28
+ self._channel_axis = 1
29
+ self._input_shape = (3, image_size, image_size)
30
+ else:
31
+ logging.debug("image_dim_ordering = 'tf'")
32
+ self._channel_axis = -1
33
+ self._input_shape = (image_size, image_size, 3)
34
+
35
+ # Wide residual network http://arxiv.org/abs/1605.07146
36
+ def _wide_basic(self, n_input_plane, n_output_plane, stride):
37
+ def f(net):
38
+ # format of conv_params:
39
+ # [ [kernel_size=("kernel width", "kernel height"),
40
+ # strides="(stride_vertical,stride_horizontal)",
41
+ # padding="same" or "valid"] ]
42
+ # B(3,3): orignal <<basic>> block
43
+ conv_params = [[3, 3, stride, "same"],
44
+ [3, 3, (1, 1), "same"]]
45
+
46
+ n_bottleneck_plane = n_output_plane
47
+
48
+ # Residual block
49
+ for i, v in enumerate(conv_params):
50
+ if i == 0:
51
+ if n_input_plane != n_output_plane:
52
+ net = BatchNormalization(axis=self._channel_axis)(net)
53
+ net = Activation("relu")(net)
54
+ convs = net
55
+ else:
56
+ convs = BatchNormalization(axis=self._channel_axis)(net)
57
+ convs = Activation("relu")(convs)
58
+
59
+ convs = Conv2D(n_bottleneck_plane, kernel_size=(v[0], v[1]),
60
+ strides=v[2],
61
+ padding=v[3],
62
+ kernel_initializer=self._weight_init,
63
+ kernel_regularizer=l2(self._weight_decay),
64
+ use_bias=self._use_bias)(convs)
65
+ else:
66
+ convs = BatchNormalization(axis=self._channel_axis)(convs)
67
+ convs = Activation("relu")(convs)
68
+ if self._dropout_probability > 0:
69
+ convs = Dropout(self._dropout_probability)(convs)
70
+ convs = Conv2D(n_bottleneck_plane, kernel_size=(v[0], v[1]),
71
+ strides=v[2],
72
+ padding=v[3],
73
+ kernel_initializer=self._weight_init,
74
+ kernel_regularizer=l2(self._weight_decay),
75
+ use_bias=self._use_bias)(convs)
76
+
77
+ # Shortcut Connection: identity function or 1x1 convolutional
78
+ # (depends on difference between input & output shape - this
79
+ # corresponds to whether we are using the first block in each
80
+ # group; see _layer() ).
81
+ if n_input_plane != n_output_plane:
82
+ shortcut = Conv2D(n_output_plane, kernel_size=(1, 1),
83
+ strides=stride,
84
+ padding="same",
85
+ kernel_initializer=self._weight_init,
86
+ kernel_regularizer=l2(self._weight_decay),
87
+ use_bias=self._use_bias)(net)
88
+ else:
89
+ shortcut = net
90
+
91
+ return add([convs, shortcut])
92
+
93
+ return f
94
+
95
+
96
+ # "Stacking Residual Units on the same stage"
97
+ def _layer(self, block, n_input_plane, n_output_plane, count, stride):
98
+ def f(net):
99
+ net = block(n_input_plane, n_output_plane, stride)(net)
100
+ for i in range(2, int(count + 1)):
101
+ net = block(n_output_plane, n_output_plane, stride=(1, 1))(net)
102
+ return net
103
+
104
+ return f
105
+
106
+ # def create_model(self):
107
+ def __call__(self):
108
+ logging.debug("Creating model...")
109
+
110
+ assert ((self._depth - 4) % 6 == 0)
111
+ n = (self._depth - 4) / 6
112
+
113
+ inputs = Input(shape=self._input_shape)
114
+
115
+ n_stages = [16, 16 * self._k, 32 * self._k, 64 * self._k]
116
+
117
+ conv1 = Conv2D(filters=n_stages[0], kernel_size=(3, 3),
118
+ strides=(1, 1),
119
+ padding="same",
120
+ kernel_initializer=self._weight_init,
121
+ kernel_regularizer=l2(self._weight_decay),
122
+ use_bias=self._use_bias)(inputs) # "One conv at the beginning (spatial size: 32x32)"
123
+
124
+ # Add wide residual blocks
125
+ block_fn = self._wide_basic
126
+ conv2 = self._layer(block_fn, n_input_plane=n_stages[0], n_output_plane=n_stages[1], count=n, stride=(1, 1))(conv1)
127
+ conv3 = self._layer(block_fn, n_input_plane=n_stages[1], n_output_plane=n_stages[2], count=n, stride=(2, 2))(conv2)
128
+ conv4 = self._layer(block_fn, n_input_plane=n_stages[2], n_output_plane=n_stages[3], count=n, stride=(2, 2))(conv3)
129
+ batch_norm = BatchNormalization(axis=self._channel_axis)(conv4)
130
+ relu = Activation("relu")(batch_norm)
131
+
132
+ # Classifier block
133
+ pool = AveragePooling2D(pool_size=(8, 8), strides=(1, 1), padding="same")(relu)
134
+ flatten = Flatten()(pool)
135
+ predictions_g = Dense(units=2, kernel_initializer=self._weight_init, use_bias=self._use_bias,
136
+ kernel_regularizer=l2(self._weight_decay), activation="softmax",
137
+ name="pred_gender")(flatten)
138
+ predictions_a = Dense(units=101, kernel_initializer=self._weight_init, use_bias=self._use_bias,
139
+ kernel_regularizer=l2(self._weight_decay), activation="softmax",
140
+ name="pred_age")(flatten)
141
+ model = Model(inputs=inputs, outputs=[predictions_g, predictions_a])
142
+
143
+ return model
144
+
145
+
146
+ def main():
147
+ model = WideResNet(64)()
148
+ model.summary()
149
+
150
+
151
+ if __name__ == '__main__':
152
+ main()
18. Facial Applications - Emotion, Age & Gender Recognition/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,390 --> 00:00:07,520
3
+ OK so welcome to Chapter 18 this episode of deep Haviland's because I'm no longer teaching you new Tieri
4
+
5
+ 2
6
+ 00:00:07,530 --> 00:00:13,710
7
+ here Purcey what I'm going to do now is going to be like a history practice project where we build basically
8
+
9
+ 3
10
+ 00:00:13,710 --> 00:00:20,280
11
+ an emotion age and gender recognition using face detection at first to find faces in an image or video
12
+
13
+ 4
14
+ 00:00:20,760 --> 00:00:26,850
15
+ and then we pick out what emotion is on the face or being shown by the face the estimated age or predicted
16
+
17
+ 5
18
+ 00:00:26,880 --> 00:00:29,240
19
+ age of the person and to predict the gender.
20
+
21
+ 6
22
+ 00:00:29,370 --> 00:00:33,890
23
+ So let's take a look at a section and see how it split it up so fiercely.
24
+
25
+ 7
26
+ 00:00:34,020 --> 00:00:38,040
27
+ We're going to build a simple emotion or facial expression to the actor.
28
+
29
+ 8
30
+ 00:00:38,130 --> 00:00:42,780
31
+ And secondly we're going to bill and we're going to build and age and gender directed director.
32
+
33
+ 9
34
+ 00:00:43,140 --> 00:00:45,600
35
+ And then the last chapter we're going to combine them both.
36
+
37
+ 10
38
+ 00:00:45,840 --> 00:00:48,780
39
+ And it's going to be called Deep surveillance.
40
+
41
+ 11
42
+ 00:00:49,080 --> 00:00:50,130
43
+ So stay tuned.
44
+
45
+ 12
46
+ 00:00:50,130 --> 00:00:51,510
47
+ It's going to be a very cool projec.
18. Facial Applications - Emotion, Age & Gender Recognition/2. Build an Emotion, Facial Expression Detector.srt ADDED
@@ -0,0 +1,1239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,570 --> 00:00:06,450
3
+ Hi and welcome to Chapter 18 point to where we get to build our supercool a motion detector.
4
+
5
+ 2
6
+ 00:00:06,480 --> 00:00:11,600
7
+ So let's move on into a virtual machine and go to a Python book and let's see how it's done.
8
+
9
+ 3
10
+ 00:00:12,050 --> 00:00:19,800
11
+ OK so now we're here in our Python browser and let's go to Chapter 18 deep civilians and eighteen point
12
+
13
+ 4
14
+ 00:00:19,800 --> 00:00:23,340
15
+ to is building an emotion that acts with little little Viji.
16
+
17
+ 5
18
+ 00:00:23,400 --> 00:00:24,920
19
+ So let's open him up.
20
+
21
+ 6
22
+ 00:00:26,910 --> 00:00:27,960
23
+ And there we go.
24
+
25
+ 7
26
+ 00:00:27,960 --> 00:00:28,590
27
+ Should be loaded.
28
+
29
+ 8
30
+ 00:00:28,590 --> 00:00:29,340
31
+ Here we go.
32
+
33
+ 9
34
+ 00:00:29,340 --> 00:00:30,050
35
+ All right.
36
+
37
+ 10
38
+ 00:00:30,330 --> 00:00:36,040
39
+ So now the first thing we're going to do before I even go into this not book is Let's go back to our
40
+
41
+ 11
42
+ 00:00:36,060 --> 00:00:39,100
43
+ father explo here and you see these two directories.
44
+
45
+ 12
46
+ 00:00:39,140 --> 00:00:42,510
47
+ Let me explain to you and picture of me for testing.
48
+
49
+ 13
50
+ 00:00:42,510 --> 00:00:45,810
51
+ Let me explain to you what these three directories are first for this age and gender one.
52
+
53
+ 14
54
+ 00:00:45,870 --> 00:00:48,540
55
+ This is our age and gender which is in the next chapter.
56
+
57
+ 15
58
+ 00:00:48,660 --> 00:00:50,820
59
+ We're not going to touch just this just yet.
60
+
61
+ 16
62
+ 00:00:50,820 --> 00:00:52,670
63
+ So let's leave this alone here.
64
+
65
+ 17
66
+ 00:00:52,910 --> 00:00:54,500
67
+ Now this here is food.
68
+
69
+ 18
70
+ 00:00:54,510 --> 00:00:55,330
71
+ 2013.
72
+
73
+ 19
74
+ 00:00:55,340 --> 00:00:57,760
75
+ That's a dataset that we're going to try it on.
76
+
77
+ 20
78
+ 00:00:58,080 --> 00:00:59,580
79
+ So let's take a look at this dataset.
80
+
81
+ 21
82
+ 00:00:59,580 --> 00:01:01,500
83
+ So we have two directories.
84
+
85
+ 22
86
+ 00:01:01,770 --> 00:01:04,190
87
+ Some of the CSU files that we aren't going to use.
88
+
89
+ 23
90
+ 00:01:04,200 --> 00:01:05,630
91
+ We're going to look at these here.
92
+
93
+ 24
94
+ 00:01:05,760 --> 00:01:08,990
95
+ I would set up so there's a trade and validation as usual.
96
+
97
+ 25
98
+ 00:01:09,000 --> 00:01:12,480
99
+ And now we see these are the emotions here.
100
+
101
+ 26
102
+ 00:01:13,310 --> 00:01:19,380
103
+ There is something I'm going to do which I haven't told you guys yet is that we're going to basically
104
+
105
+ 27
106
+ 00:01:19,650 --> 00:01:26,400
107
+ delete or discard the discussed directory because I'm going to show you the plots afterward but discussed
108
+
109
+ 28
110
+ 00:01:26,460 --> 00:01:29,320
111
+ only has four hundred and fifty five images.
112
+
113
+ 29
114
+ 00:01:29,400 --> 00:01:36,390
115
+ All right now let's take a look at fear there has a lot more images for television.
116
+
117
+ 30
118
+ 00:01:36,660 --> 00:01:44,290
119
+ So you can already see an angry happy neutralising look at Happy How many faces on this artery.
120
+
121
+ 31
122
+ 00:01:44,330 --> 00:01:45,600
123
+ All right.
124
+
125
+ 32
126
+ 00:01:45,610 --> 00:01:47,330
127
+ It is 7000.
128
+
129
+ 33
130
+ 00:01:47,330 --> 00:01:48,590
131
+ So it's quite a lot.
132
+
133
+ 34
134
+ 00:01:48,590 --> 00:01:53,200
135
+ So these are this of go back to this.
136
+
137
+ 35
138
+ 00:01:53,380 --> 00:01:56,590
139
+ This is a very imbalanced data set here.
140
+
141
+ 36
142
+ 00:01:56,590 --> 00:02:05,020
143
+ So these two things we can do one we can move the disgustful the images here into fear because honestly
144
+
145
+ 37
146
+ 00:02:05,020 --> 00:02:06,140
147
+ are a bit similar.
148
+
149
+ 38
150
+ 00:02:06,160 --> 00:02:10,090
151
+ It's bit hard even for me to pick out what is fair and what is discussed.
152
+
153
+ 39
154
+ 00:02:10,180 --> 00:02:12,170
155
+ But for now let's just delete it.
156
+
157
+ 40
158
+ 00:02:12,220 --> 00:02:17,070
159
+ So we press Delete here and let's go back to validation and we press Delete here.
160
+
161
+ 41
162
+ 00:02:17,170 --> 00:02:17,690
163
+ OK.
164
+
165
+ 42
166
+ 00:02:17,830 --> 00:02:24,520
167
+ So now we just left at six classes instead of seven and six much more evenly balanced classes even surprise
168
+
169
+ 43
170
+ 00:02:24,580 --> 00:02:25,150
171
+ surprise.
172
+
173
+ 44
174
+ 00:02:25,180 --> 00:02:31,690
175
+ The second one yet has 400 something images but that's fine because surprise does look a lot different
176
+
177
+ 45
178
+ 00:02:32,050 --> 00:02:33,620
179
+ to disgust.
180
+
181
+ 46
182
+ 00:02:33,700 --> 00:02:39,130
183
+ So let's go back to here and cascades which I mentioned before.
184
+
185
+ 47
186
+ 00:02:39,140 --> 00:02:40,640
187
+ No I haven't mentioned that yet.
188
+
189
+ 48
190
+ 00:02:40,660 --> 00:02:43,710
191
+ That's coming up in our object detection chapter.
192
+
193
+ 49
194
+ 00:02:43,990 --> 00:02:46,500
195
+ But how cascades are basically fiest detection.
196
+
197
+ 50
198
+ 00:02:46,510 --> 00:02:48,620
199
+ Well it's a type of object detector.
200
+
201
+ 51
202
+ 00:02:48,970 --> 00:02:51,690
203
+ And we're going to use We're actually not going to use detectors.
204
+
205
+ 52
206
+ 00:02:51,700 --> 00:02:52,690
207
+ We can do that.
208
+
209
+ 53
210
+ 00:02:52,750 --> 00:02:54,170
211
+ I should just leave it in Fanaa.
212
+
213
+ 54
214
+ 00:02:54,250 --> 00:02:58,170
215
+ We're going to use these here for BCB or face the action unit.
216
+
217
+ 55
218
+ 00:02:58,330 --> 00:03:00,260
219
+ So let's go back to this notebook here.
220
+
221
+ 56
222
+ 00:03:01,230 --> 00:03:01,770
223
+ All right.
224
+
225
+ 57
226
+ 00:03:01,770 --> 00:03:05,210
227
+ So this is how we load the data set.
228
+
229
+ 58
230
+ 00:03:05,220 --> 00:03:06,650
231
+ This is basically a number of classes.
232
+
233
+ 59
234
+ 00:03:06,660 --> 00:03:07,380
235
+ It was seven.
236
+
237
+ 60
238
+ 00:03:07,410 --> 00:03:08,690
239
+ Now it's six.
240
+
241
+ 61
242
+ 00:03:08,730 --> 00:03:14,260
243
+ Now we're actually going to have the number of rows and as 48th to him in size and they're all greyscale.
244
+
245
+ 62
246
+ 00:03:14,380 --> 00:03:20,730
247
+ I mentioned that you may have mentioned it in the previous section I hope but it is all grayscale images.
248
+
249
+ 63
250
+ 00:03:20,740 --> 00:03:24,880
251
+ Just take a look and see to verify all.
252
+
253
+ 64
254
+ 00:03:25,110 --> 00:03:30,810
255
+ So now we also are going to do some data augmentation as we have become routine.
256
+
257
+ 65
258
+ 00:03:30,810 --> 00:03:33,840
259
+ Right now these are the parameters of used here.
260
+
261
+ 66
262
+ 00:03:33,930 --> 00:03:36,800
263
+ Feel free to play with us as well.
264
+
265
+ 67
266
+ 00:03:36,840 --> 00:03:41,720
267
+ It doesn't make a huge difference sometimes but you never know it.
268
+
269
+ 68
270
+ 00:03:41,940 --> 00:03:44,180
271
+ No normal rescaling normalization.
272
+
273
+ 69
274
+ 00:03:44,340 --> 00:03:46,420
275
+ Now we just put our data here.
276
+
277
+ 70
278
+ 00:03:46,420 --> 00:03:51,270
279
+ Notice we have a new pragmatical column mode in boats that I don't believe we used before because it
280
+
281
+ 71
282
+ 00:03:51,270 --> 00:03:52,910
283
+ defaults to color.
284
+
285
+ 72
286
+ 00:03:53,400 --> 00:03:55,000
287
+ No we're specifying it's greyscale.
288
+
289
+ 73
290
+ 00:03:55,020 --> 00:03:56,170
291
+ So just take note of that.
292
+
293
+ 74
294
+ 00:03:56,170 --> 00:04:03,030
295
+ So when you want to do a grayscale basically operation of training you have to specified here in your
296
+
297
+ 75
298
+ 00:04:03,280 --> 00:04:06,500
299
+ day to Gen from factory function.
300
+
301
+ 76
302
+ 00:04:06,540 --> 00:04:11,800
303
+ So now let's load this and it's going to tell us how many images and how many classes.
304
+
305
+ 77
306
+ 00:04:11,800 --> 00:04:13,330
307
+ Six and six.
308
+
309
+ 78
310
+ 00:04:13,330 --> 00:04:14,550
311
+ Excellent.
312
+
313
+ 79
314
+ 00:04:14,560 --> 00:04:14,850
315
+ All right.
316
+
317
+ 80
318
+ 00:04:14,870 --> 00:04:20,750
319
+ So now we move on to let me just make this a little more clear.
320
+
321
+ 81
322
+ 00:04:20,750 --> 00:04:32,110
323
+ You can put it here this is basically imports Leesville Karris imports and put another thing here so
324
+
325
+ 82
326
+ 00:04:32,110 --> 00:04:36,060
327
+ we can say this is Chris little.
328
+
329
+ 83
330
+ 00:04:36,550 --> 00:04:38,180
331
+ He Jiechi model
332
+
333
+ 84
334
+ 00:04:41,390 --> 00:04:46,950
335
+ so now as you've seen in the previous chapter of its Simpsons this is the what we're going to use.
336
+
337
+ 85
338
+ 00:04:47,240 --> 00:04:52,470
339
+ So let's just run this and here we go.
340
+
341
+ 86
342
+ 00:04:52,650 --> 00:04:54,980
343
+ Number of parameters.
344
+
345
+ 87
346
+ 00:04:55,020 --> 00:04:56,810
347
+ This does look a little different much.
348
+
349
+ 88
350
+ 00:04:56,820 --> 00:04:57,580
351
+ Oh no.
352
+
353
+ 89
354
+ 00:04:57,600 --> 00:04:58,720
355
+ Because it's black and white.
356
+
357
+ 90
358
+ 00:04:58,760 --> 00:05:03,600
359
+ Grayscale images that is wide enough parameters or less silly.
360
+
361
+ 91
362
+ 00:05:03,960 --> 00:05:04,360
363
+ OK.
364
+
365
+ 92
366
+ 00:05:04,530 --> 00:05:05,560
367
+ Happens a lot.
368
+
369
+ 93
370
+ 00:05:05,660 --> 00:05:06,840
371
+ It's fine.
372
+
373
+ 94
374
+ 00:05:06,900 --> 00:05:12,940
375
+ Now let us just label this as well treating our model.
376
+
377
+ 95
378
+ 00:05:12,960 --> 00:05:13,700
379
+ There we go.
380
+
381
+ 96
382
+ 00:05:14,010 --> 00:05:18,540
383
+ So now we have basically some callbacks sets here.
384
+
385
+ 97
386
+ 00:05:18,690 --> 00:05:22,000
387
+ I'm going to pause here because I realized I made some changes to this code.
388
+
389
+ 98
390
+ 00:05:22,020 --> 00:05:23,570
391
+ Actually no it's fine.
392
+
393
+ 99
394
+ 00:05:23,570 --> 00:05:24,130
395
+ I'm stupid.
396
+
397
+ 100
398
+ 00:05:24,180 --> 00:05:25,390
399
+ It is fine.
400
+
401
+ 101
402
+ 00:05:26,110 --> 00:05:27,270
403
+ So let's continue.
404
+
405
+ 102
406
+ 00:05:27,270 --> 00:05:28,380
407
+ Number of training samples.
408
+
409
+ 103
410
+ 00:05:28,380 --> 00:05:33,720
411
+ Let's just double check this because I'm not entirely sure it just reflects the classes we deleted.
412
+
413
+ 104
414
+ 00:05:33,750 --> 00:05:40,060
415
+ So 6:32 tree let's scroll up and let's see if it does 6:40 we did OK.
416
+
417
+ 105
418
+ 00:05:40,080 --> 00:05:40,580
419
+ Good.
420
+
421
+ 106
422
+ 00:05:41,630 --> 00:05:43,920
423
+ I'm doing things right for a change.
424
+
425
+ 107
426
+ 00:05:43,940 --> 00:05:44,410
427
+ OK.
428
+
429
+ 108
430
+ 00:05:44,600 --> 00:05:44,940
431
+ Good.
432
+
433
+ 109
434
+ 00:05:45,020 --> 00:05:52,940
435
+ And now standard procedure how we fit using our data generators and all callbacks to find here and model
436
+
437
+ 110
438
+ 00:05:52,940 --> 00:05:53,750
439
+ compile.
440
+
441
+ 111
442
+ 00:05:53,780 --> 00:05:59,210
443
+ So I'm not actually going to run this is going to show you what I've run prior to the chapter and this
444
+
445
+ 112
446
+ 00:05:59,240 --> 00:06:01,160
447
+ was with six classes.
448
+
449
+ 113
450
+ 00:06:01,160 --> 00:06:01,520
451
+ All right.
452
+
453
+ 114
454
+ 00:06:01,580 --> 00:06:04,250
455
+ And now all the way down.
456
+
457
+ 115
458
+ 00:06:04,290 --> 00:06:06,370
459
+ You see we have 47 percent accuracy.
460
+
461
+ 116
462
+ 00:06:06,560 --> 00:06:08,110
463
+ Now that is not that good.
464
+
465
+ 117
466
+ 00:06:08,220 --> 00:06:08,860
467
+ All right.
468
+
469
+ 118
470
+ 00:06:09,080 --> 00:06:11,790
471
+ So tele box is actually not bad.
472
+
473
+ 119
474
+ 00:06:11,960 --> 00:06:17,240
475
+ I've seen people use similar models and actually sometimes much more complicated models someone who's
476
+
477
+ 120
478
+ 00:06:17,240 --> 00:06:19,380
479
+ a full DTG 9000 on this.
480
+
481
+ 121
482
+ 00:06:19,580 --> 00:06:23,940
483
+ And it is very difficult to get past 70 percent accuracy.
484
+
485
+ 122
486
+ 00:06:24,110 --> 00:06:30,680
487
+ I am 100 percent sure if I try this for maybe 280 ebox I'll get probably about 60 percent accuracy.
488
+
489
+ 123
490
+ 00:06:30,770 --> 00:06:32,030
491
+ So you can give it a try.
492
+
493
+ 124
494
+ 00:06:32,030 --> 00:06:36,260
495
+ All right I'll probably do it and update it at a later date.
496
+
497
+ 125
498
+ 00:06:36,530 --> 00:06:39,020
499
+ With this new model for you guys.
500
+
501
+ 126
502
+ 00:06:39,020 --> 00:06:40,870
503
+ So that's fine.
504
+
505
+ 127
506
+ 00:06:40,880 --> 00:06:44,300
507
+ So now let's look at the confusion matrix from these results.
508
+
509
+ 128
510
+ 00:06:44,310 --> 00:06:46,100
511
+ It is 47 percent.
512
+
513
+ 129
514
+ 00:06:46,640 --> 00:06:47,020
515
+ OK.
516
+
517
+ 130
518
+ 00:06:47,150 --> 00:06:50,720
519
+ So normally what do you what do you think of this.
520
+
521
+ 131
522
+ 00:06:50,900 --> 00:06:52,370
523
+ How would you analyze this.
524
+
525
+ 132
526
+ 00:06:52,610 --> 00:06:56,800
527
+ Now I would say it's not that good especially not good at picking up fear.
528
+
529
+ 133
530
+ 00:06:57,080 --> 00:06:58,940
531
+ You really got much fear correct.
532
+
533
+ 134
534
+ 00:06:58,940 --> 00:06:59,620
535
+ All right.
536
+
537
+ 135
538
+ 00:06:59,780 --> 00:07:01,610
539
+ Now what does it say but happy to.
540
+
541
+ 136
542
+ 00:07:01,970 --> 00:07:03,200
543
+ Now this is these are comics by the way.
544
+
545
+ 137
546
+ 00:07:03,200 --> 00:07:06,920
547
+ But even still it's definitely good at getting happy.
548
+
549
+ 138
550
+ 00:07:06,920 --> 00:07:07,840
551
+ All right.
552
+
553
+ 139
554
+ 00:07:08,210 --> 00:07:13,080
555
+ But the others aren't that great yet as you do see some mismatches here.
556
+
557
+ 140
558
+ 00:07:13,250 --> 00:07:15,640
559
+ You do see fares not being picked up that well.
560
+
561
+ 141
562
+ 00:07:15,740 --> 00:07:16,860
563
+ I mean it has been picked up.
564
+
565
+ 142
566
+ 00:07:16,950 --> 00:07:19,790
567
+ Is this a different color from Stephanie from here to here.
568
+
569
+ 143
570
+ 00:07:20,170 --> 00:07:23,580
571
+ If we need a good answer to probably tell but there's a difference.
572
+
573
+ 144
574
+ 00:07:23,660 --> 00:07:30,020
575
+ But generally you can tell here fares actually being confused with neutral and angry a lot as well.
576
+
577
+ 145
578
+ 00:07:30,020 --> 00:07:38,330
579
+ So this analysis says Our model is decent but not great which is obvious given its 47 percent accuracy
580
+
581
+ 146
582
+ 00:07:38,330 --> 00:07:38,850
583
+ here.
584
+
585
+ 147
586
+ 00:07:39,320 --> 00:07:46,250
587
+ And you can look at the one school is here definitely can see exactly as I said Happy is it's good at
588
+
589
+ 148
590
+ 00:07:46,250 --> 00:07:49,710
591
+ finding happy hearted finding fair decent.
592
+
593
+ 149
594
+ 00:07:49,730 --> 00:07:56,360
595
+ Everything else except neutral although in my experience at least for my face it picked up neutral fairly
596
+
597
+ 150
598
+ 00:07:56,360 --> 00:07:57,380
599
+ well.
600
+
601
+ 151
602
+ 00:07:57,380 --> 00:07:57,710
603
+ All right.
604
+
605
+ 152
606
+ 00:07:57,710 --> 00:07:58,770
607
+ So let's just look.
608
+
609
+ 153
610
+ 00:07:58,790 --> 00:08:00,910
611
+ I believe this was saved.
612
+
613
+ 154
614
+ 00:08:01,040 --> 00:08:02,190
615
+ Let me just make sure.
616
+
617
+ 155
618
+ 00:08:03,350 --> 00:08:05,320
619
+ It would be truly great.
620
+
621
+ 156
622
+ 00:08:05,360 --> 00:08:05,940
623
+ OK.
624
+
625
+ 157
626
+ 00:08:06,290 --> 00:08:15,270
627
+ So now that's look at our model as usual this takes about an annoyingly long seconds.
628
+
629
+ 158
630
+ 00:08:15,790 --> 00:08:16,210
631
+ OK.
632
+
633
+ 159
634
+ 00:08:16,330 --> 00:08:18,550
635
+ Quick quick and it's time Ali.
636
+
637
+ 160
638
+ 00:08:18,550 --> 00:08:23,950
639
+ All right so let's get A-class labels again because I believe they ran it before when it was 7 classes
640
+
641
+ 161
642
+ 00:08:23,950 --> 00:08:24,560
643
+ here.
644
+
645
+ 162
646
+ 00:08:25,150 --> 00:08:29,930
647
+ And now let's look at some images not sure what's in this directory.
648
+
649
+ 163
650
+ 00:08:29,950 --> 00:08:30,330
651
+ OK.
652
+
653
+ 164
654
+ 00:08:30,460 --> 00:08:35,500
655
+ So let's see how it went we predicted angry and true class was fair.
656
+
657
+ 165
658
+ 00:08:35,850 --> 00:08:37,240
659
+ So OK.
660
+
661
+ 166
662
+ 00:08:37,270 --> 00:08:39,380
663
+ Reasonable fanfare.
664
+
665
+ 167
666
+ 00:08:39,430 --> 00:08:39,910
667
+ Bingo.
668
+
669
+ 168
670
+ 00:08:39,920 --> 00:08:41,400
671
+ Got it spot on.
672
+
673
+ 169
674
+ 00:08:41,410 --> 00:08:41,890
675
+ Fair.
676
+
677
+ 170
678
+ 00:08:41,900 --> 00:08:42,580
679
+ And it was neutral.
680
+
681
+ 171
682
+ 00:08:42,580 --> 00:08:45,190
683
+ No I wouldn't say this is a neutral expression.
684
+
685
+ 172
686
+ 00:08:45,240 --> 00:08:51,580
687
+ Don't know who labeled this datasets but I mean it's probably not Fader's probably some weird expression
688
+
689
+ 173
690
+ 00:08:51,730 --> 00:08:53,330
691
+ she's making.
692
+
693
+ 174
694
+ 00:08:53,980 --> 00:08:55,630
695
+ We predict an angry and it's neutral.
696
+
697
+ 175
698
+ 00:08:55,650 --> 00:09:00,280
699
+ Now I would say you're probably neutral but it does look a bit angry doesn't he.
700
+
701
+ 176
702
+ 00:09:01,310 --> 00:09:01,570
703
+ OK.
704
+
705
+ 177
706
+ 00:09:01,590 --> 00:09:05,870
707
+ So this one picked up we detected fear but it was actually angry close.
708
+
709
+ 178
710
+ 00:09:06,360 --> 00:09:08,990
711
+ This one we predicted angry but actually was fair.
712
+
713
+ 179
714
+ 00:09:09,120 --> 00:09:12,670
715
+ We parted angry again pretty fair and neutral.
716
+
717
+ 180
718
+ 00:09:12,670 --> 00:09:13,570
719
+ It's true.
720
+
721
+ 181
722
+ 00:09:13,890 --> 00:09:14,270
723
+ OK.
724
+
725
+ 182
726
+ 00:09:14,310 --> 00:09:19,710
727
+ So now we saw this and now let's actually try it on a picture of me.
728
+
729
+ 183
730
+ 00:09:19,800 --> 00:09:20,510
731
+ OK.
732
+
733
+ 184
734
+ 00:09:20,850 --> 00:09:24,810
735
+ So one thing I should have mentioned before actually it was an undisclosed isn't disco.
736
+
737
+ 185
738
+ 00:09:25,050 --> 00:09:27,930
739
+ This is how we use OKOK cascade justifies.
740
+
741
+ 186
742
+ 00:09:27,960 --> 00:09:32,490
743
+ Now we're using something different as an open Zeevi function.
744
+
745
+ 187
746
+ 00:09:32,820 --> 00:09:39,390
747
+ So it be pointed if it classified we want to use that was to go back to this this one.
748
+
749
+ 188
750
+ 00:09:39,490 --> 00:09:40,050
751
+ All right.
752
+
753
+ 189
754
+ 00:09:40,210 --> 00:09:48,370
755
+ We have Eifel buddy and caucus's fires which I'm probably going to see in space are tiny it was going
756
+
757
+ 190
758
+ 00:09:48,370 --> 00:09:51,340
759
+ to lead them if they were taking up all of space but they don't.
760
+
761
+ 191
762
+ 00:09:51,670 --> 00:09:52,250
763
+ OK.
764
+
765
+ 192
766
+ 00:09:52,720 --> 00:09:54,700
767
+ So this is a phase detector module.
768
+
769
+ 193
770
+ 00:09:54,760 --> 00:09:56,860
771
+ Now what this module does.
772
+
773
+ 194
774
+ 00:09:56,860 --> 00:09:59,990
775
+ You can see we've created as obstacles fiest classify.
776
+
777
+ 195
778
+ 00:10:00,150 --> 00:10:05,110
779
+ Now when we get an image here this is an image from my webcam.
780
+
781
+ 196
782
+ 00:10:05,110 --> 00:10:06,940
783
+ This is a function here that we're looking at by the way.
784
+
785
+ 197
786
+ 00:10:07,060 --> 00:10:07,550
787
+ OK.
788
+
789
+ 198
790
+ 00:10:07,870 --> 00:10:16,030
791
+ So we get this function we convert it into a greyscale image and then we pass this phunk face this gray
792
+
793
+ 199
794
+ 00:10:16,420 --> 00:10:19,700
795
+ scale image of the webcam input into this.
796
+
797
+ 200
798
+ 00:10:19,700 --> 00:10:24,280
799
+ This is what does the face detection does detect multi-skilled function.
800
+
801
+ 201
802
+ 00:10:24,280 --> 00:10:27,010
803
+ These are some parameters to tweak to tweak the sensitivity as well.
804
+
805
+ 202
806
+ 00:10:27,040 --> 00:10:31,870
807
+ And how many times like if you want to find small pieces a lot of cases you tweak some of the Skilling
808
+
809
+ 203
810
+ 00:10:31,870 --> 00:10:32,930
811
+ parameters.
812
+
813
+ 204
814
+ 00:10:33,010 --> 00:10:38,130
815
+ So what it returns do is basically an array of faces.
816
+
817
+ 205
818
+ 00:10:38,140 --> 00:10:42,310
819
+ So basically if you have no faces the fact that it is written in some blank data because I'm using it
820
+
821
+ 206
822
+ 00:10:42,340 --> 00:10:44,770
823
+ in this function for some other stuff.
824
+
825
+ 207
826
+ 00:10:45,040 --> 00:10:48,790
827
+ But if it finds faces basically it returns this.
828
+
829
+ 208
830
+ 00:10:48,790 --> 00:10:55,820
831
+ These are basically did the location of the face the X Y which is the top left basically x.
832
+
833
+ 209
834
+ 00:10:55,900 --> 00:10:58,920
835
+ Let's assume let's assume this is a box.
836
+
837
+ 210
838
+ 00:10:58,930 --> 00:11:03,690
839
+ All right so x y is going to be say let's say this.
840
+
841
+ 211
842
+ 00:11:03,750 --> 00:11:05,540
843
+ It's in this box I'm doing right here.
844
+
845
+ 212
846
+ 00:11:05,560 --> 00:11:07,820
847
+ Hope you can make it out is the face.
848
+
849
+ 213
850
+ 00:11:07,820 --> 00:11:13,610
851
+ So the x y starts here on the top left corner of the face and the width and the height is which is this
852
+
853
+ 214
854
+ 00:11:13,610 --> 00:11:18,850
855
+ way and hightest like the down measurement definitely it's still off the face.
856
+
857
+ 215
858
+ 00:11:18,850 --> 00:11:24,830
859
+ So that's how we use no discipline Sivy fun function here to draw a rectangle around this.
860
+
861
+ 216
862
+ 00:11:25,030 --> 00:11:31,120
863
+ And then we take description of the image here and basically we just crop it to get this fish out of
864
+
865
+ 217
866
+ 00:11:31,120 --> 00:11:31,690
867
+ it.
868
+
869
+ 218
870
+ 00:11:31,860 --> 00:11:37,980
871
+ And what I do is just said run this of all the faces in this file resize it correctly to what declassifies
872
+
873
+ 219
874
+ 00:11:38,380 --> 00:11:47,110
875
+ is has been trained to detect 48 48 and X return an array of all the faces or the rectangle dimensions
876
+
877
+ 220
878
+ 00:11:47,470 --> 00:11:53,110
879
+ and the original image muchall way but I was maybe doing it doing something with it afterward.
880
+
881
+ 221
882
+ 00:11:53,410 --> 00:11:56,560
883
+ And yes I think I was just putting the label on it afterwards.
884
+
885
+ 222
886
+ 00:11:57,010 --> 00:11:58,150
887
+ So there we go.
888
+
889
+ 223
890
+ 00:11:58,150 --> 00:12:01,590
891
+ So we love my image to run this cool function and hope it works.
892
+
893
+ 224
894
+ 00:12:01,600 --> 00:12:03,400
895
+ After that explanation.
896
+
897
+ 225
898
+ 00:12:03,880 --> 00:12:04,270
899
+ Yes.
900
+
901
+ 226
902
+ 00:12:04,270 --> 00:12:04,580
903
+ OK.
904
+
905
+ 227
906
+ 00:12:04,600 --> 00:12:07,460
907
+ So thanks I'm happy I was happy actually.
908
+
909
+ 228
910
+ 00:12:07,600 --> 00:12:09,580
911
+ This was on my birthday two months ago.
912
+
913
+ 229
914
+ 00:12:09,880 --> 00:12:16,480
915
+ I was in Madeira Portugal which is a wonderful wonderful island that you really should visit and not
916
+
917
+ 230
918
+ 00:12:16,840 --> 00:12:19,520
919
+ get paid to say that's just highly recommended.
920
+
921
+ 231
922
+ 00:12:19,760 --> 00:12:22,710
923
+ When we met my wife and I went whale watching that day.
924
+
925
+ 232
926
+ 00:12:23,050 --> 00:12:24,740
927
+ So yes I was happy.
928
+
929
+ 233
930
+ 00:12:24,790 --> 00:12:29,070
931
+ So pretty pretty decent so you can load your images here.
932
+
933
+ 234
934
+ 00:12:29,080 --> 00:12:33,420
935
+ One thing to note you can probably reduce the size of this Texas started maybe out of left corner here.
936
+
937
+ 235
938
+ 00:12:33,940 --> 00:12:37,240
939
+ And kids faces more on the right hand side of image.
940
+
941
+ 236
942
+ 00:12:37,270 --> 00:12:40,390
943
+ Texas not going to go outside of the image.
944
+
945
+ 237
946
+ 00:12:40,390 --> 00:12:43,550
947
+ So that's one thing you can do for your homework listen.
948
+
949
+ 238
950
+ 00:12:43,870 --> 00:12:46,610
951
+ So now let's try this on our web cam.
952
+
953
+ 239
954
+ 00:12:46,690 --> 00:12:47,250
955
+ OK.
956
+
957
+ 240
958
+ 00:12:47,560 --> 00:12:49,480
959
+ So let's try this now.
960
+
961
+ 241
962
+ 00:12:54,200 --> 00:12:54,440
963
+ OK.
964
+
965
+ 242
966
+ 00:12:54,450 --> 00:12:58,830
967
+ So you may have noticed a slight break in the code and that was because when I ran this I realized my
968
+
969
+ 243
970
+ 00:12:58,830 --> 00:13:02,310
971
+ T-shirt had a stain which did not look good on camera.
972
+
973
+ 244
974
+ 00:13:02,310 --> 00:13:04,600
975
+ So again I'm not dressed up.
976
+
977
+ 245
978
+ 00:13:04,650 --> 00:13:08,260
979
+ I'm just here at home alone recording this.
980
+
981
+ 246
982
+ 00:13:08,550 --> 00:13:11,780
983
+ So let me just run this.
984
+
985
+ 247
986
+ 00:13:11,950 --> 00:13:12,790
987
+ And here we go.
988
+
989
+ 248
990
+ 00:13:12,790 --> 00:13:20,270
991
+ So we see my face being detected for my emotion my facial emotion expression my microphone right here.
992
+
993
+ 249
994
+ 00:13:20,270 --> 00:13:23,590
995
+ I hope they don't make any noise so they come.
996
+
997
+ 250
998
+ 00:13:23,590 --> 00:13:28,760
999
+ So right now it wasn't that alternating between happy and neutral quite a bit.
1000
+
1001
+ 251
1002
+ 00:13:30,480 --> 00:13:33,110
1003
+ Surprise they call it I worked.
1004
+
1005
+ 252
1006
+ 00:13:33,120 --> 00:13:33,400
1007
+ All right.
1008
+
1009
+ 253
1010
+ 00:13:33,420 --> 00:13:33,910
1011
+ Nice.
1012
+
1013
+ 254
1014
+ 00:13:33,920 --> 00:13:36,230
1015
+ So it is working fairly well.
1016
+
1017
+ 255
1018
+ 00:13:36,660 --> 00:13:38,880
1019
+ So you can experiment with this training.
1020
+
1021
+ 256
1022
+ 00:13:39,030 --> 00:13:42,450
1023
+ One thing you should know you see this bounding box here.
1024
+
1025
+ 257
1026
+ 00:13:42,600 --> 00:13:48,600
1027
+ This is actually bad because just take a quick look at a dataset here.
1028
+
1029
+ 258
1030
+ 00:13:48,900 --> 00:13:56,550
1031
+ Something you can do for us in which I neglected to do for you guys but these faces tightly cropped
1032
+
1033
+ 259
1034
+ 00:13:56,700 --> 00:13:57,600
1035
+ if you take a look at them.
1036
+
1037
+ 260
1038
+ 00:13:57,690 --> 00:14:01,820
1039
+ Let's just open this quickly they mean tighter cropped isn't they.
1040
+
1041
+ 261
1042
+ 00:14:01,880 --> 00:14:04,670
1043
+ They don't have that much space inside.
1044
+
1045
+ 262
1046
+ 00:14:04,830 --> 00:14:13,180
1047
+ So generally because I've noticed this is this is not that tightly cropped hair on the webcam.
1048
+
1049
+ 263
1050
+ 00:14:13,430 --> 00:14:16,250
1051
+ So let me just go back to this code.
1052
+
1053
+ 264
1054
+ 00:14:16,370 --> 00:14:21,520
1055
+ So if you wanted to change that you actually CAN THIS IS ACTUALLY actually the opposite of cropping.
1056
+
1057
+ 265
1058
+ 00:14:21,530 --> 00:14:29,860
1059
+ So I can actually change this to 20 20 which was because the default settings for the would Cascades
1060
+
1061
+ 266
1062
+ 00:14:29,960 --> 00:14:31,030
1063
+ quite tight.
1064
+
1065
+ 267
1066
+ 00:14:31,280 --> 00:14:37,690
1067
+ I actually previously I did some spacing some left right up down spacing so let's run this now.
1068
+
1069
+ 268
1070
+ 00:14:37,730 --> 00:14:38,880
1071
+ See what happens.
1072
+
1073
+ 269
1074
+ 00:14:40,320 --> 00:14:40,650
1075
+ OK.
1076
+
1077
+ 270
1078
+ 00:14:40,670 --> 00:14:45,200
1079
+ So as you can see it is a bit it is a bit better.
1080
+
1081
+ 271
1082
+ 00:14:45,380 --> 00:14:47,170
1083
+ Maybe we can reduce to bits even more.
1084
+
1085
+ 272
1086
+ 00:14:47,240 --> 00:14:52,310
1087
+ And we can eliminate it all to get a much less evil.
1088
+
1089
+ 273
1090
+ 00:14:52,390 --> 00:14:54,450
1091
+ This is stupid.
1092
+
1093
+ 274
1094
+ 00:14:54,470 --> 00:14:58,850
1095
+ And so the w w just not do it at the height spacing at least
1096
+
1097
+ 275
1098
+ 00:15:02,620 --> 00:15:07,170
1099
+ so I would say this is probably a small tightly cropped on my face.
1100
+
1101
+ 276
1102
+ 00:15:07,270 --> 00:15:12,220
1103
+ And if you wanted to go even for that let's just not do any height adjustment here.
1104
+
1105
+ 277
1106
+ 00:15:13,400 --> 00:15:14,840
1107
+ And see what that gives us.
1108
+
1109
+ 278
1110
+ 00:15:14,850 --> 00:15:18,390
1111
+ No.
1112
+
1113
+ 279
1114
+ 00:15:19,170 --> 00:15:19,530
1115
+ It is.
1116
+
1117
+ 280
1118
+ 00:15:19,530 --> 00:15:22,050
1119
+ It is definitely more stable and better.
1120
+
1121
+ 281
1122
+ 00:15:22,050 --> 00:15:25,830
1123
+ I would say OK cool.
1124
+
1125
+ 282
1126
+ 00:15:25,900 --> 00:15:27,140
1127
+ So this works fairly well.
1128
+
1129
+ 283
1130
+ 00:15:27,190 --> 00:15:27,670
1131
+ OK.
1132
+
1133
+ 284
1134
+ 00:15:27,840 --> 00:15:28,500
1135
+ I'm happy with this.
1136
+
1137
+ 285
1138
+ 00:15:28,510 --> 00:15:29,880
1139
+ I hope you're happy with us.
1140
+
1141
+ 286
1142
+ 00:15:29,890 --> 00:15:36,100
1143
+ This is a 47 percent accurate model and it's doing quite well but decently well at least.
1144
+
1145
+ 287
1146
+ 00:15:36,100 --> 00:15:41,630
1147
+ So what you can do as a lesson for you guys is trained us for more ebox.
1148
+
1149
+ 288
1150
+ 00:15:41,650 --> 00:15:49,690
1151
+ Also try a different augmentations try different optimize if you want to use them with a fairly large
1152
+
1153
+ 289
1154
+ 00:15:49,690 --> 00:15:50,360
1155
+ dating rate.
1156
+
1157
+ 290
1158
+ 00:15:50,380 --> 00:15:53,040
1159
+ Just realize you can reduce that even more.
1160
+
1161
+ 291
1162
+ 00:15:53,620 --> 00:15:58,510
1163
+ And you know adjust your template to options here and what you can do as well.
1164
+
1165
+ 292
1166
+ 00:15:58,630 --> 00:16:06,150
1167
+ You can add more filters individually started 32 64 128 256 hub what you saw at 64.
1168
+
1169
+ 293
1170
+ 00:16:06,370 --> 00:16:14,200
1171
+ Eliminating the tool to get us 64 128 256 and 512 and you can even add more densely as here may not
1172
+
1173
+ 294
1174
+ 00:16:14,950 --> 00:16:16,850
1175
+ be necessary but try it.
1176
+
1177
+ 295
1178
+ 00:16:17,350 --> 00:16:19,750
1179
+ Play with your dhrupad values as well.
1180
+
1181
+ 296
1182
+ 00:16:19,810 --> 00:16:24,950
1183
+ He'd even add a whole new convolutional layer here instead of stopping at 256.
1184
+
1185
+ 297
1186
+ 00:16:25,060 --> 00:16:34,620
1187
+ You can just go here and add one with 512 I'll do this for you but you can do it on your own.
1188
+
1189
+ 298
1190
+ 00:16:34,920 --> 00:16:38,830
1191
+ Maybe change activation functions maybe change initializers as I say.
1192
+
1193
+ 299
1194
+ 00:16:38,880 --> 00:16:44,390
1195
+ Although I wouldn't change to be fair I wouldn't change the Batumi position activation and iching normal.
1196
+
1197
+ 300
1198
+ 00:16:44,610 --> 00:16:49,740
1199
+ But you can if you want but I wouldn't I think those are the best for Viji type model like we're using
1200
+
1201
+ 301
1202
+ 00:16:49,740 --> 00:16:50,700
1203
+ here.
1204
+
1205
+ 302
1206
+ 00:16:50,700 --> 00:16:53,980
1207
+ And last Leslee try for more ebox.
1208
+
1209
+ 303
1210
+ 00:16:54,000 --> 00:16:55,010
1211
+ All right.
1212
+
1213
+ 304
1214
+ 00:16:55,050 --> 00:16:59,200
1215
+ A good value for this a satisfactory value would be 70 percent.
1216
+
1217
+ 305
1218
+ 00:16:59,520 --> 00:17:04,960
1219
+ So give it a go again and try different data augmentations as well.
1220
+
1221
+ 306
1222
+ 00:17:05,220 --> 00:17:10,770
1223
+ Give it a go and see how much good if accurate model you can get.
1224
+
1225
+ 307
1226
+ 00:17:10,770 --> 00:17:12,660
1227
+ OK so that's it for this lesson.
1228
+
1229
+ 308
1230
+ 00:17:12,990 --> 00:17:20,510
1231
+ What we're going to do next is do an run an age and gender detector I guess you could call it classify
1232
+
1233
+ 309
1234
+ 00:17:21,270 --> 00:17:29,200
1235
+ and combined them afterward into one super Fishell deep surveillance classify classify.
1236
+
1237
+ 310
1238
+ 00:17:29,710 --> 00:17:30,450
1239
+ OK thank you.
18. Facial Applications - Emotion, Age & Gender Recognition/2.1 Download Dataset.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/1317edb3koW63Zzjxt5b-9xOK1ElQ07xk/view?usp=sharing";</script>
18. Facial Applications - Emotion, Age & Gender Recognition/3. Build EmotionAgeGender Recognition in our Deep Surveillance Monitor.srt ADDED
@@ -0,0 +1,1547 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,690 --> 00:00:06,150
3
+ Hi and welcome to chapter eighteen point three where we're going to build a deep surveillance facial
4
+
5
+ 2
6
+ 00:00:06,150 --> 00:00:10,980
7
+ monitoring system that combines emotion age and gender recognition.
8
+
9
+ 3
10
+ 00:00:11,010 --> 00:00:15,910
11
+ So now let's go into an update on that book and our virtual machine and start building this.
12
+
13
+ 4
14
+ 00:00:16,200 --> 00:00:16,530
15
+ OK.
16
+
17
+ 5
18
+ 00:00:16,560 --> 00:00:18,170
19
+ So we're in chapter 18.
20
+
21
+ 6
22
+ 00:00:18,240 --> 00:00:20,380
23
+ No footnote but book for this.
24
+
25
+ 7
26
+ 00:00:20,640 --> 00:00:28,650
27
+ But is no 18.00 Reiffel notebook file whereas it is actually in here into to make it to quickly into
28
+
29
+ 8
30
+ 00:00:28,650 --> 00:00:31,110
31
+ the age gender estimation for the hill.
32
+
33
+ 9
34
+ 00:00:31,110 --> 00:00:31,460
35
+ All right.
36
+
37
+ 10
38
+ 00:00:31,470 --> 00:00:32,820
39
+ So click it.
40
+
41
+ 11
42
+ 00:00:32,940 --> 00:00:39,870
43
+ And what this is this is basically a guitar project I pulled it for one of the best one of the better
44
+
45
+ 12
46
+ 00:00:39,870 --> 00:00:42,740
47
+ ones I should say for age and gender detection.
48
+
49
+ 13
50
+ 00:00:43,020 --> 00:00:47,820
51
+ And the reason why I don't get it is for this code and what we're going to do we're going to use his
52
+
53
+ 14
54
+ 00:00:47,820 --> 00:00:52,500
55
+ Pretorian model is mainly because it's this data set and it's.
56
+
57
+ 15
58
+ 00:00:52,760 --> 00:00:53,750
59
+ It's not loaded here.
60
+
61
+ 16
62
+ 00:00:53,890 --> 00:00:58,320
63
+ I hope it always will be taken up way too much space probably isn't.
64
+
65
+ 17
66
+ 00:00:58,320 --> 00:01:03,390
67
+ But it's still said that he create He loaded he turned it on is quite large and I actually did have
68
+
69
+ 18
70
+ 00:01:03,390 --> 00:01:11,670
71
+ a project prior to this where we were treating it on that dataset for age and stuff stop it's OK for
72
+
73
+ 19
74
+ 00:01:11,670 --> 00:01:14,280
75
+ age and that's why.
76
+
77
+ 20
78
+ 00:01:14,620 --> 00:01:14,920
79
+ OK.
80
+
81
+ 21
82
+ 00:01:14,970 --> 00:01:18,240
83
+ In fact the dataset is here is not good at all.
84
+
85
+ 22
86
+ 00:01:18,240 --> 00:01:18,490
87
+ All right.
88
+
89
+ 23
90
+ 00:01:18,510 --> 00:01:21,380
91
+ So I'm going to leave that before I give it to you guys.
92
+
93
+ 24
94
+ 00:01:21,390 --> 00:01:23,070
95
+ All right back to this.
96
+
97
+ 25
98
+ 00:01:23,130 --> 00:01:23,690
99
+ OK.
100
+
101
+ 26
102
+ 00:01:24,030 --> 00:01:28,680
103
+ So either way when you're trying this model you actually can train his on his model here.
104
+
105
+ 27
106
+ 00:01:28,820 --> 00:01:29,170
107
+ All right.
108
+
109
+ 28
110
+ 00:01:29,190 --> 00:01:30,560
111
+ He has the code to train.
112
+
113
+ 29
114
+ 00:01:30,630 --> 00:01:35,180
115
+ However it is not quick and it is basically a standard training procedure.
116
+
117
+ 30
118
+ 00:01:35,190 --> 00:01:39,490
119
+ What we have done before but it is not worth the effort to do in a CPE use system.
120
+
121
+ 31
122
+ 00:01:39,510 --> 00:01:45,780
123
+ So I said you know what we will just actually just use our Pretorian models and we will just Pretorian
124
+
125
+ 32
126
+ 00:01:45,780 --> 00:01:52,860
127
+ model and it's a good exercise to basically executing someone else's model is actually it may seem like
128
+
129
+ 33
130
+ 00:01:52,860 --> 00:01:55,620
131
+ cheating sometimes but you can learn a lot.
132
+
133
+ 34
134
+ 00:01:55,620 --> 00:01:56,180
135
+ All right.
136
+
137
+ 35
138
+ 00:01:56,220 --> 00:01:58,120
139
+ So I'll quickly.
140
+
141
+ 36
142
+ 00:01:58,150 --> 00:01:59,970
143
+ This is his project here.
144
+
145
+ 37
146
+ 00:02:00,180 --> 00:02:01,270
147
+ Let's bring this up.
148
+
149
+ 38
150
+ 00:02:01,290 --> 00:02:02,780
151
+ All right.
152
+
153
+ 39
154
+ 00:02:03,250 --> 00:02:07,890
155
+ First what are we going to do is basically because he has some of the books he should actually label
156
+
157
+ 40
158
+ 00:02:08,100 --> 00:02:09,290
159
+ what our minds are.
160
+
161
+ 41
162
+ 00:02:09,540 --> 00:02:11,350
163
+ So student quickly.
164
+
165
+ 42
166
+ 00:02:11,400 --> 00:02:11,660
167
+ All right.
168
+
169
+ 43
170
+ 00:02:11,670 --> 00:02:18,030
171
+ So this is going to be eighteen point tree it's called us.
172
+
173
+ 44
174
+ 00:02:18,770 --> 00:02:20,660
175
+ And second one we're going to run right after.
176
+
177
+ 45
178
+ 00:02:20,690 --> 00:02:26,360
179
+ So we combine them both because firstly what we're going to run is his age and gender detector and Despard
180
+
181
+ 46
182
+ 00:02:26,400 --> 00:02:28,600
183
+ be eighteen point tree.
184
+
185
+ 47
186
+ 00:02:28,710 --> 00:02:30,700
187
+ B look at.
188
+
189
+ 48
190
+ 00:02:30,750 --> 00:02:32,080
191
+ So now let's bring this up.
192
+
193
+ 49
194
+ 00:02:32,180 --> 00:02:33,420
195
+ Stick a look at this.
196
+
197
+ 50
198
+ 00:02:35,020 --> 00:02:35,370
199
+ Right.
200
+
201
+ 51
202
+ 00:02:35,380 --> 00:02:36,960
203
+ So this is what I think I wanted.
204
+
205
+ 52
206
+ 00:02:36,990 --> 00:02:39,440
207
+ This was his get help for his project.
208
+
209
+ 53
210
+ 00:02:39,460 --> 00:02:40,240
211
+ OK.
212
+
213
+ 54
214
+ 00:02:40,690 --> 00:02:43,020
215
+ Now I believe this was him.
216
+
217
+ 55
218
+ 00:02:43,060 --> 00:02:47,860
219
+ If I'm not mistaken there was a slight issue with his project that didn't run off the bat and I had
220
+
221
+ 56
222
+ 00:02:47,860 --> 00:02:49,400
223
+ to make some changes in the file.
224
+
225
+ 57
226
+ 00:02:49,720 --> 00:02:51,540
227
+ Luckily I did make the changes in the file.
228
+
229
+ 58
230
+ 00:02:51,540 --> 00:02:56,620
231
+ So if you you don't have to put this code from scratch his code is already in this field that we saw
232
+
233
+ 59
234
+ 00:02:56,620 --> 00:02:57,680
235
+ here.
236
+
237
+ 60
238
+ 00:02:57,700 --> 00:03:03,070
239
+ So basically he what's good about this still is that he does give some instructions some if you wanted
240
+
241
+ 61
242
+ 00:03:03,070 --> 00:03:08,690
243
+ to actually train the model on you and he can trance on different data sets as well.
244
+
245
+ 62
246
+ 00:03:08,710 --> 00:03:11,460
247
+ This is that is that he included in his file.
248
+
249
+ 63
250
+ 00:03:11,860 --> 00:03:17,000
251
+ You can train Athill or you can use some of his demo file is the file we are actually going to.
252
+
253
+ 64
254
+ 00:03:17,280 --> 00:03:19,040
255
+ That's the way it's open.
256
+
257
+ 65
258
+ 00:03:19,060 --> 00:03:21,690
259
+ And you can see it here.
260
+
261
+ 66
262
+ 00:03:21,810 --> 00:03:27,160
263
+ We don't want to doing it because I already have downloaded it and basically have been trapped under
264
+
265
+ 67
266
+ 00:03:27,200 --> 00:03:28,980
267
+ I MVB basically this.
268
+
269
+ 68
270
+ 00:03:29,080 --> 00:03:31,910
271
+ I am D-B data here talks about here.
272
+
273
+ 69
274
+ 00:03:31,910 --> 00:03:37,520
275
+ Basically someone's script I am divvies Web site you know Internet Movie Database website extracted
276
+
277
+ 70
278
+ 00:03:37,520 --> 00:03:43,520
279
+ defaces and actually labeled the ages and gender which might have been a quite tedious task but they
280
+
281
+ 71
282
+ 00:03:43,520 --> 00:03:44,150
283
+ did it.
284
+
285
+ 72
286
+ 00:03:44,180 --> 00:03:44,730
287
+ OK.
288
+
289
+ 73
290
+ 00:03:45,080 --> 00:03:47,350
291
+ So these are his results here.
292
+
293
+ 74
294
+ 00:03:47,420 --> 00:03:52,730
295
+ His Pretorian model is lost is quite low for age and gender.
296
+
297
+ 75
298
+ 00:03:52,730 --> 00:03:53,860
299
+ Not that low in the end.
300
+
301
+ 76
302
+ 00:03:53,870 --> 00:03:55,490
303
+ But it's fine.
304
+
305
+ 77
306
+ 00:03:55,670 --> 00:04:03,700
307
+ And so he probably stopped at some pre-trained really something and got the Marlow's out before accuracy
308
+
309
+ 78
310
+ 00:04:04,100 --> 00:04:05,720
311
+ and the stuff went to hell.
312
+
313
+ 79
314
+ 00:04:05,720 --> 00:04:07,370
315
+ Actually I'm looking at around accuracy.
316
+
317
+ 80
318
+ 00:04:07,490 --> 00:04:10,600
319
+ This is a one blue green and blue.
320
+
321
+ 81
322
+ 00:04:10,690 --> 00:04:13,040
323
+ Like I said fine either way.
324
+
325
+ 82
326
+ 00:04:13,090 --> 00:04:15,080
327
+ Definitely we're sitting here
328
+
329
+ 83
330
+ 00:04:18,070 --> 00:04:22,800
331
+ actually is not fitting that is actually on his training data set.
332
+
333
+ 84
334
+ 00:04:22,840 --> 00:04:25,250
335
+ So this kind of results to be fair.
336
+
337
+ 85
338
+ 00:04:25,260 --> 00:04:25,890
339
+ All right.
340
+
341
+ 86
342
+ 00:04:26,110 --> 00:04:28,110
343
+ Either way it worked fine.
344
+
345
+ 87
346
+ 00:04:28,120 --> 00:04:34,690
347
+ And as an exercise for you guys if you're very if you're interested in seeing how this works his code
348
+
349
+ 88
350
+ 00:04:34,690 --> 00:04:35,510
351
+ is here.
352
+
353
+ 89
354
+ 00:04:35,630 --> 00:04:37,360
355
+ It's fairly well documented as well.
356
+
357
+ 90
358
+ 00:04:37,390 --> 00:04:42,460
359
+ So you can take a look and try doing some cool stuff with this what we're doing now is using his preaching
360
+
361
+ 91
362
+ 00:04:42,460 --> 00:04:42,860
363
+ model.
364
+
365
+ 92
366
+ 00:04:42,890 --> 00:04:51,010
367
+ So basically extracted the code from his project code is code and doing it here and I it in the book.
368
+
369
+ 93
370
+ 00:04:51,010 --> 00:04:56,110
371
+ So let's step through this code quickly it's not that different than the previous code just some little
372
+
373
+ 94
374
+ 00:04:56,110 --> 00:04:57,810
375
+ things you should note.
376
+
377
+ 95
378
+ 00:04:57,880 --> 00:05:04,220
379
+ Firstly this is how we load his model and we have to specify a hash when we look at his model.
380
+
381
+ 96
382
+ 00:05:04,360 --> 00:05:08,530
383
+ So go back to our face detector results.
384
+
385
+ 97
386
+ 00:05:08,530 --> 00:05:09,950
387
+ This is what gives us our faces.
388
+
389
+ 98
390
+ 00:05:09,970 --> 00:05:14,220
391
+ And this is what displays our results actually.
392
+
393
+ 99
394
+ 00:05:14,470 --> 00:05:15,340
395
+ Let me just check something.
396
+
397
+ 100
398
+ 00:05:15,340 --> 00:05:17,610
399
+ I don't believe I'm using that function any more.
400
+
401
+ 101
402
+ 00:05:17,980 --> 00:05:20,420
403
+ And I am right I am not using it anymore.
404
+
405
+ 102
406
+ 00:05:20,430 --> 00:05:23,700
407
+ So that's to it on.
408
+
409
+ 103
410
+ 00:05:23,760 --> 00:05:25,050
411
+ Totally unnecessary.
412
+
413
+ 104
414
+ 00:05:25,050 --> 00:05:26,400
415
+ All right.
416
+
417
+ 105
418
+ 00:05:26,400 --> 00:05:29,020
419
+ So now these are the moral premises here.
420
+
421
+ 106
422
+ 00:05:29,220 --> 00:05:29,900
423
+ OK.
424
+
425
+ 107
426
+ 00:05:30,270 --> 00:05:31,360
427
+ Just say no.
428
+
429
+ 108
430
+ 00:05:31,380 --> 00:05:34,040
431
+ Can I leave this alone ok dept OK.
432
+
433
+ 109
434
+ 00:05:34,100 --> 00:05:39,230
435
+ With none is all the specific specifications for loading his model.
436
+
437
+ 110
438
+ 00:05:39,270 --> 00:05:39,820
439
+ OK.
440
+
441
+ 111
442
+ 00:05:39,990 --> 00:05:41,180
443
+ His pre-trained weights.
444
+
445
+ 112
446
+ 00:05:41,190 --> 00:05:46,850
447
+ So we do that so we have his model we loaded it we got the weights and everything from his model and
448
+
449
+ 113
450
+ 00:05:46,870 --> 00:05:50,770
451
+ each the EF 5 format.
452
+
453
+ 114
454
+ 00:05:51,080 --> 00:05:54,450
455
+ So no set up stuff.
456
+
457
+ 115
458
+ 00:05:54,630 --> 00:05:55,820
459
+ We get his model.
460
+
461
+ 116
462
+ 00:05:55,920 --> 00:06:00,250
463
+ Fitbit does or does not go with his model here and model that load.
464
+
465
+ 117
466
+ 00:06:00,270 --> 00:06:02,600
467
+ So we get his model now officially.
468
+
469
+ 118
470
+ 00:06:02,600 --> 00:06:03,420
471
+ All right.
472
+
473
+ 119
474
+ 00:06:03,810 --> 00:06:06,500
475
+ So we initialize a webcam here.
476
+
477
+ 120
478
+ 00:06:06,510 --> 00:06:10,940
479
+ Standard stuff we said is if you've seen before we get this.
480
+
481
+ 121
482
+ 00:06:11,070 --> 00:06:15,840
483
+ Now what I've done I've made some changes to this code because the previous code didn't tell you.
484
+
485
+ 122
486
+ 00:06:15,900 --> 00:06:21,780
487
+ But if a second person appeared in the diagram it would only show one if you guys are the time it doesn't
488
+
489
+ 123
490
+ 00:06:21,780 --> 00:06:23,210
491
+ show more than one face.
492
+
493
+ 124
494
+ 00:06:23,360 --> 00:06:25,380
495
+ Oh it may have him trash.
496
+
497
+ 125
498
+ 00:06:25,410 --> 00:06:27,380
499
+ I'm not even sure you can do it.
500
+
501
+ 126
502
+ 00:06:27,390 --> 00:06:28,360
503
+ Bring a friend.
504
+
505
+ 127
506
+ 00:06:28,730 --> 00:06:29,040
507
+ OK.
508
+
509
+ 128
510
+ 00:06:29,130 --> 00:06:31,250
511
+ I have no friends with me right right now the wife is out.
512
+
513
+ 129
514
+ 00:06:31,320 --> 00:06:32,370
515
+ So I can't test that.
516
+
517
+ 130
518
+ 00:06:32,640 --> 00:06:33,970
519
+ But that's OK.
520
+
521
+ 131
522
+ 00:06:33,990 --> 00:06:37,910
523
+ So this is how actually got it to work with multiple faces.
524
+
525
+ 132
526
+ 00:06:37,920 --> 00:06:40,340
527
+ So we do this here.
528
+
529
+ 133
530
+ 00:06:41,130 --> 00:06:47,430
531
+ Basically we append the faces we extract here and then what we do we use his model that we loaded here
532
+
533
+ 134
534
+ 00:06:48,050 --> 00:06:51,200
535
+ to make a prediction and this prediction gives us two things.
536
+
537
+ 135
538
+ 00:06:51,220 --> 00:06:54,450
539
+ Gender first very male female.
540
+
541
+ 136
542
+ 00:06:54,540 --> 00:06:59,640
543
+ I know there are many agendas right now but for now is history in the male and female because the genders
544
+
545
+ 137
546
+ 00:07:00,070 --> 00:07:03,270
547
+ will still look like male or female either way.
548
+
549
+ 138
550
+ 00:07:03,270 --> 00:07:04,070
551
+ So that's fine.
552
+
553
+ 139
554
+ 00:07:04,340 --> 00:07:04,930
555
+ All right.
556
+
557
+ 140
558
+ 00:07:05,920 --> 00:07:08,600
559
+ And so we have predicted ages here as well.
560
+
561
+ 141
562
+ 00:07:08,890 --> 00:07:12,580
563
+ And this is all in a bit of a funny shape here.
564
+
565
+ 142
566
+ 00:07:12,820 --> 00:07:14,380
567
+ So he does some reshipping.
568
+
569
+ 143
570
+ 00:07:14,400 --> 00:07:14,730
571
+ All right.
572
+
573
+ 144
574
+ 00:07:14,770 --> 00:07:15,640
575
+ And flattens it.
576
+
577
+ 145
578
+ 00:07:15,680 --> 00:07:17,380
579
+ And then we get the results here.
580
+
581
+ 146
582
+ 00:07:17,380 --> 00:07:20,430
583
+ So we get the predicted ages and this is good.
584
+
585
+ 147
586
+ 00:07:20,520 --> 00:07:20,900
587
+ All right.
588
+
589
+ 148
590
+ 00:07:20,920 --> 00:07:21,470
591
+ So we get it.
592
+
593
+ 149
594
+ 00:07:21,470 --> 00:07:28,270
595
+ No no what we're going to do since we have multiple we possibly have multiple cases in this file is
596
+
597
+ 150
598
+ 00:07:28,270 --> 00:07:34,850
599
+ that we go through defaces and go through the results and display the results later on here.
600
+
601
+ 151
602
+ 00:07:35,380 --> 00:07:36,170
603
+ OK.
604
+
605
+ 152
606
+ 00:07:36,700 --> 00:07:38,250
607
+ And that is it.
608
+
609
+ 153
610
+ 00:07:38,260 --> 00:07:40,110
611
+ This is what runs forward.
612
+
613
+ 154
614
+ 00:07:40,120 --> 00:07:41,290
615
+ Age and gender.
616
+
617
+ 155
618
+ 00:07:41,620 --> 00:07:42,930
619
+ So let's give it a try.
620
+
621
+ 156
622
+ 00:07:47,660 --> 00:07:51,400
623
+ Slutting probably a learning model.
624
+
625
+ 157
626
+ 00:07:51,410 --> 00:07:52,280
627
+ There we go.
628
+
629
+ 158
630
+ 00:07:52,310 --> 00:07:53,950
631
+ So that is pretty awesome.
632
+
633
+ 159
634
+ 00:07:54,260 --> 00:08:00,950
635
+ All right what I'm going to do going to bring up a picture of my funny face and try to simulate this
636
+
637
+ 160
638
+ 00:08:01,010 --> 00:08:05,960
639
+ with another person or a pretend person we're going to use
640
+
641
+ 161
642
+ 00:08:10,180 --> 00:08:12,540
643
+ I guess is actually hard and I taught
644
+
645
+ 162
646
+ 00:08:17,740 --> 00:08:18,030
647
+ it.
648
+
649
+ 163
650
+ 00:08:18,120 --> 00:08:24,160
651
+ So my friend here she has some good face pictures.
652
+
653
+ 164
654
+ 00:08:24,270 --> 00:08:27,200
655
+ I hope she's OK with me using this right.
656
+
657
+ 165
658
+ 00:08:27,600 --> 00:08:30,700
659
+ OK so I'm going to bring this up here.
660
+
661
+ 166
662
+ 00:08:36,270 --> 00:08:36,560
663
+ OK.
664
+
665
+ 167
666
+ 00:08:36,600 --> 00:08:41,530
667
+ So this is me this is my friend is not detecting.
668
+
669
+ 168
670
+ 00:08:41,550 --> 00:08:42,200
671
+ Oh there we go.
672
+
673
+ 169
674
+ 00:08:43,220 --> 00:08:44,630
675
+ Thinks she's a guy.
676
+
677
+ 170
678
+ 00:08:44,680 --> 00:08:45,790
679
+ I have no idea.
680
+
681
+ 171
682
+ 00:08:45,820 --> 00:08:47,470
683
+ She doesn't need any of this.
684
+
685
+ 172
686
+ 00:08:47,660 --> 00:08:49,210
687
+ And so I think she's a guy.
688
+
689
+ 173
690
+ 00:08:49,350 --> 00:08:52,360
691
+ Now there's a problem here and I hope you can see it.
692
+
693
+ 174
694
+ 00:08:52,480 --> 00:08:55,630
695
+ And I did actually see it in the code and didn't mention it to you guys because I was like wondering
696
+
697
+ 175
698
+ 00:08:56,260 --> 00:08:59,690
699
+ how it doesn't work the way it does supposed to doesn't it.
700
+
701
+ 176
702
+ 00:08:59,920 --> 00:09:06,790
703
+ And that's because exactly as I said it will detect the faces here but it only draws results for one
704
+
705
+ 177
706
+ 00:09:06,790 --> 00:09:09,760
707
+ prediction here because it's not a pending results results.
708
+
709
+ 178
710
+ 00:09:09,760 --> 00:09:12,700
711
+ In theory this is something I fix later on.
712
+
713
+ 179
714
+ 00:09:12,700 --> 00:09:15,240
715
+ By the way so for now though just be aware.
716
+
717
+ 180
718
+ 00:09:15,250 --> 00:09:21,800
719
+ At this age and gender detector only technically detects one face at a time not two.
720
+
721
+ 181
722
+ 00:09:22,240 --> 00:09:26,020
723
+ And basically this is something I also noted here.
724
+
725
+ 182
726
+ 00:09:26,320 --> 00:09:27,850
727
+ If you get this early.
728
+
729
+ 183
730
+ 00:09:27,970 --> 00:09:30,900
731
+ This is the area you'll see here faces not being detected.
732
+
733
+ 184
734
+ 00:09:31,120 --> 00:09:33,430
735
+ It's because your webcam has not been turned on.
736
+
737
+ 185
738
+ 00:09:33,430 --> 00:09:41,260
739
+ So basically if this happened this happens just go to devices and webcams and tick tick tick off.
740
+
741
+ 186
742
+ 00:09:41,320 --> 00:09:41,890
743
+ OK.
744
+
745
+ 187
746
+ 00:09:42,520 --> 00:09:43,250
747
+ So that is that.
748
+
749
+ 188
750
+ 00:09:43,270 --> 00:09:48,790
751
+ And if whatever reason this program crashes when you're messing with the code and the webcam isn't released
752
+
753
+ 189
754
+ 00:09:49,060 --> 00:09:55,120
755
+ that will happen quite often I'm sure just run this line separately to reclaim your webcam so that you
756
+
757
+ 190
758
+ 00:09:55,120 --> 00:09:58,220
759
+ can re-initialize it again later on.
760
+
761
+ 191
762
+ 00:09:58,240 --> 00:09:58,570
763
+ OK.
764
+
765
+ 192
766
+ 00:09:58,660 --> 00:10:03,410
767
+ So no that was age and gender and a Eating point tree.
768
+
769
+ 193
770
+ 00:10:03,640 --> 00:10:04,330
771
+ Now let's do.
772
+
773
+ 194
774
+ 00:10:04,330 --> 00:10:06,630
775
+ Age and gender with emotions.
776
+
777
+ 195
778
+ 00:10:06,640 --> 00:10:08,430
779
+ That's the cool part.
780
+
781
+ 196
782
+ 00:10:08,680 --> 00:10:10,800
783
+ That's deep surveillance partner.
784
+
785
+ 197
786
+ 00:10:10,810 --> 00:10:18,100
787
+ So again let's load all of this 10 seconds wasted of my life again.
788
+
789
+ 198
790
+ 00:10:19,810 --> 00:10:24,610
791
+ Yoga seemed faster this time maybe me complain.
792
+
793
+ 199
794
+ 00:10:25,130 --> 00:10:25,460
795
+ OK.
796
+
797
+ 200
798
+ 00:10:25,550 --> 00:10:30,290
799
+ So no this is testing emotion agent and using a webcam.
800
+
801
+ 201
802
+ 00:10:30,430 --> 00:10:37,350
803
+ Now I'm not going to go through this code in super detail because it is a bit messy.
804
+
805
+ 202
806
+ 00:10:37,880 --> 00:10:43,640
807
+ However what I'm going to tell you is that I have manipulated this code to support two faces.
808
+
809
+ 203
810
+ 00:10:43,640 --> 00:10:44,590
811
+ All right.
812
+
813
+ 204
814
+ 00:10:44,720 --> 00:10:46,800
815
+ So let's give this ago.
816
+
817
+ 205
818
+ 00:10:47,270 --> 00:10:49,880
819
+ Bring up my friend's Instagram picture again
820
+
821
+ 206
822
+ 00:10:54,990 --> 00:10:55,330
823
+ OK.
824
+
825
+ 207
826
+ 00:10:55,360 --> 00:10:57,490
827
+ So we got an error.
828
+
829
+ 208
830
+ 00:10:57,610 --> 00:10:59,050
831
+ Color is not defined.
832
+
833
+ 209
834
+ 00:10:59,210 --> 00:11:02,290
835
+ Like I'm going to pause this and see what is happening here.
836
+
837
+ 210
838
+ 00:11:02,770 --> 00:11:04,430
839
+ I know exactly what's happening.
840
+
841
+ 211
842
+ 00:11:04,480 --> 00:11:07,040
843
+ I was trying to do some color manipulation before.
844
+
845
+ 212
846
+ 00:11:07,260 --> 00:11:07,540
847
+ OK.
848
+
849
+ 213
850
+ 00:11:07,610 --> 00:11:12,140
851
+ CBG are Spudis get back green OK.
852
+
853
+ 214
854
+ 00:11:12,240 --> 00:11:13,430
855
+ It is going to.
856
+
857
+ 215
858
+ 00:11:13,700 --> 00:11:18,780
859
+ Oh dammit what happened is that it claimed my webcam.
860
+
861
+ 216
862
+ 00:11:18,940 --> 00:11:21,900
863
+ It's not going to get an image.
864
+
865
+ 217
866
+ 00:11:21,910 --> 00:11:27,850
867
+ So what we can do as I mentioned before and I did not put it in this file accessible area just put a
868
+
869
+ 218
870
+ 00:11:27,850 --> 00:11:34,690
871
+ cell below and reclean my webcam so I can run this finally again
872
+
873
+ 219
874
+ 00:11:37,860 --> 00:11:38,720
875
+ I attempted it.
876
+
877
+ 220
878
+ 00:11:38,750 --> 00:11:39,300
879
+ There we go.
880
+
881
+ 221
882
+ 00:11:40,460 --> 00:11:42,150
883
+ So yeah I'm not actually sad.
884
+
885
+ 222
886
+ 00:11:42,260 --> 00:11:44,710
887
+ I don't know why it's tinks I'm sad no neutral.
888
+
889
+ 223
890
+ 00:11:45,060 --> 00:11:50,230
891
+ And bring up the multiple face Gladys's Merde is confusing.
892
+
893
+ 224
894
+ 00:11:52,290 --> 00:11:55,380
895
+ At still in.
896
+
897
+ 225
898
+ 00:11:55,400 --> 00:11:58,780
899
+ Why I doing a in front of me.
900
+
901
+ 226
902
+ 00:12:00,900 --> 00:12:01,850
903
+ All right.
904
+
905
+ 227
906
+ 00:12:01,910 --> 00:12:03,040
907
+ Come on detect
908
+
909
+ 228
910
+ 00:12:12,320 --> 00:12:13,630
911
+ get a brief second.
912
+
913
+ 229
914
+ 00:12:13,700 --> 00:12:15,680
915
+ But let me just bring this.
916
+
917
+ 230
918
+ 00:12:15,760 --> 00:12:18,800
919
+ This is much easier when you actually have a real person next to you.
920
+
921
+ 231
922
+ 00:12:21,690 --> 00:12:22,220
923
+ OK.
924
+
925
+ 232
926
+ 00:12:22,420 --> 00:12:24,430
927
+ I'm going to pause this and get a better picture.
928
+
929
+ 233
930
+ 00:12:24,430 --> 00:12:25,270
931
+ No offense.
932
+
933
+ 234
934
+ 00:12:25,430 --> 00:12:27,920
935
+ This picture has not been affected.
936
+
937
+ 235
938
+ 00:12:28,480 --> 00:12:28,850
939
+ OK.
940
+
941
+ 236
942
+ 00:12:28,990 --> 00:12:37,920
943
+ So hold one second because the recording that really put it on as fast forward this bit while I just
944
+
945
+ 237
946
+ 00:12:37,920 --> 00:12:44,240
947
+ waste time like.
948
+
949
+ 238
950
+ 00:12:44,410 --> 00:12:44,700
951
+ All right.
952
+
953
+ 239
954
+ 00:12:44,710 --> 00:12:52,290
955
+ So I found a picture of my wife I may use from a Facebook profile maybe I should use like stock images.
956
+
957
+ 240
958
+ 00:12:52,290 --> 00:12:55,610
959
+ It seems like these images don't have the best luck but we'll never know.
960
+
961
+ 241
962
+ 00:12:55,610 --> 00:12:55,820
963
+ All right.
964
+
965
+ 242
966
+ 00:12:55,820 --> 00:12:59,060
967
+ So let's put it in front here.
968
+
969
+ 243
970
+ 00:12:59,240 --> 00:13:08,560
971
+ I just saw a detective come back from back in says I actually thought it looks because now I can see
972
+
973
+ 244
974
+ 00:13:08,560 --> 00:13:09,320
975
+ my screen.
976
+
977
+ 245
978
+ 00:13:10,210 --> 00:13:14,590
979
+ And I can see my face and it's not detecting my face.
980
+
981
+ 246
982
+ 00:13:14,590 --> 00:13:16,370
983
+ Maybe I should just bring this back slightly.
984
+
985
+ 247
986
+ 00:13:16,380 --> 00:13:17,320
987
+ There we go.
988
+
989
+ 248
990
+ 00:13:17,830 --> 00:13:18,230
991
+ OK.
992
+
993
+ 249
994
+ 00:13:18,370 --> 00:13:24,670
995
+ So I know my wife is going to be happy with this but I think she's a guy who is 24 years old.
996
+
997
+ 250
998
+ 00:13:24,940 --> 00:13:31,160
999
+ Probably because it's from a cell phone and the skill is definitely going to be off because of it.
1000
+
1001
+ 251
1002
+ 00:13:31,450 --> 00:13:33,930
1003
+ I'm 28 when I'm 24.
1004
+
1005
+ 252
1006
+ 00:13:34,330 --> 00:13:35,400
1007
+ Damn you.
1008
+
1009
+ 253
1010
+ 00:13:35,500 --> 00:13:37,230
1011
+ Sorry to deduct it.
1012
+
1013
+ 254
1014
+ 00:13:37,570 --> 00:13:37,830
1015
+ OK.
1016
+
1017
+ 255
1018
+ 00:13:37,870 --> 00:13:40,620
1019
+ When it starts kind of to kind of going to hold.
1020
+
1021
+ 256
1022
+ 00:13:40,660 --> 00:13:45,940
1023
+ All right so what this means here is that this code works.
1024
+
1025
+ 257
1026
+ 00:13:45,940 --> 00:13:52,420
1027
+ It's not exactly 100 percent accurate with gender detection as we can see but that probably has to do
1028
+
1029
+ 258
1030
+ 00:13:52,420 --> 00:13:53,590
1031
+ with my little fake friends.
1032
+
1033
+ 259
1034
+ 00:13:53,590 --> 00:13:55,210
1035
+ I'm using my cell phone.
1036
+
1037
+ 260
1038
+ 00:13:55,240 --> 00:13:57,010
1039
+ These people are REAL by the way.
1040
+
1041
+ 261
1042
+ 00:13:57,380 --> 00:14:02,790
1043
+ But either way this should not happen that way but it works it works.
1044
+
1045
+ 262
1046
+ 00:14:02,800 --> 00:14:06,810
1047
+ It's good that is directing multiple pictures multiple people.
1048
+
1049
+ 263
1050
+ 00:14:06,820 --> 00:14:07,120
1051
+ All right.
1052
+
1053
+ 264
1054
+ 00:14:07,120 --> 00:14:09,500
1055
+ So this is good.
1056
+
1057
+ 265
1058
+ 00:14:09,550 --> 00:14:13,680
1059
+ Let's move on to the next one images so let's run this
1060
+
1061
+ 266
1062
+ 00:14:19,570 --> 00:14:23,060
1063
+ OK so now let's run this test on images.
1064
+
1065
+ 267
1066
+ 00:14:23,070 --> 00:14:23,560
1067
+ OK.
1068
+
1069
+ 268
1070
+ 00:14:23,560 --> 00:14:27,300
1071
+ You did not load anything in my bed
1072
+
1073
+ 269
1074
+ 00:14:30,260 --> 00:14:31,460
1075
+ not that or
1076
+
1077
+ 270
1078
+ 00:14:38,140 --> 00:14:38,620
1079
+ OK.
1080
+
1081
+ 271
1082
+ 00:14:38,630 --> 00:14:44,480
1083
+ So let's test this on some test images and you can place those images in the images folder.
1084
+
1085
+ 272
1086
+ 00:14:44,930 --> 00:14:45,550
1087
+ Will show you here.
1088
+
1089
+ 273
1090
+ 00:14:45,590 --> 00:14:45,910
1091
+ OK.
1092
+
1093
+ 274
1094
+ 00:14:45,980 --> 00:14:49,840
1095
+ So let's look at Donald Trump oh this is quite funny.
1096
+
1097
+ 275
1098
+ 00:14:49,950 --> 00:14:57,440
1099
+ I think it's a female 59 looks much older in my opinion but fifty nine and said It's me again.
1100
+
1101
+ 276
1102
+ 00:14:57,460 --> 00:15:01,050
1103
+ I did some extra years to me because I'm actually actually 24.
1104
+
1105
+ 277
1106
+ 00:15:01,140 --> 00:15:02,310
1107
+ But fair enough.
1108
+
1109
+ 278
1110
+ 00:15:03,940 --> 00:15:04,330
1111
+ All right.
1112
+
1113
+ 279
1114
+ 00:15:04,330 --> 00:15:06,560
1115
+ Queen Elizabeth and something funny is happening.
1116
+
1117
+ 280
1118
+ 00:15:06,640 --> 00:15:10,930
1119
+ It actually thinks this bouquet of roses is a guy who took the ideas off.
1120
+
1121
+ 281
1122
+ 00:15:11,170 --> 00:15:16,360
1123
+ Clearly this is a bad example of Mitt not working too well.
1124
+
1125
+ 282
1126
+ 00:15:16,840 --> 00:15:17,710
1127
+ But there's a reason for that.
1128
+
1129
+ 283
1130
+ 00:15:17,720 --> 00:15:18,870
1131
+ I'll tell you afterwards.
1132
+
1133
+ 284
1134
+ 00:15:18,910 --> 00:15:19,800
1135
+ OK.
1136
+
1137
+ 285
1138
+ 00:15:20,470 --> 00:15:21,270
1139
+ Barack Obama.
1140
+
1141
+ 286
1142
+ 00:15:21,460 --> 00:15:24,260
1143
+ Male angry to the one not the best.
1144
+
1145
+ 287
1146
+ 00:15:24,400 --> 00:15:25,480
1147
+ Is my wife.
1148
+
1149
+ 288
1150
+ 00:15:25,480 --> 00:15:30,870
1151
+ Female tity she actually was tastiness Richardsons quite good and she was definitely happy here.
1152
+
1153
+ 289
1154
+ 00:15:31,100 --> 00:15:31,590
1155
+ All right.
1156
+
1157
+ 290
1158
+ 00:15:31,660 --> 00:15:35,350
1159
+ She was actually this was the two days of the pictures were two days apart.
1160
+
1161
+ 291
1162
+ 00:15:35,380 --> 00:15:37,860
1163
+ So she aged a year in those two days.
1164
+
1165
+ 292
1166
+ 00:15:38,260 --> 00:15:38,560
1167
+ OK.
1168
+
1169
+ 293
1170
+ 00:15:38,590 --> 00:15:40,040
1171
+ So that's cool.
1172
+
1173
+ 294
1174
+ 00:15:40,240 --> 00:15:41,750
1175
+ Now I said I'll tell you why.
1176
+
1177
+ 295
1178
+ 00:15:42,010 --> 00:15:49,350
1179
+ That's because when we were doing this here we were basically Horncastle pacifies don't crop enough
1180
+
1181
+ 296
1182
+ 00:15:49,350 --> 00:15:55,090
1183
+ of the face out even with the default settings here which are removed because you may have seen it in
1184
+
1185
+ 297
1186
+ 00:15:55,100 --> 00:16:01,710
1187
+ a court because it just cut and paste some of these images discotheques or videos before we actually
1188
+
1189
+ 298
1190
+ 00:16:01,710 --> 00:16:04,070
1191
+ had some cropping being done.
1192
+
1193
+ 299
1194
+ 00:16:04,140 --> 00:16:07,080
1195
+ And I took it out because it was less accurate.
1196
+
1197
+ 300
1198
+ 00:16:07,410 --> 00:16:10,350
1199
+ But now anyway so you libs facial recognition.
1200
+
1201
+ 301
1202
+ 00:16:10,350 --> 00:16:12,380
1203
+ Let's give this a try.
1204
+
1205
+ 302
1206
+ 00:16:12,390 --> 00:16:13,470
1207
+ Let's quickly run this
1208
+
1209
+ 303
1210
+ 00:16:17,750 --> 00:16:19,830
1211
+ and I'll tell you about dealing with starting in the back row here.
1212
+
1213
+ 304
1214
+ 00:16:19,850 --> 00:16:21,180
1215
+ I'll tell you about dealer dealer.
1216
+
1217
+ 305
1218
+ 00:16:21,220 --> 00:16:27,460
1219
+ It is it's a machine learning basically package that was built in C++ and you can use it into Python
1220
+
1221
+ 306
1222
+ 00:16:27,800 --> 00:16:29,210
1223
+ and it does a bunch of cool stuff.
1224
+
1225
+ 307
1226
+ 00:16:29,210 --> 00:16:34,430
1227
+ And what I'm using it here for is for phase detection but it actually can do facial recognition.
1228
+
1229
+ 308
1230
+ 00:16:34,460 --> 00:16:39,480
1231
+ In my open C-v course in the past I haven't included in discourse.
1232
+
1233
+ 309
1234
+ 00:16:39,590 --> 00:16:47,170
1235
+ You actually do use Nonpoint recognition on faces for some cool projects like your detection and the
1236
+
1237
+ 310
1238
+ 00:16:47,540 --> 00:16:50,400
1239
+ swaps as well more advanced with swaps.
1240
+
1241
+ 311
1242
+ 00:16:50,390 --> 00:16:50,680
1243
+ All right.
1244
+
1245
+ 312
1246
+ 00:16:50,740 --> 00:16:53,660
1247
+ So I'm sorry I'm retarded.
1248
+
1249
+ 313
1250
+ 00:16:53,780 --> 00:16:55,690
1251
+ We are using the live on images here.
1252
+
1253
+ 314
1254
+ 00:16:55,700 --> 00:17:00,110
1255
+ I really should label this that I think in the last section used to live on the webcam.
1256
+
1257
+ 315
1258
+ 00:17:00,140 --> 00:17:01,120
1259
+ Yes.
1260
+
1261
+ 316
1262
+ 00:17:01,250 --> 00:17:01,970
1263
+ OK.
1264
+
1265
+ 317
1266
+ 00:17:02,120 --> 00:17:02,980
1267
+ So again.
1268
+
1269
+ 318
1270
+ 00:17:03,100 --> 00:17:04,280
1271
+ So it actually is.
1272
+
1273
+ 319
1274
+ 00:17:04,340 --> 00:17:08,820
1275
+ Now a trump he actually aged two years less fend off me.
1276
+
1277
+ 320
1278
+ 00:17:08,980 --> 00:17:09,420
1279
+ Oh.
1280
+
1281
+ 321
1282
+ 00:17:09,430 --> 00:17:10,790
1283
+ You know when you're younger.
1284
+
1285
+ 322
1286
+ 00:17:10,790 --> 00:17:11,900
1287
+ That's good.
1288
+
1289
+ 323
1290
+ 00:17:12,830 --> 00:17:14,130
1291
+ Fifty four female OK.
1292
+
1293
+ 324
1294
+ 00:17:14,200 --> 00:17:14,980
1295
+ Fair enough.
1296
+
1297
+ 325
1298
+ 00:17:14,980 --> 00:17:16,420
1299
+ She has a lot.
1300
+
1301
+ 326
1302
+ 00:17:16,610 --> 00:17:18,030
1303
+ In that picture.
1304
+
1305
+ 327
1306
+ 00:17:18,580 --> 00:17:20,200
1307
+ That's fine.
1308
+
1309
+ 328
1310
+ 00:17:20,290 --> 00:17:21,500
1311
+ Tony correct.
1312
+
1313
+ 329
1314
+ 00:17:21,970 --> 00:17:27,220
1315
+ Barack Obama 45 I think he was in his late 40s and this picture may be very very accurate actually is
1316
+
1317
+ 330
1318
+ 00:17:27,220 --> 00:17:28,760
1319
+ quite close.
1320
+
1321
+ 331
1322
+ 00:17:29,180 --> 00:17:35,590
1323
+ And my wife she would not be happy with each year but it's fair enough again because she was Tuti in
1324
+
1325
+ 332
1326
+ 00:17:35,590 --> 00:17:36,670
1327
+ those pictures.
1328
+
1329
+ 333
1330
+ 00:17:37,000 --> 00:17:37,950
1331
+ OK but fair enough.
1332
+
1333
+ 334
1334
+ 00:17:37,950 --> 00:17:39,860
1335
+ So you can experiment with some different things.
1336
+
1337
+ 335
1338
+ 00:17:39,860 --> 00:17:41,500
1339
+ You know it's not going to be perfect.
1340
+
1341
+ 336
1342
+ 00:17:41,500 --> 00:17:45,440
1343
+ Age is actually a very hard thing to guess.
1344
+
1345
+ 337
1346
+ 00:17:45,460 --> 00:17:52,870
1347
+ I myself sometimes have thought someone was to be in it today 25 or 26 and I learned later they were
1348
+
1349
+ 338
1350
+ 00:17:52,870 --> 00:17:59,130
1351
+ like 50 which was embarrassing but they were very happy about my mis take.
1352
+
1353
+ 339
1354
+ 00:17:59,140 --> 00:18:04,170
1355
+ So anyway we can run this with the lab using the webcam so I'm going on this.
1356
+
1357
+ 340
1358
+ 00:18:04,300 --> 00:18:09,660
1359
+ And I've changed my T-shirt by the way back to the one with stain unfortunately.
1360
+
1361
+ 341
1362
+ 00:18:09,910 --> 00:18:12,340
1363
+ Hope it isn't picked up in this camera.
1364
+
1365
+ 342
1366
+ 00:18:12,340 --> 00:18:14,070
1367
+ No it does not.
1368
+
1369
+ 343
1370
+ 00:18:14,380 --> 00:18:15,620
1371
+ And the bedroom doesn't open.
1372
+
1373
+ 344
1374
+ 00:18:15,640 --> 00:18:16,660
1375
+ Yeah it's fine.
1376
+
1377
+ 345
1378
+ 00:18:16,960 --> 00:18:18,280
1379
+ So again this is quite cool.
1380
+
1381
+ 346
1382
+ 00:18:18,280 --> 00:18:22,570
1383
+ You can definitely see dealer has a slew of film right here OK.
1384
+
1385
+ 347
1386
+ 00:18:22,940 --> 00:18:27,290
1387
+ And literally did my hair thing five seconds 10 seconds before.
1388
+
1389
+ 348
1390
+ 00:18:27,610 --> 00:18:27,970
1391
+ OK.
1392
+
1393
+ 349
1394
+ 00:18:28,120 --> 00:18:29,820
1395
+ So close.
1396
+
1397
+ 350
1398
+ 00:18:30,340 --> 00:18:35,020
1399
+ So the advantage of the lid though is that it is better at picking up faces.
1400
+
1401
+ 351
1402
+ 00:18:35,050 --> 00:18:40,600
1403
+ One thing you should have noted noted here is that it didn't mystic Queen Elizabeth's bouquet for a
1404
+
1405
+ 352
1406
+ 00:18:40,600 --> 00:18:43,610
1407
+ face like what a hawk husky classified did.
1408
+
1409
+ 353
1410
+ 00:18:43,870 --> 00:18:47,170
1411
+ So you can generally see that as better and more robust.
1412
+
1413
+ 354
1414
+ 00:18:47,180 --> 00:18:53,650
1415
+ Remember for Cutie Caskie classifies it or this was a face that it was a guy which I would think it
1416
+
1417
+ 355
1418
+ 00:18:53,640 --> 00:18:54,240
1419
+ was a female.
1420
+
1421
+ 356
1422
+ 00:18:54,310 --> 00:18:57,660
1423
+ Either way if it was a face that is.
1424
+
1425
+ 357
1426
+ 00:18:58,270 --> 00:19:00,340
1427
+ But that's fine.
1428
+
1429
+ 358
1430
+ 00:19:00,340 --> 00:19:04,650
1431
+ So what I'm saying Bakassi cast philosophise a definitely faster.
1432
+
1433
+ 359
1434
+ 00:19:04,920 --> 00:19:09,850
1435
+ So if speed is your concern or if your hardware if it's an embedded system you can use Hawkhurst justifies
1436
+
1437
+ 360
1438
+ 00:19:10,300 --> 00:19:16,920
1439
+ and maybe tweak some of the parameters like this scaling parameters especially this definitely helps.
1440
+
1441
+ 361
1442
+ 00:19:16,960 --> 00:19:23,770
1443
+ But these settings are generally well understood as the gold standard Woodhall Cascades like this is
1444
+
1445
+ 362
1446
+ 00:19:23,780 --> 00:19:27,330
1447
+ the best depending on your application.
1448
+
1449
+ 363
1450
+ 00:19:27,350 --> 00:19:29,020
1451
+ But generally this is the best.
1452
+
1453
+ 364
1454
+ 00:19:29,500 --> 00:19:35,590
1455
+ So a dealer may be more suited for some applications if its accuracy especially is of importance.
1456
+
1457
+ 365
1458
+ 00:19:35,590 --> 00:19:37,540
1459
+ OK so that's it for this chapter.
1460
+
1461
+ 366
1462
+ 00:19:37,540 --> 00:19:42,160
1463
+ I hope you enjoyed it and I hope you build something cool out of this and I hope you expand upon it
1464
+
1465
+ 367
1466
+ 00:19:42,160 --> 00:19:42,490
1467
+ too.
1468
+
1469
+ 368
1470
+ 00:19:42,510 --> 00:19:47,980
1471
+ You can actually take this code now and train and train over these models or add in some new stuff as
1472
+
1473
+ 369
1474
+ 00:19:47,980 --> 00:19:48,370
1475
+ well.
1476
+
1477
+ 370
1478
+ 00:19:48,490 --> 00:19:50,010
1479
+ And ethnicity as well.
1480
+
1481
+ 371
1482
+ 00:19:50,020 --> 00:19:57,380
1483
+ You can actually now take some of this data sets here like this one for one and maybe you can if you
1484
+
1485
+ 372
1486
+ 00:19:57,380 --> 00:20:02,410
1487
+ can find like a label that you can manually label the agenda from this as well to get even more accurate
1488
+
1489
+ 373
1490
+ 00:20:02,470 --> 00:20:04,060
1491
+ gender recognition.
1492
+
1493
+ 374
1494
+ 00:20:04,060 --> 00:20:05,310
1495
+ So this is pretty cool.
1496
+
1497
+ 375
1498
+ 00:20:05,350 --> 00:20:12,850
1499
+ And the challenge and in bringing both of these together which I probably failed to mention is that
1500
+
1501
+ 376
1502
+ 00:20:13,530 --> 00:20:16,260
1503
+ they're both taking different sized faces to boot.
1504
+
1505
+ 377
1506
+ 00:20:16,270 --> 00:20:18,940
1507
+ One is ticking and ticking in a color image.
1508
+
1509
+ 378
1510
+ 00:20:19,120 --> 00:20:20,790
1511
+ That's the age and gender sticking.
1512
+
1513
+ 379
1514
+ 00:20:20,820 --> 00:20:26,730
1515
+ Taking in a color image that's 64 by 64 whereas the other one which is all in motion detectors ticking
1516
+
1517
+ 380
1518
+ 00:20:26,800 --> 00:20:30,410
1519
+ in a smaller and gray scale image.
1520
+
1521
+ 381
1522
+ 00:20:30,460 --> 00:20:34,780
1523
+ So you have to sort of do some a little more processing here in our pipeline.
1524
+
1525
+ 382
1526
+ 00:20:35,020 --> 00:20:41,180
1527
+ And basically then you have to get the results and make sure they're lined up like the same face is
1528
+
1529
+ 383
1530
+ 00:20:41,200 --> 00:20:47,590
1531
+ tied together with his agent and the emotion which is fairly easy anyway and then plus and take it correctly
1532
+
1533
+ 384
1534
+ 00:20:47,590 --> 00:20:51,480
1535
+ and then making sure the labels follow the face around the image.
1536
+
1537
+ 385
1538
+ 00:20:51,520 --> 00:20:54,390
1539
+ So it was a fun piece of code to build.
1540
+
1541
+ 386
1542
+ 00:20:54,400 --> 00:20:55,270
1543
+ To be fair.
1544
+
1545
+ 387
1546
+ 00:20:55,470 --> 00:20:55,760
1547
+ But.
18. Facial Applications - Emotion, Age & Gender Recognition/3.1 Download weights file.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/17kyPQfUyk2un-d-XYFT54R0q9vvSxbuJ/view?usp=sharing";</script>
18. Facial Applications - Emotion, Age & Gender Recognition/3.2 Code and files required for project.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/1lI_gZM9QuxjyRIKGvOm63d1td5lgRz4i/view?usp=sharing";</script>
19. Medical Imaging - Image Segmentation with U-Net/1. Chapter Overview on Image Segmentation & Medical Imaging in U-Net.srt ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,400 --> 00:00:00,750
3
+ OK.
4
+
5
+ 2
6
+ 00:00:00,810 --> 00:00:06,330
7
+ So let's move on to image segmentation and I'll talk a bit about medical imaging and the unit which
8
+
9
+ 3
10
+ 00:00:06,330 --> 00:00:10,000
11
+ is a very cool CNN that does image segmentation.
12
+
13
+ 4
14
+ 00:00:10,060 --> 00:00:15,560
15
+ So this section is built up into four parts Firstly discuss what is segmentation Exactly.
16
+
17
+ 5
18
+ 00:00:15,820 --> 00:00:19,300
19
+ And I provide some examples of applications in medical imaging.
20
+
21
+ 6
22
+ 00:00:19,480 --> 00:00:25,720
23
+ Dennis are talking about units and how it applies to image segmentation and CNN's Then I did find some
24
+
25
+ 7
26
+ 00:00:25,720 --> 00:00:30,310
27
+ units so we are going to need to know about which is the intersection of union metric and then we do
28
+
29
+ 8
30
+ 00:00:30,310 --> 00:00:35,580
31
+ a final project in the shop too where we find a nuclear nuclei and I frequent images.
19. Medical Imaging - Image Segmentation with U-Net/2. What is Segmentation And Applications in Medical Imaging.srt ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,960 --> 00:00:07,350
3
+ Hi and welcome to chapter or section nineteen point one where we talk about segmentation and its applications
4
+
5
+ 2
6
+ 00:00:07,350 --> 00:00:09,450
7
+ in medical imaging.
8
+
9
+ 3
10
+ 00:00:09,520 --> 00:00:12,280
11
+ So what exactly is image segmentation.
12
+
13
+ 4
14
+ 00:00:12,280 --> 00:00:18,040
15
+ So the goal of segmentation is to separate different parts of an image into sensible coherent parts.
16
+
17
+ 5
18
+ 00:00:18,040 --> 00:00:19,740
19
+ And what do I mean by that.
20
+
21
+ 6
22
+ 00:00:19,810 --> 00:00:22,500
23
+ Basically you see this cat and this dog.
24
+
25
+ 7
26
+ 00:00:22,540 --> 00:00:26,490
27
+ And then there's a background definitely in the back to a bed or couch.
28
+
29
+ 8
30
+ 00:00:26,680 --> 00:00:28,600
31
+ And then there are some kittens in the back.
32
+
33
+ 9
34
+ 00:00:28,600 --> 00:00:34,400
35
+ What we want in a window actually what we want is basically to do pixel level predictions.
36
+
37
+ 10
38
+ 00:00:34,480 --> 00:00:39,640
39
+ So we want to actually know exactly what pixels here belong to the dog what belongs to the cat and what
40
+
41
+ 11
42
+ 00:00:39,640 --> 00:00:41,510
43
+ belong to the objects in the background.
44
+
45
+ 12
46
+ 00:00:41,860 --> 00:00:45,620
47
+ So as you can see it's a pretty challenging task because before.
48
+
49
+ 13
50
+ 00:00:45,850 --> 00:00:50,140
51
+ Well what we were doing was basically a prediction of the entire image.
52
+
53
+ 14
54
+ 00:00:50,140 --> 00:00:55,360
55
+ And basically if we fed this declassify it would probably give a high probability of what cat and the
56
+
57
+ 15
58
+ 00:00:55,360 --> 00:00:57,080
59
+ dog being in the picture.
60
+
61
+ 16
62
+ 00:00:57,430 --> 00:01:00,540
63
+ But now what we want to do is actually segment the image now.
64
+
65
+ 17
66
+ 00:01:00,610 --> 00:01:02,400
67
+ So how do we go about doing this.
68
+
69
+ 18
70
+ 00:01:02,530 --> 00:01:06,800
71
+ And before you even explain that let's talk about two types of segmentation.
72
+
73
+ 19
74
+ 00:01:08,510 --> 00:01:14,510
75
+ So the first type is called semantic segmentation and basically this is just pixel level predictions
76
+
77
+ 20
78
+ 00:01:15,260 --> 00:01:16,870
79
+ based on defined classes.
80
+
81
+ 21
82
+ 00:01:16,880 --> 00:01:19,740
83
+ So we have example Rood's persons cause entry.
84
+
85
+ 22
86
+ 00:01:19,910 --> 00:01:26,720
87
+ So it's simple enough to understand not simple to do but we can pretty much see how it's done here and
88
+
89
+ 23
90
+ 00:01:26,720 --> 00:01:32,900
91
+ you can imagine this has a lot of application in self-driving cars because now you need to know what
92
+
93
+ 24
94
+ 00:01:32,900 --> 00:01:37,470
95
+ is a road what is a building what are like ampoules people those sorts of things.
96
+
97
+ 25
98
+ 00:01:37,730 --> 00:01:40,200
99
+ So it's going to be quite useful for that application.
100
+
101
+ 26
102
+ 00:01:41,730 --> 00:01:42,420
103
+ Type 2.
104
+
105
+ 27
106
+ 00:01:42,460 --> 00:01:44,610
107
+ So no we're doing two different things.
108
+
109
+ 28
110
+ 00:01:44,640 --> 00:01:49,890
111
+ We're doing pixel level predictions which is exactly what we did before but now we're actually doing
112
+
113
+ 29
114
+ 00:01:50,010 --> 00:01:54,340
115
+ object detection and actually object identification as well.
116
+
117
+ 30
118
+ 00:01:54,480 --> 00:01:57,290
119
+ So we know Person 1 to call 1 and 2.
120
+
121
+ 31
122
+ 00:01:57,330 --> 00:02:03,050
123
+ So this is a more advanced level of segmentation.
124
+
125
+ 32
126
+ 00:02:03,120 --> 00:02:05,870
127
+ So let's talk a bit about applications and medical imaging.
128
+
129
+ 33
130
+ 00:02:06,090 --> 00:02:12,630
131
+ So as you know a lot of medical imaging necessitates finding and accurately labeling basically things
132
+
133
+ 34
134
+ 00:02:12,630 --> 00:02:19,080
135
+ we find in these scans because if you've seen a lot of these medical imaging scans are very very hard
136
+
137
+ 35
138
+ 00:02:19,080 --> 00:02:19,720
139
+ to interpret.
140
+
141
+ 36
142
+ 00:02:19,740 --> 00:02:24,870
143
+ And I think you need some very skilled professionals analyzing and accurately assessing what they see
144
+
145
+ 37
146
+ 00:02:24,990 --> 00:02:31,340
147
+ in those pictures generated from the different scans and basically Often this task is actually there.
148
+
149
+ 38
150
+ 00:02:31,370 --> 00:02:34,810
151
+ There is a lot of advance software being used in these tests.
152
+
153
+ 39
154
+ 00:02:35,010 --> 00:02:40,920
155
+ However they still require a human to actually go through it and maybe label things properly so that
156
+
157
+ 40
158
+ 00:02:41,010 --> 00:02:42,770
159
+ you know the machine isn't alone.
160
+
161
+ 41
162
+ 00:02:42,780 --> 00:02:43,360
163
+ All right.
164
+
165
+ 42
166
+ 00:02:43,410 --> 00:02:48,420
167
+ However this is definitely definitely a task where you can improve it because humans are definitely
168
+
169
+ 43
170
+ 00:02:48,420 --> 00:02:51,750
171
+ prone to error and there are a number of applications mainly to him.
172
+
173
+ 44
174
+ 00:02:51,820 --> 00:02:58,380
175
+ And I don't mean that are being done by convolutional your own that's an advanced neural nets.
176
+
177
+ 45
178
+ 00:02:58,620 --> 00:03:04,910
179
+ So as I said is a huge initiative to use can be division and planning to automate many of these tests.
180
+
181
+ 46
182
+ 00:03:04,980 --> 00:03:11,040
183
+ So it's a lot of tests that can be improved with computer vision not just her but surgery which is actually
184
+
185
+ 47
186
+ 00:03:11,040 --> 00:03:15,230
187
+ going to be a main application in the future for the division.
188
+
189
+ 48
190
+ 00:03:15,240 --> 00:03:21,450
191
+ However right now the trend seems to be in a lot of these medical scans things like CAT scans X-rays
192
+
193
+ 49
194
+ 00:03:21,600 --> 00:03:25,060
195
+ ultrasounds PET scans and an MRI.
196
+
197
+ 50
198
+ 00:03:25,410 --> 00:03:29,590
199
+ And so many different types of diseases to look for it.
200
+
201
+ 51
202
+ 00:03:29,610 --> 00:03:37,290
203
+ This is an ideal area of startups to take advantage of and the use cases for this are endless from cancer
204
+
205
+ 52
206
+ 00:03:37,290 --> 00:03:42,510
207
+ detection disease monitoring Alzheimer's and many many other ailments.
208
+
209
+ 53
210
+ 00:03:42,540 --> 00:03:48,270
211
+ So computer vision can definitely revolutionize the medical industry and improve patient care and get
212
+
213
+ 54
214
+ 00:03:48,270 --> 00:03:52,460
215
+ much faster diagnostics and even be used to find cures much faster.
19. Medical Imaging - Image Segmentation with U-Net/3. U-Net Image Segmentation with CNNs.srt ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,370 --> 00:00:00,930
3
+ OK.
4
+
5
+ 2
6
+ 00:00:00,960 --> 00:00:02,840
7
+ So let's talk about usenet.
8
+
9
+ 3
10
+ 00:00:02,940 --> 00:00:06,060
11
+ Usenet is a CNN that can actually do image segmentation.
12
+
13
+ 4
14
+ 00:00:06,060 --> 00:00:08,500
15
+ So let's see how it works.
16
+
17
+ 5
18
+ 00:00:08,550 --> 00:00:15,570
19
+ Usenet was created in 2015 and it basically was a CNN specifically developed for the biomedical image
20
+
21
+ 6
22
+ 00:00:15,630 --> 00:00:20,290
23
+ segmentation task which is what we're going to use it for in our project at the end of this chapter.
24
+
25
+ 7
26
+ 00:00:20,580 --> 00:00:26,140
27
+ You know it has now become very popular for entend included decoded type networks for semantics of mentation.
28
+
29
+ 8
30
+ 00:00:26,340 --> 00:00:28,150
31
+ And it has a very unique architecture.
32
+
33
+ 9
34
+ 00:00:28,220 --> 00:00:33,570
35
+ Called updown architecture which has a contracting part an expensive part and you'll see it's in the
36
+
37
+ 10
38
+ 00:00:33,570 --> 00:00:39,120
39
+ diagram here where you can see this is how it starts the input image comes in and basically it goes
40
+
41
+ 11
42
+ 00:00:39,120 --> 00:00:41,030
43
+ down to the contracting part here.
44
+
45
+ 12
46
+ 00:00:41,460 --> 00:00:47,490
47
+ And then there's this middle area here that's called the lead story to the expensive part.
48
+
49
+ 13
50
+ 00:00:47,490 --> 00:00:47,990
51
+ OK.
52
+
53
+ 14
54
+ 00:00:48,420 --> 00:00:50,110
55
+ And basically the opposite here.
56
+
57
+ 15
58
+ 00:00:50,430 --> 00:00:53,140
59
+ So first things first take a look at us.
60
+
61
+ 16
62
+ 00:00:53,280 --> 00:00:59,760
63
+ We have an input image here and it's outputting an image or basically a segmentation map that's effectively
64
+
65
+ 17
66
+ 00:00:59,760 --> 00:01:03,270
67
+ an image because we're going to use it to segment the image.
68
+
69
+ 18
70
+ 00:01:03,300 --> 00:01:09,430
71
+ So this is a model label here the contracting part of an expensive part.
72
+
73
+ 19
74
+ 00:01:09,520 --> 00:01:14,880
75
+ So yes I wanted it caught wanted to talk to you about was a bottle looking area here.
76
+
77
+ 20
78
+ 00:01:15,080 --> 00:01:15,710
79
+ OK.
80
+
81
+ 21
82
+ 00:01:16,040 --> 00:01:22,090
83
+ So this is how unit structure works is a down sample a bottleneck and then assemble the downsampling
84
+
85
+ 22
86
+ 00:01:22,130 --> 00:01:24,660
87
+ part and unit consist of four blocks.
88
+
89
+ 23
90
+ 00:01:24,690 --> 00:01:30,120
91
+ There's these tree by tree convolutional Lia's with all these really special motion and troppo to use.
92
+
93
+ 24
94
+ 00:01:30,170 --> 00:01:32,600
95
+ And this is basically Fort Lee is here.
96
+
97
+ 25
98
+ 00:01:32,840 --> 00:01:39,440
99
+ So it is these two kindly as here two by two max pooling and then the feature maps double as we go down
100
+
101
+ 26
102
+ 00:01:40,100 --> 00:01:41,620
103
+ which is not unusual.
104
+
105
+ 27
106
+ 00:01:41,630 --> 00:01:48,370
107
+ We've seen it happen in Viji as well as starting up 64 and then going to 128 56 and 512.
108
+
109
+ 28
110
+ 00:01:48,470 --> 00:01:55,450
111
+ If you go back to diagram you can see it here 64 128 256 and 512 then it's bottlenecked here.
112
+
113
+ 29
114
+ 00:01:55,580 --> 00:01:57,740
115
+ And then we basically do some dung sampling.
116
+
117
+ 30
118
+ 00:01:57,740 --> 00:02:03,540
119
+ Again going back here to get output segmentation map sort of bottleneck.
120
+
121
+ 31
122
+ 00:02:03,540 --> 00:02:05,020
123
+ Let's talk a bit about this.
124
+
125
+ 32
126
+ 00:02:05,030 --> 00:02:09,680
127
+ This consists of two convolutional is what again Bagenal opposition and dropout.
128
+
129
+ 33
130
+ 00:02:09,740 --> 00:02:10,290
131
+ OK.
132
+
133
+ 34
134
+ 00:02:10,580 --> 00:02:13,740
135
+ Nothing majorly special here just how it works.
136
+
137
+ 35
138
+ 00:02:13,750 --> 00:02:16,120
139
+ It doesn't follow a traditional CNN architecture.
140
+
141
+ 36
142
+ 00:02:16,240 --> 00:02:17,970
143
+ What is a continuous growing.
144
+
145
+ 37
146
+ 00:02:18,110 --> 00:02:24,230
147
+ There's this continuous growing here and then is this bottleneck and this is downsampling here.
148
+
149
+ 38
150
+ 00:02:24,490 --> 00:02:29,930
151
+ So the upsampling part now the upsampling part basically consists of a similar pattern.
152
+
153
+ 39
154
+ 00:02:30,020 --> 00:02:34,840
155
+ However instead of a convolutional Liya there's something called the deconvolution layer.
156
+
157
+ 40
158
+ 00:02:35,150 --> 00:02:39,530
159
+ And then it's concatenated with the future map of the corresponding contracting part.
160
+
161
+ 41
162
+ 00:02:39,530 --> 00:02:44,210
163
+ This is basically the major area here that allows you not to work.
164
+
165
+ 42
166
+ 00:02:44,210 --> 00:02:49,690
167
+ And basically then it has to convery is here to create the output up here.
168
+
169
+ 43
170
+ 00:02:49,880 --> 00:02:55,490
171
+ And if you're wondering what a deconvolution layer is because I mentioned it in this proceeding slide
172
+
173
+ 44
174
+ 00:02:56,660 --> 00:03:00,910
175
+ basically it it basically reverses the effects of the convolutional.
176
+
177
+ 45
178
+ 00:03:01,190 --> 00:03:06,560
179
+ So just imagine what a convolution of the convolution layer basically apply some transform that image
180
+
181
+ 46
182
+ 00:03:06,560 --> 00:03:11,340
183
+ will a deconvolution layer Leo does the same however basically the opposite.
184
+
185
+ 47
186
+ 00:03:11,340 --> 00:03:15,840
187
+ So it produces basically the output of a reverse convolutional.
188
+
189
+ 48
190
+ 00:03:16,370 --> 00:03:18,020
191
+ Hopefully that makes sense to you.
192
+
193
+ 49
194
+ 00:03:18,230 --> 00:03:22,670
195
+ What we're going to do now is going to define some more metrics you need to consider when making an
196
+
197
+ 50
198
+ 00:03:22,670 --> 00:03:23,760
199
+ image segmentation.
200
+
201
+ 51
202
+ 00:03:23,900 --> 00:03:24,350
203
+ CNN.
19. Medical Imaging - Image Segmentation with U-Net/4. The Intersection over Union (IoU) Metric.srt ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,610 --> 00:00:01,180
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,260 --> 00:00:03,490
7
+ So that brings us to the next chapter.
8
+
9
+ 3
10
+ 00:00:05,900 --> 00:00:06,240
11
+ OK.
12
+
13
+ 4
14
+ 00:00:06,240 --> 00:00:08,580
15
+ So welcome to chapter nineteen point three.
16
+
17
+ 5
18
+ 00:00:08,700 --> 00:00:13,970
19
+ Well we talk about the intersection of the Union now that means song.
20
+
21
+ 6
22
+ 00:00:15,480 --> 00:00:20,970
23
+ Hi and welcome to Chapter 19 point tree where we now talk about the intersection of union metric which
24
+
25
+ 7
26
+ 00:00:20,970 --> 00:00:24,910
27
+ is an important metric you need to know when we're actually training.
28
+
29
+ 8
30
+ 00:00:25,370 --> 00:00:26,550
31
+ So image segmentation.
32
+
33
+ 9
34
+ 00:00:26,550 --> 00:00:29,160
35
+ CNN So let's talk about this for us.
36
+
37
+ 10
38
+ 00:00:29,190 --> 00:00:29,600
39
+ OK.
40
+
41
+ 11
42
+ 00:00:29,970 --> 00:00:32,980
43
+ So let's assume we're doing some object detection here.
44
+
45
+ 12
46
+ 00:00:33,240 --> 00:00:37,960
47
+ And basically the green boxes are true bowling box over this nice car.
48
+
49
+ 13
50
+ 00:00:38,340 --> 00:00:41,250
51
+ So this is like a human labeled image of what it is.
52
+
53
+ 14
54
+ 00:00:41,460 --> 00:00:43,060
55
+ And now our classifier.
56
+
57
+ 15
58
+ 00:00:43,140 --> 00:00:46,950
59
+ Oh predicter gave us just bounding boxes red one.
60
+
61
+ 16
62
+ 00:00:46,990 --> 00:00:48,930
63
+ So generally it's correct.
64
+
65
+ 17
66
+ 00:00:48,930 --> 00:00:50,510
67
+ I wouldn't say it's wrong wrong at all.
68
+
69
+ 18
70
+ 00:00:50,520 --> 00:00:55,560
71
+ However you do see that it could have been brought in a lot closer here and maybe brought in here and
72
+
73
+ 19
74
+ 00:00:55,560 --> 00:00:56,470
75
+ a bit lower here.
76
+
77
+ 20
78
+ 00:00:56,850 --> 00:00:59,590
79
+ So it's a good box but not the best box.
80
+
81
+ 21
82
+ 00:01:00,090 --> 00:01:06,020
83
+ So how much of the correct area is covered by a protected bonded box that is this area here.
84
+
85
+ 22
86
+ 00:01:06,200 --> 00:01:13,040
87
+ So honestly it seems like about 90 percent of all true blocks is covered by our predicted box.
88
+
89
+ 23
90
+ 00:01:13,050 --> 00:01:15,290
91
+ So how do we measure this in a metric.
92
+
93
+ 24
94
+ 00:01:15,360 --> 00:01:16,070
95
+ OK.
96
+
97
+ 25
98
+ 00:01:16,380 --> 00:01:19,570
99
+ What is a good metric for this what if this was a bounding box.
100
+
101
+ 26
102
+ 00:01:19,570 --> 00:01:23,040
103
+ Here it still covers 90 percent of our true books.
104
+
105
+ 27
106
+ 00:01:23,070 --> 00:01:27,210
107
+ However this is a much poorer bounding box than this one.
108
+
109
+ 28
110
+ 00:01:28,720 --> 00:01:33,070
111
+ This is where the union intersection of the union comes in.
112
+
113
+ 29
114
+ 00:01:33,070 --> 00:01:37,590
115
+ So are you going to call it that for short is basically the size of a union.
116
+
117
+ 30
118
+ 00:01:37,750 --> 00:01:41,760
119
+ That's the shaded area here over the size of our predicted box.
120
+
121
+ 31
122
+ 00:01:41,830 --> 00:01:48,390
123
+ So you can see the size of our predicted box here is much bigger than the size of this predicted box.
124
+
125
+ 32
126
+ 00:01:48,400 --> 00:01:50,010
127
+ So what's what does that mean.
128
+
129
+ 33
130
+ 00:01:50,050 --> 00:01:56,020
131
+ It means that I you with this box is going to be maybe like point five competitive or you have this
132
+
133
+ 34
134
+ 00:01:56,020 --> 00:02:04,060
135
+ box which is going to be probably two point nine so generally do it typically and you have a point five
136
+
137
+ 35
138
+ 00:02:04,060 --> 00:02:08,750
139
+ is considered acceptable mainly because optic conduction is very hard to get right.
140
+
141
+ 36
142
+ 00:02:08,920 --> 00:02:15,180
143
+ So we do have it's fairly lenient Trishul and obviously the higher the oil you the better the prediction.
144
+
145
+ 37
146
+ 00:02:15,490 --> 00:02:21,690
147
+ And essentially I use a measure of overlap how good Basically the overlap is.
148
+
149
+ 38
150
+ 00:02:22,090 --> 00:02:27,640
151
+ So before I begin to show you how we actually implemented you you increase and use it as a wonderful
152
+
153
+ 39
154
+ 00:02:27,670 --> 00:02:33,190
155
+ metrics we want to so during training I'll tell you why we need this image segmentation.
156
+
157
+ 40
158
+ 00:02:33,190 --> 00:02:40,630
159
+ Remember in image segmentation we're basically measuring overlap of like a masked image over the original
160
+
161
+ 41
162
+ 00:02:40,630 --> 00:02:41,400
163
+ image.
164
+
165
+ 42
166
+ 00:02:41,410 --> 00:02:45,680
167
+ So suppose we're developing a mask that covers this image with just a call here.
168
+
169
+ 43
170
+ 00:02:45,760 --> 00:02:47,240
171
+ Forget about the bounding box video.
172
+
173
+ 44
174
+ 00:02:47,290 --> 00:02:48,890
175
+ We just want to measure this mask.
176
+
177
+ 45
178
+ 00:02:49,300 --> 00:02:50,590
179
+ And what if our.
180
+
181
+ 46
182
+ 00:02:50,650 --> 00:02:54,480
183
+ So imagine we have a mask pure yellow mask covering the scar.
184
+
185
+ 47
186
+ 00:02:54,850 --> 00:03:00,370
187
+ And imagine we have a predicted mass that is a segmentation algorithm that produced something that covers
188
+
189
+ 48
190
+ 00:03:00,430 --> 00:03:02,090
191
+ a blob like this here.
192
+
193
+ 49
194
+ 00:03:02,440 --> 00:03:08,290
195
+ How do we measure the effectiveness or basically the accuracy of this mass given that this was the mask
196
+
197
+ 50
198
+ 00:03:08,470 --> 00:03:15,370
199
+ correct mask here for this and that is why we are so it is not just useful object reduction is used
200
+
201
+ 51
202
+ 00:03:15,370 --> 00:03:18,420
203
+ for mass image masking which is segmentation.
204
+
205
+ 52
206
+ 00:03:18,430 --> 00:03:25,010
207
+ So now let's see how we do this in Chris it's pretty easy to find these custom metric functions.
208
+
209
+ 53
210
+ 00:03:25,010 --> 00:03:28,810
211
+ In Paris you just have to write a simple function.
212
+
213
+ 54
214
+ 00:03:28,830 --> 00:03:37,430
215
+ This takes in a true way year predictive labels we compute and I use school and these labels say I'm
216
+
217
+ 55
218
+ 00:03:37,440 --> 00:03:39,720
219
+ just going to be labels are actually going to be mass.
220
+
221
+ 56
222
+ 00:03:39,930 --> 00:03:44,550
223
+ And what we do here in when we compile a model we define are one metrics.
224
+
225
+ 57
226
+ 00:03:44,550 --> 00:03:46,800
227
+ Previously we used to use accuracy.
228
+
229
+ 58
230
+ 00:03:46,800 --> 00:03:50,440
231
+ Now we just use it my own metric which is going to be this function here.
232
+
233
+ 59
234
+ 00:03:50,720 --> 00:03:51,270
235
+ All right.
236
+
237
+ 60
238
+ 00:03:51,330 --> 00:03:55,100
239
+ And then we train and when we're training we actually see the report here.
240
+
241
+ 61
242
+ 00:03:55,290 --> 00:04:00,210
243
+ So we see that my own metric which is dysfunction here technically that should be the same name here
244
+
245
+ 62
246
+ 00:04:00,220 --> 00:04:02,420
247
+ just remember that I should have done that for you guys.
248
+
249
+ 63
250
+ 00:04:02,430 --> 00:04:03,670
251
+ You don't get confuse.
252
+
253
+ 64
254
+ 00:04:04,080 --> 00:04:05,870
255
+ But it's actually going up at a school.
256
+
257
+ 65
258
+ 00:04:05,940 --> 00:04:11,380
259
+ So instead of monitoring a loss and accuracy we're now monitoring a loss and a custom function.
260
+
261
+ 66
262
+ 00:04:12,000 --> 00:04:12,960
263
+ So that's pretty cool.
264
+
265
+ 67
266
+ 00:04:12,960 --> 00:04:15,330
267
+ So let's move on to the next chapter.
19. Medical Imaging - Image Segmentation with U-Net/5. Finding the Nuclei in Divergent Images.srt ADDED
@@ -0,0 +1,875 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,660 --> 00:00:05,540
3
+ It and welcome back to chapter nineteen point four where we're going to implement is project and project
4
+
5
+ 2
6
+ 00:00:05,540 --> 00:00:08,880
7
+ is called Finding the nuclei in divergent images.
8
+
9
+ 3
10
+ 00:00:09,050 --> 00:00:10,980
11
+ So let's go on to see a bit about it.
12
+
13
+ 4
14
+ 00:00:11,000 --> 00:00:14,820
15
+ So this was part of toggles Science Bowl of 2018.
16
+
17
+ 5
18
+ 00:00:15,050 --> 00:00:20,390
19
+ Basically the challenge was to spotting Nicholai to get speed up Achillas that was tackling used and
20
+
21
+ 6
22
+ 00:00:20,390 --> 00:00:25,670
23
+ what we wanted to do or what they wanted to wanted to do was automate nucular detection.
24
+
25
+ 7
26
+ 00:00:25,670 --> 00:00:28,560
27
+ So we basically perform you'll see the images soon.
28
+
29
+ 8
30
+ 00:00:28,840 --> 00:00:33,480
31
+ We're basically performing performing Nicolay detection in these images.
32
+
33
+ 9
34
+ 00:00:33,800 --> 00:00:36,360
35
+ And this is basically the writeup they used here.
36
+
37
+ 10
38
+ 00:00:36,410 --> 00:00:42,140
39
+ They're designed to cells Nikolai's a starting point for most analysis because most of the human bodies
40
+
41
+ 11
42
+ 00:00:42,540 --> 00:00:48,920
43
+ to tity trillion of ocelots cells can t and Nicholas full of DNA the genetic code that programs each
44
+
45
+ 12
46
+ 00:00:48,920 --> 00:00:49,780
47
+ cell.
48
+
49
+ 13
50
+ 00:00:49,820 --> 00:00:55,880
51
+ So identifying the nuclei allows rescissions to identify each individual cell in a sample and by measuring
52
+
53
+ 14
54
+ 00:00:55,910 --> 00:01:01,640
55
+ how many cells are how cells were active various treatments the researchers can now understand the underlying
56
+
57
+ 15
58
+ 00:01:01,670 --> 00:01:03,690
59
+ biological processes at work.
60
+
61
+ 16
62
+ 00:01:03,770 --> 00:01:09,200
63
+ So you can see this project actually has tremendous application in the medical field.
64
+
65
+ 17
66
+ 00:01:09,560 --> 00:01:14,320
67
+ And this was the flyer as well as the tagline I mentioned to you before that Kaggle used advertizes
68
+
69
+ 18
70
+ 00:01:14,340 --> 00:01:15,290
71
+ contests.
72
+
73
+ 19
74
+ 00:01:15,470 --> 00:01:19,890
75
+ So you can see it is definitely in need a practical need to get this done right.
76
+
77
+ 20
78
+ 00:01:21,870 --> 00:01:24,320
79
+ So these were the images in our data set here.
80
+
81
+ 21
82
+ 00:01:24,660 --> 00:01:25,910
83
+ Look at the first row here.
84
+
85
+ 22
86
+ 00:01:25,980 --> 00:01:29,830
87
+ These four images we are we were given images like this.
88
+
89
+ 23
90
+ 00:01:29,850 --> 00:01:34,700
91
+ These dots represent nuclei is and is basically different shades of gray.
92
+
93
+ 24
94
+ 00:01:34,730 --> 00:01:36,950
95
+ Like these we're all in the set.
96
+
97
+ 25
98
+ 00:01:36,960 --> 00:01:41,900
99
+ We actually have full color images here as well as some basic grayscale images here.
100
+
101
+ 26
102
+ 00:01:42,210 --> 00:01:47,760
103
+ So these definitely made it could be you can use some open see the trembling functions and get these
104
+
105
+ 27
106
+ 00:01:47,760 --> 00:01:50,780
107
+ mass with a human labeled mass here.
108
+
109
+ 28
110
+ 00:01:51,090 --> 00:01:56,550
111
+ However when it comes to these here you definitely need some sort of intelligence to actually extract
112
+
113
+ 29
114
+ 00:01:56,550 --> 00:01:59,020
115
+ this and label Nikolai's here.
116
+
117
+ 30
118
+ 00:01:59,250 --> 00:02:04,320
119
+ As you can see doing this manually as a human is going to take some time definitely.
120
+
121
+ 31
122
+ 00:02:04,350 --> 00:02:08,780
123
+ And then getting the counts of overcomplex images it's going to be an exhaustive tests.
124
+
125
+ 32
126
+ 00:02:08,970 --> 00:02:14,250
127
+ So this is the role of two true original images.
128
+
129
+ 33
130
+ 00:02:14,260 --> 00:02:15,570
131
+ The true mass.
132
+
133
+ 34
134
+ 00:02:15,570 --> 00:02:18,480
135
+ And again more true images and more true mass.
136
+
137
+ 35
138
+ 00:02:18,480 --> 00:02:24,960
139
+ So we're going to try and classify to take this image and put this image any image here and produce
140
+
141
+ 36
142
+ 00:02:24,960 --> 00:02:26,940
143
+ a mask that looks like this.
144
+
145
+ 37
146
+ 00:02:26,940 --> 00:02:33,910
147
+ So our approach is basically to use unit which is a special CNN designed exactly for image segmentation.
148
+
149
+ 38
150
+ 00:02:33,910 --> 00:02:35,280
151
+ Tests like this.
152
+
153
+ 39
154
+ 00:02:35,280 --> 00:02:36,340
155
+ So let's get started.
156
+
157
+ 40
158
+ 00:02:41,170 --> 00:02:41,560
159
+ OK.
160
+
161
+ 41
162
+ 00:02:41,630 --> 00:02:47,630
163
+ So we're back to a virtual machine and we're going to use you now in our medical imaging imaging segmentation
164
+
165
+ 42
166
+ 00:02:47,630 --> 00:02:51,800
167
+ project which is finding the nuclei in divergent images.
168
+
169
+ 43
170
+ 00:02:51,800 --> 00:02:52,210
171
+ OK.
172
+
173
+ 44
174
+ 00:02:52,290 --> 00:02:56,710
175
+ So we want to make you want to make sure you downloaded data set correctly.
176
+
177
+ 45
178
+ 00:02:56,840 --> 00:02:59,180
179
+ And I want to actually show you something in this data set.
180
+
181
+ 46
182
+ 00:02:59,210 --> 00:03:03,980
183
+ It is different to the type of the assets we used before because now it has mass.
184
+
185
+ 47
186
+ 00:03:04,010 --> 00:03:07,580
187
+ So let's open this dataset here.
188
+
189
+ 48
190
+ 00:03:07,580 --> 00:03:09,590
191
+ So hopefully you've extracted it here.
192
+
193
+ 49
194
+ 00:03:10,280 --> 00:03:13,900
195
+ And as you can see yes it has the same train and validation for this.
196
+
197
+ 50
198
+ 00:03:14,210 --> 00:03:17,780
199
+ But look at this it's no longer images now it's actually footless.
200
+
201
+ 51
202
+ 00:03:17,960 --> 00:03:21,010
203
+ So we have an image here.
204
+
205
+ 52
206
+ 00:03:21,050 --> 00:03:25,290
207
+ This is a test image which we're supposed to produce a mask from.
208
+
209
+ 53
210
+ 00:03:25,370 --> 00:03:28,070
211
+ And now what does this handful of mask.
212
+
213
+ 54
214
+ 00:03:28,370 --> 00:03:32,320
215
+ And these are multiple files and if you look at this there are multiple images.
216
+
217
+ 55
218
+ 00:03:32,450 --> 00:03:34,780
219
+ Each one is a Nicholai label.
220
+
221
+ 56
222
+ 00:03:35,000 --> 00:03:37,190
223
+ So what we're looking at.
224
+
225
+ 57
226
+ 00:03:37,340 --> 00:03:44,550
227
+ If you go back to our presentation here spring is actually can be like this what we're looking at right
228
+
229
+ 58
230
+ 00:03:44,550 --> 00:03:49,470
231
+ now is basically this masking is basically all the images we just saw.
232
+
233
+ 59
234
+ 00:03:49,500 --> 00:03:52,520
235
+ It's minimized us stacked upon each other.
236
+
237
+ 60
238
+ 00:03:52,680 --> 00:04:00,690
239
+ So the data is not as easy to basically interpret don't interpret but actually use as we would for all
240
+
241
+ 61
242
+ 00:04:00,690 --> 00:04:02,010
243
+ previous tests.
244
+
245
+ 62
246
+ 00:04:02,010 --> 00:04:04,150
247
+ We do have to do some processing on this data.
248
+
249
+ 63
250
+ 00:04:05,700 --> 00:04:07,380
251
+ So that's a data set here.
252
+
253
+ 64
254
+ 00:04:07,560 --> 00:04:12,180
255
+ And now let's go to all part of the book really have it loaded up here.
256
+
257
+ 65
258
+ 00:04:12,840 --> 00:04:17,520
259
+ So this code here is basically code that was provided by this guy here.
260
+
261
+ 66
262
+ 00:04:17,520 --> 00:04:22,800
263
+ He actually made the most popular little on Kaggle targeted one for this project and there were a number
264
+
265
+ 67
266
+ 00:04:22,800 --> 00:04:28,110
267
+ of contestants and this guy had probably the best example of how it works.
268
+
269
+ 68
270
+ 00:04:28,110 --> 00:04:30,340
271
+ So those were under his code.
272
+
273
+ 69
274
+ 00:04:30,840 --> 00:04:34,500
275
+ So we have images here 128 size he defined.
276
+
277
+ 70
278
+ 00:04:34,530 --> 00:04:42,150
279
+ We have all folders wanting to run that and fine ports everything successfully and we set our training
280
+
281
+ 71
282
+ 00:04:42,150 --> 00:04:47,360
283
+ parts of test paths and this is the part that is very important.
284
+
285
+ 72
286
+ 00:04:47,370 --> 00:04:52,620
287
+ Remember we showed you how the nuclides are basically on one image at a time.
288
+
289
+ 73
290
+ 00:04:52,620 --> 00:05:00,690
291
+ So basically one input image has a bunch of sub images that each one having one you can labeled on it.
292
+
293
+ 74
294
+ 00:05:00,690 --> 00:05:06,410
295
+ So what he does here what we do here is we basically stack those images together.
296
+
297
+ 75
298
+ 00:05:06,810 --> 00:05:11,290
299
+ So effectively you can read the comments and stuff I've left in here.
300
+
301
+ 76
302
+ 00:05:11,430 --> 00:05:19,540
303
+ What we do is we basically take these images and combine them into one single image with all these Nikolai's
304
+
305
+ 77
306
+ 00:05:19,570 --> 00:05:23,040
307
+ in that image because that's basically the final message we want to produce.
308
+
309
+ 78
310
+ 00:05:23,110 --> 00:05:24,770
311
+ So we have to process.
312
+
313
+ 79
314
+ 00:05:24,780 --> 00:05:26,370
315
+ I will try to get a bit.
316
+
317
+ 80
318
+ 00:05:26,770 --> 00:05:30,760
319
+ So I'm not going to run it because it takes some time to really run it successfully here and you can
320
+
321
+ 81
322
+ 00:05:30,760 --> 00:05:34,070
323
+ do so yourself.
324
+
325
+ 82
326
+ 00:05:34,170 --> 00:05:38,560
327
+ In fact you will have to do these things yourself because it's run on my machines run on us.
328
+
329
+ 83
330
+ 00:05:38,700 --> 00:05:44,130
331
+ These are like basically statements of safe but they don't actually store any of the data in this notebook
332
+
333
+ 84
334
+ 00:05:45,120 --> 00:05:46,470
335
+ just the outputs.
336
+
337
+ 85
338
+ 00:05:46,470 --> 00:05:48,780
339
+ So let's do some illustrations here.
340
+
341
+ 86
342
+ 00:05:48,900 --> 00:05:52,290
343
+ He has some nice cold here that generates this plot and plot.
344
+
345
+ 87
346
+ 00:05:52,380 --> 00:05:58,950
347
+ We can see this is image zero here and this is a concatenated are stacked on a mass produced from an
348
+
349
+ 88
350
+ 00:05:58,970 --> 00:06:04,320
351
+ input data and he does it for quite a few images here so he can actually see there's a lot of variety
352
+
353
+ 89
354
+ 00:06:04,410 --> 00:06:06,590
355
+ in the input images here.
356
+
357
+ 90
358
+ 00:06:06,660 --> 00:06:11,030
359
+ There's these grayscale ones like this seem to be the most popular.
360
+
361
+ 91
362
+ 00:06:11,040 --> 00:06:12,850
363
+ Then there's his color images here.
364
+
365
+ 92
366
+ 00:06:12,930 --> 00:06:17,400
367
+ Then there's these here these look like something from a microscope slide and then these these here
368
+
369
+ 93
370
+ 00:06:17,400 --> 00:06:22,850
371
+ are different than they are this type here you can see them putting it here.
372
+
373
+ 94
374
+ 00:06:23,220 --> 00:06:25,860
375
+ So there's a lot of different types of images here.
376
+
377
+ 95
378
+ 00:06:25,860 --> 00:06:27,800
379
+ It's not one simple answer of the desert.
380
+
381
+ 96
382
+ 00:06:27,890 --> 00:06:33,750
383
+ It's a bunch of different types of data all looking at nuclearized and all having mask that are put
384
+
385
+ 97
386
+ 00:06:33,750 --> 00:06:39,940
387
+ like this or a label like the shape like this I should say.
388
+
389
+ 98
390
+ 00:06:40,030 --> 00:06:42,190
391
+ So this is a function we used before.
392
+
393
+ 99
394
+ 00:06:42,350 --> 00:06:46,440
395
+ It's actually what we're going to use is with the example one I used in my slide.
396
+
397
+ 100
398
+ 00:06:46,600 --> 00:06:50,090
399
+ So I'm taking it off and reading this MTSO.
400
+
401
+ 101
402
+ 00:06:50,400 --> 00:06:54,480
403
+ So this is the actual metric here that he's going to use.
404
+
405
+ 102
406
+ 00:06:54,560 --> 00:06:59,040
407
+ We're going to use in our project I'm not going to go through the detail of how it's carefully calculated
408
+
409
+ 103
410
+ 00:06:59,430 --> 00:07:03,370
411
+ but it is very similar to the calculation we saw in our slides.
412
+
413
+ 104
414
+ 00:07:03,400 --> 00:07:05,620
415
+ It's bit different from us.
416
+
417
+ 105
418
+ 00:07:06,240 --> 00:07:11,460
419
+ Alternatively there were a lot of discussions on the Kaggle the message board about if this function
420
+
421
+ 106
422
+ 00:07:11,460 --> 00:07:13,200
423
+ was the best metric to use.
424
+
425
+ 107
426
+ 00:07:13,200 --> 00:07:16,680
427
+ So here's an alternative when you can use Feel free to use it.
428
+
429
+ 108
430
+ 00:07:16,700 --> 00:07:19,340
431
+ And this one uses this one side of it here.
432
+
433
+ 109
434
+ 00:07:19,550 --> 00:07:20,030
435
+ OK.
436
+
437
+ 110
438
+ 00:07:21,640 --> 00:07:24,730
439
+ So I actually left another one here too.
440
+
441
+ 111
442
+ 00:07:25,070 --> 00:07:25,490
443
+ This.
444
+
445
+ 112
446
+ 00:07:25,690 --> 00:07:31,870
447
+ This one actually it was basically a consensus said this was the best function and you can see it's
448
+
449
+ 113
450
+ 00:07:31,870 --> 00:07:33,340
451
+ quite exhaustive.
452
+
453
+ 114
454
+ 00:07:33,340 --> 00:07:34,430
455
+ Pretty technical.
456
+
457
+ 115
458
+ 00:07:34,480 --> 00:07:38,380
459
+ Someone did spend a lot of time making this function.
460
+
461
+ 116
462
+ 00:07:38,460 --> 00:07:40,630
463
+ So this is the important part here.
464
+
465
+ 117
466
+ 00:07:40,680 --> 00:07:45,750
467
+ This is what this was equally important this function but this is of course what I wanted to show you
468
+
469
+ 118
470
+ 00:07:46,260 --> 00:07:48,650
471
+ this is how we build our unit model.
472
+
473
+ 119
474
+ 00:07:48,900 --> 00:07:50,370
475
+ So two things to note.
476
+
477
+ 120
478
+ 00:07:50,490 --> 00:07:51,610
479
+ OK.
480
+
481
+ 121
482
+ 00:07:51,630 --> 00:07:58,170
483
+ You're seeing we're actually speak signing a more lovely and sort of thing model that we're assigning
484
+
485
+ 122
486
+ 00:07:58,260 --> 00:08:02,560
487
+ is two variables here and then we have this s in brackets here.
488
+
489
+ 123
490
+ 00:08:02,880 --> 00:08:05,040
491
+ So what exactly are we doing here now.
492
+
493
+ 124
494
+ 00:08:05,280 --> 00:08:10,420
495
+ Well this is simply another way we can build models in Paris.
496
+
497
+ 125
498
+ 00:08:10,510 --> 00:08:12,060
499
+ Paris is quite flexible.
500
+
501
+ 126
502
+ 00:08:12,060 --> 00:08:16,430
503
+ So what we're doing here we're connecting models by having it here.
504
+
505
+ 127
506
+ 00:08:16,460 --> 00:08:21,130
507
+ So instead of using Waddler add we're connecting them here and there's a reason we can't use model that
508
+
509
+ 128
510
+ 00:08:21,840 --> 00:08:29,040
511
+ is because of the unit structure which is basically like a bottleneck at the bottom and deconvolution
512
+
513
+ 129
514
+ 00:08:29,040 --> 00:08:30,300
515
+ is going up.
516
+
517
+ 130
518
+ 00:08:30,300 --> 00:08:32,960
519
+ It's not easy or doesn't facilitate a model.
520
+
521
+ 131
522
+ 00:08:33,060 --> 00:08:35,460
523
+ Add method in building this model.
524
+
525
+ 132
526
+ 00:08:35,460 --> 00:08:38,840
527
+ We sort of have to do it like this now so we can see.
528
+
529
+ 133
530
+ 00:08:38,850 --> 00:08:43,640
531
+ So you want us to find here connected to this lambda function which basically normalizes the inputs.
532
+
533
+ 134
534
+ 00:08:43,980 --> 00:08:46,810
535
+ Then we have see one here and see one here.
536
+
537
+ 135
538
+ 00:08:46,920 --> 00:08:53,690
539
+ So what this does it basically says this is this is our convolutional layer here would drop all that's
540
+
541
+ 136
542
+ 00:08:53,740 --> 00:08:54,730
543
+ tight see one.
544
+
545
+ 137
546
+ 00:08:55,110 --> 00:08:56,870
547
+ And this is another completion here.
548
+
549
+ 138
550
+ 00:08:56,940 --> 00:08:58,120
551
+ Titus see one again.
552
+
553
+ 139
554
+ 00:08:58,140 --> 00:09:00,860
555
+ So this is linking all of these layers together.
556
+
557
+ 140
558
+ 00:09:01,200 --> 00:09:03,170
559
+ And then we have Max beling here.
560
+
561
+ 141
562
+ 00:09:03,390 --> 00:09:06,470
563
+ Basically it's called the P1 connection.
564
+
565
+ 142
566
+ 00:09:06,510 --> 00:09:09,080
567
+ So now this is linked back to these here.
568
+
569
+ 143
570
+ 00:09:09,480 --> 00:09:10,890
571
+ So we keep going forward.
572
+
573
+ 144
574
+ 00:09:11,490 --> 00:09:17,660
575
+ As you can see the kernel the sizes get larger and larger as we go down and then we do this here.
576
+
577
+ 145
578
+ 00:09:17,820 --> 00:09:22,740
579
+ This is how we kind of connect and it will basically bottleneck or blocking the key point here.
580
+
581
+ 146
582
+ 00:09:23,680 --> 00:09:30,000
583
+ And basically now we just go up up with the not yet it is called a mandate because it looks like a when
584
+
585
+ 147
586
+ 00:09:30,230 --> 00:09:37,460
587
+ those diagrams and we're using a different type of competition here constitute the transpose.
588
+
589
+ 148
590
+ 00:09:37,740 --> 00:09:39,020
591
+ That's basically how we do it.
592
+
593
+ 149
594
+ 00:09:39,030 --> 00:09:44,220
595
+ Deconvolution Lisieux So it's going up and up and then we have an upper tier.
596
+
597
+ 150
598
+ 00:09:44,460 --> 00:09:46,720
599
+ So it is something I want you to note as well.
600
+
601
+ 151
602
+ 00:09:46,740 --> 00:09:48,750
603
+ Look at the output of the D-Conn..
604
+
605
+ 152
606
+ 00:09:49,020 --> 00:09:54,480
607
+ It's basically a greyscale image 128 by 128 with one dimension.
608
+
609
+ 153
610
+ 00:09:54,940 --> 00:09:57,380
611
+ Is a number of parameters not that much.
612
+
613
+ 154
614
+ 00:09:57,420 --> 00:10:00,280
615
+ Definitely trainable OCP.
616
+
617
+ 155
618
+ 00:10:00,350 --> 00:10:02,280
619
+ So no that's fits our model.
620
+
621
+ 156
622
+ 00:10:04,130 --> 00:10:10,680
623
+ So we just do all the basic callbacks here and then we just fit our model.
624
+
625
+ 157
626
+ 00:10:10,980 --> 00:10:12,950
627
+ This probably should not be here.
628
+
629
+ 158
630
+ 00:10:13,020 --> 00:10:14,180
631
+ Let's show what it was.
632
+
633
+ 159
634
+ 00:10:15,460 --> 00:10:16,690
635
+ And here we go.
636
+
637
+ 160
638
+ 00:10:16,930 --> 00:10:20,960
639
+ So you can see I'm not going to run this now but I've done it before and doesn't take that long.
640
+
641
+ 161
642
+ 00:10:21,250 --> 00:10:22,750
643
+ We're training it here.
644
+
645
+ 162
646
+ 00:10:23,070 --> 00:10:24,150
647
+ It's quick.
648
+
649
+ 163
650
+ 00:10:24,250 --> 00:10:25,510
651
+ Just over a minute.
652
+
653
+ 164
654
+ 00:10:25,670 --> 00:10:30,560
655
+ Poch just quite quick and we can see C-L metric glossier my metric.
656
+
657
+ 165
658
+ 00:10:30,580 --> 00:10:36,940
659
+ So let's see what function is that up there was this one here.
660
+
661
+ 166
662
+ 00:10:36,970 --> 00:10:38,910
663
+ That was the big one we use.
664
+
665
+ 167
666
+ 00:10:38,920 --> 00:10:42,130
667
+ This was the one actually remember didn't Kaggle discussions.
668
+
669
+ 168
670
+ 00:10:42,280 --> 00:10:45,470
671
+ That was the best I knew it was appropriate.
672
+
673
+ 169
674
+ 00:10:45,510 --> 00:10:50,800
675
+ You not saying that these were wrong well these are bad it's just that this one was actually too much
676
+
677
+ 170
678
+ 00:10:50,800 --> 00:10:52,740
679
+ relevance to segmentation to us.
680
+
681
+ 171
682
+ 00:10:52,990 --> 00:10:58,900
683
+ Segmentation is very different to OBD-II detection I use which use boxes.
684
+
685
+ 172
686
+ 00:10:58,990 --> 00:11:00,810
687
+ So we needed to develop something custom.
688
+
689
+ 173
690
+ 00:11:00,850 --> 00:11:06,810
691
+ This was part this was definitely a big part of the Kaggle challenge in this project wasn't just applying
692
+
693
+ 174
694
+ 00:11:07,030 --> 00:11:11,860
695
+ to the data it was coming up with a way to actually assess the performance of this model.
696
+
697
+ 175
698
+ 00:11:13,120 --> 00:11:21,160
699
+ So we can see a metric changing as it goes and our validation metric going up.
700
+
701
+ 176
702
+ 00:11:21,160 --> 00:11:23,020
703
+ So we wanted this to go up actually.
704
+
705
+ 177
706
+ 00:11:23,080 --> 00:11:24,850
707
+ So we wanted to get that over time.
708
+
709
+ 178
710
+ 00:11:24,850 --> 00:11:31,810
711
+ So I said 10 ebox it was a great way for sex in terms of accuracy that is not a good metric.
712
+
713
+ 179
714
+ 00:11:31,810 --> 00:11:37,480
715
+ However this is not this is an I.T. metric and you metrics don't really it exactly to what accuracy
716
+
717
+ 180
718
+ 00:11:37,480 --> 00:11:38,340
719
+ means.
720
+
721
+ 181
722
+ 00:11:38,350 --> 00:11:45,340
723
+ So 24:6 is actually a pretty good value as seen on Kaggle some guys got up to point seven seven 7 0.8
724
+
725
+ 182
726
+ 00:11:45,880 --> 00:11:51,570
727
+ using this metric that training on some GPS use for multiple ebox and tweaking a lot of the training
728
+
729
+ 183
730
+ 00:11:51,580 --> 00:11:52,900
731
+ parameters above here.
732
+
733
+ 184
734
+ 00:11:53,110 --> 00:11:56,540
735
+ But point Final Point 4 6 is actually pretty good.
736
+
737
+ 185
738
+ 00:11:56,740 --> 00:12:01,420
739
+ And you may have noticed actually the code just changed and that was because I actually had some confusion
740
+
741
+ 186
742
+ 00:12:01,750 --> 00:12:08,250
743
+ here where I was experimenting with different metrics and I confused myself because you when you look
744
+
745
+ 187
746
+ 00:12:08,250 --> 00:12:13,200
747
+ at a model of when you create a model it is specific to this metric here.
748
+
749
+ 188
750
+ 00:12:13,450 --> 00:12:18,490
751
+ So when you load them all you actually have to specify what metric you used when treating the model.
752
+
753
+ 189
754
+ 00:12:18,520 --> 00:12:26,320
755
+ So we just train using my metric which was a function scroll all the way up the defined right here.
756
+
757
+ 190
758
+ 00:12:26,320 --> 00:12:31,240
759
+ This metric basically used all of these functions here so basically calculate it.
760
+
761
+ 191
762
+ 00:12:31,270 --> 00:12:36,550
763
+ This was a best one according to this guys and Kaggle which was the most representative representative
764
+
765
+ 192
766
+ 00:12:37,060 --> 00:12:38,940
767
+ of what a loss should be.
768
+
769
+ 193
770
+ 00:12:39,100 --> 00:12:46,420
771
+ OK so now what I'm going to do we basically use the LoDo model that we just trained or you can use model
772
+
773
+ 194
774
+ 00:12:46,420 --> 00:12:49,590
775
+ if you're treating it within the book you don't have to look at it.
776
+
777
+ 195
778
+ 00:12:49,630 --> 00:12:50,210
779
+ OK.
780
+
781
+ 196
782
+ 00:12:50,560 --> 00:12:54,100
783
+ And basically we just split up split this up.
784
+
785
+ 197
786
+ 00:12:54,150 --> 00:13:00,550
787
+ The data are extreme data into 90 percent basically being treating data and the last 10 percent being
788
+
789
+ 198
790
+ 00:13:00,550 --> 00:13:01,640
791
+ the validation data.
792
+
793
+ 199
794
+ 00:13:02,050 --> 00:13:04,860
795
+ And we just create our mass from this now.
796
+
797
+ 200
798
+ 00:13:04,900 --> 00:13:07,730
799
+ So now let's take a look and see how on mask look.
800
+
801
+ 201
802
+ 00:13:07,780 --> 00:13:08,360
803
+ OK.
804
+
805
+ 202
806
+ 00:13:08,740 --> 00:13:14,610
807
+ So if you just run this function you'll see this was a training image here.
808
+
809
+ 203
810
+ 00:13:14,730 --> 00:13:21,350
811
+ This was the mass that we kind of calculated previously from these here and then this is the predictive
812
+
813
+ 204
814
+ 00:13:21,350 --> 00:13:21,720
815
+ mask.
816
+
817
+ 205
818
+ 00:13:21,740 --> 00:13:25,770
819
+ As you can see it is very very similar to this mass.
820
+
821
+ 206
822
+ 00:13:25,790 --> 00:13:28,550
823
+ There are some slight differences at a pixel level.
824
+
825
+ 207
826
+ 00:13:28,580 --> 00:13:30,750
827
+ However this is actually quite good.
828
+
829
+ 208
830
+ 00:13:30,980 --> 00:13:33,310
831
+ And now we can do some validation data.
832
+
833
+ 209
834
+ 00:13:33,620 --> 00:13:35,940
835
+ So this was the actual input image here.
836
+
837
+ 210
838
+ 00:13:36,230 --> 00:13:38,850
839
+ And this was a predicted mask of it.
840
+
841
+ 211
842
+ 00:13:38,870 --> 00:13:44,790
843
+ So as you can see this is actually doing a pretty good job at image segmentation and it only took maybe
844
+
845
+ 212
846
+ 00:13:45,010 --> 00:13:47,540
847
+ 15 minutes to train on a super system.
848
+
849
+ 213
850
+ 00:13:47,660 --> 00:13:50,240
851
+ So feel free to experiment with this.
852
+
853
+ 214
854
+ 00:13:50,240 --> 00:13:52,640
855
+ You could create your own mess.
856
+
857
+ 215
858
+ 00:13:52,790 --> 00:13:57,380
859
+ Not sure if you know how well you can do is probably get some software out I'll probably put a link
860
+
861
+ 216
862
+ 00:13:57,770 --> 00:14:03,380
863
+ to some software and resources here where you can start annotating and basically creating mass from
864
+
865
+ 217
866
+ 00:14:03,410 --> 00:14:04,360
867
+ images like this.
868
+
869
+ 218
870
+ 00:14:04,490 --> 00:14:10,430
871
+ So if you want to try some medical imaging class segmentation task or any other sort of segmentation
872
+
873
+ 219
874
+ 00:14:10,430 --> 00:14:13,170
875
+ to us you will know exactly how to do it.
19. Medical Imaging - Image Segmentation with U-Net/5.1 Download U-Net.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://drive.google.com/file/d/1X5vccywUQSv9VF8nTrldyBtIQqM_N5yS/view?usp=sharing";</script>
19. Medical Imaging Segmentation using U-Net/U-Net (not compatible with TensorFlow 2.0, required to downgrade).ipynb ADDED
The diff for this file is too large to render. See raw diff
 
20. Principles of Object Detection/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,390 --> 00:00:00,780
3
+ OK.
4
+
5
+ 2
6
+ 00:00:00,810 --> 00:00:04,960
7
+ So welcome to Chapter 20 where we finally get into Optik detection.
8
+
9
+ 3
10
+ 00:00:05,250 --> 00:00:07,900
11
+ And basically this chapter is up into four sections.
12
+
13
+ 4
14
+ 00:00:07,920 --> 00:00:14,310
15
+ This is where I introduce the concept basically how Optik it all started and what it evolved to.
16
+
17
+ 5
18
+ 00:00:14,310 --> 00:00:20,760
19
+ And then in twenty point to start talking about more modern day CNN based object to directors.
20
+
21
+ 6
22
+ 00:00:20,970 --> 00:00:23,920
23
+ So we go from our CNN's to mess here in.
24
+
25
+ 7
26
+ 00:00:24,220 --> 00:00:30,450
27
+ And then take a look at single shot detectors SSTV which is one of the most modern abstract algorithms
28
+
29
+ 8
30
+ 00:00:30,540 --> 00:00:31,260
31
+ out there.
32
+
33
+ 9
34
+ 00:00:31,590 --> 00:00:36,220
35
+ And then it just goes yellow which is a competing object detection algorithm right now.
36
+
37
+ 10
38
+ 00:00:36,240 --> 00:00:41,210
39
+ So these two are the latest and greatest state of the art abductor's and we're going to test them.
40
+
41
+ 11
42
+ 00:00:41,220 --> 00:00:46,040
43
+ We're going to go through them in detail and and test them and proceed in the following chapters.
20. Principles of Object Detection/2. Object Detection Introduction - Sliding Windows with HOGs.srt ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,470 --> 00:00:01,000
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,050 --> 00:00:02,430
7
+ So let's start at the beginning.
8
+
9
+ 3
10
+ 00:00:02,460 --> 00:00:05,520
11
+ Let's talk about object really object vectors.
12
+
13
+ 4
14
+ 00:00:05,670 --> 00:00:11,910
15
+ So I'm going to introduce you to the history of it so fiercely detection is one of the holy grails of
16
+
17
+ 5
18
+ 00:00:11,910 --> 00:00:17,610
19
+ computer vision because previously what we have been doing is just classifying like an entire image
20
+
21
+ 6
22
+ 00:00:17,610 --> 00:00:20,510
23
+ and seeing what objects are what Hassid belong to.
24
+
25
+ 7
26
+ 00:00:20,730 --> 00:00:26,490
27
+ But can we take an image like this and label each major component into being a dog car person horse
28
+
29
+ 8
30
+ 00:00:26,760 --> 00:00:28,340
31
+ person in the back.
32
+
33
+ 9
34
+ 00:00:28,350 --> 00:00:32,230
35
+ Not yet until we have come across up to detection.
36
+
37
+ 10
38
+ 00:00:32,640 --> 00:00:40,620
39
+ So object detection is a mix of object classification and localization object action is it is the identification
40
+
41
+ 11
42
+ 00:00:40,650 --> 00:00:43,120
43
+ of a bounding box outlining the object.
44
+
45
+ 12
46
+ 00:00:43,140 --> 00:00:49,590
47
+ So like in my face here basically is extraction a bony box or on my face and this direction is perhaps
48
+
49
+ 13
50
+ 00:00:49,590 --> 00:00:53,760
51
+ one of the most popular object detection algorithms that we all know.
52
+
53
+ 14
54
+ 00:00:53,830 --> 00:00:57,220
55
+ We're all quite familiar with from using cameras in our cell phones.
56
+
57
+ 15
58
+ 00:00:57,270 --> 00:00:57,780
59
+ OK.
60
+
61
+ 16
62
+ 00:00:58,290 --> 00:01:04,150
63
+ So basically it DL tells you instead of telling you this object here is a cat.
64
+
65
+ 17
66
+ 00:01:04,170 --> 00:01:09,070
67
+ It actually tells you where is the cat and that is the whole point of object detection.
68
+
69
+ 18
70
+ 00:01:10,620 --> 00:01:15,340
71
+ So let's get into the history of it and start with horror Cassiar classifiers.
72
+
73
+ 19
74
+ 00:01:15,360 --> 00:01:19,140
75
+ Now there were many public detectors before this.
76
+
77
+ 20
78
+ 00:01:19,140 --> 00:01:24,840
79
+ However here is what made it hard to justify this is what made it mainstream and quite popular because
80
+
81
+ 21
82
+ 00:01:24,840 --> 00:01:26,340
83
+ it was so fast.
84
+
85
+ 22
86
+ 00:01:26,370 --> 00:01:33,420
87
+ So basically this was a this was developed by Viola Jones in the face detection algorithm in 2001 not
88
+
89
+ 23
90
+ 00:01:33,420 --> 00:01:35,480
91
+ that long long ago 17 years ago.
92
+
93
+ 24
94
+ 00:01:35,520 --> 00:01:40,960
95
+ To be fair and it was superfast and it's actually still use to the number of applications.
96
+
97
+ 25
98
+ 00:01:41,280 --> 00:01:43,710
99
+ Basically it's been optimized and tweaked to be even faster.
100
+
101
+ 26
102
+ 00:01:43,710 --> 00:01:49,890
103
+ So it basically reduces the CPQ load and it's very very accurate.
104
+
105
+ 27
106
+ 00:01:49,890 --> 00:01:52,930
107
+ Basically what it does it's a cascade of classifiers.
108
+
109
+ 28
110
+ 00:01:53,190 --> 00:01:56,640
111
+ That's basically how it got it got its name and it uses a horror.
112
+
113
+ 29
114
+ 00:01:56,640 --> 00:01:58,590
115
+ Basically let's go into the next slide.
116
+
117
+ 30
118
+ 00:01:58,660 --> 00:02:02,760
119
+ Actually I don't have it in this section but it basically uses horror features and harsh features are
120
+
121
+ 31
122
+ 00:02:02,760 --> 00:02:06,210
123
+ basically basically like you have rectangles.
124
+
125
+ 32
126
+ 00:02:06,250 --> 00:02:07,100
127
+ Overling here.
128
+
129
+ 33
130
+ 00:02:07,240 --> 00:02:12,690
131
+ You imagine a white rectangle here and one here and then there are different types of Arcacha pacifies.
132
+
133
+ 34
134
+ 00:02:12,810 --> 00:02:15,590
135
+ So basically is just a feature extraction.
136
+
137
+ 35
138
+ 00:02:15,690 --> 00:02:22,350
139
+ Basically what we learned before and it's led this box is that over the window over and over continuously
140
+
141
+ 36
142
+ 00:02:22,410 --> 00:02:31,950
143
+ looking for a face they're very good but they are pretty hard to train and develop and optimize.
144
+
145
+ 37
146
+ 00:02:32,010 --> 00:02:38,010
147
+ So let's move on to histogram with gradients and SVM sliding windows so sliding windows is a method
148
+
149
+ 38
150
+ 00:02:38,010 --> 00:02:43,580
151
+ where we extract segments a full image piece by piece in the form of a rectangular extractor box.
152
+
153
+ 39
154
+ 00:02:43,590 --> 00:02:48,000
155
+ So I mentioned it in previous slide when I was talking about this box being slid across this image.
156
+
157
+ 40
158
+ 00:02:48,330 --> 00:02:53,430
159
+ What it does here in this image is a picture of my wife from the last bodybuilding bikini competition
160
+
161
+ 41
162
+ 00:02:53,430 --> 00:02:54,560
163
+ two months ago.
164
+
165
+ 42
166
+ 00:02:54,870 --> 00:03:02,550
167
+ And what it does is just imagine this window is being moved here then down here and then down here just
168
+
169
+ 43
170
+ 00:03:02,550 --> 00:03:05,670
171
+ like remember how we moved across the image.
172
+
173
+ 44
174
+ 00:03:05,680 --> 00:03:07,960
175
+ And CNN's it's exactly the same thing.
176
+
177
+ 45
178
+ 00:03:07,970 --> 00:03:14,430
179
+ And we can actually set the same parameters like stride and the size of this box and what this box does
180
+
181
+ 46
182
+ 00:03:14,430 --> 00:03:17,640
183
+ here in sliding windows with histogram of gradients.
184
+
185
+ 47
186
+ 00:03:17,700 --> 00:03:25,980
187
+ SVM is that it basically extracts the entire hawgs all his brilliance in this box at different scales.
188
+
189
+ 48
190
+ 00:03:25,980 --> 00:03:31,620
191
+ So basically it does it with image at one scale and then not a scale smaller scale and then this one
192
+
193
+ 49
194
+ 00:03:31,620 --> 00:03:35,480
195
+ here and this one basically has no room to go right to just go straight down.
196
+
197
+ 50
198
+ 00:03:35,760 --> 00:03:39,480
199
+ And it tries to match up to how gradients went what it knows.
200
+
201
+ 51
202
+ 00:03:39,480 --> 00:03:41,700
203
+ It's supposed to look like to find the object.
204
+
205
+ 52
206
+ 00:03:42,000 --> 00:03:47,400
207
+ Now as you can see this could be an effective way but it's not really that resilient.
208
+
209
+ 53
210
+ 00:03:47,400 --> 00:03:48,410
211
+ Why.
212
+
213
+ 54
214
+ 00:03:48,420 --> 00:03:53,400
215
+ Because imagine we have to do this for every segment of image continuously.
216
+
217
+ 55
218
+ 00:03:53,400 --> 00:03:55,680
219
+ It gets exhaustive and computationally expensive
220
+
221
+ 56
222
+ 00:03:58,720 --> 00:04:05,370
223
+ so previous action which is basically TISM feature extraction I just mentioned that and why would we
224
+
225
+ 57
226
+ 00:04:05,370 --> 00:04:10,740
227
+ want to actually manually find co-features if CNN's actually eliminate that.
228
+
229
+ 58
230
+ 00:04:10,740 --> 00:04:16,350
231
+ All right CNN's actually automatically find features by just running all these tests destroying data
232
+
233
+ 59
234
+ 00:04:16,680 --> 00:04:20,350
235
+ Trulia algorithm and finding the last matching it with the correct last.
236
+
237
+ 60
238
+ 00:04:20,370 --> 00:04:22,770
239
+ So that's what's brilliant about CNN's.
240
+
241
+ 61
242
+ 00:04:22,770 --> 00:04:24,760
243
+ It takes that step away from us.
244
+
245
+ 62
246
+ 00:04:26,340 --> 00:04:31,970
247
+ So as I said once of problems we're doing this is a sea of scale.
248
+
249
+ 63
250
+ 00:04:32,100 --> 00:04:34,920
251
+ Imagine this is a simple image just 20 by 20.
252
+
253
+ 64
254
+ 00:04:34,920 --> 00:04:36,870
255
+ So this box can be passed over here.
256
+
257
+ 65
258
+ 00:04:36,960 --> 00:04:39,630
259
+ But imagine this was a much bigger continue TV image.
260
+
261
+ 66
262
+ 00:04:39,720 --> 00:04:44,130
263
+ How many different times how many different boxes would we extract.
264
+
265
+ 67
266
+ 00:04:44,130 --> 00:04:46,460
267
+ How do we know what size box should be.
268
+
269
+ 68
270
+ 00:04:46,470 --> 00:04:50,410
271
+ I mean that's where we rescale image but how many different rescaling are we going to do.
272
+
273
+ 69
274
+ 00:04:50,440 --> 00:04:54,830
275
+ So as you can see this is not a very effective way of doing object detection.
276
+
277
+ 70
278
+ 00:04:56,430 --> 00:05:02,600
279
+ So talk a bit the bullet histogram gradients are not going to go in go into this in detail of taught
280
+
281
+ 71
282
+ 00:05:02,600 --> 00:05:05,480
283
+ this in my other op and see the course you can.
284
+
285
+ 72
286
+ 00:05:05,480 --> 00:05:07,280
287
+ The video is included free in that section.
288
+
289
+ 73
290
+ 00:05:07,290 --> 00:05:09,230
291
+ So that's why I'm going to talk about it much here.
292
+
293
+ 74
294
+ 00:05:09,550 --> 00:05:15,290
295
+ But basically the slides are here for you to go through on your own and you can pretty much infer from
296
+
297
+ 75
298
+ 00:05:15,290 --> 00:05:17,720
299
+ these steps here what hawgs really are.
300
+
301
+ 76
302
+ 00:05:20,110 --> 00:05:22,090
303
+ So now we move on to our CNN's.
20. Principles of Object Detection/3. R-CNN, Fast R-CNN, Faster R-CNN and Mask R-CNN.srt ADDED
@@ -0,0 +1,847 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,460 --> 00:00:00,840
3
+ OK.
4
+
5
+ 2
6
+ 00:00:00,850 --> 00:00:08,260
7
+ So welcome to 20 point to where we talk about all the different types of CNN's going from typical original
8
+
9
+ 3
10
+ 00:00:08,520 --> 00:00:10,320
11
+ on CNN all the way to fast.
12
+
13
+ 4
14
+ 00:00:10,330 --> 00:00:12,380
15
+ Our CNN and mask are CNN.
16
+
17
+ 5
18
+ 00:00:12,640 --> 00:00:15,000
19
+ So let's see what this is about.
20
+
21
+ 6
22
+ 00:00:15,010 --> 00:00:20,040
23
+ So what does our our CNN sign for it sends for regions actually not recurrent.
24
+
25
+ 7
26
+ 00:00:20,110 --> 00:00:25,210
27
+ That's a different type of neural net and this are seen CNN's were first introduced relatively recently
28
+
29
+ 8
30
+ 00:00:25,720 --> 00:00:30,300
31
+ by in 2014 by researchers at the University of College of Berkeley in California.
32
+
33
+ 9
34
+ 00:00:30,580 --> 00:00:35,730
35
+ And basically they had dramatically improved performance on D-Pa. VRC challenge.
36
+
37
+ 10
38
+ 00:00:35,770 --> 00:00:40,010
39
+ This is the equivalent of image for object detection detection testing.
40
+
41
+ 11
42
+ 00:00:40,300 --> 00:00:40,610
43
+ OK.
44
+
45
+ 12
46
+ 00:00:40,650 --> 00:00:48,880
47
+ The example of some of the images in there that isn't so our CNN's attempt to solve the exhaustive search
48
+
49
+ 13
50
+ 00:00:49,360 --> 00:00:55,090
51
+ previously performed by sliding windows by proposing bounding boxes and passing these extracted bounding
52
+
53
+ 14
54
+ 00:00:55,150 --> 00:00:57,380
55
+ boxes to the image ossify.
56
+
57
+ 15
58
+ 00:00:57,700 --> 00:01:03,460
59
+ So how do we how do we find these bones in boxes like holiday proposed proposed by using the selective
60
+
61
+ 16
62
+ 00:01:03,460 --> 00:01:04,760
63
+ search algorithm.
64
+
65
+ 17
66
+ 00:01:04,900 --> 00:01:07,760
67
+ And this is like a simple illustration of what happens here.
68
+
69
+ 18
70
+ 00:01:08,020 --> 00:01:09,360
71
+ So we have an image here.
72
+
73
+ 19
74
+ 00:01:09,610 --> 00:01:15,820
75
+ We have to propose on the boxes and then we have we basically use these boxes perceptual CNN and try
76
+
77
+ 20
78
+ 00:01:15,820 --> 00:01:19,140
79
+ to identify what is in that.
80
+
81
+ 21
82
+ 00:01:19,140 --> 00:01:25,290
83
+ So let's talk about the selective search algorithm selective search attempts to segment image into groups
84
+
85
+ 22
86
+ 00:01:25,590 --> 00:01:31,680
87
+ by combining similar areas such as colors textures and propose these regions as interesting bounding
88
+
89
+ 23
90
+ 00:01:31,680 --> 00:01:32,700
91
+ boxes.
92
+
93
+ 24
94
+ 00:01:32,700 --> 00:01:34,680
95
+ So we have this image of some sheep here.
96
+
97
+ 25
98
+ 00:01:35,040 --> 00:01:39,810
99
+ And what's left of it is going to do is going to try to like merge different similar pixels similar
100
+
101
+ 26
102
+ 00:01:39,810 --> 00:01:42,120
103
+ lighting conditions that sort of stuff.
104
+
105
+ 27
106
+ 00:01:42,120 --> 00:01:48,250
107
+ So we can kind of get eventually get something like this and then it's going to draw a box of each segmented
108
+
109
+ 28
110
+ 00:01:48,270 --> 00:01:50,630
111
+ region and proposed that.
112
+
113
+ 29
114
+ 00:01:50,840 --> 00:01:53,760
115
+ Now it is a lot more tweaking and we can do selective siche.
116
+
117
+ 30
118
+ 00:01:53,940 --> 00:01:55,040
119
+ So it all depends.
120
+
121
+ 31
122
+ 00:01:55,050 --> 00:01:58,560
123
+ Basically we can get little boxes or a lot of boxes
124
+
125
+ 32
126
+ 00:02:01,640 --> 00:02:08,240
127
+ so when selective search has identified these regions of boxes it passes this extracted image to our
128
+
129
+ 33
130
+ 00:02:08,240 --> 00:02:11,420
131
+ CNN example one tree and an image in it.
132
+
133
+ 34
134
+ 00:02:11,520 --> 00:02:13,890
135
+ OK that would be a very good sign into use.
136
+
137
+ 35
138
+ 00:02:14,720 --> 00:02:20,740
139
+ We don't use the CNN directly for classification though although we can we use an SVM to first classify
140
+
141
+ 36
142
+ 00:02:20,740 --> 00:02:22,830
143
+ the scene and extracted features.
144
+
145
+ 37
146
+ 00:02:22,850 --> 00:02:24,710
147
+ Now I didn't mention that before.
148
+
149
+ 38
150
+ 00:02:24,830 --> 00:02:31,990
151
+ But what happens is that instead of actually passing to CNN here using CNN to get to classes what it
152
+
153
+ 39
154
+ 00:02:32,000 --> 00:02:36,890
155
+ does it just gives you to CNN features which are basically a feature maps.
156
+
157
+ 40
158
+ 00:02:36,890 --> 00:02:37,690
159
+ All right.
160
+
161
+ 41
162
+ 00:02:37,790 --> 00:02:42,590
163
+ And then we use an SVM to classified a scene and extract features.
164
+
165
+ 42
166
+ 00:02:42,590 --> 00:02:49,550
167
+ This was probably done to speed because if I if I'm not mistaken R-S.C. an end was the original proposed
168
+
169
+ 43
170
+ 00:02:49,580 --> 00:02:51,790
171
+ object action and it was meant for video.
172
+
173
+ 44
174
+ 00:02:52,250 --> 00:02:53,680
175
+ So speed was a concern.
176
+
177
+ 45
178
+ 00:02:54,800 --> 00:03:00,460
179
+ So after this region proposal has been classified we didn't use a simple linear regression to generate
180
+
181
+ 46
182
+ 00:03:00,550 --> 00:03:07,060
183
+ a tighter danglin box that's holding boxes we're actually basically size and fit around objects and
184
+
185
+ 47
186
+ 00:03:07,070 --> 00:03:07,740
187
+ image.
188
+
189
+ 48
190
+ 00:03:07,940 --> 00:03:15,440
191
+ But how do we know what the good boxes though and that's where we can come up with I an IOU metric that
192
+
193
+ 49
194
+ 00:03:15,440 --> 00:03:16,550
195
+ we discussed before.
196
+
197
+ 50
198
+ 00:03:16,760 --> 00:03:23,060
199
+ Remember you basically had measured how good the overlap of the predicted box was over the over the
200
+
201
+ 51
202
+ 00:03:23,060 --> 00:03:26,090
203
+ labels box.
204
+
205
+ 52
206
+ 00:03:26,090 --> 00:03:31,580
207
+ Now here's a problem that happens in our CNN's own basically or object detection algorithms.
208
+
209
+ 53
210
+ 00:03:31,780 --> 00:03:39,490
211
+ You DO WE they don't often just propose one box they propose many boxes over to see Image image sometimes.
212
+
213
+ 54
214
+ 00:03:39,800 --> 00:03:44,100
215
+ So remember in our view point five was considered a good result.
216
+
217
+ 55
218
+ 00:03:44,420 --> 00:03:51,650
219
+ Well what if we have multiple boxes and you all want all given over one with zero point five.
220
+
221
+ 56
222
+ 00:03:51,650 --> 00:03:52,890
223
+ This is a common problem.
224
+
225
+ 57
226
+ 00:03:53,180 --> 00:03:53,750
227
+ OK.
228
+
229
+ 58
230
+ 00:03:54,050 --> 00:03:56,560
231
+ So in the figures below let's go to this example.
232
+
233
+ 59
234
+ 00:03:56,570 --> 00:03:58,590
235
+ We have four boxes in red.
236
+
237
+ 60
238
+ 00:03:58,680 --> 00:04:00,080
239
+ These are them right here.
240
+
241
+ 61
242
+ 00:04:00,110 --> 00:04:08,930
243
+ The green is a true box a true positive and as such we have one true positive and true false positives.
244
+
245
+ 62
246
+ 00:04:08,930 --> 00:04:09,470
247
+ OK.
248
+
249
+ 63
250
+ 00:04:10,040 --> 00:04:15,460
251
+ So the true false positives here would be these boxes here one two and three.
252
+
253
+ 64
254
+ 00:04:15,500 --> 00:04:16,010
255
+ OK.
256
+
257
+ 65
258
+ 00:04:20,120 --> 00:04:26,520
259
+ So the reason it's true false positives is because we have one box here the dotted line right now this
260
+
261
+ 66
262
+ 00:04:26,520 --> 00:04:31,440
263
+ is basically our best box so that will count as a true positive in our predictions.
264
+
265
+ 67
266
+ 00:04:31,440 --> 00:04:32,000
267
+ All right.
268
+
269
+ 68
270
+ 00:04:32,130 --> 00:04:34,040
271
+ So for now it is ignore the green box.
272
+
273
+ 69
274
+ 00:04:34,050 --> 00:04:36,010
275
+ He doesn't count in this coalition.
276
+
277
+ 70
278
+ 00:04:36,060 --> 00:04:38,180
279
+ We just have one box that's really good.
280
+
281
+ 71
282
+ 00:04:38,220 --> 00:04:47,740
283
+ He's a true positive and treat up on good or false positives so mean average position is a very tricky
284
+
285
+ 72
286
+ 00:04:47,830 --> 00:04:54,510
287
+ metric in my opinion mainly because it's not that intuitive to understand how the formulas fit here.
288
+
289
+ 73
290
+ 00:04:54,730 --> 00:04:56,870
291
+ What I'm going to say though is I'm going to try to explain it to you.
292
+
293
+ 74
294
+ 00:04:56,940 --> 00:05:02,470
295
+ Outgo introduce into amount of left I'm not for you in a slide so you can go over on your own and try
296
+
297
+ 75
298
+ 00:05:02,470 --> 00:05:03,720
299
+ to make sense of it.
300
+
301
+ 76
302
+ 00:05:03,760 --> 00:05:08,840
303
+ There's also a couple of blogs that have some very good but pretty lengthy explanations for it.
304
+
305
+ 77
306
+ 00:05:09,070 --> 00:05:11,100
307
+ But let's go back to the sorry.
308
+
309
+ 78
310
+ 00:05:11,320 --> 00:05:17,170
311
+ So what I'm going to say is that actually in this blog here is a very good explanation for it.
312
+
313
+ 79
314
+ 00:05:17,290 --> 00:05:25,030
315
+ But essentially what mean average position tries to do is that it takes it knows all the boxes the architect
316
+
317
+ 80
318
+ 00:05:25,060 --> 00:05:31,490
319
+ proposes and basically what it tries to do is try to come up with a metric that defines basically what
320
+
321
+ 81
322
+ 00:05:31,490 --> 00:05:33,130
323
+ it is like.
324
+
325
+ 82
326
+ 00:05:33,300 --> 00:05:41,680
327
+ See object that to predict this but object to be predicted maybe for false positives and maybe like
328
+
329
+ 83
330
+ 00:05:41,680 --> 00:05:44,880
331
+ some other boxes that were probably irrelevant and stuff.
332
+
333
+ 84
334
+ 00:05:44,890 --> 00:05:47,580
335
+ So and then we remember this is a for one a class.
336
+
337
+ 85
338
+ 00:05:47,590 --> 00:05:49,280
339
+ Remember there are different classes as well.
340
+
341
+ 86
342
+ 00:05:49,460 --> 00:05:56,300
343
+ So I mean average precision is a way to measure how effective all of my object detectors were were actually
344
+
345
+ 87
346
+ 00:05:56,560 --> 00:05:57,490
347
+ well actually.
348
+
349
+ 88
350
+ 00:05:57,660 --> 00:05:58,070
351
+ OK.
352
+
353
+ 89
354
+ 00:05:58,920 --> 00:06:04,690
355
+ So just remember this is a metric of how we baseline how we measure performance of objective
356
+
357
+ 90
358
+ 00:06:07,820 --> 00:06:14,420
359
+ So now let's move on to not the maximum suppression and this is a technique that object is used to remove
360
+
361
+ 91
362
+ 00:06:14,420 --> 00:06:18,640
363
+ overlapping boxes and thus improve them up scores significantly.
364
+
365
+ 92
366
+ 00:06:18,920 --> 00:06:24,400
367
+ So what this does basically it looks at a probabilities associated with each box that's being generated
368
+
369
+ 93
370
+ 00:06:24,830 --> 00:06:30,590
371
+ and the probabilities it looks at all the probabilities of the object being in the same class effectively.
372
+
373
+ 94
374
+ 00:06:30,590 --> 00:06:35,690
375
+ So if we have like two or three boxes that say this is a call but they have high overlap.
376
+
377
+ 95
378
+ 00:06:35,870 --> 00:06:40,180
379
+ What it does now it looks at the highest probability box.
380
+
381
+ 96
382
+ 00:06:40,230 --> 00:06:41,970
383
+ That's this one here in red.
384
+
385
+ 97
386
+ 00:06:42,200 --> 00:06:46,730
387
+ And what it does it checks the eye or you would do with the other boxes.
388
+
389
+ 98
390
+ 00:06:46,730 --> 00:06:52,970
391
+ So in this image here we see that we have tree boxes here that object texture the doctor has basically
392
+
393
+ 99
394
+ 00:06:54,080 --> 00:07:00,340
395
+ highlighted and these are the probabilities of it being a class so what it does it checks you over these
396
+
397
+ 100
398
+ 00:07:00,410 --> 00:07:07,940
399
+ here and what it does it will drop the boxes that it deems basically not as relevant as main box.
400
+
401
+ 101
402
+ 00:07:07,950 --> 00:07:11,250
403
+ So that's effectively how we clean up these boxes.
404
+
405
+ 102
406
+ 00:07:11,470 --> 00:07:15,420
407
+ You do object that is produce.
408
+
409
+ 103
410
+ 00:07:15,450 --> 00:07:17,430
411
+ So now let's move on to fast.
412
+
413
+ 104
414
+ 00:07:17,430 --> 00:07:24,870
415
+ Our CNN's this was this was announced or released in 2015 a year after the original on CNN came out
416
+
417
+ 105
418
+ 00:07:25,380 --> 00:07:31,320
419
+ and basically what happened was that the problem with our CNNs was that it was effective but pretty
420
+
421
+ 106
422
+ 00:07:31,320 --> 00:07:39,030
423
+ slow as each but each proposed bounding box has to be classified by a CNI by CNN and as such doing is
424
+
425
+ 107
426
+ 00:07:39,120 --> 00:07:41,890
427
+ doing it in realtime was often impossible.
428
+
429
+ 108
430
+ 00:07:42,060 --> 00:07:43,970
431
+ So it required tree models.
432
+
433
+ 109
434
+ 00:07:44,370 --> 00:07:47,640
435
+ And it also required tree models to be trained separately.
436
+
437
+ 110
438
+ 00:07:47,640 --> 00:07:53,760
439
+ So we had to have a feature extraction CNN and SVM to predict a class and a linear regression model
440
+
441
+ 111
442
+ 00:07:53,760 --> 00:07:55,420
443
+ to tighten the Bongi boxes.
444
+
445
+ 112
446
+ 00:07:55,640 --> 00:07:57,690
447
+ You remember all these things we discussed earlier.
448
+
449
+ 113
450
+ 00:07:57,960 --> 00:08:03,220
451
+ So this definitely is a bit of a lot of moving parts in our CNN's.
452
+
453
+ 114
454
+ 00:08:03,230 --> 00:08:06,600
455
+ So what did fasta our CNN's do first.
456
+
457
+ 115
458
+ 00:08:06,630 --> 00:08:11,190
459
+ Our CNN's fiercly reduced number for postboxes by removing the overlap generated.
460
+
461
+ 116
462
+ 00:08:11,190 --> 00:08:12,930
463
+ So how did they do this.
464
+
465
+ 117
466
+ 00:08:12,930 --> 00:08:18,750
467
+ We run the CNN across the image just once instead of many times using a technique called that region
468
+
469
+ 118
470
+ 00:08:19,350 --> 00:08:21,930
471
+ of interest pooling our boy pool.
472
+
473
+ 119
474
+ 00:08:22,440 --> 00:08:22,880
475
+ OK.
476
+
477
+ 120
478
+ 00:08:23,190 --> 00:08:29,280
479
+ So our away pool allows us to share the forward pass of the CNN for images for image across that sort
480
+
481
+ 121
482
+ 00:08:29,280 --> 00:08:30,660
483
+ of subregions.
484
+
485
+ 122
486
+ 00:08:30,660 --> 00:08:37,380
487
+ This works because previously regions are simply extracted from the CNN feature map and then puled which
488
+
489
+ 123
490
+ 00:08:37,530 --> 00:08:44,360
491
+ deadfall you only need to render CNN once image and basically use that output of the CNN going forward.
492
+
493
+ 124
494
+ 00:08:46,060 --> 00:08:52,510
495
+ So combining the training of the CNN classifier and the bonding box regressed into a single model that's
496
+
497
+ 125
498
+ 00:08:52,510 --> 00:08:53,030
499
+ what it did.
500
+
501
+ 126
502
+ 00:08:53,110 --> 00:08:59,280
503
+ So instead of so we have the SVM feature classifier and that now became a sort of actually on top of
504
+
505
+ 127
506
+ 00:08:59,280 --> 00:09:05,870
507
+ the CNN and old linear regression that was taking the boxes basically that became a bounding box.
508
+
509
+ 128
510
+ 00:09:05,910 --> 00:09:08,770
511
+ Apulia parallel to our soft Leo.
512
+
513
+ 129
514
+ 00:09:09,040 --> 00:09:12,540
515
+ So basically what it became is this.
516
+
517
+ 130
518
+ 00:09:12,610 --> 00:09:14,430
519
+ So we have a feature extractor.
520
+
521
+ 131
522
+ 00:09:14,440 --> 00:09:22,360
523
+ CNN we have a soft Max layers at the end of CNN being our classifier for the object type or class.
524
+
525
+ 132
526
+ 00:09:22,510 --> 00:09:26,120
527
+ And then we have this in Peridot a bounding box regressive.
528
+
529
+ 133
530
+ 00:09:26,170 --> 00:09:33,400
531
+ So that is how it our CNN's fast our CNN solve a lot of the delays and sluggishness of the original
532
+
533
+ 134
534
+ 00:09:33,400 --> 00:09:37,670
535
+ RCN and so now a year later in 2016.
536
+
537
+ 135
538
+ 00:09:37,950 --> 00:09:42,560
539
+ Our CNN's media release I should say so faster.
540
+
541
+ 136
542
+ 00:09:42,720 --> 00:09:44,650
543
+ Our CNN's made significant speed increases.
544
+
545
+ 137
546
+ 00:09:44,650 --> 00:09:51,290
547
+ However a region proposal still remained relatively slow as it still relied on selective search algorithm.
548
+
549
+ 138
550
+ 00:09:51,790 --> 00:09:57,220
551
+ Fortunately a Microsoft Research Team figured out how to eliminate this bottleneck.
552
+
553
+ 139
554
+ 00:09:57,220 --> 00:10:02,770
555
+ So how did it speed up rigid proposal select to sit your lies and features extracted from the image
556
+
557
+ 140
558
+ 00:10:03,280 --> 00:10:07,880
559
+ What if we just reused as features to do region proposal instead.
560
+
561
+ 141
562
+ 00:10:08,100 --> 00:10:11,220
563
+ Okay so that was the inside that made the faster.
564
+
565
+ 142
566
+ 00:10:11,320 --> 00:10:13,180
567
+ And is extremely efficient.
568
+
569
+ 143
570
+ 00:10:13,180 --> 00:10:15,670
571
+ So you get to take a look at a diagram from the people here.
572
+
573
+ 144
574
+ 00:10:15,970 --> 00:10:23,150
575
+ So basically this line here is what is important what if we use those features to do rigid proposal.
576
+
577
+ 145
578
+ 00:10:23,440 --> 00:10:29,350
579
+ That is exactly why we don't have to keep running this over and over for the equivalent of a bit using
580
+
581
+ 146
582
+ 00:10:29,770 --> 00:10:32,410
583
+ selective search to generate our proposals.
584
+
585
+ 147
586
+ 00:10:33,880 --> 00:10:35,950
587
+ So how do we do rigid proposals with faster.
588
+
589
+ 148
590
+ 00:10:35,950 --> 00:10:43,960
591
+ Our CNN's so fast our CNN's ad a fully convolutional network on top of the features of CNN to create
592
+
593
+ 149
594
+ 00:10:43,990 --> 00:10:45,740
595
+ a region proposal network.
596
+
597
+ 150
598
+ 00:10:45,940 --> 00:10:47,370
599
+ That's what we're seeing here.
600
+
601
+ 151
602
+ 00:10:47,680 --> 00:10:48,780
603
+ So it is now.
604
+
605
+ 152
606
+ 00:10:49,390 --> 00:10:56,720
607
+ Let's go back to it is now a full fully convolutional network on top of the features of CNN.
608
+
609
+ 153
610
+ 00:10:56,740 --> 00:10:57,480
611
+ So let's think about that.
612
+
613
+ 154
614
+ 00:10:57,480 --> 00:11:04,590
615
+ So we have a CNN producers here and is a fully convolutional at work here that does this region proposal
616
+
617
+ 155
618
+ 00:11:05,980 --> 00:11:06,730
619
+ OK.
620
+
621
+ 156
622
+ 00:11:06,930 --> 00:11:12,540
623
+ So the authors of the paper state the region proposal network slides a window over the features of the
624
+
625
+ 157
626
+ 00:11:12,570 --> 00:11:14,930
627
+ CNN at each window location.
628
+
629
+ 158
630
+ 00:11:14,940 --> 00:11:22,740
631
+ The netbook was a school and a bounty box put anchor hence 4000 box coordinates where kids number of
632
+
633
+ 159
634
+ 00:11:22,740 --> 00:11:23,540
635
+ anchors.
636
+
637
+ 160
638
+ 00:11:23,670 --> 00:11:25,980
639
+ That is basically how it works.
640
+
641
+ 161
642
+ 00:11:26,010 --> 00:11:31,680
643
+ I encourage you to read to people if you want to get into more detail but first our CNN's Anyway back
644
+
645
+ 162
646
+ 00:11:31,680 --> 00:11:34,540
647
+ to this after each pass of the sliding window.
648
+
649
+ 163
650
+ 00:11:34,740 --> 00:11:41,400
651
+ It outputs key potential ballot boxes and a confidence of how good this box is expected to be.
652
+
653
+ 164
654
+ 00:11:41,400 --> 00:11:44,860
655
+ That's pretty cool and pretty complicated as well.
656
+
657
+ 165
658
+ 00:11:45,180 --> 00:11:47,140
659
+ So producing the bombing boxes now.
660
+
661
+ 166
662
+ 00:11:47,360 --> 00:11:54,630
663
+ So previously we mentioned we produced key potential bounding boxes these bomb box proposals are proposals
664
+
665
+ 167
666
+ 00:11:54,630 --> 00:12:01,740
667
+ of common expected boxes of suits and ships aspect ratio and sizes this aspect ratio is called Anca
668
+
669
+ 168
670
+ 00:12:01,740 --> 00:12:02,710
671
+ boxes.
672
+
673
+ 169
674
+ 00:12:02,730 --> 00:12:09,420
675
+ So what's happening here is that all boxes the boxes we propose in this method are basically predefined
676
+
677
+ 170
678
+ 00:12:09,630 --> 00:12:13,150
679
+ and there would be 60 of suits and shapes ratios and sizes.
680
+
681
+ 171
682
+ 00:12:13,530 --> 00:12:20,040
683
+ So it's a region that the proposal outputs a bony box put onco and a scale of how likely the image in
684
+
685
+ 172
686
+ 00:12:20,040 --> 00:12:29,940
687
+ the box will be an object so let's move on to mask our CNN's which is pixel level segmentation now Mass.
688
+
689
+ 173
690
+ 00:12:29,980 --> 00:12:32,430
691
+ Our CNN's aim to combine it.
692
+
693
+ 174
694
+ 00:12:32,460 --> 00:12:38,170
695
+ Obs detection action and classification with segmentation as we previously saw segmentation as the labeling
696
+
697
+ 175
698
+ 00:12:38,170 --> 00:12:41,880
699
+ of objects per pixel level as we can see in the above image.
700
+
701
+ 176
702
+ 00:12:42,980 --> 00:12:45,220
703
+ So how do Musse our CNN's work.
704
+
705
+ 177
706
+ 00:12:45,450 --> 00:12:47,590
707
+ They are basically an extension of fast.
708
+
709
+ 178
710
+ 00:12:47,600 --> 00:12:53,480
711
+ Our CNN's web binary mask is created for the objects detected by the box.
712
+
713
+ 179
714
+ 00:12:53,540 --> 00:13:00,100
715
+ So you remember how you actually created a mask that was basically provided segmentation outputs.
716
+
717
+ 180
718
+ 00:13:00,140 --> 00:13:07,640
719
+ That's what's happening here when in fact our CNN's so binary mask basically bit 0 or 1 is created for
720
+
721
+ 181
722
+ 00:13:07,640 --> 00:13:09,060
723
+ the object in the box.
724
+
725
+ 182
726
+ 00:13:09,080 --> 00:13:11,190
727
+ So we know we're looking at a box.
728
+
729
+ 183
730
+ 00:13:11,240 --> 00:13:15,390
731
+ And we are creating a binary mask for each box that's found.
732
+
733
+ 184
734
+ 00:13:15,560 --> 00:13:21,590
735
+ And this is the architecture of the network and the link to the actual research paper where this publication
736
+
737
+ 185
738
+ 00:13:21,710 --> 00:13:24,390
739
+ was released or made.
740
+
741
+ 186
742
+ 00:13:24,770 --> 00:13:28,440
743
+ So messy CNN's you something called R Y O line.
744
+
745
+ 187
746
+ 00:13:28,820 --> 00:13:33,580
747
+ So the mass of put uses the CNN extracted features to create its binary mask.
748
+
749
+ 188
750
+ 00:13:33,710 --> 00:13:36,130
751
+ So how do the authors of this paper achieve this.
752
+
753
+ 189
754
+ 00:13:36,130 --> 00:13:42,740
755
+ They use our oil line instead of a pool as a Pooles feature map was misalign from the regions of the
756
+
757
+ 190
758
+ 00:13:42,740 --> 00:13:43,660
759
+ original image.
760
+
761
+ 191
762
+ 00:13:43,880 --> 00:13:46,990
763
+ That was something I read in the paper I was confused to why.
764
+
765
+ 192
766
+ 00:13:47,100 --> 00:13:48,350
767
+ But they explained it pretty well.
768
+
769
+ 193
770
+ 00:13:48,470 --> 00:13:52,010
771
+ So mapping a region of interest onto a future map.
772
+
773
+ 194
774
+ 00:13:52,010 --> 00:13:54,580
775
+ So imagine this was the original image here.
776
+
777
+ 195
778
+ 00:13:54,770 --> 00:13:59,000
779
+ And we had a feature map that we generated from CNN that is smaller.
780
+
781
+ 196
782
+ 00:13:59,210 --> 00:14:00,700
783
+ OK because it usually is smaller.
784
+
785
+ 197
786
+ 00:14:00,710 --> 00:14:07,010
787
+ And as we use your reporting and what happens here is I know a 100 by 100 image is now mapped onto the
788
+
789
+ 198
790
+ 00:14:07,040 --> 00:14:08,610
791
+ two by to the to future map.
792
+
793
+ 199
794
+ 00:14:08,690 --> 00:14:14,930
795
+ Therefore a window of 20 by 20 that's just good old size window here on your original image is mapped
796
+
797
+ 200
798
+ 00:14:14,930 --> 00:14:17,470
799
+ to a 6.4 pixel here.
800
+
801
+ 201
802
+ 00:14:17,510 --> 00:14:21,320
803
+ This is how a direct linear mapping would be in the future map.
804
+
805
+ 202
806
+ 00:14:21,460 --> 00:14:25,750
807
+ Our pool however wrong's down pixel maps to six by six.
808
+
809
+ 203
810
+ 00:14:25,790 --> 00:14:30,730
811
+ That is why our pools feature but when we use our pool instead of like a line.
812
+
813
+ 204
814
+ 00:14:30,740 --> 00:14:37,880
815
+ You had a misaligned boxes or regions on the image may have been that important to any looking at a
816
+
817
+ 205
818
+ 00:14:37,880 --> 00:14:39,630
819
+ big image with small objects.
820
+
821
+ 206
822
+ 00:14:39,950 --> 00:14:47,240
823
+ But if you think about it if it's a big box and you're going to do a proposal on being misaligned can
824
+
825
+ 207
826
+ 00:14:47,240 --> 00:14:48,290
827
+ actually be bad.
828
+
829
+ 208
830
+ 00:14:48,710 --> 00:14:55,910
831
+ So I wrote a line of these bio linear interpolation to know exactly what would be the pixel at 6.4.
832
+
833
+ 209
834
+ 00:14:56,360 --> 00:15:02,260
835
+ So it's a pretty cool nifty algorithm that gives you exact mapping without having to roundel pixels.
836
+
837
+ 210
838
+ 00:15:03,620 --> 00:15:10,640
839
+ So these are some examples of mask RCN and segmenting and classifying images is pretty cool to see this
840
+
841
+ 211
842
+ 00:15:10,640 --> 00:15:11,030
843
+ in action.
844
+
845
+ 212
846
+ 00:15:11,030 --> 00:15:14,570
847
+ Actually it's very accurate in some in some videos.
20. Principles of Object Detection/4. Single Shot Detectors (SSDs).srt ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,750 --> 00:00:01,100
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,140 --> 00:00:06,570
7
+ So now let's move on to chapter 20 point tree where we talk about single shot detectors also called
8
+
9
+ 3
10
+ 00:00:06,630 --> 00:00:09,510
11
+ SSD is and design not solid state drives.
12
+
13
+ 4
14
+ 00:00:09,510 --> 00:00:18,010
15
+ By the way totally different SSD so single shot detectors so we previously just went through the entire
16
+
17
+ 5
18
+ 00:00:18,310 --> 00:00:21,880
19
+ CNN family and we've seen how successfully that can be applied.
20
+
21
+ 6
22
+ 00:00:21,880 --> 00:00:22,590
23
+ All right.
24
+
25
+ 7
26
+ 00:00:22,720 --> 00:00:28,690
27
+ How ever the performance on video is still not optimal and it typically run even Anji pews at seven
28
+
29
+ 8
30
+ 00:00:28,690 --> 00:00:29,690
31
+ frames per second.
32
+
33
+ 9
34
+ 00:00:29,860 --> 00:00:36,100
35
+ Now SSD is to improve dispute speech by eliminating the need for rigid proposal so you can see the speed
36
+
37
+ 10
38
+ 00:00:36,100 --> 00:00:38,870
39
+ of fasta our CNN's here and you'll.
40
+
41
+ 11
42
+ 00:00:39,160 --> 00:00:40,860
43
+ And this is Ulo vision one.
44
+
45
+ 12
46
+ 00:00:41,020 --> 00:00:45,500
47
+ But look at the speed of SSD is on relatively high resolution images.
48
+
49
+ 13
50
+ 00:00:45,520 --> 00:00:49,820
51
+ They're pretty fast and this is running on a Titan SGP.
52
+
53
+ 14
54
+ 00:00:50,500 --> 00:00:52,060
55
+ So this is pretty impressive.
56
+
57
+ 15
58
+ 00:00:52,060 --> 00:01:00,130
59
+ So how did it achieve this improvement in speed as these use multi-skilled features and default boxes
60
+
61
+ 16
62
+ 00:01:00,160 --> 00:01:03,400
63
+ as well as dropping the resolution of the images to improve speed.
64
+
65
+ 17
66
+ 00:01:03,400 --> 00:01:05,570
67
+ That doesn't seem that difficult does it.
68
+
69
+ 18
70
+ 00:01:05,590 --> 00:01:10,740
71
+ But anyway this allows a city of near real time speed with almost no drop in accuracy.
72
+
73
+ 19
74
+ 00:01:11,810 --> 00:01:15,240
75
+ So these are comprised of two main parts.
76
+
77
+ 20
78
+ 00:01:15,380 --> 00:01:21,110
79
+ We have the feature epic Scheckter and typically they use Viji 16 that was actually what was used in
80
+
81
+ 21
82
+ 00:01:21,110 --> 00:01:26,360
83
+ the published people but residents a dense net actually could provide better results as they have actually
84
+
85
+ 22
86
+ 00:01:26,510 --> 00:01:31,970
87
+ been better in the eyeless VRC result competition.
88
+
89
+ 23
90
+ 00:01:31,980 --> 00:01:37,490
91
+ So anyway so we then we have the feature map here and we have the convolutional filter that we use for
92
+
93
+ 24
94
+ 00:01:37,490 --> 00:01:38,930
95
+ object detection.
96
+
97
+ 25
98
+ 00:01:38,930 --> 00:01:40,620
99
+ So this is a diagram of it here.
100
+
101
+ 26
102
+ 00:01:40,670 --> 00:01:43,060
103
+ So basically this is the input image here.
104
+
105
+ 27
106
+ 00:01:43,130 --> 00:01:48,260
107
+ This is a feature extractor for convolutional work here which was Viji 16.
108
+
109
+ 28
110
+ 00:01:48,590 --> 00:01:53,070
111
+ And then we have the convolutional filter that we use for our detectors here.
112
+
113
+ 29
114
+ 00:01:53,240 --> 00:01:53,730
115
+ OK.
20. Principles of Object Detection/5. YOLO to YOLOv3.srt ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,600 --> 00:00:05,590
3
+ Hi and welcome to Chapter 20 point four where we talk about you and we go all the way up from yellow
4
+
5
+ 2
6
+ 00:00:05,640 --> 00:00:07,470
7
+ to fish and tree.
8
+
9
+ 3
10
+ 00:00:07,470 --> 00:00:09,700
11
+ So let's see what eel really is about.
12
+
13
+ 4
14
+ 00:00:10,710 --> 00:00:12,430
15
+ So you know you only live once.
16
+
17
+ 5
18
+ 00:00:12,480 --> 00:00:15,140
19
+ No it's actually you only look once.
20
+
21
+ 6
22
+ 00:00:15,510 --> 00:00:21,110
23
+ And the idea behind you is that a single neuron that work is applied to full to a full image.
24
+
25
+ 7
26
+ 00:00:21,450 --> 00:00:25,830
27
+ And this allows us to reason globally about image when generating predictions.
28
+
29
+ 8
30
+ 00:00:25,910 --> 00:00:31,210
31
+ So it's basically a neural net that actually looks at the entire image for us with CNN.
32
+
33
+ 9
34
+ 00:00:31,560 --> 00:00:38,220
35
+ So it is a direct development from off multi bucks but it turns a multi box from a rigid proposal into
36
+
37
+ 10
38
+ 00:00:38,220 --> 00:00:44,400
39
+ an object recognition method by adding a soft actually in parallel with a box or a press box classify
40
+
41
+ 11
42
+ 00:00:44,400 --> 00:00:45,070
43
+ earlier.
44
+
45
+ 12
46
+ 00:00:45,450 --> 00:00:50,580
47
+ So it divides the image into regions and then predicts bounding boxes and possibilities of each region
48
+
49
+ 13
50
+ 00:00:51,430 --> 00:00:58,840
51
+ YOLO YOLO uses a fully convolutional neural network allowing for an input of various sizes.
52
+
53
+ 14
54
+ 00:00:58,890 --> 00:00:59,840
55
+ So that's pretty cool.
56
+
57
+ 15
58
+ 00:01:01,680 --> 00:01:04,820
59
+ So let me tell you how it works.
60
+
61
+ 16
62
+ 00:01:04,830 --> 00:01:08,760
63
+ So the input image is divided into an S by a script.
64
+
65
+ 17
66
+ 00:01:09,100 --> 00:01:14,870
67
+ If the center of an object falls into this grid that cell is responsible for detecting that object.
68
+
69
+ 18
70
+ 00:01:14,900 --> 00:01:20,580
71
+ Now each grid predicts a number of bungling boxes and confidence scores for his boxes.
72
+
73
+ 19
74
+ 00:01:20,580 --> 00:01:25,270
75
+ Confidence has defined as a probability of an object multiplied by the threshold.
76
+
77
+ 20
78
+ 00:01:25,290 --> 00:01:31,300
79
+ I use school and I use scores of point five or less than 0.5 or given ZERO confidence.
80
+
81
+ 21
82
+ 00:01:31,630 --> 00:01:36,080
83
+ So the bounding box is defined by these parameters.
84
+
85
+ 22
86
+ 00:01:36,120 --> 00:01:42,690
87
+ X y width and height where x and y the center of the box and W H are height and weight by multiplying
88
+
89
+ 23
90
+ 00:01:42,690 --> 00:01:47,450
91
+ the conditional class probability and individual box confidence predictions.
92
+
93
+ 24
94
+ 00:01:47,460 --> 00:01:50,740
95
+ We get to class specific confidence school of each box.
96
+
97
+ 25
98
+ 00:01:50,970 --> 00:01:55,910
99
+ That's how you effectively works so that your model is quite good.
100
+
101
+ 26
102
+ 00:01:55,920 --> 00:01:57,830
103
+ So this is essay by a secret here.
104
+
105
+ 27
106
+ 00:01:58,080 --> 00:02:02,950
107
+ These are the bounding boxes that we just generate from each cell plus the confidence.
108
+
109
+ 28
110
+ 00:02:03,000 --> 00:02:05,160
111
+ This is the class probability map.
112
+
113
+ 29
114
+ 00:02:05,160 --> 00:02:10,350
115
+ So the class probability map basically indicates if you go back to it here the probability of an object
116
+
117
+ 30
118
+ 00:02:10,350 --> 00:02:16,740
119
+ be multiplied by Trishul school k and then these are the final actions here.
120
+
121
+ 31
122
+ 00:02:16,740 --> 00:02:20,220
123
+ So you can see this blue object here which would belong to that class.
124
+
125
+ 32
126
+ 00:02:20,220 --> 00:02:22,270
127
+ Probability of a dog.
128
+
129
+ 33
130
+ 00:02:22,380 --> 00:02:29,400
131
+ Then there was a bicycle and then there was a background which ended up being a car.
132
+
133
+ 34
134
+ 00:02:29,440 --> 00:02:36,160
135
+ So let's talk about the last function adjustments during training Ulo uses differential weight for confidence
136
+
137
+ 35
138
+ 00:02:36,160 --> 00:02:40,790
139
+ corrections from boxes that contain objects and boxes that do not contain objects.
140
+
141
+ 36
142
+ 00:02:41,020 --> 00:02:46,990
143
+ It penalizes errors in small and large objects differently by predicting the square root of the box
144
+
145
+ 37
146
+ 00:02:47,010 --> 00:02:49,780
147
+ width and height.
148
+
149
+ 38
150
+ 00:02:49,780 --> 00:02:51,550
151
+ So this is a little architecture.
152
+
153
+ 39
154
+ 00:02:51,800 --> 00:02:53,980
155
+ It's a bit it looks a bit simple doesn't it.
156
+
157
+ 40
158
+ 00:02:53,980 --> 00:02:57,480
159
+ However it's a lot more going on than meets the eye.
160
+
161
+ 41
162
+ 00:02:57,590 --> 00:03:04,510
163
+ In this image so let's talk about the evolution of your little sister in 2016.
164
+
165
+ 42
166
+ 00:03:04,580 --> 00:03:06,510
167
+ That was the seamier fast.
168
+
169
+ 43
170
+ 00:03:06,770 --> 00:03:14,150
171
+ Our CNN's was released and it was voted open civies people's choice award at CVR conference.
172
+
173
+ 44
174
+ 00:03:14,210 --> 00:03:15,760
175
+ Pattern recognition.
176
+
177
+ 45
178
+ 00:03:15,880 --> 00:03:22,670
179
+ Unconference election two was later released when Basho musician was added to CNN.
180
+
181
+ 46
182
+ 00:03:22,670 --> 00:03:30,360
183
+ So we do use Bachan musician and as easily as here which resulted in mapping improvements.
184
+
185
+ 47
186
+ 00:03:30,400 --> 00:03:35,230
187
+ Map was mean average precision of 2 percent which was quite significant.
188
+
189
+ 48
190
+ 00:03:35,240 --> 00:03:41,810
191
+ It was also a fine tuned a bit to get high resolution images giving it a 4 percent increase in map and
192
+
193
+ 49
194
+ 00:03:41,810 --> 00:03:48,260
195
+ then yellow tree was fine too and even fitten introduced multi-skilled training better training to better
196
+
197
+ 50
198
+ 00:03:48,290 --> 00:03:51,920
199
+ help detect small objects so that is it for you.
200
+
201
+ 51
202
+ 00:03:51,950 --> 00:03:56,590
203
+ And now we move onto the next section which is the tensor flu outbreak detection API.
21. TensforFlow Object Detection/Go to the folder speciefid in this file ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ INSTRUCTIONS
2
+
3
+
4
+ GO TO THIS FOLDER FROM YOUR IPYTHON NOTEBOOK
5
+ /home/deeplearningcv/models/models/research/object_detection
6
+
7
+ OPEN THIS FILE
8
+ /home/deeplearningcv/models/models/research/object_detection/object_detection_tutorial.ipynb
9
+
10
+ OR
11
+
12
+ COPY THE IPYTHON NOTEBOOK FILE IN THIS FOLDER TO THE DIRECTORY - /home/deeplearningcv/models/models/research/object_detection
21. TensforFlow Object Detection/object_detection_tutorial.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
21. TensorFlow Object Detection API/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,990 --> 00:00:07,630
3
+ Hi and welcome to Chapter 21 where we talk about the tenets of flu object detection API.
4
+
5
+ 2
6
+ 00:00:07,890 --> 00:00:11,050
7
+ So this section is split up into tree bots.
8
+
9
+ 3
10
+ 00:00:11,070 --> 00:00:16,890
11
+ First part we deal with the API install and set up then we start experimenting with the actual trained
12
+
13
+ 4
14
+ 00:00:16,950 --> 00:00:24,180
15
+ API is just one extreme and resonant SSD and we would we use it on a web cam and videos and images and
16
+
17
+ 5
18
+ 00:00:24,180 --> 00:00:29,980
19
+ then on twenty one point tree I go into detail and how you actually want to go about training a flow
20
+
21
+ 6
22
+ 00:00:30,030 --> 00:00:31,230
23
+ of the action.
24
+
25
+ 7
26
+ 00:00:31,500 --> 00:00:34,840
27
+ It's not easy but it is doable if you have a GP you.
21. TensorFlow Object Detection API/2. TFOD API Install and Setup.srt ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,420 --> 00:00:00,970
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,020 --> 00:00:06,500
7
+ So in twenty one point one we deal with a t i install and setup.
8
+
9
+ 3
10
+ 00:00:06,630 --> 00:00:10,060
11
+ So let's talk a bit about tensor flows object action.
12
+
13
+ 4
14
+ 00:00:10,150 --> 00:00:16,380
15
+ The TFT API is one of the more mature and relatively easy to use object action frameworks.
16
+
17
+ 5
18
+ 00:00:16,470 --> 00:00:19,600
19
+ Most of them are actually quite finicky and tricky to use.
20
+
21
+ 6
22
+ 00:00:19,650 --> 00:00:22,560
23
+ Typically most other obligation frameworks are finicky.
24
+
25
+ 7
26
+ 00:00:22,560 --> 00:00:28,140
27
+ As I just said and difficult to use and they brick quite easily as I have a lot of moving parts tend
28
+
29
+ 8
30
+ 00:00:28,430 --> 00:00:33,540
31
+ to flows of detection attempts to solve that by creating a framework API that uses tensor flow.
32
+
33
+ 9
34
+ 00:00:33,540 --> 00:00:34,760
35
+ No surprise there.
36
+
37
+ 10
38
+ 00:00:34,770 --> 00:00:40,240
39
+ To create an object detection modules using both our CNN family as well as the SSD family.
40
+
41
+ 11
42
+ 00:00:41,010 --> 00:00:47,550
43
+ So while the TI FOTA Woody API makes it far easier than until the remittance it still has a bit of a
44
+
45
+ 12
46
+ 00:00:47,550 --> 00:00:50,100
47
+ learning curve by the way.
48
+
49
+ 13
50
+ 00:00:50,100 --> 00:00:54,890
51
+ This is actually an output of the tensor flow optimization API with an SSD.
52
+
53
+ 14
54
+ 00:00:54,990 --> 00:00:55,890
55
+ It's quite cool isn't it.
56
+
57
+ 15
58
+ 00:00:57,180 --> 00:00:59,350
59
+ So now let's talk about it install and setup.
60
+
61
+ 16
62
+ 00:00:59,370 --> 00:01:05,040
63
+ Now if you're using the visual machine with this already installed you don't have to go through this.
64
+
65
+ 17
66
+ 00:01:05,040 --> 00:01:08,600
67
+ However it's not that hard to do so I'm just gonna go through it step by step.
68
+
69
+ 18
70
+ 00:01:08,610 --> 00:01:13,020
71
+ I'm not going to do it with you because I really have it installed on my machine and I don't I think
72
+
73
+ 19
74
+ 00:01:13,020 --> 00:01:14,790
75
+ if I try to reinstall it I could mess things up.
76
+
77
+ 20
78
+ 00:01:15,360 --> 00:01:19,580
79
+ But this is how I did it documented everything and it worked perfectly.
80
+
81
+ 21
82
+ 00:01:19,590 --> 00:01:24,430
83
+ So what you do basically activate activated computer visual library.
84
+
85
+ 22
86
+ 00:01:24,690 --> 00:01:29,670
87
+ We're not going to we're not going to install a disk because they can have a lot of clashes with packages
88
+
89
+ 23
90
+ 00:01:29,700 --> 00:01:34,480
91
+ and libraries being you know messy with each other not mixing well.
92
+
93
+ 24
94
+ 00:01:34,590 --> 00:01:36,470
95
+ So let's clone this environment.
96
+
97
+ 25
98
+ 00:01:36,490 --> 00:01:44,820
99
+ So we go into the terminal and go condo create and let's call this TFT we named this environment that
100
+
101
+ 26
102
+ 00:01:45,070 --> 00:01:49,860
103
+ so in future when you want to activate it you just go source activate TFT and it's there.
104
+
105
+ 27
106
+ 00:01:50,560 --> 00:01:58,200
107
+ So anyway copy this line in your terminal and clone your directory your sorry CV environment and then
108
+
109
+ 28
110
+ 00:01:58,200 --> 00:02:06,070
111
+ run this line here suited apt get install put above compiler Python pipe Perl Python Alexa Mel and python.
112
+
113
+ 29
114
+ 00:02:06,120 --> 00:02:07,400
115
+ Okay okay.
116
+
117
+ 30
118
+ 00:02:07,530 --> 00:02:15,770
119
+ Then we do pip install system pip install context lib Jupiter mapped lib and then go back.
120
+
121
+ 31
122
+ 00:02:15,780 --> 00:02:21,930
123
+ So just go back to old man home directory make up for local models and then get clone.
124
+
125
+ 32
126
+ 00:02:22,110 --> 00:02:26,430
127
+ This basically from here from this getup.
128
+
129
+ 33
130
+ 00:02:26,510 --> 00:02:35,130
131
+ Link here fanciful models and go back again to directory and get cloned this as well and now as we go
132
+
133
+ 34
134
+ 00:02:35,130 --> 00:02:43,350
135
+ back here go into this directory cocoa API Python API and end to make this will compile and build some
136
+
137
+ 35
138
+ 00:02:43,350 --> 00:02:49,050
139
+ stuff that you need and basically just copy this line here and then decide comments here so don't actually
140
+
141
+ 36
142
+ 00:02:49,050 --> 00:02:50,850
143
+ run this in a terminal.
144
+
145
+ 37
146
+ 00:02:50,910 --> 00:02:57,240
147
+ So this is where we get the intensify models from so you use w get put above zip and you download this
148
+
149
+ 38
150
+ 00:02:57,240 --> 00:03:01,720
151
+ link here and unzip this file and then you can delete this file afterwards.
152
+
153
+ 39
154
+ 00:03:01,740 --> 00:03:02,450
155
+ It's fine.
156
+
157
+ 40
158
+ 00:03:02,760 --> 00:03:06,840
159
+ And then what you do you get to part that we need to get everything working.
160
+
161
+ 41
162
+ 00:03:06,840 --> 00:03:12,540
163
+ So we defined this part here and then we go to our protection bills and we run the tests and if the
164
+
165
+ 42
166
+ 00:03:12,540 --> 00:03:19,740
167
+ stuff's tests run run successfully by opening this file we will know if this install works correctly.
168
+
169
+ 43
170
+ 00:03:19,740 --> 00:03:21,410
171
+ So go ahead and try it on your own.
172
+
173
+ 44
174
+ 00:03:21,420 --> 00:03:27,930
175
+ See if it works if there's any problems don't hesitate to contact me and all the times things change.
176
+
177
+ 45
178
+ 00:03:27,960 --> 00:03:29,850
179
+ We did the updates.
180
+
181
+ 46
182
+ 00:03:29,850 --> 00:03:33,660
183
+ So maybe just check this link if this isn't work first before you contact me.
184
+
185
+ 47
186
+ 00:03:33,660 --> 00:03:34,870
187
+ See if there's anything here.
188
+
189
+ 48
190
+ 00:03:34,890 --> 00:03:37,620
191
+ Menu maybe a new vision or a new dependency.
192
+
193
+ 49
194
+ 00:03:37,620 --> 00:03:40,330
195
+ You never know.
196
+
197
+ 50
198
+ 00:03:40,500 --> 00:03:45,190
199
+ Running the demo so download the python file in this folder.
200
+
201
+ 51
202
+ 00:03:45,200 --> 00:03:50,420
203
+ I actually do have this file in the in the in my Python the book files here.
204
+
205
+ 52
206
+ 00:03:50,820 --> 00:03:56,540
207
+ However they do work you don't don't run them from there they're actually copy and paste them into this
208
+
209
+ 53
210
+ 00:03:56,540 --> 00:03:57,310
211
+ territory here.
212
+
213
+ 54
214
+ 00:03:57,660 --> 00:04:02,380
215
+ So let me go to our virtual machine and I'll show you exactly where to find us.
216
+
217
+ 55
218
+ 00:04:02,400 --> 00:04:06,560
219
+ Okay so we're back in a virtual machine and directory I want you to go to is.
220
+
221
+ 56
222
+ 00:04:06,750 --> 00:04:11,940
223
+ Remember I told you based on our presentation here I wanted you to go to models models research oblique
224
+
225
+ 57
226
+ 00:04:11,940 --> 00:04:17,070
227
+ detection and put this file here or resources file here that you don't want it.
228
+
229
+ 58
230
+ 00:04:17,160 --> 00:04:21,720
231
+ Oh it's actually stored in your folder as well and your notebooks folder.
232
+
233
+ 59
234
+ 00:04:22,230 --> 00:04:29,250
235
+ But anyhow let's go to the surgery and I'll show you where to put this file since models models research
236
+
237
+ 60
238
+ 00:04:29,400 --> 00:04:32,620
239
+ and object let's type it in that type of it.
240
+
241
+ 61
242
+ 00:04:34,540 --> 00:04:38,500
243
+ Object detection and you'll see it is a notebook somewhere.
244
+
245
+ 62
246
+ 00:04:39,280 --> 00:04:42,110
247
+ This one here that is a file even a run from no one.
248
+
249
+ 63
250
+ 00:04:42,160 --> 00:04:42,430
251
+ Okay.
252
+
253
+ 64
254
+ 00:04:43,060 --> 00:04:46,540
255
+ So in the next chapter we're going to run this file and go to the Detroit tutorial.
21. TensorFlow Object Detection API/2.1 Download the code (for those not using the Virtual Machine).html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://1drv.ms/u/s!AkTkTuTv8A66da5SgRE3zXCrtQA";</script>
21. TensorFlow Object Detection API/3. Experiment with a ResNet SSD on images, webcam and videos.srt ADDED
@@ -0,0 +1,471 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,650 --> 00:00:00,960
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,020 --> 00:00:07,680
7
+ So welcome to the 21 point to where we actually start playing with our SSD inside by using sari
8
+
9
+ 3
10
+ 00:00:11,660 --> 00:00:18,240
11
+ Hi welcome to Chapter 21 point to where we start experimenting with the object of action SSD based on
12
+
13
+ 4
14
+ 00:00:18,280 --> 00:00:23,080
15
+ resonate and we do this on images webcams and all of them feel it and videos.
16
+
17
+ 5
18
+ 00:00:23,100 --> 00:00:24,580
19
+ So let's get started.
20
+
21
+ 6
22
+ 00:00:25,170 --> 00:00:27,000
23
+ So now we're here in a virtual machine.
24
+
25
+ 7
26
+ 00:00:27,000 --> 00:00:32,000
27
+ And this project is going to be a bit different as we have done because we remember we created a new
28
+
29
+ 8
30
+ 00:00:32,010 --> 00:00:38,120
31
+ environment so we have to launch what I put in the books from a new environment so fiercely that school
32
+
33
+ 9
34
+ 00:00:38,130 --> 00:00:49,140
35
+ source caps lock is on source activate T.F. the API to get us to not use on the school.
36
+
37
+ 10
38
+ 00:00:49,140 --> 00:00:49,680
39
+ There we go.
40
+
41
+ 11
42
+ 00:00:49,860 --> 00:00:51,950
43
+ So that's a environment that we're in right now.
44
+
45
+ 12
46
+ 00:00:52,230 --> 00:00:52,850
47
+ So that's good.
48
+
49
+ 13
50
+ 00:00:52,900 --> 00:00:54,010
51
+ I will have to.
52
+
53
+ 14
54
+ 00:00:54,010 --> 00:00:58,450
55
+ This will probably be any name you name that previously before if you're not using my pre-installed
56
+
57
+ 15
58
+ 00:00:58,710 --> 00:01:00,020
59
+ with your machine.
60
+
61
+ 16
62
+ 00:01:00,180 --> 00:01:07,080
63
+ So I put iPod it's on the books and it brings up a point on the book browser of Jupiter.
64
+
65
+ 17
66
+ 00:01:07,560 --> 00:01:12,950
67
+ So now what I want us to go to is really there are two ways to get a set up.
68
+
69
+ 18
70
+ 00:01:13,040 --> 00:01:18,540
71
+ You're supposed to download and I put it in the book file in the resources file that file was basically
72
+
73
+ 19
74
+ 00:01:19,420 --> 00:01:20,090
75
+ this here.
76
+
77
+ 20
78
+ 00:01:20,240 --> 00:01:20,940
79
+ All right.
80
+
81
+ 21
82
+ 00:01:20,940 --> 00:01:26,130
83
+ However I left it in the artery here in case you wanted to manually copy and paste it into the directory.
84
+
85
+ 22
86
+ 00:01:26,130 --> 00:01:28,640
87
+ So let's actually go ahead and do that.
88
+
89
+ 23
90
+ 00:01:28,650 --> 00:01:38,940
91
+ So copy this file control Control-C and the doctor wants you to go to was the Saathiya models models.
92
+
93
+ 24
94
+ 00:01:39,450 --> 00:01:45,240
95
+ Research sorry and object detection and pissed that file in to here.
96
+
97
+ 25
98
+ 00:01:45,430 --> 00:01:46,160
99
+ OK.
100
+
101
+ 26
102
+ 00:01:46,470 --> 00:01:48,070
103
+ This one here is actually ulo.
104
+
105
+ 27
106
+ 00:01:48,180 --> 00:01:51,780
107
+ It's not actually going to work to paste it into here.
108
+
109
+ 28
110
+ 00:01:52,260 --> 00:01:59,220
111
+ So now let's go to a Titan notebook browser and find this directory.
112
+
113
+ 29
114
+ 00:01:59,520 --> 00:02:00,740
115
+ So find the file.
116
+
117
+ 30
118
+ 00:02:00,740 --> 00:02:01,670
119
+ I should say so.
120
+
121
+ 31
122
+ 00:02:01,680 --> 00:02:04,270
123
+ Good models research.
124
+
125
+ 32
126
+ 00:02:04,560 --> 00:02:07,630
127
+ Scroll down to object detection
128
+
129
+ 33
130
+ 00:02:09,980 --> 00:02:16,610
131
+ right here and let's launch this file.
132
+
133
+ 34
134
+ 00:02:16,610 --> 00:02:23,220
135
+ Ok so here we go this file here is actually not a file I created I just modified it slightly.
136
+
137
+ 35
138
+ 00:02:23,260 --> 00:02:28,900
139
+ This is a father comes intensive flows observation API and it allows you to basically play with the
140
+
141
+ 36
142
+ 00:02:28,900 --> 00:02:31,850
143
+ different features in it it's in the official Demel.
144
+
145
+ 37
146
+ 00:02:32,180 --> 00:02:33,770
147
+ So let's run the first box here.
148
+
149
+ 38
150
+ 00:02:33,780 --> 00:02:40,990
151
+ So imports this plot of stuff in line by the way in case I haven't mentioned it to you what this does
152
+
153
+ 39
154
+ 00:02:40,990 --> 00:02:47,200
155
+ is that it generates matplotlib Matlab plots inside a phone book as opposed to having it be like an
156
+
157
+ 40
158
+ 00:02:47,200 --> 00:02:49,080
159
+ open TV and a new window.
160
+
161
+ 41
162
+ 00:02:49,540 --> 00:02:52,650
163
+ So anyway that's duties imports here as well.
164
+
165
+ 42
166
+ 00:02:53,230 --> 00:02:56,340
167
+ And let's run this block here.
168
+
169
+ 43
170
+ 00:02:56,470 --> 00:02:59,830
171
+ These old directories would basically point to different models.
172
+
173
+ 44
174
+ 00:02:59,830 --> 00:03:01,230
175
+ This is the SSD.
176
+
177
+ 45
178
+ 00:03:01,240 --> 00:03:04,220
179
+ This is a resident SSD will be using that street and the cocoa.
180
+
181
+ 46
182
+ 00:03:04,420 --> 00:03:09,840
183
+ That's a common object data set and it's going to download it the first time if you didn't already have
184
+
185
+ 47
186
+ 00:03:09,840 --> 00:03:12,490
187
+ it saved is going to lose here.
188
+
189
+ 48
190
+ 00:03:12,490 --> 00:03:16,680
191
+ Actually I do have it saved so it should not download.
192
+
193
+ 49
194
+ 00:03:16,690 --> 00:03:21,960
195
+ I hope it is doing something so maybe it is downloading.
196
+
197
+ 50
198
+ 00:03:22,010 --> 00:03:25,750
199
+ So anyway let's run this box and wait for that to finish.
200
+
201
+ 51
202
+ 00:03:27,520 --> 00:03:37,780
203
+ And we load a little nap some help a code and then we do our detection boxes here and don't mind these
204
+
205
+ 52
206
+ 00:03:38,590 --> 00:03:41,260
207
+ red things that look like it's going to be at URL.
208
+
209
+ 53
210
+ 00:03:41,450 --> 00:03:42,750
211
+ This will still run.
212
+
213
+ 54
214
+ 00:03:42,790 --> 00:03:44,360
215
+ So it's fine.
216
+
217
+ 55
218
+ 00:03:44,430 --> 00:03:44,920
219
+ So right.
220
+
221
+ 56
222
+ 00:03:44,950 --> 00:03:47,010
223
+ These these boxes have from now.
224
+
225
+ 57
226
+ 00:03:47,320 --> 00:03:52,450
227
+ So let's do that we do this one and remember some of us do it again.
228
+
229
+ 58
230
+ 00:03:53,220 --> 00:03:54,370
231
+ Let's run this one.
232
+
233
+ 59
234
+ 00:03:54,550 --> 00:03:57,760
235
+ And that actually was not up wanted yet.
236
+
237
+ 60
238
+ 00:03:57,760 --> 00:03:58,720
239
+ This is what we want.
240
+
241
+ 61
242
+ 00:03:58,730 --> 00:04:04,020
243
+ So it goes through the images in a Test spot and it still takes a while to run.
244
+
245
+ 62
246
+ 00:04:04,480 --> 00:04:05,110
247
+ To be fair.
248
+
249
+ 63
250
+ 00:04:05,500 --> 00:04:11,980
251
+ And what it's going to do is basically it's going to take that image it found out and basically run
252
+
253
+ 64
254
+ 00:04:11,980 --> 00:04:17,340
255
+ all of these SSD functions that require it to classify detect objects here.
256
+
257
+ 65
258
+ 00:04:17,590 --> 00:04:18,590
259
+ So here we go.
260
+
261
+ 66
262
+ 00:04:18,730 --> 00:04:26,440
263
+ So the first first test image it picked up this is a dog you can see it clearly says this woman.
264
+
265
+ 67
266
+ 00:04:26,690 --> 00:04:30,740
267
+ I'm pressing control and moving my mouse button and we can see it's a dog here.
268
+
269
+ 68
270
+ 00:04:30,770 --> 00:04:32,080
271
+ And you can see the probabilities.
272
+
273
+ 69
274
+ 00:04:32,260 --> 00:04:33,310
275
+ It's a bit hard to make out.
276
+
277
+ 70
278
+ 00:04:33,310 --> 00:04:37,720
279
+ Maybe we can actually change some parameters here to make this a little more legible.
280
+
281
+ 71
282
+ 00:04:37,870 --> 00:04:38,560
283
+ Insightful.
284
+
285
+ 72
286
+ 00:04:38,550 --> 00:04:39,790
287
+ I applied in the book.
288
+
289
+ 73
290
+ 00:04:40,080 --> 00:04:41,500
291
+ It's a dog here and a dog here.
292
+
293
+ 74
294
+ 00:04:41,530 --> 00:04:43,720
295
+ So it's quite good.
296
+
297
+ 75
298
+ 00:04:43,720 --> 00:04:45,790
299
+ This is the image I use in my presentation slide.
300
+
301
+ 76
302
+ 00:04:45,970 --> 00:04:51,940
303
+ You can see this is a kite kite kite person person and these are the probabilities which I can read
304
+
305
+ 77
306
+ 00:04:52,180 --> 00:04:55,260
307
+ it looks like 60 tree just looks like a hundred.
308
+
309
+ 78
310
+ 00:04:55,270 --> 00:04:57,400
311
+ But what it is it is.
312
+
313
+ 79
314
+ 00:04:57,400 --> 00:05:03,010
315
+ So now let's try it on a webcam and I'm pretty much looking like a bit of a mess right now because it's
316
+
317
+ 80
318
+ 00:05:03,010 --> 00:05:03,630
319
+ quite late.
320
+
321
+ 81
322
+ 00:05:03,640 --> 00:05:07,710
323
+ And I have not comb my hair for a while but I'm sober to try this.
324
+
325
+ 82
326
+ 00:05:07,720 --> 00:05:12,840
327
+ So let's run this webcam should come on any second now.
328
+
329
+ 83
330
+ 00:05:15,680 --> 00:05:15,940
331
+ All right.
332
+
333
+ 84
334
+ 00:05:15,980 --> 00:05:18,470
335
+ This is me in my natural element here.
336
+
337
+ 85
338
+ 00:05:18,680 --> 00:05:26,040
339
+ And you can see that actually went up T-shirt on ice free advertising for Apple.
340
+
341
+ 86
342
+ 00:05:26,060 --> 00:05:30,520
343
+ So this is me here and this is the person box that's affecting me right now.
344
+
345
+ 87
346
+ 00:05:30,560 --> 00:05:31,780
347
+ So this is actually pretty cool.
348
+
349
+ 88
350
+ 00:05:31,820 --> 00:05:37,640
351
+ So let me just close this and no let's try it out on the video.
352
+
353
+ 89
354
+ 00:05:37,650 --> 00:05:40,880
355
+ So this is a dash cam video I downloaded off YouTube.
356
+
357
+ 90
358
+ 00:05:40,960 --> 00:05:43,130
359
+ And so let's it it
360
+
361
+ 91
362
+ 00:05:49,000 --> 00:05:50,170
363
+ it sometimes takes a while to load.
364
+
365
+ 92
366
+ 00:05:50,170 --> 00:05:51,850
367
+ Oh there we go.
368
+
369
+ 93
370
+ 00:05:52,720 --> 00:05:54,630
371
+ So this is a tip.
372
+
373
+ 94
374
+ 00:05:54,850 --> 00:05:56,150
375
+ So this is pretty cool.
376
+
377
+ 95
378
+ 00:05:56,230 --> 00:05:58,190
379
+ If I if I do say so.
380
+
381
+ 96
382
+ 00:05:58,750 --> 00:06:05,170
383
+ So we're running it here detecting imprisons cause I think we just saw a bike from a mistaken call again
384
+
385
+ 97
386
+ 00:06:06,610 --> 00:06:08,090
387
+ and the frame rate isn't that bad.
388
+
389
+ 98
390
+ 00:06:08,170 --> 00:06:12,010
391
+ Honestly it being on a C.P. you not a GP.
392
+
393
+ 99
394
+ 00:06:12,280 --> 00:06:14,210
395
+ This is actually pretty sick.
396
+
397
+ 100
398
+ 00:06:18,900 --> 00:06:20,460
399
+ So let's close this video now.
400
+
401
+ 101
402
+ 00:06:21,560 --> 00:06:27,150
403
+ So you've just run and experimented with SSD single shot detector.
404
+
405
+ 102
406
+ 00:06:27,550 --> 00:06:29,090
407
+ So I hope you found this chapter fun.
408
+
409
+ 103
410
+ 00:06:29,090 --> 00:06:32,660
411
+ I found it quite fun to play with this as well.
412
+
413
+ 104
414
+ 00:06:32,660 --> 00:06:35,680
415
+ What you can do is put any video you want here.
416
+
417
+ 105
418
+ 00:06:36,150 --> 00:06:39,840
419
+ Another dashcam video as well and any images you want.
420
+
421
+ 106
422
+ 00:06:39,860 --> 00:06:45,590
423
+ We'll go there to see what fall they look at Image Pat.
424
+
425
+ 107
426
+ 00:06:45,600 --> 00:06:52,170
427
+ Basically find fine image that is obviously not defined there.
428
+
429
+ 108
430
+ 00:06:53,410 --> 00:06:54,450
431
+ Keep going.
432
+
433
+ 109
434
+ 00:06:56,320 --> 00:06:57,510
435
+ Testament.
436
+
437
+ 110
438
+ 00:06:57,720 --> 00:07:03,960
439
+ It is as if it's about OK syntactical test images.
440
+
441
+ 111
442
+ 00:07:05,850 --> 00:07:06,590
443
+ This one here.
444
+
445
+ 112
446
+ 00:07:07,020 --> 00:07:08,700
447
+ So this is the directory we're looking at.
448
+
449
+ 113
450
+ 00:07:08,710 --> 00:07:10,650
451
+ We had what else images in it.
452
+
453
+ 114
454
+ 00:07:10,700 --> 00:07:12,010
455
+ I'm not sure what's in this file.
456
+
457
+ 115
458
+ 00:07:12,040 --> 00:07:16,950
459
+ I guess it's a source file for you want to put your source for images to sometimes.
460
+
461
+ 116
462
+ 00:07:17,020 --> 00:07:18,300
463
+ So this is pretty cool.
464
+
465
+ 117
466
+ 00:07:18,530 --> 00:07:23,420
467
+ So you can experiment with a web cam with your test images and watch your videos.
468
+
469
+ 118
470
+ 00:07:23,520 --> 00:07:23,730
471
+ OK.
21. TensorFlow Object Detection API/4. How to Train a TFOD Model.srt ADDED
@@ -0,0 +1,503 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,700 --> 00:00:01,040
3
+ OK.
4
+
5
+ 2
6
+ 00:00:01,050 --> 00:00:03,310
7
+ So welcome to twenty one point three.
8
+
9
+ 3
10
+ 00:00:03,510 --> 00:00:09,060
11
+ Well actually talk about and tell you how to go about creating a custom tensor for up to detection module
12
+
13
+ 4
14
+ 00:00:09,260 --> 00:00:10,220
15
+ model.
16
+
17
+ 5
18
+ 00:00:10,890 --> 00:00:17,490
19
+ So training the tens of the action the training process of tensile flow of detection is honestly a bit
20
+
21
+ 6
22
+ 00:00:17,490 --> 00:00:23,970
23
+ messy and this is probably one of the best of the detection Well most mature object libraries on the
24
+
25
+ 7
26
+ 00:00:23,970 --> 00:00:25,600
27
+ market today market.
28
+
29
+ 8
30
+ 00:00:25,640 --> 00:00:33,540
31
+ But an open source market you can say and it's explained fairly well it actually is bit tricky to use.
32
+
33
+ 9
34
+ 00:00:34,020 --> 00:00:36,530
35
+ So this is a step I broke down from looking at it.
36
+
37
+ 10
38
+ 00:00:36,570 --> 00:00:40,310
39
+ So first we prepare a data set in the T.F. record format.
40
+
41
+ 11
42
+ 00:00:40,380 --> 00:00:41,440
43
+ That's a specific record.
44
+
45
+ 12
46
+ 00:00:41,460 --> 00:00:43,500
47
+ I'll show you in the next few slides.
48
+
49
+ 13
50
+ 00:00:43,950 --> 00:00:48,570
51
+ Then we need to create a class label file that's a dog P.B. text file.
52
+
53
+ 14
54
+ 00:00:48,570 --> 00:00:53,760
55
+ Then we need to download a pre-trained Kokoo model and we need to set up the correct file that actually
56
+
57
+ 15
58
+ 00:00:53,760 --> 00:00:58,490
59
+ structure configured the object action pipeline and then start training.
60
+
61
+ 16
62
+ 00:00:58,500 --> 00:01:00,690
63
+ So let's go step by step.
64
+
65
+ 17
66
+ 00:01:00,720 --> 00:01:03,220
67
+ So what does it T.F. record format.
68
+
69
+ 18
70
+ 00:01:03,270 --> 00:01:07,650
71
+ So it tends to flow because the action expects so it treating and test data to be in this T.F. or the
72
+
73
+ 19
74
+ 00:01:07,650 --> 00:01:09,220
75
+ school record format.
76
+
77
+ 20
78
+ 00:01:09,300 --> 00:01:11,460
79
+ This is pretty much what it looks like here.
80
+
81
+ 21
82
+ 00:01:11,910 --> 00:01:17,220
83
+ Luckily though we can convert existing data sets that's like in the past SQL View see data set which
84
+
85
+ 22
86
+ 00:01:17,220 --> 00:01:22,260
87
+ is stored in Exod. Mosha directly to ATF records file by using the script.
88
+
89
+ 23
90
+ 00:01:22,290 --> 00:01:24,300
91
+ They provide this for the here.
92
+
93
+ 24
94
+ 00:01:24,690 --> 00:01:26,790
95
+ So it's quite myself.
96
+
97
+ 25
98
+ 00:01:27,510 --> 00:01:30,030
99
+ And no we talk about a classily files.
100
+
101
+ 26
102
+ 00:01:30,030 --> 00:01:32,910
103
+ So basically this is what the castable file looks like.
104
+
105
+ 27
106
+ 00:01:33,850 --> 00:01:41,350
107
+ It's basically just on a dictionary Shukria where we have the IDs and label names.
108
+
109
+ 28
110
+ 00:01:41,350 --> 00:01:44,660
111
+ So that's pretty much what we need to do with redefining.
112
+
113
+ 29
114
+ 00:01:44,830 --> 00:01:50,560
115
+ If we're training a detector to detect let's say London underground tube signs you'll just put the name
116
+
117
+ 30
118
+ 00:01:50,560 --> 00:01:56,740
119
+ of the class here and it's I.D. and you keep adding more and more items all objects here.
120
+
121
+ 31
122
+ 00:02:00,660 --> 00:02:03,510
123
+ So now we have to use pre-trained model.
124
+
125
+ 32
126
+ 00:02:03,840 --> 00:02:04,250
127
+ OK.
128
+
129
+ 33
130
+ 00:02:04,380 --> 00:02:11,860
131
+ So we don't train this model and this is like the resonant model that we use in the previous chapter.
132
+
133
+ 34
134
+ 00:02:12,240 --> 00:02:13,890
135
+ So we don't have one here.
136
+
137
+ 35
138
+ 00:02:13,920 --> 00:02:20,490
139
+ So tensile flow who has several pre-treat models and one Ko-Ko and Coko is basically a large scale object
140
+
141
+ 36
142
+ 00:02:20,490 --> 00:02:26,020
143
+ protection segmentation and captioning data set and basically has a lot of features here.
144
+
145
+ 37
146
+ 00:02:26,370 --> 00:02:31,320
147
+ You can go to the Web site this Cocco decision it and you will find it.
148
+
149
+ 38
150
+ 00:02:31,900 --> 00:02:33,950
151
+ And so you don't live with the models here.
152
+
153
+ 39
154
+ 00:02:34,230 --> 00:02:38,620
155
+ This link carries you to all the models that are available on tensor.
156
+
157
+ 40
158
+ 00:02:39,030 --> 00:02:44,560
159
+ And basically just use it downloaded onto mental health and on target here.
160
+
161
+ 41
162
+ 00:02:44,730 --> 00:02:46,470
163
+ So that's what we do.
164
+
165
+ 42
166
+ 00:02:47,100 --> 00:02:50,180
167
+ And this is a list of all the models of healable here.
168
+
169
+ 43
170
+ 00:02:50,280 --> 00:02:55,620
171
+ It gives the speed and the map scope which is quite useful helps you choose which model is most appropriate
172
+
173
+ 44
174
+ 00:02:55,620 --> 00:02:56,590
175
+ for your application.
176
+
177
+ 45
178
+ 00:02:58,530 --> 00:03:05,310
179
+ So now we get to configure object detection pipeline said abjection pipeline configuration file is composed
180
+
181
+ 46
182
+ 00:03:05,310 --> 00:03:07,780
183
+ of five sections.
184
+
185
+ 47
186
+ 00:03:07,800 --> 00:03:12,610
187
+ Basically we have a model that we define here to configuration here.
188
+
189
+ 48
190
+ 00:03:13,110 --> 00:03:15,310
191
+ Then we have the training config here.
192
+
193
+ 49
194
+ 00:03:15,480 --> 00:03:23,350
195
+ So we add this through here and the in part Real talk about soon the valuation config and evaluation
196
+
197
+ 50
198
+ 00:03:23,370 --> 00:03:24,090
199
+ in podrida.
200
+
201
+ 51
202
+ 00:03:24,090 --> 00:03:26,760
203
+ So let's take a look at this file in more detail.
204
+
205
+ 52
206
+ 00:03:26,790 --> 00:03:31,400
207
+ So this is a sample of the model file the model section here.
208
+
209
+ 53
210
+ 00:03:31,890 --> 00:03:34,900
211
+ What it looks like is this is the model config.
212
+
213
+ 54
214
+ 00:03:35,130 --> 00:03:35,720
215
+ I should say.
216
+
217
+ 55
218
+ 00:03:35,820 --> 00:03:36,340
219
+ OK.
220
+
221
+ 56
222
+ 00:03:36,690 --> 00:03:42,330
223
+ So we basically what we have to note is that we have some templates already that we can use inside of
224
+
225
+ 57
226
+ 00:03:42,470 --> 00:03:43,240
227
+ of flow.
228
+
229
+ 58
230
+ 00:03:43,530 --> 00:03:49,530
231
+ So we just basically make sure that all classes match up to the classes that enable custom are being
232
+
233
+ 59
234
+ 00:03:49,530 --> 00:03:50,910
235
+ attacked to Dexter.
236
+
237
+ 60
238
+ 00:03:51,420 --> 00:03:53,850
239
+ So this one was taken from the president one on one.
240
+
241
+ 61
242
+ 00:03:54,480 --> 00:03:56,940
243
+ And those Treen in the Pascrell VRC dataset.
244
+
245
+ 62
246
+ 00:03:57,330 --> 00:04:01,950
247
+ So as I said you don't need to rewrite this file just edit to one belonging to the preacher and model
248
+
249
+ 63
250
+ 00:04:02,070 --> 00:04:06,750
251
+ to be using Dymo defines the model defines all unnecessary.
252
+
253
+ 64
254
+ 00:04:06,850 --> 00:04:15,300
255
+ Our Or our CNN and as the parameters and when using pretreated models as best we leave this configuration
256
+
257
+ 65
258
+ 00:04:15,360 --> 00:04:19,170
259
+ configuration file unchanged unchanged just a bit.
260
+
261
+ 66
262
+ 00:04:19,170 --> 00:04:26,320
263
+ This red box here with the classes and notices that trade input Greedo and the yvel config and involved
264
+
265
+ 67
266
+ 00:04:26,360 --> 00:04:27,900
267
+ in podrida sections.
268
+
269
+ 68
270
+ 00:04:27,900 --> 00:04:31,990
271
+ So this is what we have to try and change these in red.
272
+
273
+ 69
274
+ 00:04:32,070 --> 00:04:33,840
275
+ Basically directory mappings.
276
+
277
+ 70
278
+ 00:04:33,840 --> 00:04:38,010
279
+ So we have to make sure that they're actually correct and pointing to the correct files that you want
280
+
281
+ 71
282
+ 00:04:38,010 --> 00:04:38,780
283
+ to use.
284
+
285
+ 72
286
+ 00:04:38,820 --> 00:04:40,740
287
+ This will be your label file.
288
+
289
+ 73
290
+ 00:04:40,800 --> 00:04:44,440
291
+ This will be your record file you're treating UTF record file.
292
+
293
+ 74
294
+ 00:04:44,670 --> 00:04:46,270
295
+ This is what disappoints do here.
296
+
297
+ 75
298
+ 00:04:46,500 --> 00:04:50,930
299
+ So for validation and for record for training and for validation.
300
+
301
+ 76
302
+ 00:04:51,510 --> 00:04:53,990
303
+ And this is Leavell parts of boat as well.
304
+
305
+ 77
306
+ 00:04:54,110 --> 00:04:54,680
307
+ OK.
308
+
309
+ 78
310
+ 00:04:54,910 --> 00:04:59,450
311
+ So remember the labels are the same ideas will be the same it's the same file.
312
+
313
+ 79
314
+ 00:04:59,890 --> 00:05:00,350
315
+ OK.
316
+
317
+ 80
318
+ 00:05:04,360 --> 00:05:06,870
319
+ So now here's the directory structure of the project.
320
+
321
+ 81
322
+ 00:05:06,910 --> 00:05:09,560
323
+ So this is how and where we put our files.
324
+
325
+ 82
326
+ 00:05:09,850 --> 00:05:14,860
327
+ So the PBX ex-felon label file that goes inside a directory called data.
328
+
329
+ 83
330
+ 00:05:15,250 --> 00:05:22,270
331
+ Then we have a record files which is a train and evaluation of record files or validation whatever you
332
+
333
+ 84
334
+ 00:05:22,270 --> 00:05:24,230
335
+ want to call it same thing.
336
+
337
+ 85
338
+ 00:05:24,640 --> 00:05:29,820
339
+ And then we have a new directory here models and then we have subdirectory model here.
340
+
341
+ 86
342
+ 00:05:30,160 --> 00:05:32,360
343
+ This is where we put our pipeline config file.
344
+
345
+ 87
346
+ 00:05:32,500 --> 00:05:33,990
347
+ That's this file here.
348
+
349
+ 88
350
+ 00:05:34,390 --> 00:05:40,330
351
+ And then we just have the trained directory and the evaluation directory here as well under the model
352
+
353
+ 89
354
+ 00:05:40,660 --> 00:05:46,340
355
+ inside the model form which is inside of the models for the here.
356
+
357
+ 90
358
+ 00:05:46,350 --> 00:05:48,820
359
+ So now this is how we start the training process.
360
+
361
+ 91
362
+ 00:05:48,890 --> 00:05:56,810
363
+ So we go to basically terminal and we just copy this line of code here and make sure to code the lines
364
+
365
+ 92
366
+ 00:05:56,810 --> 00:05:57,970
367
+ in a read here.
368
+
369
+ 93
370
+ 00:05:58,280 --> 00:06:03,090
371
+ These correspond to the model you are using and the directory that we just created here.
372
+
373
+ 94
374
+ 00:06:03,260 --> 00:06:03,710
375
+ OK.
376
+
377
+ 95
378
+ 00:06:03,890 --> 00:06:10,010
379
+ This is the day that actually it needs to be pointing to and it doesn't do that.
380
+
381
+ 96
382
+ 00:06:10,010 --> 00:06:16,200
383
+ And then you can actually bring up tents board to monitor your treating progress which is pretty cool.
384
+
385
+ 97
386
+ 00:06:16,700 --> 00:06:18,770
387
+ It's going to be district directory here.
388
+
389
+ 98
390
+ 00:06:19,030 --> 00:06:22,230
391
+ That again does your data directly at the specified here.
392
+
393
+ 99
394
+ 00:06:25,340 --> 00:06:30,590
395
+ And then before that I should have mentioned this earlier but it's important when you're labeling it
396
+
397
+ 100
398
+ 00:06:30,590 --> 00:06:35,450
399
+ images use a software that actually produces it in the correct format.
400
+
401
+ 101
402
+ 00:06:35,450 --> 00:06:38,120
403
+ So this is how we use annotations here.
404
+
405
+ 102
406
+ 00:06:38,150 --> 00:06:43,710
407
+ So this is my wife with my dog Samuel and software we use as a label.
408
+
409
+ 103
410
+ 00:06:43,800 --> 00:06:50,190
411
+ AMG I think this is two elves they have here this really label I am the label image obviously.
412
+
413
+ 104
414
+ 00:06:50,540 --> 00:06:52,200
415
+ So download it if you want to do it.
416
+
417
+ 105
418
+ 00:06:52,220 --> 00:06:55,910
419
+ It's available for Windows Mac and Linux.
420
+
421
+ 106
422
+ 00:06:55,910 --> 00:07:01,690
423
+ And this is the format that Pascal VRC Exham a format that we use now.
424
+
425
+ 107
426
+ 00:07:01,820 --> 00:07:09,380
427
+ It is not what we used us but we generate the image anti-Chavez in this format using the software.
428
+
429
+ 108
430
+ 00:07:09,410 --> 00:07:11,160
431
+ It actually does it automatically for you.
432
+
433
+ 109
434
+ 00:07:11,630 --> 00:07:17,420
435
+ And we can use our tents for a script I mentioned earlier to convert this file directly back to the
436
+
437
+ 110
438
+ 00:07:17,420 --> 00:07:20,170
439
+ text of the record files.
440
+
441
+ 111
442
+ 00:07:20,270 --> 00:07:21,910
443
+ So this is a summary here.
444
+
445
+ 112
446
+ 00:07:22,010 --> 00:07:24,890
447
+ We didn't do a full project here for the following reasons.
448
+
449
+ 113
450
+ 00:07:25,160 --> 00:07:30,390
451
+ Training and SSTO even a faster are CNN One is you is very impractical.
452
+
453
+ 114
454
+ 00:07:30,440 --> 00:07:31,630
455
+ It is going to take forever.
456
+
457
+ 115
458
+ 00:07:31,700 --> 00:07:38,840
459
+ So you definitely need a GP or a cloud to use to effectively train this A.F. all the data sets are there
460
+
461
+ 116
462
+ 00:07:38,870 --> 00:07:46,190
463
+ huge there are quite a few gigs of storage and also to setting up a GPU when a local system is a nightmare
464
+
465
+ 117
466
+ 00:07:46,190 --> 00:07:46,790
467
+ sometimes.
468
+
469
+ 118
470
+ 00:07:46,880 --> 00:07:53,660
471
+ It's a very scary task but once you get it working it's good you feel very happy because it's so much
472
+
473
+ 119
474
+ 00:07:53,660 --> 00:07:54,950
475
+ faster.
476
+
477
+ 120
478
+ 00:07:55,620 --> 00:07:58,150
479
+ So I've of all time general steps here.
480
+
481
+ 121
482
+ 00:07:58,220 --> 00:08:05,040
483
+ The Trina's model there are also some good tutorials I found online that do this as well.
484
+
485
+ 122
486
+ 00:08:05,090 --> 00:08:10,430
487
+ Basically try to make it as simple as possible if you going through all the steps telling you what to
488
+
489
+ 123
490
+ 00:08:10,430 --> 00:08:13,210
491
+ pay attention to and what's important.
492
+
493
+ 124
494
+ 00:08:13,220 --> 00:08:16,880
495
+ I've actually tried this on my system as well so I know what works.
496
+
497
+ 125
498
+ 00:08:16,910 --> 00:08:23,150
499
+ So I wish you all the best of luck when making your own object to actual text detectors.
500
+
501
+ 126
502
+ 00:08:23,300 --> 00:08:23,610
503
+ Thank you.
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/1. Chapter Introduction.srt ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,280 --> 00:00:06,920
3
+ Hi and welcome to Chapter 22 where we look at object detection using yellow vision tree and dark blue.
4
+
5
+ 2
6
+ 00:00:06,970 --> 00:00:09,680
7
+ So this section is built up into tree parts.
8
+
9
+ 3
10
+ 00:00:09,690 --> 00:00:16,440
11
+ Firstly we get you up and running by installing yellow dock nets and outflow then we start experimenting
12
+
13
+ 4
14
+ 00:00:16,440 --> 00:00:24,750
15
+ with real and still images webcam feed and videos and then we build on a yellow object detector and
16
+
17
+ 5
18
+ 00:00:24,750 --> 00:00:28,350
19
+ we're going to detect London Underground signs in our project.
20
+
21
+ 6
22
+ 00:00:28,740 --> 00:00:30,200
23
+ So let's get started.
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2. Setting up and install Yolo DarkNet and DarkFlow.srt ADDED
@@ -0,0 +1,363 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,380 --> 00:00:06,570
3
+ You say hi welcome to Chapter 22 point one where we get to Yale and Dartmouth and Da'ath set up and
4
+
5
+ 2
6
+ 00:00:06,570 --> 00:00:07,770
7
+ installed.
8
+
9
+ 3
10
+ 00:00:08,340 --> 00:00:10,010
11
+ So just to remind you.
12
+
13
+ 4
14
+ 00:00:10,010 --> 00:00:14,100
15
+ Yellow stands for you only look once and it's a pretty awesome object doctor.
16
+
17
+ 5
18
+ 00:00:14,370 --> 00:00:20,190
19
+ You can take a look at the Web site here it looks a bit shady but it's very reputable and very very
20
+
21
+ 6
22
+ 00:00:20,190 --> 00:00:20,520
23
+ good.
24
+
25
+ 7
26
+ 00:00:20,610 --> 00:00:21,800
27
+ It's very good technology.
28
+
29
+ 8
30
+ 00:00:21,810 --> 00:00:22,860
31
+ Those guys developed.
32
+
33
+ 9
34
+ 00:00:23,100 --> 00:00:25,790
35
+ So basically we have to install dot at first.
36
+
37
+ 10
38
+ 00:00:25,800 --> 00:00:27,170
39
+ And what does darknet.
40
+
41
+ 11
42
+ 00:00:27,300 --> 00:00:29,530
43
+ It's the official name for the EULA framework.
44
+
45
+ 12
46
+ 00:00:29,550 --> 00:00:32,360
47
+ I mean these guys are kind of awesome should read somebody's papers.
48
+
49
+ 13
50
+ 00:00:32,380 --> 00:00:36,120
51
+ They're very entertaining and very informative as well.
52
+
53
+ 14
54
+ 00:00:36,120 --> 00:00:41,760
55
+ So to get this installed that Cisco to we go back to our terminal here.
56
+
57
+ 15
58
+ 00:00:42,000 --> 00:00:44,080
59
+ This is all commands we enter into minal.
60
+
61
+ 16
62
+ 00:00:44,430 --> 00:00:48,940
63
+ We stay in a home directory and we make a folder called dot net.
64
+
65
+ 17
66
+ 00:00:49,110 --> 00:00:51,270
67
+ We get Clun repository here.
68
+
69
+ 18
70
+ 00:00:51,660 --> 00:00:58,170
71
+ And then we go back we go into this net folder and we go to make and then we use this file to get the
72
+
73
+ 19
74
+ 00:00:58,170 --> 00:00:59,130
75
+ weights here.
76
+
77
+ 20
78
+ 00:00:59,400 --> 00:01:05,190
79
+ And then we just run this line in blue to executed on the test image that they've provided in one of
80
+
81
+ 21
82
+ 00:01:05,230 --> 00:01:10,330
83
+ their sample their trees and in their dark directory you'll see a file called predictions.
84
+
85
+ 22
86
+ 00:01:10,500 --> 00:01:13,890
87
+ And this will be the output from the test file.
88
+
89
+ 23
90
+ 00:01:13,920 --> 00:01:21,130
91
+ So when you run this line here I said you basically run this line and blue copy and paste it.
92
+
93
+ 24
94
+ 00:01:21,210 --> 00:01:22,130
95
+ This is what you will see.
96
+
97
+ 25
98
+ 00:01:22,120 --> 00:01:25,430
99
+ Takes about maybe 10 seconds to run a little to model it for us.
100
+
101
+ 26
102
+ 00:01:25,500 --> 00:01:29,470
103
+ We just see the output of the model here and it basically went in.
104
+
105
+ 27
106
+ 00:01:29,490 --> 00:01:34,530
107
+ Once it's done you'll see this I didn't paste it felt all the way in because it would but it would have
108
+
109
+ 28
110
+ 00:01:34,530 --> 00:01:35,470
111
+ been too small.
112
+
113
+ 29
114
+ 00:01:35,880 --> 00:01:42,210
115
+ But you see this at the end it's a probability object probabilities found here.
116
+
117
+ 30
118
+ 00:01:42,930 --> 00:01:44,000
119
+ And that's pretty cool.
120
+
121
+ 31
122
+ 00:01:44,010 --> 00:01:46,300
123
+ So you know what to find this file now.
124
+
125
+ 32
126
+ 00:01:46,380 --> 00:01:50,390
127
+ It's basically a file in the directory called predictions.
128
+
129
+ 33
130
+ 00:01:50,640 --> 00:01:53,310
131
+ And this is it can enter any sample image you want.
132
+
133
+ 34
134
+ 00:01:53,550 --> 00:02:01,280
135
+ Basically it files to test files a document data and you can this is a self-hater recently when it's
136
+
137
+ 35
138
+ 00:02:01,280 --> 00:02:02,470
139
+ over my friend.
140
+
141
+ 36
142
+ 00:02:02,640 --> 00:02:05,140
143
+ And you can just try it quite easily.
144
+
145
+ 37
146
+ 00:02:05,310 --> 00:02:05,790
147
+ You can try.
148
+
149
+ 38
150
+ 00:02:05,970 --> 00:02:08,760
151
+ Was your half horses anything you want.
152
+
153
+ 39
154
+ 00:02:08,760 --> 00:02:12,430
155
+ So feel have fun playing with the yellow.
156
+
157
+ 40
158
+ 00:02:13,230 --> 00:02:15,820
159
+ So but what else can we do from you.
160
+
161
+ 41
162
+ 00:02:16,020 --> 00:02:17,970
163
+ Was using yellow from the command line.
164
+
165
+ 42
166
+ 00:02:18,240 --> 00:02:20,790
167
+ But can we use it inside of Pitre.
168
+
169
+ 43
170
+ 00:02:21,060 --> 00:02:23,300
171
+ Well yes we can with Dulfer.
172
+
173
+ 44
174
+ 00:02:23,580 --> 00:02:26,040
175
+ And I'll now introduce you to darf flow in the next chapter.
176
+
177
+ 45
178
+ 00:02:27,780 --> 00:02:28,080
179
+ OK.
180
+
181
+ 46
182
+ 00:02:28,080 --> 00:02:32,320
183
+ So just to show you guys how this works without actually showing you images.
184
+
185
+ 47
186
+ 00:02:32,370 --> 00:02:35,550
187
+ We're going to actually run this in our machine.
188
+
189
+ 48
190
+ 00:02:35,640 --> 00:02:36,210
191
+ OK.
192
+
193
+ 49
194
+ 00:02:36,510 --> 00:02:40,570
195
+ So somebody said we have to go to type in here.
196
+
197
+ 50
198
+ 00:02:40,810 --> 00:02:42,670
199
+ Dot dot net Right.
200
+
201
+ 51
202
+ 00:02:42,690 --> 00:02:43,090
203
+ Yep.
204
+
205
+ 52
206
+ 00:02:43,170 --> 00:02:44,940
207
+ See the dot net.
208
+
209
+ 53
210
+ 00:02:44,980 --> 00:02:49,430
211
+ So actually I didn't see what they call it actually yeah it is.
212
+
213
+ 54
214
+ 00:02:49,430 --> 00:02:50,300
215
+ You don't know it.
216
+
217
+ 55
218
+ 00:02:50,310 --> 00:02:51,080
219
+ Why didn't it
220
+
221
+ 56
222
+ 00:02:54,890 --> 00:02:55,350
223
+ go.
224
+
225
+ 57
226
+ 00:02:55,770 --> 00:02:58,040
227
+ So we can we're the here.
228
+
229
+ 58
230
+ 00:02:58,050 --> 00:03:05,940
231
+ So now let's just go to the outline and I may have done that a bit too quickly because it actually pressed
232
+
233
+ 59
234
+ 00:03:06,020 --> 00:03:08,680
235
+ into soon as I pasted that line.
236
+
237
+ 60
238
+ 00:03:08,700 --> 00:03:10,890
239
+ This takes about 10 seconds or so to load the models.
240
+
241
+ 61
242
+ 00:03:10,900 --> 00:03:12,150
243
+ It's going to be done quickly.
244
+
245
+ 62
246
+ 00:03:12,390 --> 00:03:17,170
247
+ So now it's loading the weights and it's done and now with what it's going to do it's going to classify
248
+
249
+ 63
250
+ 00:03:17,170 --> 00:03:18,390
251
+ that test image.
252
+
253
+ 64
254
+ 00:03:18,630 --> 00:03:20,730
255
+ So let's go to dinner directory here.
256
+
257
+ 65
258
+ 00:03:21,010 --> 00:03:21,640
259
+ All right.
260
+
261
+ 66
262
+ 00:03:21,720 --> 00:03:23,600
263
+ So that's going to be in the Dot Net directory.
264
+
265
+ 67
266
+ 00:03:23,640 --> 00:03:27,090
267
+ And this was the output here that we're going to generate.
268
+
269
+ 68
270
+ 00:03:27,150 --> 00:03:30,840
271
+ And let's see if that was probably my previously saved image.
272
+
273
+ 69
274
+ 00:03:31,020 --> 00:03:31,850
275
+ Yep it was.
276
+
277
+ 70
278
+ 00:03:31,850 --> 00:03:34,630
279
+ It is going to make a new image soon.
280
+
281
+ 71
282
+ 00:03:34,770 --> 00:03:35,490
283
+ So let's wait
284
+
285
+ 72
286
+ 00:03:39,510 --> 00:03:47,620
287
+ spot usually took last time I think about 25 or so seconds when the time that search should be done
288
+
289
+ 73
290
+ 00:03:47,710 --> 00:03:48,850
291
+ soon.
292
+
293
+ 74
294
+ 00:03:48,850 --> 00:03:49,930
295
+ There we go.
296
+
297
+ 75
298
+ 00:03:49,930 --> 00:03:53,140
299
+ It took six seconds and probably have something running in the background.
300
+
301
+ 76
302
+ 00:03:53,440 --> 00:03:54,500
303
+ So here we go.
304
+
305
+ 77
306
+ 00:03:54,550 --> 00:04:00,420
307
+ So this is the test image here that it updated its very nicely and neatly labeled here.
308
+
309
+ 78
310
+ 00:04:00,760 --> 00:04:05,310
311
+ So if you wanted to do more as images go to your data directory here.
312
+
313
+ 79
314
+ 00:04:05,620 --> 00:04:10,960
315
+ And this is actually where I have the self-pay used in the test and the presentation I should say.
316
+
317
+ 80
318
+ 00:04:11,200 --> 00:04:15,720
319
+ So let's try this one here called person so let's see how that works.
320
+
321
+ 81
322
+ 00:04:15,730 --> 00:04:17,650
323
+ So let's go back to this line.
324
+
325
+ 82
326
+ 00:04:17,710 --> 00:04:19,410
327
+ However we don't use dog.
328
+
329
+ 83
330
+ 00:04:19,480 --> 00:04:22,750
331
+ We use person.
332
+
333
+ 84
334
+ 00:04:22,790 --> 00:04:33,860
335
+ So again the little the model when it first folded the slides for you guys no.
336
+
337
+ 85
338
+ 00:04:34,130 --> 00:04:35,520
339
+ All right there we go.
340
+
341
+ 86
342
+ 00:04:35,900 --> 00:04:37,840
343
+ So this is the up it up to file here.
344
+
345
+ 87
346
+ 00:04:38,330 --> 00:04:42,170
347
+ So let's go back to a dock and a directory and look at our predictions file.
348
+
349
+ 88
350
+ 00:04:42,410 --> 00:04:43,520
351
+ And this is pretty cool.
352
+
353
+ 89
354
+ 00:04:43,550 --> 00:04:48,200
355
+ We have a person a horse and a dog all accurately labeled and very neatly done too.
356
+
357
+ 90
358
+ 00:04:48,530 --> 00:04:51,290
359
+ So yellow is pretty awesome as you can see.
360
+
361
+ 91
362
+ 00:04:51,590 --> 00:04:56,390
363
+ So I encourage you to experiment with doing pictures and get the hang of playing with yellow.
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2.1 Guide to the MacOS Install.html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://gist.github.com/simonw/0f93bec220be9cf8250533b603bf6dba";</script>
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/2.2 Download the YOLO files (if not using the VM).html ADDED
@@ -0,0 +1 @@
 
 
1
+ <script type="text/javascript">window.location = "https://1drv.ms/u/s!AkTkTuTv8A66dAvPsd9zbDSYLeI";</script>
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/3. Experiment with YOLO on still images, webcam and videos.srt ADDED
@@ -0,0 +1,547 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,470 --> 00:00:06,750
3
+ I will come to Chapter 22 point to where we talk about dark blue and dark blue is basically how we can
4
+
5
+ 2
6
+ 00:00:06,750 --> 00:00:13,150
7
+ interface with Ulo inside of plate and run some cool tests on images videos and draw web cam as well.
8
+
9
+ 3
10
+ 00:00:14,240 --> 00:00:20,150
11
+ So basically this is actually is a Jeff of basically being using the video.
12
+
13
+ 4
14
+ 00:00:20,150 --> 00:00:21,360
15
+ It's pretty cool.
16
+
17
+ 5
18
+ 00:00:21,380 --> 00:00:22,990
19
+ Hope it's distracting you right now.
20
+
21
+ 6
22
+ 00:00:23,360 --> 00:00:27,850
23
+ But basically this is how we actually create the story and create.
24
+
25
+ 7
26
+ 00:00:27,890 --> 00:00:35,360
27
+ But this is how we actually set up the environment for Ulo Darfur I should say as the install isn't
28
+
29
+ 8
30
+ 00:00:35,360 --> 00:00:35,620
31
+ dead.
32
+
33
+ 9
34
+ 00:00:35,630 --> 00:00:38,950
35
+ Hard to do just a bunch of things we have to do.
36
+
37
+ 10
38
+ 00:00:39,200 --> 00:00:44,490
39
+ So what I've done here basically have created a new environment using Anaconda.
40
+
41
+ 11
42
+ 00:00:44,510 --> 00:00:46,030
43
+ Basically you can call whatever you want.
44
+
45
+ 12
46
+ 00:00:46,030 --> 00:00:51,830
47
+ I've used I use it as my tensor flew with the environment previously so you can call it that if you
48
+
49
+ 13
50
+ 00:00:51,830 --> 00:00:55,520
51
+ want to call it you know anything you want to do.
52
+
53
+ 14
54
+ 00:00:55,520 --> 00:00:56,930
55
+ It's already installed right now.
56
+
57
+ 15
58
+ 00:00:56,930 --> 00:01:02,300
59
+ So if you're using this virtual machine which I hope we do which I hope you don't want it you don't
60
+
61
+ 16
62
+ 00:01:02,300 --> 00:01:06,360
63
+ have to do this and tedious install again but it doesn't actually take that long.
64
+
65
+ 17
66
+ 00:01:06,410 --> 00:01:12,020
67
+ So just run these commands line by line in the terminal and it should be fine.
68
+
69
+ 18
70
+ 00:01:12,150 --> 00:01:12,660
71
+ OK.
72
+
73
+ 19
74
+ 00:01:13,220 --> 00:01:14,130
75
+ Except for this one.
76
+
77
+ 20
78
+ 00:01:14,180 --> 00:01:18,050
79
+ Basically you're going to have to create the environment manually on your own but that's not too hard
80
+
81
+ 21
82
+ 00:01:18,050 --> 00:01:18,340
83
+ to do.
84
+
85
+ 22
86
+ 00:01:18,340 --> 00:01:21,780
87
+ So sort of duck flow install.
88
+
89
+ 23
90
+ 00:01:21,790 --> 00:01:28,130
91
+ So now go see a terminal and basically start with your home territory in the terminal and make it an
92
+
93
+ 24
94
+ 00:01:28,140 --> 00:01:29,160
95
+ article of flow.
96
+
97
+ 25
98
+ 00:01:29,160 --> 00:01:36,690
99
+ Go into it install site and basically clone this repository here go to that for the Pipp install and
100
+
101
+ 26
102
+ 00:01:36,690 --> 00:01:43,950
103
+ do this business all talk by the way and go get this get the weights here and then basically start to
104
+
105
+ 27
106
+ 00:01:44,010 --> 00:01:48,070
107
+ this surgery hip to see if she can.
108
+
109
+ 28
110
+ 00:01:48,160 --> 00:01:50,860
111
+ And now let's mess around with the all in Python.
112
+
113
+ 29
114
+ 00:01:50,900 --> 00:01:57,460
115
+ So now he can actually import duckbilled thought in that build and import from the starchy import this
116
+
117
+ 30
118
+ 00:01:57,670 --> 00:01:58,660
119
+ function here.
120
+
121
+ 31
122
+ 00:01:59,110 --> 00:02:04,570
123
+ So no I'm going to go back to this and we're going to go to our vision machine and actually use this
124
+
125
+ 32
126
+ 00:02:04,570 --> 00:02:05,070
127
+ now.
128
+
129
+ 33
130
+ 00:02:05,370 --> 00:02:05,680
131
+ OK.
132
+
133
+ 34
134
+ 00:02:05,710 --> 00:02:11,830
135
+ So from what I play on the browser you don't go into the deepening C-v territory you go into awfuller
136
+
137
+ 35
138
+ 00:02:12,520 --> 00:02:17,840
139
+ and go to the Darfur master directory here and you'll see a file called a tutorial.
140
+
141
+ 36
142
+ 00:02:18,100 --> 00:02:19,520
143
+ Open this notebook here.
144
+
145
+ 37
146
+ 00:02:19,570 --> 00:02:22,360
147
+ This is why I compiled you guys.
148
+
149
+ 38
150
+ 00:02:22,680 --> 00:02:25,850
151
+ And basically this is how we used offline side of Pitre.
152
+
153
+ 39
154
+ 00:02:26,250 --> 00:02:29,840
155
+ So firstly let's run this block of code.
156
+
157
+ 40
158
+ 00:02:30,000 --> 00:02:37,020
159
+ What this does here this actually loads our model All right using tensor Flo's T.F. net and all this
160
+
161
+ 41
162
+ 00:02:37,020 --> 00:02:40,590
163
+ stuff we actually load the model we want from the CFQ directory.
164
+
165
+ 42
166
+ 00:02:40,680 --> 00:02:42,190
167
+ We load the so we want.
168
+
169
+ 43
170
+ 00:02:42,420 --> 00:02:43,720
171
+ We set up trouble.
172
+
173
+ 44
174
+ 00:02:43,980 --> 00:02:49,770
175
+ And if you're using a GPS you can specify AGP one however you you use false since this is a virtual
176
+
177
+ 45
178
+ 00:02:49,770 --> 00:02:52,810
179
+ machine and we don't have access to you from here.
180
+
181
+ 46
182
+ 00:02:52,810 --> 00:02:53,380
183
+ All right.
184
+
185
+ 47
186
+ 00:02:53,460 --> 00:02:58,570
187
+ And then we passed these options that we created here to RTFM that class object.
188
+
189
+ 48
190
+ 00:02:58,630 --> 00:02:59,600
191
+ Right.
192
+
193
+ 49
194
+ 00:03:00,420 --> 00:03:06,750
195
+ So you can see it basically updated the model here loaded everything happening successfully around and
196
+
197
+ 50
198
+ 00:03:06,750 --> 00:03:08,060
199
+ we now have a model.
200
+
201
+ 51
202
+ 00:03:08,370 --> 00:03:15,150
203
+ So now what we need to do is convert our open C-v BGR image to R.G. format.
204
+
205
+ 52
206
+ 00:03:15,150 --> 00:03:21,990
207
+ This is something that's pretty annoying about open C-v with loads images and b g r by default.
208
+
209
+ 53
210
+ 00:03:22,010 --> 00:03:26,160
211
+ RGV said it see if you can be actually made a function that can get it for us.
212
+
213
+ 54
214
+ 00:03:26,160 --> 00:03:27,340
215
+ So it's not a big deal.
216
+
217
+ 55
218
+ 00:03:27,660 --> 00:03:30,600
219
+ So we're running this on our sample horse's image.
220
+
221
+ 56
222
+ 00:03:30,600 --> 00:03:32,850
223
+ And let's go and see what that image looks like.
224
+
225
+ 57
226
+ 00:03:32,860 --> 00:03:38,100
227
+ Go to darf flu go to think of a sample image here.
228
+
229
+ 58
230
+ 00:03:38,430 --> 00:03:41,760
231
+ And I believe the one we called with horses or sample horses.
232
+
233
+ 59
234
+ 00:03:41,760 --> 00:03:42,540
235
+ This one.
236
+
237
+ 60
238
+ 00:03:42,870 --> 00:03:50,780
239
+ So now we've run this image inside of our vitamin book and we got basically some bounding boxes here
240
+
241
+ 61
242
+ 00:03:51,360 --> 00:03:57,410
243
+ and some confidence scores and a label but we don't actually have an image.
244
+
245
+ 62
246
+ 00:03:57,450 --> 00:04:03,190
247
+ So that's the play our results using open C-v Tado.
248
+
249
+ 63
250
+ 00:04:03,190 --> 00:04:05,380
251
+ So no this is actually pretty cool no.
252
+
253
+ 64
254
+ 00:04:05,620 --> 00:04:11,200
255
+ So we've displayed at our bounding boxes here using up and see if we would do labels.
256
+
257
+ 65
258
+ 00:04:11,440 --> 00:04:13,450
259
+ So this is pretty neat.
260
+
261
+ 66
262
+ 00:04:13,450 --> 00:04:16,080
263
+ So we can encapsulate that function.
264
+
265
+ 67
266
+ 00:04:16,120 --> 00:04:19,470
267
+ That's actually the function we call here display results.
268
+
269
+ 68
270
+ 00:04:19,480 --> 00:04:21,290
271
+ I'm actually using this I'm not entirely sure.
272
+
273
+ 69
274
+ 00:04:21,330 --> 00:04:23,170
275
+ I think I was using it.
276
+
277
+ 70
278
+ 00:04:23,170 --> 00:04:24,290
279
+ No I'm not using it.
280
+
281
+ 71
282
+ 00:04:24,340 --> 00:04:28,430
283
+ It's pretty much pointless unless you are missing it down here.
284
+
285
+ 72
286
+ 00:04:28,510 --> 00:04:30,530
287
+ Sorry my mistake.
288
+
289
+ 73
290
+ 00:04:30,780 --> 00:04:32,320
291
+ Happened to me to me a lot.
292
+
293
+ 74
294
+ 00:04:32,320 --> 00:04:35,970
295
+ Read my own code wrong so much.
296
+
297
+ 75
298
+ 00:04:36,080 --> 00:04:41,630
299
+ I think something is wrong with me sometimes anyhow so this is a code here to load it to run your little
300
+
301
+ 76
302
+ 00:04:41,680 --> 00:04:42,970
303
+ troll webcam.
304
+
305
+ 77
306
+ 00:04:42,970 --> 00:04:46,100
307
+ So we basically load them all exactly the same way we did before.
308
+
309
+ 78
310
+ 00:04:46,390 --> 00:04:49,320
311
+ We actually don't need to do it twice but it's part of.
312
+
313
+ 79
314
+ 00:04:49,320 --> 00:04:55,140
315
+ I just posted it here in case you wanted to run this separately from this on top and we re-initialize
316
+
317
+ 80
318
+ 00:04:55,140 --> 00:05:00,310
319
+ a web cam here from open TV and we run this so it's going to take a while to run out of what 10 seconds
320
+
321
+ 81
322
+ 00:05:00,370 --> 00:05:01,740
323
+ seven seconds before.
324
+
325
+ 82
326
+ 00:05:02,140 --> 00:05:05,910
327
+ And now our webcam for him is going to pop up here shortly.
328
+
329
+ 83
330
+ 00:05:07,340 --> 00:05:08,810
331
+ Well no results.
332
+
333
+ 84
334
+ 00:05:08,830 --> 00:05:09,660
335
+ It's not defined.
336
+
337
+ 85
338
+ 00:05:09,730 --> 00:05:11,590
339
+ That's quite funny.
340
+
341
+ 86
342
+ 00:05:11,620 --> 00:05:13,500
343
+ So let's do this again sorry.
344
+
345
+ 87
346
+ 00:05:13,530 --> 00:05:14,200
347
+ By the way it's
348
+
349
+ 88
350
+ 00:05:18,120 --> 00:05:23,580
351
+ now one thing you're going to notice is that it actually runs very slowly on a super UBS system.
352
+
353
+ 89
354
+ 00:05:23,670 --> 00:05:29,180
355
+ The frame rate is nowhere near as good as SSD.
356
+
357
+ 90
358
+ 00:05:29,290 --> 00:05:34,420
359
+ Any minute any second now I should say webcam for him is going to pop up here.
360
+
361
+ 91
362
+ 00:05:45,750 --> 00:05:50,540
363
+ We have so you have a webcam image.
364
+
365
+ 92
366
+ 00:05:50,540 --> 00:05:54,410
367
+ Right now it doesn't look smooth at all.
368
+
369
+ 93
370
+ 00:05:54,410 --> 00:06:00,770
371
+ And we see multiple boxes here now the multiple box problem could be is actually because of the trouble
372
+
373
+ 94
374
+ 00:06:00,770 --> 00:06:04,960
375
+ we set if we set this close system because it's very slow.
376
+
377
+ 95
378
+ 00:06:05,030 --> 00:06:08,290
379
+ If we said attritional And actually I don't have the Trishul parameter here.
380
+
381
+ 96
382
+ 00:06:08,570 --> 00:06:12,550
383
+ So let's go back to the top and set Trishul here.
384
+
385
+ 97
386
+ 00:06:14,110 --> 00:06:15,640
387
+ And DUDAS here.
388
+
389
+ 98
390
+ 00:06:16,790 --> 00:06:17,200
391
+ All right.
392
+
393
+ 99
394
+ 00:06:17,240 --> 00:06:18,720
395
+ You can send us to point five.
396
+
397
+ 100
398
+ 00:06:18,710 --> 00:06:22,570
399
+ You'll actually see much less banging boxes on that image.
400
+
401
+ 101
402
+ 00:06:22,580 --> 00:06:27,480
403
+ So now let's run Ulo basically on video.
404
+
405
+ 102
406
+ 00:06:29,740 --> 00:06:31,400
407
+ See how it looks.
408
+
409
+ 103
410
+ 00:06:31,430 --> 00:06:31,720
411
+ OK.
412
+
413
+ 104
414
+ 00:06:31,760 --> 00:06:33,140
415
+ May have seen this in my entire video.
416
+
417
+ 105
418
+ 00:06:33,140 --> 00:06:38,890
419
+ So this is it running frame by frame on an elephant on the couch and a person sometimes.
420
+
421
+ 106
422
+ 00:06:38,900 --> 00:06:40,820
423
+ So let's stop this for now.
424
+
425
+ 107
426
+ 00:06:41,300 --> 00:06:47,010
427
+ And what we can do if you want to experiment is let's actually load this model.
428
+
429
+ 108
430
+ 00:06:47,100 --> 00:06:49,160
431
+ All right separately though.
432
+
433
+ 109
434
+ 00:06:49,550 --> 00:06:55,430
435
+ So let's actually we can just run it loaded from up here and it's said the Trishul two point six.
436
+
437
+ 110
438
+ 00:06:55,430 --> 00:06:55,850
439
+ All right.
440
+
441
+ 111
442
+ 00:06:58,450 --> 00:07:02,880
443
+ Doesn't take that long to do maybe about 10 seconds to run.
444
+
445
+ 112
446
+ 00:07:02,880 --> 00:07:03,890
447
+ There we go it's done.
448
+
449
+ 113
450
+ 00:07:03,960 --> 00:07:07,690
451
+ And now let's go back to this video.
452
+
453
+ 114
454
+ 00:07:09,790 --> 00:07:10,660
455
+ There we go.
456
+
457
+ 115
458
+ 00:07:10,660 --> 00:07:11,920
459
+ And you should see definitely.
460
+
461
+ 116
462
+ 00:07:11,920 --> 00:07:12,780
463
+ There we go.
464
+
465
+ 117
466
+ 00:07:12,790 --> 00:07:14,200
467
+ It's just one box.
468
+
469
+ 118
470
+ 00:07:14,200 --> 00:07:16,570
471
+ And this is actually for me it isn't about.
472
+
473
+ 119
474
+ 00:07:16,890 --> 00:07:17,970
475
+ And actually no disappear.
476
+
477
+ 120
478
+ 00:07:17,980 --> 00:07:23,260
479
+ So maybe that threshold is too high but you can see we only have one bounding box here and it's getting
480
+
481
+ 121
482
+ 00:07:23,430 --> 00:07:27,480
483
+ it as an elephant right all the time except when it disappears.
484
+
485
+ 122
486
+ 00:07:27,700 --> 00:07:29,310
487
+ So this is pretty sick.
488
+
489
+ 123
490
+ 00:07:30,400 --> 00:07:32,330
491
+ So let's close up now.
492
+
493
+ 124
494
+ 00:07:32,690 --> 00:07:37,200
495
+ So that concludes only yellow a tutorial.
496
+
497
+ 125
498
+ 00:07:37,520 --> 00:07:39,710
499
+ These are the staunch options out of left here.
500
+
501
+ 126
502
+ 00:07:40,070 --> 00:07:45,410
503
+ You may have seen me scrolling back and forth trying to figure out why discoid wasn't working.
504
+
505
+ 127
506
+ 00:07:45,590 --> 00:07:52,130
507
+ What happens in open sea is that if something opens a webcam and then this code it crashes inside like
508
+
509
+ 128
510
+ 00:07:52,130 --> 00:07:54,500
511
+ this could disappear results were not found.
512
+
513
+ 129
514
+ 00:07:54,830 --> 00:07:57,480
515
+ What happens is that we now need to run this line here.
516
+
517
+ 130
518
+ 00:07:57,740 --> 00:08:00,280
519
+ These two lines to be released a webcam.
520
+
521
+ 131
522
+ 00:08:00,320 --> 00:08:05,210
523
+ So what happened is that when I try to initiate a webcam again it just got stuck and it had to go to
524
+
525
+ 132
526
+ 00:08:05,210 --> 00:08:08,020
527
+ kernel and restart the notebook and wait bit.
528
+
529
+ 133
530
+ 00:08:08,350 --> 00:08:13,820
531
+ So it's good knowledge to know when you're playing with this and maybe something you mess up something.
532
+
533
+ 134
534
+ 00:08:13,820 --> 00:08:14,790
535
+ You know what to do.
536
+
537
+ 135
538
+ 00:08:15,050 --> 00:08:21,790
539
+ Just run this line to basically recapture your webcam.
540
+
541
+ 136
542
+ 00:08:21,820 --> 00:08:26,060
543
+ All right so later on we're going to see how to actually make and model.
544
+
545
+ 137
546
+ 00:08:26,070 --> 00:08:28,640
547
+ So this is a model we were going to make in the next section.
22. Object Detection with YOLO & Darkflow Build a London Underground Sign Detector/4. Build your own YOLO Object Detector - Detecting London Underground Signs.srt ADDED
@@ -0,0 +1,1011 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 1
2
+ 00:00:00,580 --> 00:00:01,190
3
+ I guess.
4
+
5
+ 2
6
+ 00:00:01,200 --> 00:00:08,970
7
+ Hi and welcome back to Chapter 22 point tree where we're about to build on our very own customized yellow
8
+
9
+ 3
10
+ 00:00:09,060 --> 00:00:10,180
11
+ object detector.
12
+
13
+ 4
14
+ 00:00:10,590 --> 00:00:15,980
15
+ This one is going to detect London Underground signs so let's get started and see how we do this.
16
+
17
+ 5
18
+ 00:00:15,990 --> 00:00:17,380
19
+ So just something to note.
20
+
21
+ 6
22
+ 00:00:17,380 --> 00:00:22,620
23
+ Firstly doing this without a GP who is going to be very slow and we're not going to actually make a
24
+
25
+ 7
26
+ 00:00:22,620 --> 00:00:27,100
27
+ very good object the doctor is going to have fit quite a bit.
28
+
29
+ 8
30
+ 00:00:27,330 --> 00:00:31,890
31
+ And also because I'm teaching from my virtual box I don't have access to my GP so we don't have a choice
32
+
33
+ 9
34
+ 00:00:31,920 --> 00:00:38,610
35
+ but to use OCP you and also the Trina's using a GP who doesn't require a different set up than anything
36
+
37
+ 10
38
+ 00:00:38,730 --> 00:00:39,480
39
+ we're doing here.
40
+
41
+ 11
42
+ 00:00:39,750 --> 00:00:43,640
43
+ Just a few additional commands but I'll show it to you in this slide.
44
+
45
+ 12
46
+ 00:00:43,650 --> 00:00:44,570
47
+ OK.
48
+
49
+ 13
50
+ 00:00:45,460 --> 00:00:53,310
51
+ So first of all to make a custom image doctor you basically have to create your own custom data set.
52
+
53
+ 14
54
+ 00:00:53,340 --> 00:00:58,880
55
+ So this software called label image which is found here available for Windows Mac and Linux.
56
+
57
+ 15
58
+ 00:00:58,920 --> 00:01:00,480
59
+ It's quite easy to use.
60
+
61
+ 16
62
+ 00:01:00,510 --> 00:01:06,600
63
+ And what do you have to do is basically set select a form that will open a directory that you want to
64
+
65
+ 17
66
+ 00:01:06,600 --> 00:01:07,680
67
+ use.
68
+
69
+ 18
70
+ 00:01:07,680 --> 00:01:15,780
71
+ This is the directory I used here for my TEFL images and you basically said to format Pascal VRC and
72
+
73
+ 19
74
+ 00:01:15,810 --> 00:01:18,010
75
+ basically you just drag and drop.
76
+
77
+ 20
78
+ 00:01:18,060 --> 00:01:19,860
79
+ Actually I can show it to you right now.
80
+
81
+ 21
82
+ 00:01:19,860 --> 00:01:22,980
83
+ Let's quickly go to label image here.
84
+
85
+ 22
86
+ 00:01:22,980 --> 00:01:23,850
87
+ All right.
88
+
89
+ 23
90
+ 00:01:24,180 --> 00:01:27,960
91
+ So this is actually some images here.
92
+
93
+ 24
94
+ 00:01:28,200 --> 00:01:29,210
95
+ So let's see.
96
+
97
+ 25
98
+ 00:01:29,400 --> 00:01:35,220
99
+ Let's clear this one is to leave it and let's delete it and actually label some images here and let's
100
+
101
+ 26
102
+ 00:01:35,230 --> 00:01:38,050
103
+ make sure we're all with it afterward.
104
+
105
+ 27
106
+ 00:01:38,280 --> 00:01:40,850
107
+ Let's create a box here.
108
+
109
+ 28
110
+ 00:01:41,760 --> 00:01:42,050
111
+ All right.
112
+
113
+ 29
114
+ 00:01:42,090 --> 00:01:44,960
115
+ And let's call this London Underground.
116
+
117
+ 30
118
+ 00:01:45,060 --> 00:01:49,560
119
+ And now once we do that it's going to be saved as our class here.
120
+
121
+ 31
122
+ 00:01:49,980 --> 00:01:50,700
123
+ All right.
124
+
125
+ 32
126
+ 00:01:50,700 --> 00:01:51,880
127
+ So that's pretty cool.
128
+
129
+ 33
130
+ 00:01:52,080 --> 00:01:56,740
131
+ What I want to tell you is make sure the format is set on Pascrell VRC not Iolo.
132
+
133
+ 34
134
+ 00:01:56,970 --> 00:01:57,910
135
+ And that's odd.
136
+
137
+ 35
138
+ 00:01:58,140 --> 00:02:02,470
139
+ But you'll actually use to use your banalities as possible the A C.
140
+
141
+ 36
142
+ 00:02:02,580 --> 00:02:03,910
143
+ So let's create another box.
144
+
145
+ 37
146
+ 00:02:03,910 --> 00:02:08,050
147
+ Here is how we do a second box an image and press OK.
148
+
149
+ 38
150
+ 00:02:08,370 --> 00:02:10,950
151
+ So now we have those two boxes here.
152
+
153
+ 39
154
+ 00:02:11,310 --> 00:02:14,050
155
+ And all we have to do is just complete this.
156
+
157
+ 40
158
+ 00:02:14,050 --> 00:02:19,610
159
+ See if it saves a TD or a tree here in Exham or format usually you want to save it.
160
+
161
+ 41
162
+ 00:02:19,680 --> 00:02:26,380
163
+ You can do this manually all use a script to rename it properly but call it like 0 0 1 x amount.
164
+
165
+ 42
166
+ 00:02:27,120 --> 00:02:28,170
167
+ And there we go.
168
+
169
+ 43
170
+ 00:02:28,470 --> 00:02:32,340
171
+ So let's go back to our presentation.
172
+
173
+ 44
174
+ 00:02:32,340 --> 00:02:40,490
175
+ So this is what we do so for and I've actually got 100 images of London underground tube sign in various
176
+
177
+ 45
178
+ 00:02:40,940 --> 00:02:42,390
179
+ parts of London.
180
+
181
+ 46
182
+ 00:02:42,500 --> 00:02:45,540
183
+ That's what you're going to have to do even to have to get images yourself.
184
+
185
+ 47
186
+ 00:02:45,550 --> 00:02:48,340
187
+ It is true web scraper or image Skipworth sorry.
188
+
189
+ 48
190
+ 00:02:48,680 --> 00:02:53,000
191
+ Or just manually using a google images and finding them on your own.
192
+
193
+ 49
194
+ 00:02:53,030 --> 00:02:55,640
195
+ So there are some things I learned the hard way.
196
+
197
+ 50
198
+ 00:02:55,640 --> 00:03:02,300
199
+ Make sure your images names are labeled like 001 much like some random digits like I did here because
200
+
201
+ 51
202
+ 00:03:02,330 --> 00:03:06,580
203
+ you're going to have to mentally adjust them afterward so you can easily do a script to fix it.
204
+
205
+ 52
206
+ 00:03:06,590 --> 00:03:08,070
207
+ But I didn't know at the time.
208
+
209
+ 53
210
+ 00:03:08,150 --> 00:03:11,090
211
+ So I actually had to relabel my images twice.
212
+
213
+ 54
214
+ 00:03:11,120 --> 00:03:11,840
215
+ It was a bit tedious.
216
+
217
+ 55
218
+ 00:03:11,840 --> 00:03:12,750
219
+ It didn't have to do that.
220
+
221
+ 56
222
+ 00:03:12,770 --> 00:03:17,210
223
+ But it was easier because it just it was a hundred images it was easier than doing a script to do it.
224
+
225
+ 57
226
+ 00:03:17,210 --> 00:03:18,840
227
+ At that point in time.
228
+
229
+ 58
230
+ 00:03:19,130 --> 00:03:22,490
231
+ And then we use the saved files in Pascal format.
232
+
233
+ 59
234
+ 00:03:22,550 --> 00:03:23,910
235
+ Just remember that.
236
+
237
+ 60
238
+ 00:03:24,090 --> 00:03:30,570
239
+ Because you went out in Jess's and just the dog axonal files so what else to know.
240
+
241
+ 61
242
+ 00:03:30,670 --> 00:03:31,160
243
+ OK.
244
+
245
+ 62
246
+ 00:03:31,450 --> 00:03:37,210
247
+ So we need to set up this file structure here and I'll go to a visual machine very shortly to show either
248
+
249
+ 63
250
+ 00:03:37,450 --> 00:03:38,770
251
+ the file structure.
252
+
253
+ 64
254
+ 00:03:38,770 --> 00:03:41,500
255
+ Actually I can do it to you now but let me discuss it first.
256
+
257
+ 65
258
+ 00:03:41,710 --> 00:03:42,060
259
+ OK.
260
+
261
+ 66
262
+ 00:03:42,160 --> 00:03:48,600
263
+ So this ensures I will tell you why this is important enough to will again if you don't do it this way.
264
+
265
+ 67
266
+ 00:03:48,790 --> 00:03:55,570
267
+ But it will come in handy in the end mainly because when the actual files have the folder name that
268
+
269
+ 68
270
+ 00:03:55,630 --> 00:03:59,880
271
+ it's in the image name and basically the pat to the file.
272
+
273
+ 69
274
+ 00:04:00,130 --> 00:04:08,740
275
+ So I did this in Windows initially and what I had to do was to go into make a Python script to examine
276
+
277
+ 70
278
+ 00:04:08,740 --> 00:04:14,860
279
+ this XML file and basically make all the changes for me manually or probably a blow the scripts if you
280
+
281
+ 71
282
+ 00:04:14,860 --> 00:04:21,150
283
+ want afterward I'll clean it up a bit but it was not fun to do to correct these mistakes.
284
+
285
+ 72
286
+ 00:04:21,250 --> 00:04:24,010
287
+ So let me go to the virtual machine now.
288
+
289
+ 73
290
+ 00:04:24,390 --> 00:04:25,100
291
+ It's here.
292
+
293
+ 74
294
+ 00:04:26,440 --> 00:04:28,850
295
+ And let's find this territory.
296
+
297
+ 75
298
+ 00:04:29,350 --> 00:04:33,120
299
+ Actually for Call it actually I was looking at now.
300
+
301
+ 76
302
+ 00:04:33,340 --> 00:04:43,010
303
+ So no I believe it was in top nets here.
304
+
305
+ 77
306
+ 00:04:44,480 --> 00:04:46,800
307
+ Actually I mean distric dark flow.
308
+
309
+ 78
310
+ 00:04:47,060 --> 00:04:51,850
311
+ So definitely darknet Darfur Doxil master Treen images annotations.
312
+
313
+ 79
314
+ 00:05:02,560 --> 00:05:06,770
315
+ OK so ignore all of these files I have in the search area.
316
+
317
+ 80
318
+ 00:05:06,860 --> 00:05:09,840
319
+ These were just a backup copy I made of them.
320
+
321
+ 81
322
+ 00:05:09,830 --> 00:05:12,800
323
+ So annotations these are the files are important.
324
+
325
+ 82
326
+ 00:05:12,810 --> 00:05:21,080
327
+ Now what's funny is that usually you train these doctors with hundreds of thousands of images.
328
+
329
+ 83
330
+ 00:05:21,290 --> 00:05:25,220
331
+ I actually trained them only using 5 images.
332
+
333
+ 84
334
+ 00:05:25,220 --> 00:05:26,470
335
+ Now I'll tell you why.
336
+
337
+ 85
338
+ 00:05:26,720 --> 00:05:28,150
339
+ That's because I'm using a spear.
340
+
341
+ 86
342
+ 00:05:28,460 --> 00:05:34,120
343
+ And every time I tried to go to maybe six seven images it would crash during training.
344
+
345
+ 87
346
+ 00:05:34,190 --> 00:05:38,320
347
+ So I actually actually had to do a lot of trial and error to get the thing to work right.
348
+
349
+ 88
350
+ 00:05:38,330 --> 00:05:42,520
351
+ So basically I spent a lot of time collecting images that I never used.
352
+
353
+ 89
354
+ 00:05:42,740 --> 00:05:48,030
355
+ So it's a bit sad but that's OK at least we've learned from my mistake.
356
+
357
+ 90
358
+ 00:05:48,050 --> 00:05:50,320
359
+ So let's take a look at this XML file.
360
+
361
+ 91
362
+ 00:05:50,420 --> 00:05:55,620
363
+ So as you can see we have annotations annotations being the name of the folder that it's in.
364
+
365
+ 92
366
+ 00:05:55,760 --> 00:05:57,670
367
+ We have the file name here.
368
+
369
+ 93
370
+ 00:05:57,950 --> 00:06:01,130
371
+ This final game here corresponds to the GOP.
372
+
373
+ 94
374
+ 00:06:01,140 --> 00:06:02,260
375
+ Finally I'm here.
376
+
377
+ 95
378
+ 00:06:02,750 --> 00:06:03,630
379
+ Sorry.
380
+
381
+ 96
382
+ 00:06:03,670 --> 00:06:04,420
383
+ Images.
384
+
385
+ 97
386
+ 00:06:04,670 --> 00:06:06,750
387
+ Finally I'm here OK let's go back to it.
388
+
389
+ 98
390
+ 00:06:06,770 --> 00:06:08,630
391
+ Get it she added.
392
+
393
+ 99
394
+ 00:06:09,140 --> 00:06:14,260
395
+ And other thing to note is that it has a part of the image name here as well.
396
+
397
+ 100
398
+ 00:06:14,360 --> 00:06:18,120
399
+ So you're actually the training gasifier training module.
400
+
401
+ 101
402
+ 00:06:18,140 --> 00:06:20,720
403
+ Any low actually looks at these file names here.
404
+
405
+ 102
406
+ 00:06:20,810 --> 00:06:23,360
407
+ If you have a mistake here it is not going to work.
408
+
409
+ 103
410
+ 00:06:23,490 --> 00:06:23,770
411
+ OK.
412
+
413
+ 104
414
+ 00:06:23,840 --> 00:06:25,760
415
+ So it's one thing to know.
416
+
417
+ 105
418
+ 00:06:25,760 --> 00:06:28,430
419
+ So let's go back to this here.
420
+
421
+ 106
422
+ 00:06:28,550 --> 00:06:33,300
423
+ So this was a directory I told you about just in case you were wondering.
424
+
425
+ 107
426
+ 00:06:33,320 --> 00:06:36,180
427
+ So yeah I just mentioned this this needs to be corrected.
428
+
429
+ 108
430
+ 00:06:37,540 --> 00:06:40,410
431
+ And now we need to go to see if you tertiary.
432
+
433
+ 109
434
+ 00:06:40,660 --> 00:06:47,200
435
+ And we need to go to DCF to see if Sorry lower to find to see if you'll file and copy this file and
436
+
437
+ 110
438
+ 00:06:47,200 --> 00:06:48,090
439
+ rename it.
440
+
441
+ 111
442
+ 00:06:48,130 --> 00:06:52,440
443
+ So since we're using a one class you can call it whatever you want but we can decide whether you'll
444
+
445
+ 112
446
+ 00:06:52,550 --> 00:06:54,340
447
+ under school one.
448
+
449
+ 113
450
+ 00:06:54,400 --> 00:06:56,490
451
+ So let's see what that file looks like now.
452
+
453
+ 114
454
+ 00:07:01,080 --> 00:07:02,310
455
+ Let's go here.
456
+
457
+ 115
458
+ 00:07:02,350 --> 00:07:03,770
459
+ Good to see you directory.
460
+
461
+ 116
462
+ 00:07:03,780 --> 00:07:13,490
463
+ And look for the file I made previously that would be you know where is it to search and disco.
464
+
465
+ 117
466
+ 00:07:13,500 --> 00:07:14,120
467
+ There we go.
468
+
469
+ 118
470
+ 00:07:15,360 --> 00:07:19,230
471
+ So let's open this file and we have some information here.
472
+
473
+ 119
474
+ 00:07:19,470 --> 00:07:21,440
475
+ So no it looks fine.
476
+
477
+ 120
478
+ 00:07:21,450 --> 00:07:26,760
479
+ However we do have to this is a file and we copied it from the original file.
480
+
481
+ 121
482
+ 00:07:26,860 --> 00:07:29,160
483
+ The awesome changes we're going to have to make in this file.
484
+
485
+ 122
486
+ 00:07:29,250 --> 00:07:37,880
487
+ And I'll tell you in addition so you see this red block I have highlighted here that is at the top of
488
+
489
+ 123
490
+ 00:07:37,880 --> 00:07:39,190
491
+ the file here.
492
+
493
+ 124
494
+ 00:07:39,310 --> 00:07:40,810
495
+ This bit here.
496
+
497
+ 125
498
+ 00:07:40,890 --> 00:07:49,180
499
+ 64 subdivisions with in height Let's go back to the presentation we need to make sure these are these
500
+
501
+ 126
502
+ 00:07:49,180 --> 00:07:49,860
503
+ numbers here.
504
+
505
+ 127
506
+ 00:07:50,110 --> 00:07:54,400
507
+ Now in the original file there are not a naturally or could these are commented out I believe.
508
+
509
+ 128
510
+ 00:07:54,400 --> 00:07:59,500
511
+ So we just need to make sure that this looks like this and Jinyan the heightened Loree reducing the
512
+
513
+ 129
514
+ 00:07:59,500 --> 00:08:02,310
515
+ height and weight makes it run much faster too.
516
+
517
+ 130
518
+ 00:08:02,350 --> 00:08:03,540
519
+ So that's a good thing.
520
+
521
+ 131
522
+ 00:08:04,950 --> 00:08:10,540
523
+ So that's added to configure the bottom part of the configuration file also needs to be edited.
524
+
525
+ 132
526
+ 00:08:10,560 --> 00:08:12,680
527
+ So is two things we need to edits here.
528
+
529
+ 133
530
+ 00:08:13,050 --> 00:08:20,220
531
+ One the convolutional last congressional league here it needs a number of number specific number filters
532
+
533
+ 134
534
+ 00:08:20,240 --> 00:08:25,800
535
+ here and that's basically from this formula here and this formula depend on a number of classes you
536
+
537
+ 135
538
+ 00:08:25,800 --> 00:08:26,510
539
+ use.
540
+
541
+ 136
542
+ 00:08:26,730 --> 00:08:33,400
543
+ So since we use one class is just basically five into one plus six which is tity.
544
+
545
+ 137
546
+ 00:08:33,420 --> 00:08:39,430
547
+ However if we use trig classes it will just be five plus times nine which is 45.
548
+
549
+ 138
550
+ 00:08:39,900 --> 00:08:42,730
551
+ And the other thing we need to do is set the number of classes here.
552
+
553
+ 139
554
+ 00:08:43,020 --> 00:08:51,080
555
+ So let's quickly take a look of that at the bottom just to make sure it's what I said it is good.
556
+
557
+ 140
558
+ 00:08:51,150 --> 00:08:53,710
559
+ This is in the conference last convolutional league here.
560
+
561
+ 141
562
+ 00:08:53,970 --> 00:08:58,340
563
+ And number filters are set to Tuti and classes is a 2:1.
564
+
565
+ 142
566
+ 00:08:58,440 --> 00:09:00,510
567
+ So we're good to go so far.
568
+
569
+ 143
570
+ 00:09:01,050 --> 00:09:02,920
571
+ Let's go back to that presentation.
572
+
573
+ 144
574
+ 00:09:03,000 --> 00:09:07,910
575
+ And now we also need to create and edit or labels or label textfile.
576
+
577
+ 145
578
+ 00:09:08,160 --> 00:09:14,760
579
+ So this one is fairly easy we just need to remember when we were labeling it in Lippe label image that
580
+
581
+ 146
582
+ 00:09:14,760 --> 00:09:19,360
583
+ program and we had a object category named London underground.
584
+
585
+ 147
586
+ 00:09:19,380 --> 00:09:21,130
587
+ We just need to list out there.
588
+
589
+ 148
590
+ 00:09:21,210 --> 00:09:26,610
591
+ So if we had to reliables like cat dog Donald Trump would need to put each one on a new line in his
592
+
593
+ 149
594
+ 00:09:26,610 --> 00:09:27,350
595
+ file.
596
+
597
+ 150
598
+ 00:09:27,400 --> 00:09:30,470
599
+ So let's go to this directory and talk floor master.
600
+
601
+ 151
602
+ 00:09:31,230 --> 00:09:32,800
603
+ And take a look at that.
604
+
605
+ 152
606
+ 00:09:35,150 --> 00:09:37,580
607
+ Just to make sure it's done correctly.
608
+
609
+ 153
610
+ 00:09:37,660 --> 00:09:38,180
611
+ There we go.
612
+
613
+ 154
614
+ 00:09:38,180 --> 00:09:42,310
615
+ So you see this is London London and the ground is level there.
616
+
617
+ 155
618
+ 00:09:49,310 --> 00:09:49,660
619
+ OK.
620
+
621
+ 156
622
+ 00:09:49,770 --> 00:09:50,810
623
+ So let's keep going.
624
+
625
+ 157
626
+ 00:09:52,680 --> 00:10:00,650
627
+ So now we have to do training so to do training we have to go until it's aminal and in Darfur dockmaster
628
+
629
+ 158
630
+ 00:10:01,190 --> 00:10:04,980
631
+ Darfur mastered archery execute the following line.
632
+
633
+ 159
634
+ 00:10:05,040 --> 00:10:06,530
635
+ That's this line here.
636
+
637
+ 160
638
+ 00:10:06,820 --> 00:10:07,470
639
+ All right.
640
+
641
+ 161
642
+ 00:10:08,350 --> 00:10:12,960
643
+ So essentially this is a line here we use if we're using a c.p.
644
+
645
+ 162
646
+ 00:10:13,090 --> 00:10:15,090
647
+ So take note of this.
648
+
649
+ 163
650
+ 00:10:15,100 --> 00:10:16,620
651
+ This has to be correct.
652
+
653
+ 164
654
+ 00:10:16,720 --> 00:10:19,900
655
+ So we have to have all of their arteries identify correctly.
656
+
657
+ 165
658
+ 00:10:19,900 --> 00:10:24,110
659
+ So it's CMAG and the school configuration file.
660
+
661
+ 166
662
+ 00:10:24,280 --> 00:10:30,920
663
+ So you underscore one school class and we LoDo we it's we have previously.
664
+
665
+ 167
666
+ 00:10:30,970 --> 00:10:33,910
667
+ So some people put the weights in a bin directory.
668
+
669
+ 168
670
+ 00:10:34,080 --> 00:10:42,270
671
+ You just make sure it's lined up correctly and then we have transitions folder and tree and images will
672
+
673
+ 169
674
+ 00:10:42,540 --> 00:10:47,540
675
+ be specified in the box if you're using a Jeep you can use this as well.
676
+
677
+ 170
678
+ 00:10:47,770 --> 00:10:52,320
679
+ It tells you how much percent that the GPO memory you want to use.
680
+
681
+ 171
682
+ 00:10:52,320 --> 00:11:00,300
683
+ So now after some time takes about an hour to always want to see the train of five bucks but that's
684
+
685
+ 172
686
+ 00:11:00,300 --> 00:11:05,370
687
+ just what five images you can imagine how long it'll take to train if you have hundreds of thousands
688
+
689
+ 173
690
+ 00:11:05,370 --> 00:11:06,370
691
+ of images.
692
+
693
+ 174
694
+ 00:11:06,600 --> 00:11:12,420
695
+ So after we do that training is complete and then our model is saved and it's actually here checkpoint
696
+
697
+ 175
698
+ 00:11:12,720 --> 00:11:13,530
699
+ file here.
700
+
701
+ 176
702
+ 00:11:13,860 --> 00:11:18,450
703
+ Now when we load it back we just have to specify the checkpoint and it knows basically from the model
704
+
705
+ 177
706
+ 00:11:18,450 --> 00:11:20,790
707
+ name what checkpoint to look for.
708
+
709
+ 178
710
+ 00:11:21,360 --> 00:11:23,600
711
+ So this is how we load it into partem.
712
+
713
+ 179
714
+ 00:11:23,910 --> 00:11:28,610
715
+ Basically the same thing we did before except we know specifying this model here.
716
+
717
+ 180
718
+ 00:11:28,890 --> 00:11:31,680
719
+ And oh a checkpoint that's all.
720
+
721
+ 181
722
+ 00:11:32,370 --> 00:11:33,650
723
+ And this is it.
724
+
725
+ 182
726
+ 00:11:33,690 --> 00:11:34,570
727
+ This is what we just build.
728
+
729
+ 183
730
+ 00:11:34,570 --> 00:11:36,770
731
+ This is where we spend all the time doing.
732
+
733
+ 184
734
+ 00:11:36,870 --> 00:11:42,170
735
+ We built a London Underground classify some of these images on the training data sets of cheating here.
736
+
737
+ 185
738
+ 00:11:42,420 --> 00:11:47,860
739
+ But a few of them are actually on the set and actually put them out quite well.
740
+
741
+ 186
742
+ 00:11:47,880 --> 00:11:53,160
743
+ I was actually very impressed with how well this worked given that we only trained about five images
744
+
745
+ 187
746
+ 00:11:53,730 --> 00:11:58,600
747
+ so let's actually go to division of machine and execute that training.
748
+
749
+ 188
750
+ 00:11:58,890 --> 00:12:01,000
751
+ So let me just get the line here.
752
+
753
+ 189
754
+ 00:12:07,230 --> 00:12:13,620
755
+ That was not lying it's just lying there reading these comments that are relevant.
756
+
757
+ 190
758
+ 00:12:13,620 --> 00:12:16,930
759
+ So this is the line we want to use.
760
+
761
+ 191
762
+ 00:12:17,140 --> 00:12:20,270
763
+ So we go to this territory here from two no.
764
+
765
+ 192
766
+ 00:12:20,590 --> 00:12:23,800
767
+ So let's open up with your virtual machine.
768
+
769
+ 193
770
+ 00:12:23,800 --> 00:12:26,230
771
+ So it's good to see the back.
772
+
773
+ 194
774
+ 00:12:26,280 --> 00:12:28,740
775
+ So it's really dark.
776
+
777
+ 195
778
+ 00:12:29,080 --> 00:12:31,560
779
+ It's not always mix them up.
780
+
781
+ 196
782
+ 00:12:31,570 --> 00:12:38,160
783
+ Ellis the off flew again and then we just run this line up.
784
+
785
+ 197
786
+ 00:12:38,170 --> 00:12:39,350
787
+ Something went wrong.
788
+
789
+ 198
790
+ 00:12:39,640 --> 00:12:42,330
791
+ That is because we're not in the correct environment.
792
+
793
+ 199
794
+ 00:12:42,440 --> 00:12:48,300
795
+ What I do with my environment service activities are the answers to API.
796
+
797
+ 200
798
+ 00:12:49,000 --> 00:12:54,220
799
+ And I don't own a pistol but I want to piece it and have it run at the same time.
800
+
801
+ 201
802
+ 00:12:54,280 --> 00:13:01,530
803
+ So I'm just going to make sure we do have that character courage and so let's copy it here.
804
+
805
+ 202
806
+ 00:13:02,660 --> 00:13:09,510
807
+ Go back to it here and let's specify Let's see 25 ebox so you can actually watch a run.
808
+
809
+ 203
810
+ 00:13:10,040 --> 00:13:13,290
811
+ And this should work fine.
812
+
813
+ 204
814
+ 00:13:13,340 --> 00:13:14,150
815
+ Let's see it go.
816
+
817
+ 205
818
+ 00:13:18,590 --> 00:13:23,710
819
+ I'm actually going to leave those five images for you so you can actually start training this class
820
+
821
+ 206
822
+ 00:13:23,800 --> 00:13:25,630
823
+ this object on your own.
824
+
825
+ 207
826
+ 00:13:26,640 --> 00:13:27,860
827
+ So you can have fun doing that.
828
+
829
+ 208
830
+ 00:13:30,370 --> 00:13:33,440
831
+ So it takes a while to get started and you do see some warnings as well.
832
+
833
+ 209
834
+ 00:13:33,810 --> 00:13:39,060
835
+ You see some statistics here saying that there are seven objects detected in images just at five.
836
+
837
+ 210
838
+ 00:13:39,240 --> 00:13:44,360
839
+ That's because one or two of those images had more than one London Underground sign.
840
+
841
+ 211
842
+ 00:13:44,670 --> 00:13:46,640
843
+ And you can see it's going to e.g. park.
844
+
845
+ 212
846
+ 00:13:46,860 --> 00:13:47,990
847
+ It's actually going quite quickly.
848
+
849
+ 213
850
+ 00:13:48,010 --> 00:13:52,110
851
+ So draining family parks isn't going to take that long.
852
+
853
+ 214
854
+ 00:13:52,380 --> 00:13:53,520
855
+ Maybe just over an hour.
856
+
857
+ 215
858
+ 00:13:53,550 --> 00:13:56,370
859
+ Actually And there we go.
860
+
861
+ 216
862
+ 00:13:56,370 --> 00:13:58,600
863
+ It's going to get it's going to be finished soon.
864
+
865
+ 217
866
+ 00:14:00,810 --> 00:14:02,330
867
+ In fact that's probably finished now.
868
+
869
+ 218
870
+ 00:14:03,780 --> 00:14:06,010
871
+ A Wait how many e-books they specify.
872
+
873
+ 219
874
+ 00:14:06,160 --> 00:14:07,150
875
+ Twenty five bucks.
876
+
877
+ 220
878
+ 00:14:07,150 --> 00:14:11,380
879
+ Oh damn those five got so used to treating five bucks.
880
+
881
+ 221
882
+ 00:14:12,940 --> 00:14:14,520
883
+ Anyway while that's running.
884
+
885
+ 222
886
+ 00:14:14,740 --> 00:14:18,030
887
+ Let's actually use the model we created before.
888
+
889
+ 223
890
+ 00:14:18,250 --> 00:14:18,930
891
+ OK.
892
+
893
+ 224
894
+ 00:14:19,270 --> 00:14:22,420
895
+ So that would actually be in this file here.
896
+
897
+ 225
898
+ 00:14:22,420 --> 00:14:25,920
899
+ No that's the tenths of one it was.
900
+
901
+ 226
902
+ 00:14:26,020 --> 00:14:27,310
903
+ Yes this one.
904
+
905
+ 227
906
+ 00:14:27,340 --> 00:14:29,540
907
+ So that's test only object detector.
908
+
909
+ 228
910
+ 00:14:29,950 --> 00:14:32,480
911
+ So this is the object we're going to load here.
912
+
913
+ 229
914
+ 00:14:32,650 --> 00:14:33,270
915
+ OK.
916
+
917
+ 230
918
+ 00:14:33,670 --> 00:14:36,800
919
+ So I turn them all up to 400 ebox.
920
+
921
+ 231
922
+ 00:14:36,810 --> 00:14:43,030
923
+ I actually did it once at 500 but then I added in some more images and it turned that 400 just tested.
924
+
925
+ 232
926
+ 00:14:43,210 --> 00:14:49,030
927
+ This is the trouble we're going to use so let's This shouldn't take that long to load.
928
+
929
+ 233
930
+ 00:14:52,490 --> 00:14:55,910
931
+ Almost done thanks for telling me it's funny.
932
+
933
+ 234
934
+ 00:14:55,920 --> 00:14:56,750
935
+ Tell you
936
+
937
+ 235
938
+ 00:15:02,030 --> 00:15:02,480
939
+ OK.
940
+
941
+ 236
942
+ 00:15:02,820 --> 00:15:03,100
943
+ Good.
944
+
945
+ 237
946
+ 00:15:03,150 --> 00:15:04,540
947
+ We've loaded our model.
948
+
949
+ 238
950
+ 00:15:04,760 --> 00:15:09,800
951
+ Now it's actually cycled through some test images here from our test dataset.
952
+
953
+ 239
954
+ 00:15:11,520 --> 00:15:13,800
955
+ Damn it's pretty good.
956
+
957
+ 240
958
+ 00:15:13,860 --> 00:15:14,580
959
+ Good.
960
+
961
+ 241
962
+ 00:15:14,580 --> 00:15:15,690
963
+ This one got it right.
964
+
965
+ 242
966
+ 00:15:15,690 --> 00:15:20,520
967
+ But then if I do this as a one London underground tube same it clearly is not.
968
+
969
+ 243
970
+ 00:15:20,540 --> 00:15:21,870
971
+ So let's see what else.
972
+
973
+ 244
974
+ 00:15:22,240 --> 00:15:28,200
975
+ Again looking at maybe some curve here perhaps got this one right.
976
+
977
+ 245
978
+ 00:15:28,520 --> 00:15:29,620
979
+ Got this one right.
980
+
981
+ 246
982
+ 00:15:29,630 --> 00:15:31,060
983
+ This one right.
984
+
985
+ 247
986
+ 00:15:31,100 --> 00:15:35,090
987
+ This one I got two boxes so we should have used them none maximal suppression.
988
+
989
+ 248
990
+ 00:15:35,110 --> 00:15:37,450
991
+ I got him still didn't get any here.
992
+
993
+ 249
994
+ 00:15:37,490 --> 00:15:44,230
995
+ Oddly got this one surprisingly And this one and this one too.
996
+
997
+ 250
998
+ 00:15:44,280 --> 00:15:50,700
999
+ So as you can see our model actually performs fairly well given that we only used 5 images.
1000
+
1001
+ 251
1002
+ 00:15:50,790 --> 00:15:56,730
1003
+ So I encourage you to experiment experiment with this make your own optic detect detector and maybe
1004
+
1005
+ 252
1006
+ 00:15:56,730 --> 00:15:58,270
1007
+ show it to the rest of the students in the class.
1008
+
1009
+ 253
1010
+ 00:15:58,290 --> 00:16:04,710
1011
+ Upload it to upload your model wait's to all of you Demy form.