blesot commited on
Commit
d788cc0
·
1 Parent(s): 8c6db8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -17
README.md CHANGED
@@ -1,46 +1,49 @@
1
- Hugging Face's logo
2
- ---
3
  tags:
4
  - object-detection
5
  - vision
6
- library_name: faster_rcnn
7
  datasets:
8
  - coco
9
 
10
  ---
11
 
12
- # Faster R-CNN
13
 
14
  ## Model desription
15
 
16
- This model is an enhanced version of the [Fast R-CNN model](https://arxiv.org/pdf/1504.08083.pdf). Due to the computation bottleneck posed by Fast-RCNN that saw the innovation of Region of Pooling. Faster-RCNN introduces the Region of Proposal Network(RPN) and reuses the same CNN results for the same proposal instead of running a selective search algorithm. The RPN is trained end-to-end to generate high-quality region proposals, which Fast R-CNN uses for detection. The model merges RPN and Fast R-CNN into a single network by sharing their convolutional features. With 'attention' mechanisms, the RPN component tells the unified network where to look. This state-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
17
 
18
  *This Model is based on the Pretrained model from [OpenMMlab](https://github.com/open-mmlab/mmdetection)*
19
 
20
- ![Faster R-CNN](https://user-images.githubusercontent.com/40661020/143881188-ab87720f-5059-4b4e-a928-b540fb8fb84d.png)
21
 
22
- ### More information on the Model, Dataset, Training and Results:
23
 
24
  #### The model
25
- By implementing a CNN-based region proposal network, the Faster R-CNN addresses the bottleneck issue that the Fast R-CNN raised during the proposal stage. Additionally, it uses the concept of various size anchor boxes, which accelerates the object detection model. Convolution layers receive input from images and generate a feature map. We obtain the region of proposals by adding a layer of convolution to the extracted feature map.
 
26
 
27
- To output the box and class information, the convolution layer traverses across the feature map at each position using a 3X3 window to create box proposals. At each output, a K number of boxes are generated at relative coordinates position from the pre-defined anchor boxes. The final box output is the probability of whether the box contains the object.
28
 
29
  #### Datasets
30
  [COCO Datasets](https://cocodataset.org/#home)
31
 
32
- #### Training
33
  Please [read the paper](https://arxiv.org/pdf/1703.06870.pdf) for more information on training, or check OpenMMLab [repository](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn)
34
 
35
- In four stages, the model training is done:
 
 
36
 
37
- The RPN is trained on the COCO object detection datasets in the first stage to produce the region of proposals. The trained RPN from stage one is then used to train the Fast R-CNN. Following this training, a detector network is used to initialize the RPN's training with fixed shared convolution layers, and the network's unique layers are adjusted. Finally, the last step is fine-tuning unique layers of Fast R-CNN, forming a unified network.
 
 
 
 
38
 
39
  #### Results Summary
40
- - The RPN model achieves better results than the one that uses selective search.
41
- - Pascal VOC 2007 & 2012 are used for the test sets
42
- - The selective search model takes more time(ms) than the RPN model.
43
-
44
 
45
  ## Intended uses & limitations
46
- Due to the efficiency in learning the training dataset superior than ordinary CNN algorithms, Faster R-CNN models are suitable to identify classified moving objects. Faster R-CNN has the disadvantage that the RPN is trained with all of the size 256 mini-batch anchors being taken from a single image. The network may take a long time to attain convergence because all samples from one image may be correlated.
 
 
 
 
1
  tags:
2
  - object-detection
3
  - vision
4
+ library_name: mask_rcnn
5
  datasets:
6
  - coco
7
 
8
  ---
9
 
10
+ # Mask R-CNN
11
 
12
  ## Model desription
13
 
14
+ Mask R-CNN is a model that extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. The model locates pixels of images instead of just bounding boxes as Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs.
15
 
16
  *This Model is based on the Pretrained model from [OpenMMlab](https://github.com/open-mmlab/mmdetection)*
17
 
18
+ ![MMDetection](https://user-images.githubusercontent.com/12907710/137271636-56ba1cd2-b110-4812-8221-b4c120320aa9.png)
19
 
20
+ ### More information on the model and dataset:
21
 
22
  #### The model
23
+ Mask R-CNN works towards the approach of instance segmentation, which involves object detection, and semantic segmentation. For object detection, Mask R-CNN uses an architecture that is similar to Faster R-CNN, while it uses a Fully Convolutional Network(FCN) for semantic segmentation.
24
+ The FCN is added to the top of features of a Faster R-CNN to generate a mask segmentation output. This segmentation output is in parallel with the classification and bounding box regressor network of the Faster R-CNN model. From the advancement of Fast R-CNN Region of Interest Pooling(ROI), Mask R-CNN adds refinement called ROI aligning by addressing the loss and misalignment of ROI Pooling; the new ROI aligned leads to improved results.
25
 
 
26
 
27
  #### Datasets
28
  [COCO Datasets](https://cocodataset.org/#home)
29
 
30
+ ## Training Procedure
31
  Please [read the paper](https://arxiv.org/pdf/1703.06870.pdf) for more information on training, or check OpenMMLab [repository](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn)
32
 
33
+ The model architecture is divided into two parts:
34
+ - Region proposal network (RPN) to propose candidate object bounding boxes.
35
+ - Binary mask classifier to generate a mask for every class
36
 
37
+ #### Technical Summary.
38
+ - Mask R-CNN is quite similar to the structure of faster R-CNN.
39
+ - Outputs a binary mask for each Region of Interest.
40
+ - Applies bounding-box classification and regression in parallel, simplifying the original R-CNN's multi-stage pipeline.
41
+ - The network architectures utilized are called ResNet and ResNeXt. The depth can be either 50 or 101
42
 
43
  #### Results Summary
44
+ - Instance Segmentation: Based on the COCO dataset, Mask R-CNN outperforms all categories compared to MNC and FCIS, which are state-of-the-art models.
45
+ - Bounding Box Detection: Mask R-CNN outperforms the base variants of all previous state-of-the-art models, including the COCO 2016 Detection Challenge winner.
 
 
46
 
47
  ## Intended uses & limitations
48
+ The identification of object relationships and the context of objects in a picture are both aided by image segmentation. Some of the applications include face recognition, number plate recognition, and satellite image analysis. With great model generality, Mask RCNN can be extended to human pose estimation; it can be used to estimate on-site approaching live traffic to aid autonomous driving.
49
+