We can think of Spatial Pyramid Matching as an extension of Bag Of Visual Words.
Here, instead of only taking the Histogram of features of the entire image at once, we take the histogram at different levels.
That is, we divide the image into 2x2, 4x4, 8x8 grids and so on and calculate histogram for each block separately and then concatenate the vectors.
Advantage is that image can be of any size or aspect ratio. You will get the same fixed length output.
Also multi-level pooling is shown to be robust to object deformations.
Selective Search used this technique to do object detection.
------------------------
This is a part of the course 'Evolution of Object Detection Networks'.
See full playlist here: [ Ссылка ]
------------------------
References:Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories - Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Credits:
[ Ссылка ]
Copyright Disclaimer: Under section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
![](https://i.ytimg.com/vi/6MwuK2wHlOg/maxresdefault.jpg)