STN

https://arxiv.org/abs/1506.02025 from 2015.

STN helps to crop out and scale-normalizes the appropriate region, which can simplify the subsequent classfication task and lead to better classification performance.

(a) Input Image with Random Translation, Scale, Rotation, and Clutter, (b) STN Applied to Input Image, (c) Output of STN, (d) Classification Prediction

Quick Review on Spatial Transformation Matrices

There are mainly 3 transformation learnt by STN in the paper. Indeed, more sophisticated transformation can also be applied as well.

1.1 Affine Transformation

1.2 Projective Transformation

1.3 Thin Plate Spline(TPS) Transformation

To be explored

Spatial Transformer Network(STN)

STN = Localisation Net + Grid Generator + Sampler

2.1 Localisation Net

input feature map: (W,H,C)

output: $\theta$ , parameters of transformation $T\theta$

Grid Generator

Sampler

Sampling Kernel

DCN

Regular convolution is operated on a regular grid R.
Deformable convolution is operated on R but with each points augmented by a learnable offset ∆pn.
Convolution is used to generate 2N number of feature maps corresponding to N 2D offsets ∆pn (x-direction and y-direction for each offset).

STN-Spatial Transformer Network(Image Classification) and Deformable Convolution Networks

STN