Efficient Net

Posted by Rain on April 15, 2020

Depth Scaling(d): problem:

performance not improved as expected. Vanishing Gradient.

Width Scaling(w):

With shallow models(less deep but wider) accuracy saturates quickly with larger width.

Resolution(r):

  • object detection: 300x300,512x512, 600x600.
  • but accuracy gain diminishes very quickly.

Proposed Compound Scaling:

  • $d = \alpha^{\phi}$,
  • $w = \beta^\phi$,
  • $r = \gamma^{\phi}$,
  • $such that \alpha\beta\gamma \approx 2$

Efficient Architecture:

  • Given a baseline architecture.
  • Fix $\phi$ =1, assuming that twice more resources are available.
  • Try different value of $\phi$.