Maybe we can reduce the amount of computation needed by applying 1x1 convolutions first? → Inception
Xception
Hypothesis: Inception module try to explicitly factor two tasks done by a single convolution kernel: mapping cross-channel correlation and spatial correlation
By inception module, these two correlations are sufficiently decoupled due to parallel connections
Would it be reasonable to make a much stronger hypothesis than that of Inception?
We can apply 1x1 convolution over channels and 3x3 convolution separately on each several channels.
That’s somewhat like Depthwise Separable Convolution but in an inverted order!