Calculate the output size in convolution layer

Machine LearningDeep LearningPytorchConv Neural-Network

Machine Learning Problem Overview


How do I calculate the output size in a convolution layer?

For example, I have a 2D convolution layer that takes a 3x128x128 input and has 40 filters of size 5x5.

Machine Learning Solutions


Solution 1 - Machine Learning

you can use this formula [(W−K+2P)/S]+1.

  • W is the input volume - in your case 128
  • K is the Kernel size - in your case 5
  • P is the padding - in your case 0 i believe
  • S is the stride - which you have not provided.

So, we input into the formula:

Output_Shape = (128-5+0)/1+1

Output_Shape = (124,124,40)

NOTE: Stride defaults to 1 if not provided and the 40 in (124, 124, 40) is the number of filters provided by the user.

Solution 2 - Machine Learning

You can find it in two ways: simple method: input_size - (filter_size - 1)

W - (K-1)
Here W = Input size
            K = Filter size
            S = Stride
            P = Padding

But the second method is the standard to find the output size.

Second method: (((W - K + 2P)/S) + 1)
        Here W = Input size
        K = Filter size
        S = Stride
        P = Padding 

Solution 3 - Machine Learning

Let me start simple; since you have square matrices for both input and filter let me get one dimension. Then you can apply the same for other dimension(s). Imagine your are building fences between trees, if there are N trees, you have to build N-1 fences. Now apply that analogy to convolution layers.

Your output size will be: input size - filter size + 1

Because your filter can only have n-1 steps as fences I mentioned.

Let's calculate your output with that idea. 128 - 5 + 1 = 124 Same for other dimension too. So now you have a 124 x 124 image.

That is for one filter.

If you apply this 40 times you will have another dimension: 124 x 124 x 40

Here is a great guide if you want to know more about advanced convolution arithmetic: https://arxiv.org/pdf/1603.07285.pdf

Solution 4 - Machine Learning

Formula : n[i]=(n[i-1]−f[i]+2p[i])/s[i]+1

where,

n[i-1]=128

f[i]=5

p[i]=0

s[i]=1

so,

n[i]=(128-5+0)/1+1 =124

so the size of the output layer is: 124x124x40 Where '40' is the number of filters

Solution 5 - Machine Learning

(1241243)*40 = 1845120 width = 124 height = 124 depth = 3 no. of filters = 40 stride = 1 padding = 0

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMonk247ukView Question on Stackoverflow
Solution 1 - Machine LearningThe BrownBatmanView Answer on Stackoverflow
Solution 2 - Machine LearningRamzan ShahidView Answer on Stackoverflow
Solution 3 - Machine LearningSam OzView Answer on Stackoverflow
Solution 4 - Machine LearningRahul VermaView Answer on Stackoverflow
Solution 5 - Machine LearningRitesh Pratap SinghView Answer on Stackoverflow