🛵Motivation
I won't explain the ins and outs of how CNN works in this article, or how it forwards and backwards because there are already tons of tutorials and articles out there. But there is still not much implementation of basic algorithms in an easy-looking way. I cannot learn CNN from a huge framework like PyTorch, which has very complex code organizations and class inheritances. To help people (and myself) have a deeper understanding of convolution, I decide to make one. Because from my perspective, sometimes code is worth a thousand words.
🪐Introduction
In my version of implementation, five CNNs and average pooling are introduced: CNN 1.0, CNN 2.0, CNN 3.0, CNN loops, and CNN img2col.
- CNN 1.0. This is the first version of my CNN because it can only do the convolution and pooling operation in 2D, which means it does not support batch inputs and multiple kernels. It can only read in single-channel images.
- CNN 2.0. A better implementation compared to CNN 1.0 because in this version I added channel supports. So images can be multi-channel ( colored with RGB three channels). Besides convolution operation can have multiple kernels to process multi-channel inputs and output multi-channel features.
- CNN 3.0. In this version, it’s already a full-functional convolution neural network. It can input images as a batch, which means the inputs have 4 dimensions: [batch size, channel, height, width].
From here, things are getting more and more interesting. Because in CNN x.0, I actually implemented them with NumPy array, which supports various operations to process matrix. So I didn’t write convolution forward and backward with 6 or 7 loops, instead, I increase the dimension of inputs and kernels and broadcast them, which is quite complicated but do improve inference time.
- CNN loops. In order to have a more clear version, I rewrite CNN 3.0 with only ‘for loops’. In this case, the loops actually already reached 7 levels at most, which could be super time-consuming. But I used a Python library called Numba, which compile Python code into optimized machine code with LLVM and improved the inference time a lot!
- CNN img2col. Wow, what an amazing and great algorithm. I’m not sure who invents it first, but this algorithm is used by Keras. The basic idea of the img2col function is that it transforms convolution operation into matrix multiplication by changing the structure of inputs and kernels. In my experiments, CNN with img2col implementation has the best performance w.r.t inference time!
Alright, it’s tons of work to implement all of those things. But here is not the end, because I was not just implementing a single CNN operation. For testing the performance of my version of CNNs, I wrote average pooling operation as well. With these two, the first part of Lenet can be built.
Besides, I already implemented ANN before (A Mini Machine Learning Library with Detailed Math Derivation), which could provide a fully connected neural network. So I combined ANN and CNN to build a Lenet network and trained the model on digits recognition dataset. The result looks great!
👨🏻💻Comments
This is actually an update of my mini machine learning library, which surely cannot compete with many open-source machine learning tools like sklearn and deep learning frameworks like Pytorch. But it’s still helpful for learning!
A more detailed test of my versions of CNN is included in my GitHub repo. All codes and math derivation are uploaded there. If you want to learn more about this, please check it out!
Repo link: