Everyday I at least try to read through a new paper in the field of machine learning. Today I read through a paper that can be found at the following:
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions – ScienceDirect
This is an interesting idea that I would say is inspired by the Kronecker product of linear algebra to create a network with adaptive activation functions. I say inspired by due to the Kronecker product not actually needing to be calculated to perform the methods described by the paper.
The core idea of this paper is that the output of linear layer could be given to multiple activation functions something it refers to Rowdy Activation. The first activation function might be a standard one such as ReLu the rest of them would be sinusoidal harmonic functions that activate it a bounded but noisy manner. The hope is that this will allow for a search of the solution space that avoid over focusing on certain parameters and leads to a more balanced search using the entire network.
I think this is a powerful idea and well implemented by the paper. However, it could prove to be of limited usefulness in its current form. Increased activation functions mean significantly more expensive back propagation computationally. This means that the network would take longer to train and end up being lower in performance than the computationally equivalent regular network. So while it may be more accurate with fewer parameters it ends up in computationally equivalent situations perhaps being worse than the traditional approach.
That said I feel that there is merit and this is worth exploring particularly in situations where accuracy is at a premium and more parameters is not helping. This could be a tool in the toolbox of a data scientist presented with such a problem.
A purely subjective rating of the paper is below. This shouldn’t be taken to seriously:
Explanation: 6/10
Novelty: 7/10
Breakthrough: 3/10
Interest: 8/10
Here explanation represents the quality of how the paper explains its core ideas. A rating of 10 to me would represent an ideal where an educated individual with little familiarity of the field might still grasp and understand the core concepts.
Breakthrough would be the level at which I feel the ideas herein demonstrate or could demonstrate better achievements in some areas of ML than standard or state of the art techniques.
Leave a Reply