-
Mimic Before Reconstruct – MR-MAE
MAE stand for masked auto-encoder and this is a technique for pre-training a neural network for image recognition. An auto encoder in this context takes in an image and then has the task of simply reproducing that image. At some point in will encode the image into a smaller dimensional space before expanding it out…
-
GLU Variants Improve Transformer
Activation functions are crucial to preventing the collapse of a neural network into one big layer. Without them the number of layers in a model, sometimes described as depth, would collapse internally to function as one simple linear approximation layer. Therefore non-linearity needs to be introduced and this non-linearity is referred to as activation functions.…
-
LLaMA an Open Large Language Model
The rise of ChatGPT has been a huge moment for public engagement with large language models and I think its release impressed everyone. While the hype has died down a little at the time of writing I think it may have been one of the largest showcases of machine learning to the general public ever…
-
Text-to-Image Models for Visual Perception
There are now large open source models and weights for the generation of images from text prompts. The results of these networks like the common stable diffusion are quite impressive and but for few issues they are capable of producing some interesting artworks. I’ve used them myself to do some interesting abstract artwork in my…
-
Scalable Federated Learning – Paper of the Day
I am going to be honest in the introduction here that when selecting this paper I had little exposure to federated learning and wanted to learn more. I lucked out with this paper as it is a great introduction to the concept of federated learning. The main concept here is that multiple groups each have…
-
Dropout Reduces Underfitting – Paper of The Day
Dropout is the process that converts a percentage of the output of one layer into zero values. This is used between layers and the percentage represents a randomly selected sample of that size. So for one layer outputting 100 values, a dropout of 50% would imply that half of those 100 values are going to…
-
Complex Neural Networks – Paper of the Day
It would be useful to apply deep learning to the complex plane because it would allow us to work with data that has both a phase and magnitude component. Signal and audio processing would be good examples of places where complex numbers can be extremely useful. If neural networks can be adapted to work with…
-
Learning the Beauty in Songs – Paper of the Day
This is a very interesting paper where the authors attempt to improve amateur singers to a professional level using machine learning techniques. I see less work of machine learning in the audio realm and so papers that tackle something related to sound always stand out for me. A Variational Auto Encoder is used on training…
-
Spiking Neural Networks – Paper Of The Day
Spiking neural networks are networks that incorporate a time dimension into their process so that each neuron builds up energy and then releases distinct spikes. This process is very different from the typical one and leads to essentially a whole different sub field of machine learning that pursues this biologically inspired process. In this paper…
-
Swarm Parallelism – Paper of the Day
With the rise of massive networks there is a push to find ways to parallelism training in new ways to make research on them more accessible. The sheer size of these networks which now reach into the billions of parameters cut off many researchers from making advances in the field. There are various attempts to…