Month: March 2023

Mimic Before Reconstruct – MR-MAE

MAE stand for masked auto-encoder and this is a technique for pre-training a neural network for image recognition. An auto encoder in this context takes in an image and then has the task of simply reproducing that image. At some point in will encode the image into a smaller dimensional space before expanding it out…

March 13, 2023
GLU Variants Improve Transformer

Activation functions are crucial to preventing the collapse of a neural network into one big layer. Without them the number of layers in a model, sometimes described as depth, would collapse internally to function as one simple linear approximation layer. Therefore non-linearity needs to be introduced and this non-linearity is referred to as activation functions.…

March 10, 2023
LLaMA an Open Large Language Model

The rise of ChatGPT has been a huge moment for public engagement with large language models and I think its release impressed everyone. While the hype has died down a little at the time of writing I think it may have been one of the largest showcases of machine learning to the general public ever…

March 9, 2023
Text-to-Image Models for Visual Perception

There are now large open source models and weights for the generation of images from text prompts. The results of these networks like the common stable diffusion are quite impressive and but for few issues they are capable of producing some interesting artworks. I’ve used them myself to do some interesting abstract artwork in my…

March 8, 2023
Scalable Federated Learning – Paper of the Day

I am going to be honest in the introduction here that when selecting this paper I had little exposure to federated learning and wanted to learn more. I lucked out with this paper as it is a great introduction to the concept of federated learning. The main concept here is that multiple groups each have…

March 7, 2023
Dropout Reduces Underfitting – Paper of The Day

Dropout is the process that converts a percentage of the output of one layer into zero values. This is used between layers and the percentage represents a randomly selected sample of that size. So for one layer outputting 100 values, a dropout of 50% would imply that half of those 100 values are going to…

March 3, 2023