Month: March 2023

  • Mimic Before Reconstruct – MR-MAE

    MAE stand for masked auto-encoder and this is a technique for pre-training a neural network for image recognition. An auto encoder in this context takes in an image and then has the task of simply reproducing that image. At some point in will encode the image into a smaller dimensional space before expanding it out…

  • GLU Variants Improve Transformer

    Activation functions are crucial to preventing the collapse of a neural network into one big layer. Without them the number of layers in a model, sometimes described as depth, would collapse internally to function as one simple linear approximation layer. Therefore non-linearity needs to be introduced and this non-linearity is referred to as activation functions.…

  • LLaMA an Open Large Language Model

    The rise of ChatGPT has been a huge moment for public engagement with large language models and I think its release impressed everyone. While the hype has died down a little at the time of writing I think it may have been one of the largest showcases of machine learning to the general public ever…

  • Text-to-Image Models for Visual Perception

    There are now large open source models and weights for the generation of images from text prompts. The results of these networks like the common stable diffusion are quite impressive and but for few issues they are capable of producing some interesting artworks. I’ve used them myself to do some interesting abstract artwork in my…

  • Scalable Federated Learning – Paper of the Day

    I am going to be honest in the introduction here that when selecting this paper I had little exposure to federated learning and wanted to learn more. I lucked out with this paper as it is a great introduction to the concept of federated learning. The main concept here is that multiple groups each have…

  • Dropout Reduces Underfitting – Paper of The Day

    Dropout is the process that converts a percentage of the output of one layer into zero values. This is used between layers and the percentage represents a randomly selected sample of that size. So for one layer outputting 100 values, a dropout of 50% would imply that half of those 100 values are going to…