A quick little paper. I’ve become used to reading journal articles or the occasional thesis in the field of machine learning. I’d refer to this as more of a monograph. A quick little look at a small modification of vision transformers which certainly suited me on what was a very busy day for unrelated reasons.
They have set a standard vision transformer to have a layer norm before and after a dense layer embeds the patches. This small modification is an extremely computationally cheap way of getting better results that they demonstrate through experiments.
Well worth a look if you work on vision transformers but perhaps this paper could have been fleshed out just a little bit with a proper conclusion.