Dropout is the process that converts a percentage of the output of one layer into zero values. This is used between layers and the percentage represents a randomly selected sample of that size. So for one layer outputting 100 values, a dropout of 50% would imply that half of those 100 values are going to be converted into zero. This stops a network settling on set paths and in practice makes it perform Bayesian inference to overcome the dropout. This has traditionally been used to prevent a model over-fitting on the data it is trained on. Over fitting is where a model becomes hyper focused on the data it is presented during training to the point that it becomes worse when dealing with unseen data.
The paper at the bottom examines early dropout to improve under-fitting of models. This is where a model is unable to achieve good accuracy on the training data and therefore will struggle on unseen data as well. This is generally an issue when datasets are large and the model isn’t able to converge on a good solution to the problem even though it has access to a large amount of data. Early dropout can according to the paper be used to help with this issue. The process involves dropout being applied to early iterations of training and then at some chosen point being turned off entirely. Turned off in this instance could be considered being 0% dropout.
The paper has some other good qualities beyond this interesting finding. Going over the current usage of dropout and giving needed context to their work. A quality paper that I believe is well worth the time to read.
Dropout Reduces Underfitting
Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell