Recently I came across a dataset and paper that was rather fascinating:
CINAR I., KOKLU M. and TASDEMIR S., (2020). “Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods”, Gazi Journal of Engineering Sciences, vol. 6, no. 3, pp. 200-209, December 2020, DOI: https://doi.org/10.30855/gmbd.2020.03.03.
Researcher had taken photos of raisins in a dark box and then created images of the raisins that were process to be black and white like so:
With the black and white image created in the bottom left. They could then use image processing techniques to generate features like area, perimeter, etc. This simple process generates numbers to describe each raisin. This essentially converts what is ordinarily an image recognition task to a tabular data task. Which is what interested me in the first place.
I am always fascinated when in Machine Learning there is the possibility of converting from one kind of task to another. This opens the door to new approaches. Often I am experimenting with the opposite transformation from tabular data to images but this was a new approach. I tackled it in the Kaggle notebook here.
Using the same number of folds I was able to increase the accuracy with an ADABoost Classifier. The reason this works is that an ADABoost Classifier works by fitting a weak classifier model and then iteratively changing the weights per observation and combining many classifiers to classify in a non-linear way. This proved to be a stronger approach than then the classification strategies they were using.
I was able to quickly determine a better classifier than the one used in the paper thanks to Pycaret’s compare_model feature:
You can see the decision boundary here:
I’m glad this dataset was shared and that I was able to achieve the accuracy I did. I’ll keep an eye out if the researchers involved go on to develop more in this area. I’m definitely thankful they shared this with the machine learning community.