Investigation of the Clustering of High Quality Stingless Bee Honeys using Unsupervised Machine Learning Models
DOI: https://doi.org/10.55373/mjchem.v27i1.110
Keywords: Stingless bee honey; Malaysia; quality; unsupervised machine learning; clustering
Abstract
Honey quality and authenticity are crucial due to its health benefits and rising demand, yet challenges like environmental factors and adulteration persist. This study evaluated 106 honey samples for quality, bee species distribution, and patterns using machine learning models. Unsupervised clustering techniques, including K-Means, Agglomerative, Hierarchical Clustering, and DBSCAN, were applied. Component plane analysis of the Self-Organizing Map (SOM) highlighted key clustering factors. Hierarchical clustering (unscaled dendrogram) outperformed others with a Silhouette score of 0.351, a Davies-Bouldin Index of 0.977, and a Cophenetic Correlation Coefficient of 0.709. Quality was assessed based on pH, moisture content, sugar levels, and 5-hydroxymethylfurfural (HMF) using the Malaysian Standard for stingless bee Honey (MS 2683:2017) and Codex Alimentarius guidelines. All samples met quality standards, indicating freshness and high quality. Four distinct clusters emerged with unique physicochemical properties and species distributions. The application of various unsupervised clustering techniques (e.g., K-Means, Hierarchical Clustering, DBSCAN) and a Self-Organizing Map (SOM) for analyzing honey quality and bee species distribution is innovative. While honey quality assessments are common, incorporating advanced data analytics to uncover patterns and relationships is relatively novel.