Development of Machine Learning-Based QSAR Model for Virtual Screening of Dipeptidyl Peptidase-4 Inhibitors
DOI: https://doi.org/10.55373/mjchem.v25i5.168
Keywords: Diabetes Mellitus Type 2; dipeptidyl peptidase-4; Machine Learning; Quantitative Structure-Activity Relationship (QSAR)
Abstract
Treatment of type 2 diabetes mellitus is mostly done by inhibiting the DPP-4 protein using an inhibitor compound, however, it may cause headaches and indigestion as its side effect. This study has been focused on the development of the DPP-4 inhibitor as a new drug candidate for type 2 diabetes mellitus using the Machine Learning-based Quantitative Structure-Activity Relationship (QSAR) for the virtual screening process. Training dataset has been obtained from the ChEMBL database with DPP-4 as the target protein (code ChEMBL284), and it is used to find a model which then applied for the virtual screening process of 884 million molecules obtained from the ZINC database. The screening processes are based on the predicted activity (pIC50) values above the experimental activity values of the drugs that were already available and it is then screened again according to Lipinski Rule of 5 to find out the compounds that can be absorbed by the body. The compounds that can be absorbed by the body were then docked using AutoDockVina software to determine the free energy value and interaction pattern between the compound and protein target to get recommendations for a new DPP-4 inhibitor candidate. Result obtained from the best model with an R2 test value of 0.69 is then used for virtual screening. The results of the virtual screening were 5 compounds that had the highest pIC50 values and not violating Lipinski's Rule of 5. These compounds had codes ZINC341837061, ZINC001359979988, ZINC001707862778, ZINC001722886251 and ZINC001726358542.