Synergy of advanced machine learning and deep neural networks with consensus molecular dock- ing for enhanced potency prediction of ALK inhibitors,

Published in Scientific Reports , 2023

Abstract: This study addresses the urgent need for new Anaplastic lymphoma kinase (ALK) inhibitors in Non-Small Cell Lung Cancer (NSCLC), focusing on the ALK-positive mutation variant (5% of cases). Despite only five FDA-approved inhibitors, the demand for effective drugs persists. Leveraging the power of AI, machine learning, and deep learning, our research expedites novel ALK inhibitor screening. The dataset includes 26,168 substances tested for ALK inhibition potential from scientific papers. Notably, the XGBoost machine learning model exhibited compelling results with an external validation (EV) f1 score of 0.921 and an EV-AP of 0.961, along with a cross-validation (CV) f1 score of 0.888±0.039 and a CV-AP of 0.939±0.032. Besides, the ANN demonstrated excellence with an EV-f1 score of 0.930 and an EV-AP of 0.955, complemented by a CV-f1 score of 0.891±0.037 and a CV-AP of 0.934±0.040. The present study undertakes a meticulous comparative analysis between traditional machine learning models and a Graph Neural Network (GNN) model, the latter being a product of our recent research endeavors. The findings reveal that, despite the advancements in neural network models, traditional machine learning models exhibit superior performance over the GNN model. During this research, these models were employed in conjunction with a consensus docking model to screen a total of 120,571 compounds virtually, leading to the identification of three promising ALK inhibitors: CHEMBL1689515, CHEMBL2380351, and CHEMBL102714. The study recommends further dynamic simulations, in vitro tests, and exploration of advanced AI models like CNN and RNN for molecular optimization.

Keywords: ALK, XGBoost, Artificial Neural Network, Graph Neural Network, Molecular Docking.

This paper is in under-review phase.

Link Github code