Direkt zum Inhalt
Merck
  • DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.

Journal of computer-aided molecular design (2019-05-28)
Farman Ali, Saeed Ahmed, Zar Nawab Khan Swati, Shahid Akbar
ZUSAMMENFASSUNG

DNA-binding proteins (DBPs) participate in various biological processes including DNA replication, recombination, and repair. In the human genome, about 6-7% of these proteins are utilized for genes encoding. DBPs shape the DNA into a compact structure known chromatin while some of these proteins regulate the chromosome packaging and transcription process. In the pharmaceutical industry, DBPs are used as a key component of antibiotics, steroids, and cancer drugs. These proteins also involve in biophysical, biological, and biochemical studies of DNA. Due to the crucial role in various biological activities, identification of DBPs is a hot issue in protein science. A series of experimental and computational methods have been proposed, however, some methods didn't achieve the desired results while some are inadequate in its accuracy and authenticity. Still, it is highly desired to present more intelligent computational predictors. In this work, we introduce an innovative computational method namely DP-BINDER based on physicochemical and evolutionary information. We captured local highly decisive features from physicochemical properties of primary protein sequences via normalized Moreau-Broto autocorrelation (NMBAC) and evolutionary information by position specific scoring matrix-transition probability composition (PSSM-TPC) and pseudo-position specific scoring matrix (PsePSSM) using training and independent datasets. The optimal features were selected by the support vector machine-recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) from fused features and were fed into random forest (RF) and support vector machine (SVM). Our method attained 92.46% and 89.58% accuracy with jackknife and ten-fold cross-validation, respectively on the training dataset, while 81.17% accuracy on the independent dataset for prediction of DBPs. These results demonstrate that our method attained the highest success rate in the literature. The superiority of DP-BINDER over existing approaches due to several reasons including abstraction of local dominant features via effective feature descriptors, utilization of appropriate feature selection algorithms and effective classifier.