Comparison of Feature Selection Methods to Classify Inhibitors in DUD-E Database

Heri Kuswanto, Renny Yunia Nurhidayah, Hayato Ohwada

Research output: Contribution to journalConference article

1 Citation (Scopus)

Abstract

In designing a new drug, inhibitor compound is usually used to control the enzyme work to recover a particular disease. In the drug design technique, the classification of inhibitor is carry out by docking software to simulate the bounding of mixing (new inhibitor candidate) with the targeted enzyme. DUD-E is a database to simulate docking with high dimensional data characteristic, which lead to the feasibility of machine learning approach as the analytical tool. A compound with specific characterictics can be classified into ligand or decoy by using many characterictics leading to a problem in the machine learning algorithm. This paper discusses feature selection analysis to obtain the compound characteristics which are effectively determine ligand or decoy. This paper examined Mutual Information-based Feature Selection (MIFS), Correlation-based Feature Selection (CFS) as well as Fast Correlation-Based Filter (FCBF), and the results show that the FCBF always selects less number of features with fastest runtime of classification. The highest classification accuracy is obtained when all features are used in the classification by k-NN. However, the accuracy is slightly different with classification using selected features. The CFS method performs well for Data-A with accuracy of 89,55%, while the MIFS outperforms the others for Data-B and Data-C with the classification accuracy of 92,34% and 95,20% consecutively.

Original languageEnglish
Pages (from-to)194-202
Number of pages9
JournalProcedia Computer Science
Volume144
DOIs
Publication statusPublished - 1 Jan 2018
Event3rd International Neural Network Society Conference on Big Data and Deep Learning, INNS BDDL 2018 - Sanur, Bali, Indonesia
Duration: 17 Apr 201819 Apr 2018

    Fingerprint

Keywords

  • DUD-E
  • accuracy
  • feature
  • k-NN
  • runtime

Cite this