In designing a new drug, inhibitor compound is usually used to control the enzyme work to recover a particular disease. In the drug design technique, the classification of inhibitor is carry out by docking software to simulate the bounding of mixing (new inhibitor candidate) with the targeted enzyme. DUD-E is a database to simulate docking with high dimensional data characteristic, which lead to the feasibility of machine learning approach as the analytical tool. A compound with specific characterictics can be classified into ligand or decoy by using many characterictics leading to a problem in the machine learning algorithm. This paper discusses feature selection analysis to obtain the compound characteristics which are effectively determine ligand or decoy. This paper examined Mutual Information-based Feature Selection (MIFS), Correlation-based Feature Selection (CFS) as well as Fast Correlation-Based Filter (FCBF), and the results show that the FCBF always selects less number of features with fastest runtime of classification. The highest classification accuracy is obtained when all features are used in the classification by k-NN. However, the accuracy is slightly different with classification using selected features. The CFS method performs well for Data-A with accuracy of 89,55%, while the MIFS outperforms the others for Data-B and Data-C with the classification accuracy of 92,34% and 95,20% consecutively.
|Number of pages||9|
|Journal||Procedia Computer Science|
|Publication status||Published - 1 Jan 2018|
|Event||3rd International Neural Network Society Conference on Big Data and Deep Learning, INNS BDDL 2018 - Sanur, Bali, Indonesia|
Duration: 17 Apr 2018 → 19 Apr 2018