Finding unknown disease-related genes by comparing random forest results to secondary data in medical science study

Takahiro Koiwa, Kazutaka Nishiwaki, Hayato Ohwada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Finding a disease-related gene is important in drug discovery. However, Alzheimer's disease involves many genes, and conducting experiments to find a disease-related gene is costly. Therefore, machine learning on gene expression microarray data is a suitable method to address this problem. However, sometimes the results are not fully correct. The present study aims to find a disease-related gene using random forest and to compare the results with previous medical research. In doing so, genes are determined to be disease-related using two methods. With random forest, we use the microarray data from GEO data sets. We feature a common gene in each data and combine them into a data set. A gene that has high importance is chosen. Random forest found 51 genes. In contrast, from previous medical science research, the study found 6167 genes. The study used the square of the impact factor to rank the genes. The impact factor is gained from the average number of citations received per paper published in that journal during the two preceding years. Based on the results from random forest and comparison with previous studies, we found 31 genes that are the same. Four of these genes are related to the GENE database, and the remaining 27 genes are related potential genes or unknown genes found in this study. The high ranking among them are considered to be very likely related. A low-ranking gene should also be considered. Genes in which the first method has been extracted. However, in the second method, the genes have not extracted thought new related potential gene. In conclusion, comparing random forest results to another related study in medical science is a powerful method to find unknown genes related to Alzheimer's disease.

Original languageEnglish
Title of host publicationProceedings of the 2016 7th International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2016
PublisherAssociation for Computing Machinery
Pages24-27
Number of pages4
ISBN (Electronic)9781450347945
DOIs
Publication statusPublished - 19 Dec 2016
Event7th International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2016 - Macau, Macao
Duration: 19 Dec 201622 Dec 2016

Publication series

NameACM International Conference Proceeding Series

Conference

Conference7th International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2016
CountryMacao
CityMacau
Period19/12/1622/12/16

    Fingerprint

Keywords

  • Big data
  • Gene expression analysis
  • Machine learning

Cite this

Koiwa, T., Nishiwaki, K., & Ohwada, H. (2016). Finding unknown disease-related genes by comparing random forest results to secondary data in medical science study. In Proceedings of the 2016 7th International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2016 (pp. 24-27). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3029375.3029386