Model deep learning untuk klasifikasi fragmen metagenom dengan spaced k-mers sebagai ekstraksi fitur

Deep learning model for metagenome fragment classification using spaced k-mers feature extraction

Nur Choiriyati  -  Department of Computer Science, IPB University, Indonesia
Yandra Arkeman  -  Department of Agro-Industrial Engineering, IPB University, Indonesia
*Wisnu Ananta Kusuma  -  Department of Computer Science, IPB University, Indonesia
Received: 3 Jul 2019; Revised: 5 May 2020; Accepted: 25 May 2020; Published: 31 Jul 2020; Available online: 3 Jul 2020.
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Article Info
Section: Original Research Articles
Language: ID
Statistics: 190 36
Abstract
An open challenge in bioinformatics is the analysis of the sequenced metagenomes from the various environments. Several studies demonstrated bacteria classification at the genus level using k-mers as feature extraction where the highest value of k gives better accuracy but it is costly in terms of computational resources and computational time. Spaced k-mers method was used to extract the feature of the sequence using 111 1111 10001 where 1 was a match and 0 was the condition that could be a match or did not match. Currently, deep learning provides the best solutions to many problems in image recognition, speech recognition, and natural language processing. In this research, two different deep learning architectures, namely Deep Neural Network (DNN) and Convolutional Neural Network (CNN), trained to approach the taxonomic classification of metagenome data and spaced k-mers method for feature extraction. The result showed the DNN classifier reached 90.89 % and the CNN classifier reached 88.89 % accuracy at the genus level taxonomy.
Keywords: classification; deep learning; metagenomes; spaced k-mers

Article Metrics:

  1. H. Wu, "PCA-based linear combinations of oligonucleotide frequencies for metagenomic DNA fragment binning," in IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Sun Valley, USA, Sept. 2008, pp. 46-53. doi: 10.1109/CIBCB.2008.4675758
  2. C. Simon and R. Daniel, "Metagenomic Analyses: Past and Future Trends," Applied and Environmental Microbiology, vol. 4, no. 77, pp. 1153-1161, 2011. doi: 10.1128/AEM.02345-10
  3. J. Qin, "A metagenome-wide association study of gut microbiota in type 2 diabetes," Nature, vol. 490, pp. 55-60, 2012. doi: 10.1038/nature11450
  4. P. J. Turnbaugh, R. E. Ley, M. A. Mahowald, V. Magrini, E. R. Mardis, and J. I. Gordon, "An obesity-associated gut microbiome with increased capacity for energy harvest," Nature, vol. 444, pp. 1027-1031, 2006. doi: 10.1038/nature05414
  5. J.-L. Bouchot, W. L. Trimble, G. Ditzler, Y. Lan, S. Essinger, and G. Rosen, "Advances in machine learning for processing and comparison of metagenomic data," in Computational Systems Biology, Molecular Mechanisms to Disease: Second Edition, Elsevier, pp. 295-329, 2013. doi: 10.1016/B978-0-12-405926-9.00014-9
  6. H. Zheng and H. Wu, "A novel LDA and PCA-based hierarchical scheme for metagenomic fragment binning," in IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Nashville, USA, Apr. 2009, pp. 53-59. doi: 10.1109/CIBCB.2009.4925707
  7. D. H. Huson, A. F. Auch, J. Qi, and S. C. Schuster, "MEGAN analysis of metagenomic data," Genome Research, vol. 17, no. 3, pp. 377-386, 2007. doi: 10.1101/gr.5969107
  8. A. C. McHardy and I. Rigoutsos, "What's in the mix: phylogenetic classification of metagenome sequence samples," Current Opinion in Microbiology, vol. 10, no. 5, pp. 499-503, 2007. doi: 10.1016/j.mib.2007.08.004
  9. W. A. Kusuma and Y. Akiyama, "Metagenome fragment binning based on characterization vector," in International Conference on Bioinformatics and Biomedical Technology, Sanya, China, Mar. 2011, pp. 1-5.
  10. A. A. Pekuwali, W. A. Kusuma, and A. Buono, "Optimization of spaced k-mer frequency feature extraction using genetic algorithms for metagenome fragment classification," Journal of ICT Research and Applications, vol. 12, no. 2, pp. 123-137, 2018. doi: 10.5614/itbj.ict.res.appl.2018.12.2.2
  11. Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. doi: 10.1038/nature14539
  12. A. Fiannaca et al., "Deep learning models for bacteria taxonomic classification of metagenomic data," BMC Bioinformatics, vol. 19, no. 7, pp. 73-154, 2018. doi: 10.1186/s12859-018-2182-6
  13. P. Sunil, T. Rashmi, K. Vandana, and V. Pritish, "DeepInteract: deep neural network based protein-protein interaction prediction tool," Current Bioinformatic, vol. 12, no. 6, pp. 551-557, 2017. doi: 10.2174/1574893611666160815150746
  14. D. C. Richter, F. Ott, A. F. Auch, R. Schmid, and D. H. Huson, "MetaSim-A sequencing simulator for genomics and metagenomics," PLoS ONE, vol. 3, no. 10, pp. 417-412, 2008. doi: 10.1371/journal.pone.0003373
  15. B. M. J. Tromp and M. Li, "PatternHunter: faster and more sensitive homology search," Bioinformatics, vol. 18, no. 3, pp. 440-445, 2002. doi: 10.1093/bioinformatics/18.3.440
  16. S. Karsoliya, "Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture," International Journal of Engineering Trends and Technology, vol. 3, no. 6, pp. 714-717, 2012.

No citation recorded.