skip to main content

Prediksi interaksi protein-protein berbasis sekuens protein menggunakan fitur autocorrelation dan machine learning

Sequence-based prediction of protein-protein interaction using autocorrelation features and machine learning

1Department of Computer Science, IPB University. Jl. Raya Dramaga, Kampus IPB Dramaga, Bogor 16680, Indonesia

2Tropical Biopharmaca Research Center, IPB University. Jl. Taman Kencana No. 3, Bogor 16128, Indonesia

Received: 18 Nov 2020; Revised: 14 Sep 2021; Accepted: 4 Jan 2022; Published: 31 Jan 2022.
Open Access Copyright (c) 2022 The authors. Published by Department of Computer Engineering, Universitas Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
Protein-protein interaction (PPI) can define a protein's function by knowing the protein's position in a complex network of protein interactions. The number of PPIs that have been identified is relatively small. Therefore, several studies were conducted to predict PPI using protein sequence information. This research compares the performance of three autocorrelation methods: Moran, Geary, and Moreau-Broto, in extracting protein sequence features to predict PPI. The results of the three extractions are then applied to three machine learning algorithms, namely k-Nearest Neighbor (KNN), Random Forest, and Support Vector Machine (SVM). The prediction models with the three autocorrelation methods can produce predictions with high average accuracy, which is 95.34% for Geary in KNN, 97.43% for Geary in RF, and 97.11% for Geary and Moran in SVM. In addition, the interacting protein pairs tend to have similar autocorrelation characteristics. Thus, the autocorrelation method can be used to predict PPI well.
Keywords: autocorrelation; machine learning; protein-protein interaction; protein sequence
Funding: Kementrian Riset, Teknologi, dan Pendidikan Tinggi under contract 4168/IT3.I.1/PN/2019

Article Metrics:

  1. A. Athanasios, V. Charalampos, T. Vasileios, and G. M. Ashraf, “Protein-Protein Interaction (PPI) network: recent advances in drug discovery,” Current Drug Metabolism, vol. 18, pp. 5-10, 2017
  2. J. D. L. Rivas and C. Fontanillo, “Protein-protein interactions essentials: key concepts to building and analyzing interactome networks,” PLOS Computational Biology, vol. 6, 2010
  3. S. Jones and J. M. Thornton, “Principles of protein-protein interactions,” Proceedings of the National Academy of Sciences of the United States of America, vol. 93, pp. 13-20, 1996
  4. Z. H. You Y. K. Lei, L. Zhu, J. Xia, and B. Wang. 2013, “Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis,” BMC Bioinformatics, vol. 14, 2013
  5. L. Skrabanek, H. K. Saini, G. D. Bader, and A. J. Enright, “Computational prediction of protein-protein interactions,” Molecular Biotechnology, vol. 38, pp. 1-17, 2008
  6. X. Y. Pan, Y. N. Zhang, and H. B. Shen. 2010. “Large-Scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features,” Journal of Proteome Research, vol. 9, pp. 4992-5001, 2010
  7. J. Shen, J. Zhang, X. Luo, W. Zhu, K. Yu, K. Chen, Y. Li, and H. Jiang, “Predicting protein-protein interactions based only on sequences information,” Proceedings of the National Academy of Sciences of the United States of America, vol. 104, pp. 4337-4341, 2007
  8. H. S. Stoker, Organic and Biological Chemistry, 7th ed. Boston, US: Cengage Learning, 2015
  9. Y. Z. Guo, L. Z. Yu, Z. N. Wen, and M. L. Li, “Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences,” Nucleic Acids Research, vol. 36, pp. 3025-3030, 2008
  10. L. Yang, J. F. Xia, and J. Gui, “Prediction of protein-protein interactions from protein sequence using local descriptors,” Protein & Peptide Letters, vol. 17, pp. 1085-1090, 2010
  11. J. Pevsner, Bioinformatics and Functional Genomics, 2nd ed. New Jersey, US: John Wiley & Sons, 2009
  12. J. Xia, K. Han, and D. Huang, “Sequence-Based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor,” Protein & Peptide Letters, vol. 17, pp. 137-145, 2010
  13. Z. You, J. Yu, L. Zhu, S. Li, and Z. Wen, “Neurocomputing a MapReduce based parallel SVM for large-scale predicting protein-protein interactions,” Neurocomputing, vol. 145, pp. 37-43, 2014
  14. S. A. K. Ong, H. H. Lin, Y. Z. Chen, Z. R. L, and Z. Cao, “Efficacy of different protein descriptors in predicting protein functional families,” BMC Bioinformatics, vol. 14, pp.1-14, 2007. doi: 10.1186/1471-2105-8-300
  15. L. Lan, N. Djuric, Y. Guo, and S. Vucetic, “MS-kNN: protein function prediction by integrating multiple data sources,” BMC Bioinformatics, vol. 14, 2013
  16. R. M. Parry et al., “k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction,” The Pharmacogenomics Journal, vol. 10, pp. 292-309, 2010
  17. G. Paul, R. Sua, M. Romaina, V. Sebastien, V. Pierre, and G. Isabellea, “Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classier,” Computerized Medical Imaging and Graphics, vol. 16, 2016
  18. S. H. Wijaya, I. Batubara, T. Nishioka, M. A. U. Amin, and S. Kanaya, “Metabolomic studies of indonesian jamu medicines: prediction of jamu efficacy and identification of important metabolites,” Molecular Informatics, vol. 36, 2017
  19. A. Subudhi, M. Dash, and S. Sabut. “Automated segmentation and classification of brain stroke using expectation-maximization and random forest classifier,” Biocybernetics and Biomedical Engineering, pp.1-13, 2019
  20. S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics & Proteomics, vol. 15, pp. 41-51, 2018
  21. N. S. Ramadhanti, W. A. Kusuma, and R Heryanto, “Development of Jamu formula prediction system module of Ijah analytics based on pharmacology activity and particular efficacy target,” in IOP Conference Series: Earth and Environmental Science, vol. 335, 012003, 2019
  22. D. Szklarczyk et al., “STRING v10: protein-protein interaction networks, integrated over the tree of life,” Nucleic Acids Research, vol. 43, pp. 447-452, 2015
  23. [TUC] The UniProt Consortium, “UniProt: The Universal Protein Knowledgebase,” Nucleic Acids Research, vol. 45, pp. 158–169, 2016
  24. [IUPAC-IUB] International Union of Pure and Applied Chemistry Commission on Biochemical Nomenclature, “A one-letter notation for amino acid sequences: Tentative rules,” Biochemical Journal, vol. 113, pp. 1-4, 1968
  25. D. R. Flower, “On the utility of alternative amino acid scripts,” Bioinformation, vol. 8, pp. 539–542, 2012. doi: 10.6026/97320630008539
  26. G. E. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” SIGKDD Explore, vol. 6, pp. 20-29, 2004
  27. Y. Liang, S. Liu, and S. Zhang, “Prediction of protein structural class based on different autocorrelation descriptors of position-specific scoring matrix,” MATCH Communications in Mathematical and in Computer Chemistry, vol. 73, pp. 765-784, 2015
  28. P. A. P. Moran, “Notes on continuous stochastic phenomena,” Biometrika, vol. 37, pp. 17-23, 1950
  29. R. C. Geary, “The contiguity ratio and statistical mapping,” The Incorporated Statistician, vol. 5, pp. 115-145, 1954
  30. G. Moreau and P. Broto, “Autocorrelation of molecular structures application to SAR studies,” Nour J Chim, vol. 4, pp. 757-767, 1980
  31. C. Tanford, “Contribution of hydrophobic interactions to the stability of the globular conformation of proteins,” Journal of the American Chemical Society, vol.84, pp. 4240-4247, 1962. doi: 10.1021/ja00881a009
  32. A. Ben-Naim, Hydrophobic Interactions, New York, US: Springer, 1980
  33. W. R. Krigbaum and A. Komoriya, “Local interactions as a structure determinant for protein molecules: II,” Biochimica et Biophysica Acta (BBA)-Protein Structure. vol. 576, pp. 204-228, 1979. doi: 10.1016/0005-2795(79)90498-7
  34. R. Grantham, “Amino acid difference formula to help explain protein evolution,” Science, vol. 185, pp. 862-864, 1974. doi: 10.1126/science.185.4154.862
  35. M. Charton and B. I. Charton, “The structural dependence of amino acid hydrophobicity parameters,” Journal of Theoretical Biology, vol. 99, pp. 629-644, 1982. doi: 10.1016/0022-5193(82)90191-6
  36. G. Rose, A. Geselowitz, G. Lesser, R Lee, and M. Zehfus, “Hydrophobicity of amino acid residues in globular proteins,” Science, vol. 229, pp. 834-838, 1985. doi: 10.1126/science.4023714
  37. P. Zhou, F. F. Tian, B. Li, S. R. Wu, and Z. L. Li, “Genetic algorithm base virtual screening of combinative mode for peptide/protein,” Acta Chim Sinica, vol. 64, pp. 691-697, 2006
  38. M. W. Browne, “Cross-validation methods,” Journal of Mathematical Psychology, vol. 44, pp. 108-132, 2000
  39. S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN classification,” ACM Trans Intell Syst Technol, vol. 8, 2017
  40. S. Zhang, X. Li, M. Zong, X. Zhu X, R. Wang, “Efficient kNN classification with different numbers of nearest neighbors,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, pp. 1774-1785, 2018
  41. T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, pp. 21-27, 1967
  42. X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, J. H. McLachlan, A. Ng, B. Liu, B. P. Yu, et al., “Top 10 algorithms in data mining,” Knowledge Information System, vol. 14, pp. 1-37, 2008
  43. R. Romero, E. L. Iglesias, L. Borrajo, “A Linear-RBF multikernel SVM to classify big text corpora,” BioMed Research International, 2015
  44. C. Cortes and V. Vapnik, “Support-vector network,” Machine Learning, vol. 20, pp. 273-297, 1995
  45. J. Han, M. Kamber, J. Pei, Data Mining Concepts and Techniques, 3rd ed. Waltham (US): Elsevier, 2012

Last update:

No citation recorded.

Last update: 2024-12-21 03:35:12

No citation recorded.