Perbandingan penghitungan jarak pada k-nearest neighbour dalam klasifikasi data tekstual

Wahyono Wahyono; I Nyoman Prayana Trisna; Sarah Lintang Sariwening; Muhammad Fajar; Danur Wijayanto

doi:10.14710/jtsiskom.8.1.2020.54-58

DOI: https://doi.org/10.14710/jtsiskom.8.1.2020.54-58

Perbandingan penghitungan jarak pada k-nearest neighbour dalam klasifikasi data tekstual

Comparison of distance measurement on k-nearest neighbour in textual data classification

Wahyono Wahyono¹

, I Nyoman Prayana Trisna², Sarah Lintang Sariwening², Muhammad Fajar², Danur Wijayanto²

¹Department of Computer Science and Electronic, Universitas Gadjah Mada, Indonesia

²Master of Computer Science, Universitas Gadjah Mada, Indonesia

Received: 15 Jun 2019; Revised: 22 Oct 2019; Accepted: 5 Nov 2019; Available online: 15 Nov 2019; Published: 31 Jan 2020.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

One algorithm to classify textual data in automatic organizing of documents application is KNN, by changing word representations into vectors. The distance calculation in the KNN algorithm becomes essential in measuring the closeness between data elements. This study compares four distance calculations commonly used in KNN, namely Euclidean, Chebyshev, Manhattan, and Minkowski. The dataset used data from Youtube Eminem’s comments which contain 448 data. This study showed that Euclidian and Minkowski on the KNN algorithm achieved the best result compared to Chebycev and Manhattan. The best results on KNN are obtained when the K value is 3.

Fulltext View|Download Email colleagues

Keywords: KNN; tekstual data; distance measurement; Euclidean; Chebyshev; Manhattan; Minkowski

Funding: Universitas Gadjah Mada, Indonesia

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In Volume 8, Issue 1, Year 2020 (January 2020)

Grape leaf image disease classification using CNN-VGG16 model Object Tracking System Using Combination of Optical Flow Algorithm and Template Matching Classification of beneficiaries for the rehabilitation of uninhabitable houses using the K-Nearest Neighbor algorithm Parameter tuning in KNN for software defect prediction: an empirical analysis Accuracy Comparison of C4.5 and CART Algorithms in Predicting Student Achievement Index Category More related articles

Most cited articles

Perancangan Jaringan Sensor Terdistribusi untuk Pengaturan Suhu, Kelembaban dan Intensitas Cahaya Intelligent System of Parking Reservation and Monitoring on Campus using Internet of Things Concept Speed Bump Enforcement System Based on Vehicle Speed Classified by Haar Cascade Classifier Human Vital Physiological Parameters Monitoring: A Wireless Body Area Technology Based Internet of Things Perancangan Game Math Adventure Sebagai Media Pembelajaran Matematika Berbasis Android More cited articles

R. Feldman and J. Sanger, The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007
T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967. doi: 10.1109/TIT.1967.1053964
T. Bailey and A. K. Jain, “A note on distance-weighted k-nearest neighbor rules,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 8, no. 4, pp. 311-313, 1978. doi: 10.1109/TSMC.1978.4309958
V. B. Prasath et al., “Distance and similarity measures effect on the performance of k-nearest neighbor classifier - a review,“ arXiv:1708.04321 v3 [cs.LG], 2019. doi: 10.1089/big.2018.0175
Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov, "Approximate nearest neighbor algorithm based on navigable small world graphs," Information Systems, vol. 45, pp. 61–68, 2014. doi: 10.1016/j.is.2013.10.006
T. Kirdat and V. V. Patil, “Application of Chebyshev distance and Minkowski distance to CBIR using color histogram,” International Journal of Innovative Research in Technology (IJIRT), vol. 2, no. 9, pp. 28-31, 2016
N. E. Md Isa, A. Amir, M. Z. Ilyas, and M. S. Razalli, “The performance analysis of k-nearest neighbors (K-NN) algorithm for motor imagery classification based on EEG signal, ” in 2017 International Conference on Emerging Electronic Solutions for IoT, Penang, Malaysia, Oct. 2017, pp. 1-6. doi: 10.1051/matecconf/201714001024
N. Ali, D. Neagu, and P. Trundle, “Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets,” SN Applied Sciences, vol. 1, no. 1559, 2019. doi: 10.1007/s42452-019-1356-9
K. Chomboon, P. Chujai, P. Teerarassamee, K. Kerdprasop, and N. Kerdprasop, “An empirical study of distance metrics for k-nearest neighbor algorithm,” in the 3rd International Conference on Industrial Application Engineering 2015, Kitakyushu, Japan, Mar. 2015, pp. 1-6. doi: 10.12792/iciae2015.051
L. Y. Hu, M. W. Huang, S. W. Ke, and C. F. Tsai, “The distance function effect on k-nearest neighbor classification for medical datasets,” SpringerPlus, vol. 5, no. 1, pp. 1-9, 2016. doi: 10.1186/s40064-016-2941-7
P. Mulak and N. Talhar, “Analysis of distance measures using k-nearest neighbor algorithm on kdd dataset,” International Journal of Science and Research, vol. 4, no. 7, pp.2101-2104, 2015
D. Sinwar and R. Kaushik, “Study of Euclidean and Manhattan distance metrics using simple k-means clustering,” International Journal for Research in Applied Science and Engineering Technology (IJRASET), vol. 2, no. 5, pp. 270-274, 2014
R. Todeschini, D. Ballabio, V. Consonni, and F. Grisoni, “A new concept of higher-order similarity and the role of distance/similarity measures in local classification methods,” Chemometrics and Intelligent Laboratory Systems, vol. 157, pp. 50-57, 2016. doi: 10.1016/j.chemolab.2016.06.013
Y. Shikhar, V. P. Singh, and R. Srivastava, “Comparative analysis of distance metrics for designing an effective content-based image retrieval system using colour and texture features,” International Journal of Image, Graphics, and Signal Processing, vol. 12, no. 7, pp. 58-65, 2017. doi: 10.5815/ijigsp.2017.12.07
P. Koniusz, F. Yan, P. H. Gosselin, and K. Mikolajczyk, "Higher-order occurrence pooling for bags-of-words: visual concept detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 2, pp. 313–326, 2017. doi: 10.1109/TPAMI.2016.2545667

Last update:

Discrimination of civet coffee using visible spectroscopy
Graciella Mae L Adier, Charlene A Reyes, Edwin R Arboleda. Jurnal Teknologi dan Sistem Komputer, 8 (3), 2020. doi: 10.14710/jtsiskom.2020.13734
Pneumonia identification based on lung texture analysis using modified k-nearest neighbour
S Kana Saputra, Insan Taufik, Mhd Hidayat, Dinda Farahdilla Dharma. Journal of Physics: Conference Series, 2193 (1), 2022. doi: 10.1088/1742-6596/2193/1/012070
K-nearest neighbor performance for Nusantara scripts image transliteration
Anastasia Rita Widiarti. Jurnal Teknologi dan Sistem Komputer, 8 (2), 2020. doi: 10.14710/jtsiskom.8.2.2020.150-156
Klasifikasi Pengeluaran per Kapita di Tiga Provinsi Sulawesi menggunakan K-Nearest Neighbor
Ismi Rizqa Lina, Dia Cahya Wati . J Statistika: Jurnal Ilmiah Teori dan Aplikasi Statistika, 16 (1), 2023. doi: 10.36456/jstat.vol16.no1.a7193
Identification and Classification of Pathogenic Bacteria Using the K-Nearest Neighbor Method
Diana Rahmawati, Mutiara Puspa Putri I, Miftachul Ulum, Koko Joni. JEEE-U (Journal of Electrical and Electronic Engineering-UMSIDA), 5 (1), 2021. doi: 10.21070/jeeeu.v5i1.1221
Innovations in Machine and Deep Learning
José Ángel Villarreal-Hernández, María Lucila Morales-Rodríguez, Nelson Rangel-Valdez, Claudia Gómez-Santillán. Studies in Big Data, 134 , 2023. doi: 10.1007/978-3-031-40688-1_4

Last update: 2025-08-10 15:28:33

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Perbandingan penghitungan jarak pada k-nearest neighbour dalam klasifikasi data tekstual

Comparison of distance measurement on k-nearest neighbour in textual data classification

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER