skip to main content

Identification of the distribution village maturation: Village classification using Density-based spatial clustering of applications with noise

1Informatics Engineering Department, Universitas Islam Negeri Sultan Syarif Kasim Riau. Jl. HR. Soebrantas Panam Km. 15 No. 155, Tuah Madani, Kec. Tampan, Kampar Regency, Riau 28293, Indonesia

2School of Computing, Faculty Engineering, Universiti Teknologi Malaysia. UTM Johor Bahru, Johor 81310, Malaysia

3Information System Department, Universitas Islam Negeri Sultan Syarif Kasim Riau. Jl. HR. Soebrantas Panam Km. 15 No. 155, Tuah Madani, Kec. Tampan, Kampar Regency, Riau 28293, Indonesia

4 Prism Lab, Insa Center Val de Loire. 88 Boulevard Lahitolle, Bourges 18000, France

View all affiliations
Received: 3 Dec 2020; Revised: 19 Mar 2021; Accepted: 24 Apr 2021; Published: 31 Jul 2021; Available online: 26 Apr 2021.
Open Access Copyright (c) 2021 The Authors. Published by Department of Computer Engineering, Universitas Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
The rural development measurement is undoubtedly not easy due to its particular needs and conditions. This study classifies village performance from social, economic, and ecological indices. One thousand five hundred ninety-one villages from the Community and Village Empowerment Office at Riau Province, Indonesia, are grouped into five village maturation classes: very under-developed village, under-developed village, developing village, developed village, and independent village. To date, Density-based spatial clustering of applications with noise (DBSCAN) is utilized in mining 13 of the villages’ attributes. Python programming is applied to analyze and evaluate the DBSCAN activities. The study reveals the grouping’s silhouette coefficient values at 0.8231, thus indicating the well-being clustering performance. The epsilon and minimum points values are considered in DBSCAN evaluation with percentage splits simulation. This grouping can be used as guidelines for governments in analyzing the distribution of rural development subsidies more optimal.

Note: This article has supplementary file(s).

Fulltext View|Download |  Dataset, Data Analysis
Supplementary Data
Subject The collected data of villages at Riau Province from the year 2018 and the results of DBSCAN analysis of villages classification on three main attributes, namely IKS, IKL, and IKE
Type Dataset, Data Analysis
  Download (299KB)    Indexing metadata
Keywords: clustering; density-based spatial clustering of applications with noise; Python; silhouette coefficient;village maturity
Funding: Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia;Riau Province Community and Village Empowerment Service, Indonesia;Universiti Teknologi Malaysia;Insa Center Val de Loire, Bourges, France

Article Metrics:

  1. H. S. Bakti, “Identifikasi masalah dan potensi desa berbasis indek desa membangun (IDM) di desa Gondowangi kecamatan Wagir kabupaten Malang,” Wiga: Jurnal Penelitian Ilmu Ekonomi., vol. 7, no. 1, pp. 1–14, 2018. doi: 10.30741/wiga.v7i1.331
  2. M. Stit, N. Kusuma, And E. Purwanti, “Village index analysis building to know the village development in Gadingrejo district Pringsewu District,” Inovasi Pembangunan: Jurnal Kelitbangan, vol. 6, no. 2, pp. 179–190, 2018. doi: 10.30741/wiga.v7i1.331
  3. A. Aprianti, M. Marliani, Y. Yunindyawati, and F. Nomaini, “Pengaruh program satu desa satu PAUD,” thesis, Sriwijaya University, Indonesia. 2018
  4. G. Bathla, H. Aggarwal, And R. Rani, “A novel approach for clustering big data based on Mapreduce,” International Journal of Electrical and Computer Engineering, vol. 8, no. 3, pp. 1711–1719, 2018. doi: 10.11591/ijece.v8i3.pp1711-1719
  5. A. Amelio and A. Tagarelli, Data Mining : Clustering. Encyclopedia of Bioinformatics and Computational Biology, 2018
  6. R. Filipovych et al., “Semi-supervised cluster analysis of imaging data,” NeuroImage, vol. 54, pp. 2185-2197, 2011. doi: 10.1016/j.neuroimage.2010.09.074
  7. A. Bewley and B. Upcroft, “Advantages of exploiting projection structure for segmenting dense 3D point clouds,” in Australasian Conference on Robotics and Automation, Sydney, Australia, Dec. 2013, pp. 2–4
  8. J. R. Saura, “Using data sciences in digital marketing: framework, methods, and performance metrics,” Journal of Innovation & Knowledge, vol. 6, no. 2, pp. 92-102, 2020. doi: 10.1016/j.jik.2020.08.001
  9. Y. Yang, E. W. K. See-To, and S. Papagiannidis, “You have not been archiving emails for no reason! Using big data analytics to cluster B2B interest in products and services and link clusters to financial performance,” Industrial Marketing Management, vol. 86, 2018, pp. 16–29, 2020. doi: 10.1016/j.indmarman.2019.01.016
  10. N. Tomasevic, N. Gvozdenovic, and S. Vranes, “An overview and comparison of supervised data mining techniques for student exam performance prediction,” Computers & Education, vol. 143, 103676, 2020. doi: 10.1016/j.compedu.2019.103676
  11. M. C. Thomas, W. Zhu, and J. A. Romagnoli, “Data mining and clustering in chemical process databases for monitoring and knowledge discovery,” Journal of Process Control, vol. 67, pp. 160–175, 2018. doi: 10.1016/j.jprocont.2017.02.006
  12. S. Zheng and J. Zhao, “A new unsupervised data mining method based on the stacked autoencoder for chemical process fault diagnosis,” Computers and Chemical Engineering, vol. 135, 106755, 2020. doi: 10.1016/j.compchemeng.2020.106755
  13. Y. Guo, N. Wang, Z. Y. Xu, and K. Wu, “The internet of things-based decision support system for information processing in intelligent manufacturing using data mining technology,” Mechanical Systems and Signal Processing, vol. 142, 106630, 2020. doi: 10.1016/j.ymssp.2020.106630
  14. G. Grigoras and F. Scarlatache, “An assessment of the renewable energy potential using a clustering based data mining method. Case study in Romania,” Energy, vol. 81, pp. 416–429, 2015. doi: 10.1016/
  15. L. Kaufman and P.J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, volume (344). John Wiley & Sons, 2009
  16. G. Karypis, E. H. Han, and V. Kumar, “Chameleon: Hierarchical clustering using dynamic modeling,” Computer, vol. 32, no. 8, pp. 68–75, 1999. doi: 10.1109/2.781637
  17. D. M. Saputra, D. Saputra, and L. D. Oswari, “Effect of distance metrics in determining k-value in k-means clustering using elbow and silhouette method,” in Sriwijaya International Conference on Information Technology and Its Applications, Palembang, Indonesia, Nov. 2019, pp. 341–346. doi: 10.2991/aisr.k.200424.051
  18. S. Wang, D. Wang, C. Li, Y. Li, and G. Ding, “Clustering by fast search and find of density peaks with data field,” Chinese Journal of Electronics, vol. 25, no. 3, pp. 397–402, 2016. doi: 10.1049/cje.2016.05.001
  19. H. P. Kriegel, P. Kröger, J. Sander, and A. Zimek, “Density-based clustering,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 231–240, 2011. doi: 10.1002/widm.30
  20. M. M. R. Khan, M. A. B. Siddique, R. B. Arif, and M. R. Oishe, “ADBSCAN: Adaptive density-based spatial clustering of applications with noise for identifying clusters with varying densities,” in 4th International Conference on Electrical Engineering and Information and Communication Technology, Dhaka, Bangladesh, Sept. 2019, pp. 107–111. doi: 10.1109/CEEICT.2018.8628138
  21. P. B. Nagpa and P. A. Mann, “Comparative study of density-based clustering algorithms,” International Journal of Computer Applications, vol. 27, no. 11, pp. 44–47, 2011. doi: 10.5120/3341-4600
  22. M. Esther, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” KDD-96 Proceedings, vol. 96, no. 34, pp. 226–231, 1996
  23. R. Arya and G. Sikka, “An optimized approach for density based spatial clustering application with noise,” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of the Computer Society of India, vol I, 2014, pp. 695-702. doi: 10.1007/978-3-319-03107-1_76
  24. B. Borah and D. Bhattacharyya, “An improved sampling-based DBSCAN for large spatial databases,” in Intelligent Sensing and Information Processing, Chennai, India, Jan. 2004, pp. 92-96. doi: 10.1109/ICISIP.2004.1287631
  25. B.Z. Qiu, X.Z. Zhang, and J.Y.I Shen, “Grid-based clustering algorithm for multi-density,” in International Conference on Machine Learning and Cybernetics, Guangzhou, China, Aug. 2005, pp. 1509–1512. doi: 10.1109/ICMLC.2005.1527183
  26. C. Xiaoyun, M. Yufang, Z. Yan, and W. Ping, “GMDBSCAN: Multi-density DBSCAN cluster based on grid,” in IEEE International Conference on e-Business Engineering, Xi’an, China, Oct. 2008, pp. 780–783. doi: 10.1109/ICEBE.2008.54
  27. A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, Vol. 344, no. 6191, pp. 1492–1496, 2014. doi: 10.1126/science.1242072
  28. L. Yinghua et al., “An efficient and scalable density-based clustering algorithm for datasets with complex structures,” Neurocomputing, vol. 171, pp. 9–22, 2016. doi: 10.1016/j.neucom.2015.05.109
  29. C. Deng, J. Song, R. Sun, S. Cai, and Y. Shi, “Griden: An effective grid-based and density-based spatial clustering algorithm to support parallel computing,” Pattern Recognition Letters, vol. 109, pp. 81–88, 2018. doi: 10.1016/j.patrec.2017.11.011
  30. G. Andrade, G. Ramos, D. Madeira, R. Sachetto, R. Ferreira, and L. Rocha, “G-DBSCAN: A GPU accelerated algorithm for density-based clustering,” Procedia Computer Science, vol. 18, pp. 369–378, 2013. doi: 10.1016/j.procs.2013.05.200
  31. M. Hosseini-Rad and M. Abdolrazzagh-Nezhad, “A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering,” Soft Computing, vol. 24, no. 20, pp. 15529–15549, 2020. doi: 10.1007/s00500-020-04881-0
  32. H. Hanibal et al., Indeks desa membangun kementrian desa, pembangunan daerah tertinggal dan transmigrasi. Jakarta, Indonesia, 2015
  33. O. Okfalisa, R. Fitriani, and Y. Vitriani, “The comparison of linear regression method and k-nearest neighbors in scholarship recipient,” in 19th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Busan, Korea, Jun. 2018, pp. 194–199. doi: 10.1109/SNPD.2018.8441068
  34. O. Okfalisa, I. Gazalba, M. Mustakim, and N. G. I. Reza, “Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification,” in International Conferences on Information Technology, Information Systems and Electrical Engineering, Yogyakarta, Indonesia, Nov. 2017, pp. 294–298. doi: 10.1109/ICITISEE.2017.8285514
  35. H. Yan, N. Yang, Y. Peng, and Y. Ren, “Data mining in the construction industry: Present status, opportunities, and future trends,” Automation in Construction, vol. 119, no. August 2019, 103331, 2020. doi: 10.1016/j.autcon.2020.103331
  36. Han, Jiawei, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011
  37. E. Sharma, M. Mussetta and W. Elmenreich, “Investigating the impact of data quality on the energy yield forecast using data mining techniques,” in 2020 IEEE PES Innovative Smart Grid Technologies Europe, The Hague, Netherlands, Oct. 2020, pp. 599-603. doi: 10.1109/ISGT-Europe47291.2020.9248920
  38. P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,” in International Conference on Electrical, Electronics, and Optimization Techniques, Chennai, India, Mar. 2016. doi: 10.1109/ICEEOT.2016.7754750
  39. S.R. Kannan, “A new segmentation system for MR brain images based on fuzzy techniques,” Applied Soft Computing Journal, vol. 8, no. 4, pp. 1599– 1606, 2008. doi: 10.1016/j.asoc.2007.10.025
  40. P.J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. doi: 10.1016/0377-0427(87)90125-7
  41. V. T. P. Swindiarto, R. Sarno, and D. C. R. Novitasari, “Integration of Fuzzy C-Means Clustering and TOPSIS (FCM-TOPSIS) with silhouette analysis for multi criteria parameter data,” in International Seminar on Application for Technology of Information and Communication, Semarang, Indonesia, Sept. 2018, pp. 463–468. doi: 10.1109/ISEMANTIC.2018.8549844
  42. B. Rozemberczki, O. Kiss, and R. Sarkar, “Karate club: an api oriented open-source python framework for unsupervised learning on graphs,” in 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, Oct. 2020, pp. 3125–3132. doi: 10.1145/3340531.3412757
  43. P. Virtanen et al., “SciPy 1.0: fundamental algorithms for scientific computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, 2020
  44. Y. M. Elbarawy, R. F. Mohamed, and N. I. Ghali, “Improving social network community detection using DBSCAN algorithm,” in World Symposium on Computer Applications and Research, Sousse, Tunisia, Jan. 2014, pp. 1-6. doi: 10.1109/WSCAR.2014.6916792
  45. M. Khatoon and W. A. Banu, “An efficient method to detect communities in social networks using DBSCAN algorithm,” Social Network Analysis and Mining, vol. 9, no. 1, pp. 1-12, 2019. doi: 10.1007/s13278-019-0554-1
  46. Y. Xie and S. Shekhar, “Significant DBSCan towards statistically robust clustering,” in ACM International Conference Proceeding Series, Vienna, Austria, Aug. 2019, pp. 31–40. doi: 10.1145/3340964.3340968

Last update:

No citation recorded.

Last update: 2021-09-19 06:59:13

No citation recorded.