Kombinasi metode NER-OCR untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa Indonesia

Ahmad Syarif Rosidy; Tubagus Mohammad Akhriza; Mochammad Husni

doi:10.14710/jtsiskom.2020.13686

DOI: https://doi.org/10.14710/jtsiskom.2020.13686

Kombinasi metode NER-OCR untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa Indonesia

Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters

Ahmad Syarif Rosidy, Tubagus Mohammad Akhriza

, Mochammad Husni

STMIK PPKIA Pradnya Paramita, Indonesia

Received: 27 Feb 2020; Revised: 9 Jul 2020; Accepted: 10 Jul 2020; Available online: 13 Jul 2020; Published: 31 Oct 2020.

Citation Format:

Abstract

Event organizers in Indonesia often use websites to disseminate information about these events through digital posters. However, manually processing for transferring information from posters to websites is constrained by time efficiency, given the increasing number of posters uploaded. Also, information retrieval methods, such as Named Entity Recognition (NER) for Indonesian posters, are still rarely discussed in the literature. In contrast, the NER method application to Indonesian corpus is challenged by accuracy improvement because Indonesian is a low-resource language that causes a lack of corpus availability as a reference. This study proposes a solution to improve the efficiency of information extraction time from digital posters. The proposed solution is a combination of the NER method with the Optical Character Recognition (OCR) method to recognize text on posters developed with the support of relevant training data corpus to improve accuracy. The experimental results show that the system can increase time efficiency by 94 % with 82-92 % accuracy for several extracted information entities from 50 testing digital posters.

Note: This article has supplementary file(s).

Fulltext View|Download | Research Results

Accuracy calculation spreadsheet

Subject
Type	Research Results
	Download (33KB) Indexing metadata

Email colleagues

Keywords: digital posters; information retrieval; named entity recognition; optical character recognition

Funding: STMIK PPKIA Pradnya Paramita, Indonesia

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In Volume 8, Issue 4, Year 2020 (October 2020)

Most viewed articles

Pembuatan Sistem Informasi Rental Mobil dengan Menggunakan Java dan Mysql K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes Pemanfaatan Augmented Reality untuk Pembelajaran Pengenalan Alat Musik Piano Expert System for Diagnosis of Plant Pest and Disease Horticulture with Forward and Backward Chaining Inference Decision Support System for Thesis Graduation Recommendation Using AHP-TOPSIS Method More articles

Most cited articles

SMS Security Improvement using RSA in Complaints Application on Regional Head Election’s Fraud Decision Support System for Subsidizing the Repair Cost of Containers Damage Using Naive Bayes Backward Chaining Analysis Model in Detecting Game Addiction Levels in Children Sistem Informasi Manajemen Pemesanan Dan Penjualan Pada UNDIP Distro Perancangan dan Pembuatan Aplikasi Visualisasi 3D Interaktif Masjid Agung Jawa Tengah Menggunakan Unity3D More cited articles

J. Armbrecht, E. Lundberg, and T. D. Andersson, A research agenda for event management. Edward Elgar Publishing, 2019. doi: 10.4337/9781788114363
R. Kushol, I. Ahsan, and M. N. Raihan, "An Android-based useful text extraction framework using image and natural language processing," International Journal of Computer Theory and Engineering, vol. 10, no. 3, pp. 77-83, 2018. doi: 10.7763/IJCTE.2018.V10.1203
K. Badwaik, K. Mahmood, and A. Raza, "Towards applying OCR and semantic web to achieve optimal learning experience," in IEEE 13th International Symposium on Autonomous Decentralized Systems, Bangkok, Thailand, Mar. 2017, pp. 262-267. doi: 10.1109/ISADS.2017.40
A. Das, D. Ganguly, and U. Garain, "Named entity recognition with word embeddings and wikipedia categories for a low-resource language," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 16, no. 3:18, 2017. doi: 10.1145/3015467
L. Derczynski et al., "Analysis of named entity recognition and linking for tweets," Information Processing and Management, vol. 51, no. 2, pp. 32-49, 2015. doi: 10.1016/j.ipm.2014.10.006
A. Ritter, C. Sam, Mausam, and O. Etzioni, "Named entity recognition in tweets: An experimental study," in Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, Jul. 2011, pp. 1524-1534
Li and Y. Liu, "Improving named entity recognition in tweets via detecting non-standard words," in 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China, Jul. 2015, pp. 929-938. doi: 10.3115/v1/P15-1090
T. M. Akhriza, H. Y. Sahaduta, and A. D. Susilo, "Improving mobility of base transceiver station locating method using telegram's application," International Journal of Technology, vol. 8, no. 1, pp. 175-183, 2017. doi: 10.14716/ijtech.v8i1.6012
I. Alfina, R. Manurung, and M. I. Fanany, "DBpedia entities expansion in automatically building dataset for Indonesian NER," in 2016 International Conference on Advanced Computer Science and Information Systems, Malang, Indonesia, Oct. 2017, pp. 335-340. doi: 10.1109/ICACSIS.2016.7872784
R. A. Leonandya, B. Distiawan, and N. H. Praptono, "A Semi-supervised algorithm for indonesian named entity recognition," in 3rd International Symposium on Computational and Business Intelligence, Bali, Indonesia, Dec. 2015, pp. 45-50. doi: 10.1109/ISCBI.2015.15
A. Luthfi, B. Distiawan, and R. Manurung, "Building an Indonesian named entity recognizer using Wikipedia and DBPedia," in International Conference on Asian Language Processing, Kuching, Malaysia, Oct. 2014, pp. 19-22. doi: 10.1109/IALP.2014.6973520
N. Peng and M. Dredze, "Improving named entity recognition for Chinese social media with word segmentation representation learning," in Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, Aug. 2016, pp. 149-155. doi: 10.18653/v1/P16-2025
I. Budi and S. Bressan, "Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language," International Journal of Business Intelligence and Data Mining, vol. 2, no. 4, pp. 426-446, 2007. doi: 10.1504/IJBIDM.2007.016382
A. S. Agbenemu, J. Yankey, and E. O., "An automatic number plate recognition system using Opencv and Tesseract OCR engine," International Journal of Computer Applications, vol. 180, no. 43, pp. 1-5, 2018. doi: 10.5120/ijca2018917150
A. E. Utami, O. D. Nurhayati, and K. T. Martono, "Aplikasi penerjemah bahasa Inggris - Indonesia dengan optical character recognition berbasis android," Jurnal Teknologi dan Sistem Komputer, vol. 4, no. 1, pp. 167-177, 2016. doi: 10.14710/jtsiskom.4.1.2016.167-177
OCR.Space, "Free OCR API," 2019. [online]. Available: https://ocr.space/ocrapi
Ç. Sönmez and A. Özgü, "A graph-based approach for contextual text normalization," in Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, Oct. 2014, pp. 313-324. doi: 10.3115/v1/D14-1037
H. D. M. Alfarohmi and M. A. Bijaksana, "Building the Indonesian NE dataset using Wikipedia and DBpedia with entities expansion method on DBpedia," in International Conference on Asian Language Processing, Bandung, Indonesia, Nov. 2018, pp. 334-339. doi: 10.1109/IALP.2018.8629117
J. Ni and R. Florian, "Improving multilingual named entity recognition with wikipedia entity type mapping," in Conference on Empirical Methods in Natural Language Processing, Austin, Texas, Nov. 2016, pp. 1275-1284. doi: 10.18653/v1/D16-1135
H. N. Abdulkhudhur, I. Q. Habeeb, Y. Yusof, and S. A. M. Yusof, "Implementation of improved Levenshtein algorithm for spelling correction word candidate list generation," Journal of Theoretical and Applied Information Technology, vol. 88, no. 3, pp. 449-455, 2016
A. T. J. Harjanta, "Preprocessing text untuk meminimalisir kata yang tidak berarti dalam proses text mining," Jurnal Informatika UPGRIS, vol. 1, no. 1, pp. 1-9, 2015
J. R. Fraenkel, N. E. Wallen, and H. H. Hyun, How to design and evaluate research in education, 8th edition. New York: McGraw-Hill, 2012
N. Quoc Viet Hung, N. T. Tam, L. N. Tran, and K. Aberer, "An evaluation of aggregation techniques in crowdsourcing," in International Conference on Web Information Systems Engineering, Nanjing, China, Oct. 2013, pp. 1-15. doi: 10.1007/978-3-642-41154-0_1
J. Sarwono, Metode penelitian kuantitatif dan kualitatif. Yogyakarta: Graha Ilmu, 2006
H. Najjichah, A. Syukur, and H. Subagyo, "Pengaruh text preprocessing dan kombinasinya," Jurnal Teknologi Informasi, vol. 15, no. 1, pp. 1-11, 2019

Last update:

HCI International 2022 Posters
Suwanto Raharjo, Ema Utami, Mochammad Yusa, Edhy Sutanta. Communications in Computer and Information Science, 1580 , 2022. doi: 10.1007/978-3-031-06417-3_50

Last update: 2026-04-07 00:54:09

No citation recorded.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Kombinasi metode NER-OCR untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa Indonesia

Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER