Kombinasi metode NER-OCR untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa Indonesia

Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters

Ahmad Syarif Rosidy  -  STMIK PPKIA Pradnya Paramita, Indonesia
*Tubagus Mohammad Akhriza orcid scopus  -  STMIK PPKIA Pradnya Paramita, Indonesia
Mochammad Husni  -  STMIK PPKIA Pradnya Paramita, Indonesia
Received: 27 Feb 2020; Revised: 9 Jul 2020; Accepted: 10 Jul 2020; Published: 31 Oct 2020; Available online: 13 Jul 2020.
Fulltext Fulltext | |
Accuracy calculation spreadsheet
Type Research Results
  Download (33KB)    Indexing metadata
Open Access Copyright (c) 2020 Jurnal Teknologi dan Sistem Komputer under http://creativecommons.org/licenses/by-sa/4.0.

Citation Format:
Article Info
Section: Original Research Articles
Language: ID
Statistics: 478 135
Event organizers in Indonesia often use websites to disseminate information about these events through digital posters. However, manually processing for transferring information from posters to websites is constrained by time efficiency, given the increasing number of posters uploaded. Also, information retrieval methods, such as Named Entity Recognition (NER) for Indonesian posters, are still rarely discussed in the literature. In contrast, the NER method application to Indonesian corpus is challenged by accuracy improvement because Indonesian is a low-resource language that causes a lack of corpus availability as a reference. This study proposes a solution to improve the efficiency of information extraction time from digital posters. The proposed solution is a combination of the NER method with the Optical Character Recognition (OCR) method to recognize text on posters developed with the support of relevant training data corpus to improve accuracy. The experimental results show that the system can increase time efficiency by 94 % with 82-92 % accuracy for several extracted information entities from 50 testing digital posters.

Note: This article has supplementary file(s).

Keywords: digital posters; information retrieval; named entity recognition; optical character recognition
  1. J. Armbrecht, E. Lundberg, and T. D. Andersson, A research agenda for event management. Edward Elgar Publishing, 2019. doi: 10.4337/9781788114363
  2. R. Kushol, I. Ahsan, and M. N. Raihan, "An Android-based useful text extraction framework using image and natural language processing," International Journal of Computer Theory and Engineering, vol. 10, no. 3, pp. 77-83, 2018. doi: 10.7763/IJCTE.2018.V10.1203
  3. K. Badwaik, K. Mahmood, and A. Raza, "Towards applying OCR and semantic web to achieve optimal learning experience," in IEEE 13th International Symposium on Autonomous Decentralized Systems, Bangkok, Thailand, Mar. 2017, pp. 262-267. doi: 10.1109/ISADS.2017.40
  4. A. Das, D. Ganguly, and U. Garain, "Named entity recognition with word embeddings and wikipedia categories for a low-resource language," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 16, no. 3:18, 2017. doi: 10.1145/3015467
  5. L. Derczynski et al., "Analysis of named entity recognition and linking for tweets," Information Processing and Management, vol. 51, no. 2, pp. 32-49, 2015. doi: 10.1016/j.ipm.2014.10.006
  6. A. Ritter, C. Sam, Mausam, and O. Etzioni, "Named entity recognition in tweets: An experimental study," in Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, Jul. 2011, pp. 1524-1534.
  7. Li and Y. Liu, "Improving named entity recognition in tweets via detecting non-standard words," in 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China, Jul. 2015, pp. 929-938. doi: 10.3115/v1/P15-1090
  8. T. M. Akhriza, H. Y. Sahaduta, and A. D. Susilo, "Improving mobility of base transceiver station locating method using telegram's application," International Journal of Technology, vol. 8, no. 1, pp. 175-183, 2017. doi: 10.14716/ijtech.v8i1.6012
  9. I. Alfina, R. Manurung, and M. I. Fanany, "DBpedia entities expansion in automatically building dataset for Indonesian NER," in 2016 International Conference on Advanced Computer Science and Information Systems, Malang, Indonesia, Oct. 2017, pp. 335-340. doi: 10.1109/ICACSIS.2016.7872784
  10. R. A. Leonandya, B. Distiawan, and N. H. Praptono, "A Semi-supervised algorithm for indonesian named entity recognition," in 3rd International Symposium on Computational and Business Intelligence, Bali, Indonesia, Dec. 2015, pp. 45-50. doi: 10.1109/ISCBI.2015.15
  11. A. Luthfi, B. Distiawan, and R. Manurung, "Building an Indonesian named entity recognizer using Wikipedia and DBPedia," in International Conference on Asian Language Processing, Kuching, Malaysia, Oct. 2014, pp. 19-22. doi: 10.1109/IALP.2014.6973520
  12. N. Peng and M. Dredze, "Improving named entity recognition for Chinese social media with word segmentation representation learning," in Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, Aug. 2016, pp. 149-155. doi: 10.18653/v1/P16-2025
  13. I. Budi and S. Bressan, "Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language," International Journal of Business Intelligence and Data Mining, vol. 2, no. 4, pp. 426-446, 2007. doi: 10.1504/IJBIDM.2007.016382
  14. A. S. Agbenemu, J. Yankey, and E. O., "An automatic number plate recognition system using Opencv and Tesseract OCR engine," International Journal of Computer Applications, vol. 180, no. 43, pp. 1-5, 2018. doi: 10.5120/ijca2018917150
  15. A. E. Utami, O. D. Nurhayati, and K. T. Martono, "Aplikasi penerjemah bahasa Inggris - Indonesia dengan optical character recognition berbasis android," Jurnal Teknologi dan Sistem Komputer, vol. 4, no. 1, pp. 167-177, 2016. doi: 10.14710/jtsiskom.4.1.2016.167-177
  16. OCR.Space, "Free OCR API," 2019. [online]. Available: https://ocr.space/ocrapi
  17. Ç. Sönmez and A. Özgü, "A graph-based approach for contextual text normalization," in Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, Oct. 2014, pp. 313-324. doi: 10.3115/v1/D14-1037
  18. H. D. M. Alfarohmi and M. A. Bijaksana, "Building the Indonesian NE dataset using Wikipedia and DBpedia with entities expansion method on DBpedia," in International Conference on Asian Language Processing, Bandung, Indonesia, Nov. 2018, pp. 334-339. doi: 10.1109/IALP.2018.8629117
  19. J. Ni and R. Florian, "Improving multilingual named entity recognition with wikipedia entity type mapping," in Conference on Empirical Methods in Natural Language Processing, Austin, Texas, Nov. 2016, pp. 1275-1284. doi: 10.18653/v1/D16-1135
  20. H. N. Abdulkhudhur, I. Q. Habeeb, Y. Yusof, and S. A. M. Yusof, "Implementation of improved Levenshtein algorithm for spelling correction word candidate list generation," Journal of Theoretical and Applied Information Technology, vol. 88, no. 3, pp. 449-455, 2016.
  21. A. T. J. Harjanta, "Preprocessing text untuk meminimalisir kata yang tidak berarti dalam proses text mining," Jurnal Informatika UPGRIS, vol. 1, no. 1, pp. 1-9, 2015.
  22. J. R. Fraenkel, N. E. Wallen, and H. H. Hyun, How to design and evaluate research in education, 8th edition. New York: McGraw-Hill, 2012.
  23. N. Quoc Viet Hung, N. T. Tam, L. N. Tran, and K. Aberer, "An evaluation of aggregation techniques in crowdsourcing," in International Conference on Web Information Systems Engineering, Nanjing, China, Oct. 2013, pp. 1-15. doi: 10.1007/978-3-642-41154-0_1
  24. J. Sarwono, Metode penelitian kuantitatif dan kualitatif. Yogyakarta: Graha Ilmu, 2006.
  25. H. Najjichah, A. Syukur, and H. Subagyo, "Pengaruh text preprocessing dan kombinasinya," Jurnal Teknologi Informasi, vol. 15, no. 1, pp. 1-11, 2019.

No citation recorded.