Kombinasi metode NER-OCR untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa Indonesia

Combining the NER-OCR methods to improve information retrieval efficiency in the Indonesian posters

Ahmad Syarif Rosidy  -  STMIK PPKIA Pradnya Paramita, Indonesia
*Tubagus Mohammad Akhriza orcid scopus  -  STMIK PPKIA Pradnya Paramita, Indonesia
Mochammad Husni  -  STMIK PPKIA Pradnya Paramita, Indonesia
Received: 27 Feb 2020; Revised: 9 Jul 2020; Accepted: 10 Jul 2020; Published: 31 Oct 2020; Available online: 13 Jul 2020.
Event organizers in Indonesia often use websites to disseminate information about these events through digital posters. However, manually processing for transferring information from posters to websites is constrained by time efficiency, given the increasing number of posters uploaded. Also, information retrieval methods, such as Named Entity Recognition (NER) for Indonesian posters, are still rarely discussed in the literature. In contrast, the NER method application to Indonesian corpus is challenged by accuracy improvement because Indonesian is a low-resource language that causes a lack of corpus availability as a reference. This study proposes a solution to improve the efficiency of information extraction time from digital posters. The proposed solution is a combination of the NER method with the Optical Character Recognition (OCR) method to recognize text on posters developed with the support of relevant training data corpus to improve accuracy. The experimental results show that the system can increase time efficiency by 94 % with 82-92 % accuracy for several extracted information entities from 50 testing digital posters.

Keywords: digital posters; information retrieval; named entity recognition; optical character recognition
