Temu kembali dokumen sumber rujukan dalam sistem daur ulang teks

Nathaniel Clarence Haryanto; Lucia Dwi Krisnawati; Antonius Rachmat Chrismanto

doi:10.14710/jtsiskom.8.2.2020.140-149

DOI: https://doi.org/10.14710/jtsiskom.8.2.2020.140-149

Temu kembali dokumen sumber rujukan dalam sistem daur ulang teks

Retrieval of source documents in a text reuse system

Nathaniel Clarence Haryanto, Lucia Dwi Krisnawati

, Antonius Rachmat Chrismanto

Department of Informatics, Universitas Kristen Duta Wacana, Indonesia

Received: 17 Oct 2019; Revised: 26 Feb 2020; Accepted: 13 Mar 2020; Available online: 20 Mar 2020; Published: 30 Apr 2020.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

The architecture of the text-reuse detection system consists of three main modules, i.e., source retrieval, text analysis, and knowledge-based postprocessing. Each module plays an important role in the accuracy rate of the detection outputs. Therefore, this research focuses on developing the source retrieval system in cases where the source documents have been obfuscated in different levels. Two steps of term weighting were applied to get such documents. The first was the local-word weighting, which has been applied to the test or reused documents to select query per text segments. The tf-idf term weighting was applied for indexing all documents in the corpus and as the basis for computing cosine similarity between the queries per segment and the documents in the corpus. A two-step filtering technique was applied to get the source document candidates. Using artificial cases of text reuse testing, the system achieves the same rates of precision and recall that are 0.967, while the recall rate for the simulated cases of reused text is 0.66.

Fulltext View|Download Email colleagues

Keywords: text reuse detection; source retrieval; significant words; local-word weighting scheme

Funding: Universitas Kristen Duta Wacana

Article Metrics:

Article Info

Section: Original Research Articles

Language : ID

In Volume 8, Issue 2, Year 2020 (April 2020)

Most viewed articles

Sistem Pengendali Peralatan Elektronik dalam Rumah secara Otomatis Menggunakan Sensor PIR, Sensor LM35, dan Sensor LDR Pengembangan Sistem Pakar Untuk Diagnosis Penyakit Hepatitis Berbasis Web Menggunakan Metode Certainty Factor Pembuatan Sistem Informasi Rental Mobil dengan Menggunakan Java dan Mysql K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes Implementasi Algoritma Kriptografi RSA untuk Enkripsi dan Dekripsi Email More articles

Most cited articles

Hajj and Umroh Android-Based Learning Application Pengembangan Sistem Pakar Untuk Diagnosis Penyakit Hepatitis Berbasis Web Menggunakan Metode Certainty Factor Perancangan Sistem Sensor Pemonitor Lingkungan Berbasis Jaringan Sensor Nirkabel Identification of Rupiah Paper Currency Denomination using SURF and FLANN Methods at Rotation Variation Pengembangan Aplikasi Manajemen Pelatihan Laboratorium Software Engineering Di Fakultas Teknik Sistem Komputer More cited articles

P. Clough, R. Gaizauskas, S. Piao and Y. Wilks, "METER: MEasuring TExt Reuse," in 4th Annual Meeting of the Association for Computational Lingustics, Stroudsburg, United States, Jul. 2002, pp. 152-159. doi: 10.3115/1073083.1073110
L. D. Krisnawati and K. U. Schülz, "Significant word-based text alignment for text reuse detection," in International Conference on Research and Innovation in Computer, Electronics, and MAnufacturing Engineering, Bali, Indonesia, Feb. 2017, pp. 7-12
L. D. Krisnawati, "The use of phraseword and local-weighted terms as features for text reuse and plagiarism detection," in Seminar Hasil Penelitian Bagi Civitas Akademika UKDW, Yogyakarta, Indonesia, Nov. 2017, pp. 27-44
M. Potthast et al., "Overview of the 6th international competition on plagiarism detection," in CLEF 2014 Evaluation Labs and Workshop, Sheffield, UK, Sept. 2014, pp. 845-876
L. D. Krisnawati and K. Schülz, "Plagiarism detection for Indonesian texts," in International Conference on Information Integration and Web-based Applications & Services, Vienna, Austria, Dec. 2013, pp. 595-599. doi: 10.1145/2539150.2539213
L. D. Krisnawati, "Plagiarism detection for Indonesian texts," Dissertation, Ludwig Maximilian University, Munich, Germany, 2016
K. Leilei, L. Zhimao, Y. Yong, Q. Haoliang, and H. Zhongyuan, "Source retrieval and text alignment corpus," in CLEF 2015 Conference and Labs of the Evaluation Forum, Toulouse, France, Sept. 2015, pp. 1-7
B. Gipp, Citation-based plagarism detection: detecting disguised and cross-language plagiarism using citation pattern analysis. Wiesbaden: Springer, 2014. doi: 10.1007/978-3-658-06394-8
A. F. Suryana, A. T. Wibowo, and A. Romadhany, “Performance efficiency in plagiarism indication detection system using indexing method with data structure 2–3 tree,” in 2nd International Conference on Information and Communication Technology, Bandung, Indonesia, May 2014, pp. 403-408. doi: 10.1109/ICoICT.2014.6914096
A. R. Syahputra, "Implementasi algoritma winnowing untuk deteksi kemiripan," Pelita Informatika Budi Dharma, vol. 9, no. 1, pp. 134-138, 2015
Z. F. Alfikri and A. Purwarianti, "Detailed analysis of extrinsic plagiarism detection system using machine learning approach (naive bayes and svm)," TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 12, no. 11, pp. 7884-7894, 2014. doi: 10.11591/telkomnika.v12i11.6652
N. Kurniati, A. Rahmatulloh, and R. Qomar, "Web scraping and winnowing algorithms for plagiarism detection of final project titles," Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, vol. 10, no. 2, pp. 73-83, 2019. doi: 10.24843/LKJITI.2019.v10.i02.p02
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval. New York: Cambridge University Press., 2009
M. Kiabod, M. N. Dehkordi, and M. Sharafi, "A novel method of significant words identification in text summarization," Journal of Emerging Technologies in Web Intelligence, vol. 4, no. 3, pp. 252-258, 2012. doi: 10.4304/jetwi.4.3.252-258
C. Basile, D. Benedetto, E. Caglioti, G. Cristadoro, and M. D. Esposti, "A plagiarism detection procedure in three steps: selection, matches and squares," in 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, San Sebastian, Spain, Sept. 2009, pp. 1-9

Last update:

Paper Recommendation for Research References in Data Mining using Content-Based Filtering
Khoerun Nissa Muflih Hunna, Faiza Renaldi, Irma Santikarama. 2022 International Conference on Science and Technology (ICOSTECH), 2022. doi: 10.1109/ICOSTECH54296.2022.9829112

Last update: 2025-05-07 12:07:43

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Starting from 2021, the author(s) whose article is published in the JTSiskom journal attain the copyright for their article and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. By submitting the manuscript to JTSiskom, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Teknologi dan Sistem Komputer).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. JTSiskom will not be held responsible for anything arising because of the writer's internal dispute. JTSiskom will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. JTSiskom allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and JTSiskom to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.

Temu kembali dokumen sumber rujukan dalam sistem daur ulang teks

Retrieval of source documents in a text reuse system

EDITORIAL OFFICE OF JURNAL TEKNOLOGI DAN SISTEM KOMPUTER