Analisis dan Deteksi Kemiripan Teks Berbasis Python dengan Algoritma Levenshtein Distance

Haris Sudarman; Y Yulhendri

doi:10.30645/jurasik.v10i1.869

Analisis dan Deteksi Kemiripan Teks Berbasis Python dengan Algoritma Levenshtein Distance

Haris Sudarman^(1*), Y Yulhendri⁽²⁾,

(1) Universitas Esa Unggul, Jakarta, Indonesia
(2) Universitas Esa Unggul, Jakarta, Indonesia
(*) Corresponding Author

Abstract

Improvements in information technology have complicated the issue of plagiarism in academia, particularly in higher education. This project intends to create a plagiarism detection tool that examines the similarity of PDF files to established references utilizing the Levenshtein Distance method. The suggested system can effectively and precisely identify plagiarism by a series of procedures, such as text extraction, linguistic Preprocessing (tokenisation and stopword removal), and calculating the degree of similarity using the Levenshtein Distance method. Testing was carried out on various scenarios, including variations in document size and plagiarism levels. The experimental results show that the higher the level of similarity between the document and the reference, the longer the computing time required. However, this system can detect plagiarism with a fairly good success rate, even in documents with a low level of similarity. Black box testing confirms that this application can work according to the expected specifications, namely inputting PDF documents, detecting plagiarism, and providing accurate similarity percentage results. This research contributes to providing a plagiarism detection tool that can help maintain academic integrity, with the possibility of further development through integration with machine learning and user interface improvements.

Full Text:

PDF

References

Diktiristek, “Statistik Pendidikan Tinggi 2021 (Final),” Direktorat Jenderal Pendidikan Tinggi, Riset, dan Teknologi, 2021. .

K. Johnson, M. Lee, and T. Robinson, “Comprehensive text Preprocessing techniques for efficient natural language processing,” Adv. Comput. Sci. Text Anal., 2023 [3] X. Yao, M. H. Yap, and Y. Zhang, “An Empirical Study to Evaluate Structural Similarity for Source Code Translation,” 2019, doi: 10.1109/TIMES-iCON47539.2019.9024512.

V. Z. Riadi, “Vector Space Model dan Clustering untuk Deteksi Kesamaan Dokumen,” Skripsi, Universitas Komputer Indonesia, 2019

J. Nangi, I. B. G. P. Asmara, M. I. Sarita, L. M. G. Jaya, H. T. Mokui, dan L. Tajidun, “Perbandingan Algoritma Winnowing dan Algoritma Rabin-Karp pada Aplikasi Pendeteksi Kesamaan Dokumen Skripsi,” Jurnal Sistem Informasi Bisnis, vol. 14, no. 2, hlm. 131–142, 2024.

S. Kurniati, R. Yulianto, and T. Harlina, “Implementasi Metode Perceptual Hash Untuk Deteksi Plagiarisme Tugas Mata Kuliah Software Modeling,” Jikom J. Inform. dan Komput., 2023, doi: 10.55794/jikom.v12i2.84.

M. H. Febiawan, A. Setiawan, and A. Primadewi, “Sistem Pendeteksi Dini Plagiarisme Menggunakan Algoritma Levenshtein Distance,” J. Komtika (Komputasi dan Inform., 2020, doi: 10.31603/komtika.v3i1.3464.

A. Prasetyo, B. Santoso, dan D. W. Susilo, “Implementasi Algoritma Levenshtein Distance untuk Deteksi Plagiarisme pada Dokumen Teks Bahasa Indonesia,” Jurnal Informatika, vol. 14, no. 2, hlm. 45–52, 2020.

P. Sunilkumar and A. P. Shaji, “A Survey on Semantic Similarity,” 2019, doi: 10.1109/ICAC347590.2019.9036843.

G. Alfian, M. Syafrudin, M. F. Ijaz, M. A. Syaekhoni, N. L. Fitriyani, dan J. Rhee, “Improving efficiency of RFID-based traceability system for perishable food by utilizing IoT sensors and machine learning model,” Food Control, vol. 110, hlm. 107016, 2020.

Muhammad Romzi and B. Kurniawan, “JTIM : Jurnal Teknik Informatika Mahakarya,” JTIM J. Tek. Inform. Mahakarya, 2020.

A. Widodo, “Analyzing consumer journey using the operation edit distance approach,” University of Indonesia Thesis, 2023.

K. Johnson, M. Lee, and T. Robinson, “Comprehensive text Preprocessing techniques for efficient natural language processing,” Adv. Comput. Sci. Text Anal., 2023.

A. Patel, S. Gupta, and L. Roy, “The impact of text Preprocessing on machine learning models in NLP,” Int. J. Data Sci. Appl., 2023

Tahaei, M., Li, T., & Vaniea, K. (2022). Understanding Privacy-Related Advice on Stack Overflow. Proceedings on Privacy Enhancing Technologies. https://doi.org/10.2478/popets-2022-0038

M. Davis, L. Brown, and R. Wilson, “The significance of Case Folding in text Preprocessing for natural language processing,” J. Text Anal. NLP, 2023

A. Pratama, B. Santoso, & D. W. Susilo. (2023). Implementasi Sistem Deteksi Plagiarisme Dokumen Bahasa Indonesia Menggunakan Metode Vector Space Model. Jurnal Teknologi Informasi, 2023.

H. Taylor, C. Martinez, and P. Nguyen, “Tokenizing techniques and their applications in natural language processing,” Int. J. Comput. Linguist. NLP, 2023

Yao, X., Yap, M. H., & Zhang, Y. (2019). An Empirical Study to Evaluate Structural Similarity for Source Code Translation. TIMES-ICON 2019 - 2019 4th Technology Innovation Management and Engineering Science International Conference. https://doi.org/10.1109/TIMES-iCON47539.2019.9024512

A. Smith, J. Doe, and K. Johnson, “The impact of stopword removal on text analysis and classification,” Int. J. Nat. Lang. Process., 2023.

Sunilkumar, P., & Shaji, A. P. (2019). A Survey on Semantic Similarity. 2019 6th IEEE International Conference on. https://doi.org/10.1109/ICAC347590.2019.9036843

J. Brown, A. Green, and M. White, “The role of stemming in natural language processing tasks,” Int. J. Data Sci. NLP, 2023.

J. Brown, A. Smith, and P. Taylor, “Advancements in Levenshtein Distance applications for text similarity and natural language processing,” Int. J. Comput. Linguist., 2023.

A. Patel, S. Gupta, and R. Sharma, “The role of advanced Preprocessing in enhancing NLP tasks,” J. Data Sci. NLP, 2023.

Jurafsky, D., & Martin, J. H. (2021 "Speech and Language Processing (3rd Edition)"A comprehensive resource on natural language processing, covering tokenization, stemming, and distance metrics like Levenshtein.

A. Kurniawan, R. Santoso, and B. Putra, “The impact of Preprocessing techniques on text mining performance,” Int. J. Comput. Sci. Inf. Technol., 2023.

I. Setiawan, T. Widodo, and L. Saputra, “The influence of text Preprocessing techniques on classification accuracy: A comparative analysis,” J. Appl. Math. Comput. Sci., 2023.

R. Andika, Y. Pratama, and K. Susilo, “Advancements in word embedding techniques for document similarity measures,” Int. Conf. Mach. Learn. Appl., 2023.

Rinaldi Munir Plagiarism Detection Using Levenshtein Distance With Dynamic Programming https://informatika.stei.itb.ac.id/~rinaldi.munir/Stmik/2018-2019/Makalah/Makalah-Stima-2019-071.pdf?utm.

N. Munir, R. Kusnadi, and S. Halim, “Development of a plagiarism detection system using Levenshtein Distance and Preprocessing techniques,” Int. J. Comput. Sci. Res., 2023.

DOI: http://dx.doi.org/10.30645/jurasik.v10i1.869

DOI (PDF): http://dx.doi.org/10.30645/jurasik.v10i1.869.g844

Refbacks

There are currently no refbacks.

JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Print/Online ISSN 2527-5771/2549-7839
Organized by LPPM STIKOM Tunas Bangsa
Published by STIKOM Tunas Bangsa
W: https://tunasbangsa.ac.id/ejurnal/index.php/jurasik

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Published Papers Indexed/Abstracted By:

Jumlah Kunjungan : View My Stats

Username
Password
Remember me