Extractive Text Summerization Pada Berita Berbahasa Indonesia Menggunakan Algoritma Support Vector Machine

Thalita Meisya Permata Aulia, Asep Jamaludin, Tesa Nur Padilah

Abstract


According to the Program for International Student Assessment (PISA) for the 2018 survey of 61 countries that participated in PISA, the reading interest of the Indonesian people still received a low score of 358 out of an overall average score of 472. One of the consequences of low reading is the difficulty of understanding the content of reading, especially for long and many texts, so it will be easier to read the summary. With advances in text summarization technology can be done using text mining methods. text mining will retrieve information on big data from text-based documents, the summary process will take the main points of news or important sentences without changing the content of the reading or also called extraction techniques. To get maximum results, the weighting is done by extracting sentence features based on numerical data, quotations, sentence length, sentence position in paragraphs, and overall sentence position. The research methodology uses knowledge discovery in database (KDD) and modeling using support vector machine algorithms. Testing or evaluation using recall, precision and F-measure. The best research result is the scenario of comparison of test data and training data 7:3, using the Linear kernel, with accuracy 72,4%, precision 63,4%, recall 51,9%, and F-measure 57,1%.

Full Text:

PDF

References


PISA, “Reading performance (Programme for international Student Assesment),” 2020. [Online]. Available: https://data.oecd.org.

N. S. W. Gotami, Indriati dan K. R. Dewi, “Peringkasan teks otomatis secara ekstraktif pada artikel berita kesehatan berbahasa indonesia dengan menggunakan metode latent semantic analysis,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, pp. 2821-2828, 2018.

M. Allahyari, S. Puriyeh, M. Assefi, S. Safaei, E. Trippe, J. B. Gutierrez dan K. Kochut, “Brief survey of text mining : classification, clustering and extraction techniques,” ArXIv, 2017.

W. S. El-Kassas, C. R. Salma, A. A. Rafea dan H. K. Mohamed, “Automatic text summerization : a comprehensive survey,” Elsevier, 165, pp. 2-26, 2020.

N. Moratanch dan S. Citrakala, “A survey on extractive text summerization,” IEEE International Conference on Computer and Signal Processing, pp. 1-7, 2017.

j. Carventes, F. G. Lamont, L. R. Mazahua dan A. Lopez, “A comphresive survey on support vector machine classication: application challenges and trends,” Neurocomputing, 408, pp. 189-215, 2019.

P. D. M. W, “Evaluation : From Precision, Recall and F-measure to ROC, Informedness, Markedness, & Correlation,” International Journal of Machine Learning Technology, pp. 37-63, 2011.

S. R. Gallego, B. Krawczyk, S. Garcia, M. Wozniak dan F. Hererra, “A survey on data preprocessing for data stream mining : current status and feature direction,” Neurocomputing, 239, pp. 39-57, 2017.

M. Yuli, “Data mining : Klasifikasi Menggunakan Algoritma C4.5,” Jurnal Edik Informatika, 2(2), pp. 213-219, 2017.

R. Suresh dan R. S. Harshni, “Data mining and text mining - a survey,” International conference on computation of power, energy, information and comunacation, 21, pp. 412-420, 2017.




DOI: http://dx.doi.org/10.30645/j-sakti.v5i2.371

Refbacks

  • There are currently no refbacks.


J-SAKTI (Jurnal Sains Komputer & Informatika)
Published Papers Indexed/Abstracted By:

Jumlah Kunjungan :

View My Stats