Analysis of Performance Labelling Sentiment Between K-Means Indobert And Inset Lexicon-Based

Rama Dona Ariyatma(1*), Bagus Priambodo(2),

(1) Universitas Mercu Buana, Jakarta, Indonesia
(2) Universitas Mercu Buana, Jakarta, Indonesia
(*) Corresponding Author

Abstract


Sentiment analysis, a natural language processing technique, plays a key role in identifying opinions or sentiments from textual data. Accurate sentiment labelling within a dataset significantly impacts the performance of sentiment analysis models. However, manual labelling can be time-consuming. Many researchers utilize lexicon-based methods for sentiment labelling, but lexicons are often limited in reflecting topic-specific nuances, potentially leading to inaccurate sentiment representation. This inaccuracy can negatively affect classification models. Inset Lexicon (Indonesia Sentiment Lexicon) provides a pre-weighted list of sentiment words for sentiment analysis in Indonesian. This study aims to explore the use of K-means clustering as an automatic sentiment labelling technique and compare it to the performance of Inset Lexicon. For K-means clustering, IndoBERT is employed as the embedding model. The objective of this research is to evaluate the accuracy of automatic sentiment labelling by comparing it with actual data to assess the performance of both methods. The experiment accuracy shows that K-means with IndoBert achieves 74.79%, higher than Inset Lexicon that achieves only 59.82%

Full Text:

PDF

References


F. Koto And G. Y. Rahmaningtyas, “Inset Lexicon: Evaluation Of A Word List For Indonesian Sentiment Analysis In Microblogs,” In 2017 International Conference On Asian Language Processing (Ialp), Ieee, Dec. 2017, Pp. 391–394. Doi: 10.1109/Ialp.2017.8300625.

I. I. Fatani And H. Irawan, “Twitter, Instagram, Youtube Speak: Understanding Sentiments On Lrt Jabodebek Services Via Inset Lexicon, Indobert And Bertopic Approaches,” Journal Of Electrical Systems, Vol. 20, No. 4s, Pp. 1028–1035, Apr. 2024, Doi: 10.52783/Jes.2147.

S. Shaleha, A. Saputri, And H. M. Wicaksana, “Sentiment Analysis With Supervised Topic Modelling On Twitter Data Related To Indonesian Election 2024,” In 2023 International Conference On Computer, Control, Informatics And Its Applications (Ic3ina), Ieee, Oct. 2023, Pp. 37–42. Doi: 10.1109/Ic3ina60834.2023.10285800.

M. K. Anam, T. A. Fitri, A. Agustin, L. Lusiana, M. B. Firdaus, And A. T. Nurhuda, “Sentiment Analysis For Online Learning Using The Lexicon-Based Method And The Support Vector Machine Algorithm,” Ilkom Jurnal Ilmiah, Vol. 15, No. 2, Pp. 290–302, Aug. 2023, Doi: 10.33096/Ilkom.V15i2.1590.290-302.

D. Musfiroh, U. Khaira, P. E. P. Utomo, And T. Suratno, “Analisis Sentimen Terhadap Perkuliahan Daring Di Indonesia Dari Twitter Dataset Menggunakan Inset Lexicon,” Malcom: Indonesian Journal Of Machine Learning And Computer Science, Vol. 1, No. 1, Pp. 24–33, Mar. 2021, Doi: 10.57152/Malcom.V1i1.20.

R. Nainggolan, F. Adline, T. Tobing, And E. J. G. Harianja, “Analysis Sentiment In Bukalapak Comments With K-Means Clustering Method,” International Journal Of New Media Technology), Vol. 9, No. 2, P. 87, 2022.

M. M. Haider, Md. A. Hossin, H. R. Mahi, And H. Arif, “Automatic Text Summarization Using Gensim Word2vec And K-Means Clustering Algorithm,” In 2020 Ieee Region 10 Symposium (Tensymp), Ieee, 2020, Pp. 283–286. Doi: 10.1109/Tensymp50017.2020.9230670.

J. Ravi And S. Kulkarni, “Text Embedding Techniques For Efficient Clustering Of Twitter Data,” Evol Intell, Vol. 16, No. 5, Pp. 1667–1677, Oct. 2023, Doi: 10.1007/S12065-023-00825-3.

K. A. S. Awat And M. A. Ballera, “Applying K-Means Clustering On Questionnaires Item Bank To Improve Students’ Academic Performance,” In 2018 Ieee 10th International Conference On Humanoid, Nanotechnology, Information Technology,Communication And Control, Environment And Management (Hnicem), Ieee, Nov. 2018, Pp. 1–6. Doi: 10.1109/Hnicem.2018.8666409.

A. Subakti, H. Murfi, And N. Hariadi, “The Performance Of Bert As Data Representation Of Text Clustering,” J Big Data, Vol. 9, No. 1, Dec.

, Doi: 10.1186/S40537-022-00564-9.

B. Wilie Et Al., “Indonlu: Benchmark And Resources For Evaluating Indonesian Natural Language Understanding,” Sep. 2020.

N. Aliyah Salsabila, Y. Ardhito Winatmoko, A. Akbar Septiandri, And A. Jamal, “Colloquial Indonesian Lexicon,” In 2018 International Conference On Asian Language Processing (Ialp), Ieee, Nov. 2018, Pp. 226–229. Doi: 10.1109/Ialp.2018.8629151.

F. Anisa Nirmala, M. Jazman, N. Evrilyan Rozanda, And F. Nur Salisah, “Cyberbullying Sentiment Analysis Of Instagram Comments Using Naïve Bayes Classifier And K-Nearest Neighbor Algorithm Methods,” Vol. 5, No. 5, Pp. 1213–1219, 2024, Doi: 10.52436/1.Jutif.2024.5.5.1997.

H. A. Shehu Et Al., “Deep Sentiment Analysis: A Case Study On Stemmed Turkish Twitter Data,” Ieee Access, Vol. 9, Pp. 56836–56854, 2021, Doi: 10.1109/Access.2021.3071393.

C. Yuan And H. Yang, “Research On K-Value Selection Method Of K-Means Clustering Algorithm,” J (Basel), Vol. 2, No. 2, Pp. 226–235, Jun. 2019, Doi: 10.3390/J2020016.

H. Xie Et Al., “Improving K-Means Clustering With Enhanced Firefly Algorithms,” Appl Soft Comput, Vol. 84, P. 105763, Nov. 2019, Doi: 10.1016/J.Asoc.2019.105763.

D. E. Sondakh, S. W. Taju, M. G. Tene, And A. E. T. Pangaila, “Sistem Analisis Sentimen Ulasan Aplikasi Belanja Online Menggunakan Metode Ensemble Learning Sentiment Analysis System For Online Shopping Application Reviews Using Ensemble Learning Method,” Cogito Smart Journal |, Vol. 9, No. 2, 2023.

H. S. Priyanka And R. Ashok Kumar, “Sentiment Analysis Using Machine Learning Based Ensemble Model For Food Reviews,” International Journal Of Innovative Research In Applied Sciences And Engineering, Vol. 4, No. 3, Pp. 690–694, Sep. 2020, Doi: 10.29027/Ijirase.V4.I3.2020.690-694.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, And R. Budiarto, “Evaluating Trust Prediction And Confusion Matrix Measures For Web Services Ranking,” Ieee Access, Vol. 8, Pp. 90847–90861, 2020, Doi: 10.1109/Access.2020.2994222.

P. Kanani And Dr. M. Padole, “Deep Learning To Detect Skin Cancer Using Google Colab,” Int J Eng Adv Technol, Vol. 8, No. 6, Pp. 2176–2183, Aug. 2019, Doi: 10.35940/Ijeat.F8587.088619.

P. Sitikhu, K. Pahi, P. Thapa, And S. Shakya, “A Comparison Of Semantic Similarity Methods For Maximum Human Interpretability,” Oct. 2019, [Online]. Available: Http://Arxiv.Org/Abs/1910.09129




DOI: http://dx.doi.org/10.30645/jurasik.v10i1.849

DOI (PDF): http://dx.doi.org/10.30645/jurasik.v10i1.849.g824

Refbacks

  • There are currently no refbacks.



JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Published Papers Indexed/Abstracted By:

Jumlah Kunjungan : View My Stats