SENTIMENT ANALYSIS BASED ON PROBABILISTIC CLASSIFIER TECHNIQUES IN VARIOUS INDONESIAN REVIEW DATA


(Received: 10-Mar.-2022, Revised: 28-Apr.-2022 , Accepted: 24-May-2022)
Sentiment analysis is the field in data science to achieve a broader holistic view of users’ needs and expectations. Indonesian user opinions have the potential to manage to be valuable information using sentiment-analysis tasks. One of the most supervised-learning techniques used in Indonesian sentiment analysis is the Naïve Bayes classifier. The classifier can be optimized and tuned in various models to increase the sentiment analysis model performance. This research aims to examine the performance of various Naïve Bayes models in sentiment analysis, especially when implemented in small datasets to handle overfitting problems. Four different Naïve Bayes models used are Gaussian, Multinomial, Complement and Bernoulli. We also analyze the effect of various pre-processing techniques on the models’ performance. Moreover, we build the first fashion dataset from the Indonesian marketplace which has a unique character compared to the datasets from other domains. Finally, we also use various datasets in the experiment to test the Naïve Bayes models' performance. From the experimental results, Complement Naïve Bayes is superior to other models, especially in handling overfitting with an F1-score of approximately 0.82.

[1] O. Alqaryouti, N. Siyam, A. A. Monem and K. Shaalan, "Aspect-based Sentiment Analysis Using Smart Government Review Data," Applied Computing and Informatics, DOI: 10.1016/j.aci.2019.11.003, 2019.

[2] S. Ainin, A. Feizollah, N. B. Anuar and N. A. Abdullah, "Sentiment Analyzes of Multilingual Tweets on Halal Tourism," Tourism Management Perspect., vol. 34, no. Feb., p. 100658, 2020.

[3] K. M. O. Nahar, A. Jaradat, M. S. Atoum and F. Ibrahim, "Sentiment Analysis and Classification of Arab Jordanian Facebook Comments For Jordanian Telecom Companies Using Lexicon-based Approach and Machine Learning," Jordanian J. of Computers and Inf. Technol. (JJCIT), vol. 6, no. 3, pp. 247–262, 2020.

[4] H. Elfaik and E. H. Nfaoui, "Deep Bidirectional LSTM Network Learning-based Sentiment Analysis for Arabic Text," J. Intellgent Systems, vol. 30, no. 1, pp. 395–412, DOI: 10.1515/jisys-2020-0021, 2021.

[5] M. Gusti, "Ini Dia Nilai Transaksi Marketplace Indonesia 2020," Kompas TV, [Online], Available: kompas.tv/article/107064/ini-dia-nilai-transaksi-marketplace-indonesia-2020. (Accessed Jun. 02, 2021).

[6] A. Priadana and A. A. Rizal, "Sentiment Analysis on Government Performance in Tourism during the COVID-19 Pandemic Period with Lexicon Based," CAUCHY, vol. 7, no. 1, pp. 28–39, Nov. 2021.

[7] T. Sutabri, S. J. Putra, M. R. Effendi, M. N. Gunawan and D. Napitupulu, "Sentiment Analysis for Popular e-traveling Sites in Indonesia Using Naive Bayes," Proc. of the 6th Int. Conf. on Cyber and IT Service Management (CITSM), pp. 1–4, DOI: 10.1109/CITSM.2018.8674262, 2018.

[8] A. F. Akbar, A. B. Santoso, P. K. Putra and I. Budi, "A Classification Model to Identify Public Opinion on the Lockdown Policy Using Indonesian Tweets," J. Theor. Appl. Inf. Technol., vol. 99, no. 14, 2021.

[9] C. C. P. Hapsari, W. Astuti and M. D. Purbolaksono, "Naive Bayes Classifier and Word2Vec for Sentiment Analysis on Bahasa Indonesia Cosmetic Product Reviews," Proc. of the Int. Conf. on Data Science and Its Applications (ICoDSA), pp. 22–27, DOI: 10.1109/ICoDSA53588.2021.9617544, Oct. 2021.

[10] A. Rahmatulloh, R. N. Shofa, I. Darmawan and Ardiansah, "Sentiment Analysis of Ojek Online User Satisfaction Based on the Naïve Bayes and Net Brand Reputation Method," Proc. of the 9th Int. Conf. on Information and Communication Technology (ICoICT), pp. 337–341, 2021.

[11] Webretailer, "Online Marketplaces in Southeast Asia: A Unique Region for Ecommerce," 2020, [Online], Available: https://www.webretailer.com/b/online-marketplaces-southeast-asia/. (Accessed Jun. 26, 2021).

[12] U. Rhohmawati, I. Slamet and H. Pratiwi, "Sentiment Analysis Using Maximum Entropy on Application Reviews (Study Case: Shopee on Google Play)," JITEKI Journal, vol. 5, no. 1, pp. 44–49, 2019.

[13] E. Swandy, "Bahasa Gaul Remaja Dalam Media Sosial Facebook," J. Bastra, vol. 1, no. 4, pp. 1–19, 2017.

[14] L. Jing, H. Huang and H. Shi, "Improved Feature Selection Approach TFIDF in Text Mining," Proc. of the 1st IEEE Int. Conf. on Machine Learning and Cybernetics, pp. 4–5, Beijing, China, 2002.

[15] J. Chen, H. Huang, S. Tian and Y. Qu, "Feature Selection for Text Classification with Naïve Bayes," Expert Syst. Appl., vol. 36, no. 3 PART 1, pp. 5432–5435, DOI: 10.1016/j.eswa.2008.06.054, 2009.

[16] Shuo Xu, "Bayesian Naïve Bayes Classifiers to Text Classification," J. Information Science, no. 15, pp. 1–12, DOI: 10.1177/0165551510000000, 2016.

[17] D. H. Abd, A. T. Sadiq and A. R. Abbas, "Political Articles Categorization Based on Different Naïve Bayes Models," Proc. of the Int. Conf. on Applied Computing to Support Industry: Innovation and Technology (ACRIT 2019), vol. 1174, pp. 286-301, 2020.

[18] J. D. M. Rennie, L. Shih, J. Teevan and D. Karger, "Tackling the Poor Assumptions of Naive Bayes Text Classifiers," Proc. of 21st Int. Conf. on Machine Learning (ICML '04), vol. 2, no. 1973, pp. 616–623, 2003.

[19] V. Metsis, I. Androutsopoulos and G. Paliouras, "Spam Filtering with Naive Bayes - Which Naive Bayes?," Proc. of the 3rd Conf. on Email and Anti-Spam (CEAS 2006), [Online], Available: https://www2.aueb.gr/users/ion/docs/ceas2006_paper.pdf, 2006.

[20] U. Rofiqoh et al. "Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexion Based Feature," J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1725–1732, 2017.

[21] W. Athira, I. Gholissodin and R. S. Perdana, "Analisis Sentimen Cyberbullying Pada Komentar Instagram Dengan Metode Klasifikasi Support Vector Machine," J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 2, no. 11, pp. 4704–4713, 2018.

[22] P. Antinasari, R. S. Perdana and M. A. Fauzi, "Analisis Sentimen Tentang Opini Film Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Naive Bayes Dengan Perbaikan Kata Tidak Baku," J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 12, pp. 1718–1724, 2017.

[23] A. R. T. Lestari, R. S. Perdana and M. A. Fauzi, "Analisis Sentimen Tentang Opini Pilkada DKI 2017 Pada Dokumen Twitter Berbahasa Indonesia Menggunakan Näive Bayes dan Pembobotan Emoji," J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. 12, pp. 1718–1724, 2017.

[24] F. Pedregosa et al., "Scikit-learn: Machine Learning in Python," J. Machine Learning Research, vol. 12, pp. 2825-2830, 2011.

[25] A. W. Pradana and M. Hayaty, "The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts," Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, pp. 375–380, DOI: 10.22219/kinetik.v4i4.912, 2019.

[26] H. Kansal and D. Toshniwal, "Aspect Based Summarization of Context Dependent Opinion Words," Procedia Computer Science, vol. 35, pp. 166–175, DOI: 10.1016/j.procs.2014.08.096, 2014.