
		<paper>
			<loc>https://jjcit.org/paper/130</loc>
			<title>HYBRID FEATURE SELECTION FRAMEWORK FOR SENTIMENT ANALYSIS ON LARGE CORPORA</title>
			<doi>10.5455/jjcit.71-1609858713</doi>
			<authors>*Kayode S. Adewole,Abdullateef O. Balogun,Muiz O. Raheem,Muhammed K.  Jimoh,Rasheed G. Jimoh,Modinat A. Mabayoje,Fatima E. Usman-Hamza,Abimbola G. Akintola,Ayisat W. Asaju-Gbolagade</authors>
			<keywords>Sentiment analysis,Opinion mining,Hybrid feature selection,Boruta,Recursive feature elimination</keywords>
			<citation>11</citation>
			<views>5464</views>
			<downloads>1255</downloads>
			<received_date>5-Jan.-2021</received_date>
			<revised_date>  22-Feb.-2021</revised_date>
			<accepted_date>  17-Mar.-2021</accepted_date>
			<abstract>Sentiment analysis has recently drawn considerable research attention in recent years owing to its applicability
in determining users’ opinions, sentiments and emotions from large collections of textual data. The goal of
sentiment analysis centred on improving users’ experience by deploying robust techniques that mine opinions
and emotions from large corpora. There are several studies on sentiment analysis and opinion mining from
textual information; however, the existence of domain-specific words, such as slang, abbreviations and
grammatical mistakes further posed serious challenges to existing sentiment analysis methods. In this paper, we
focus on the identification of an effective discriminative subset of features that can aid classification of users’
opinions from large corpora. This study proposes a hybrid feature-selection framework that is based on the
hybridization of filter- and wrapper-based feature selection methods. Correlation feature selection (CFS) is
hybridized with Boruta and Recursive Feature Elimination (RFE) to identify the most discriminative feature
subsets for sentiment analysis. Four publicly available datasets for sentiment analysis: Amazon, Yelp, IMDB and
Kaggle are considered to evaluate the performance of the proposed hybrid feature selection framework. This
study evaluates the performance of three classification algorithms: Support Vector Machine (SVM), Naïve Bayes
and Random Forest to ascertain the superiority of the proposed approach. Experimental results across different
contexts as depicted by the datasets considered in this study clearly show that CFS combined with Boruta
produced promising results, especially when the features selected are passed to Random Forest classifier.
Indeed, the proposed hybrid framework provides an effective way of predicting users’ opinions and emotions
while giving substantial consideration to predictive accuracy. The computing time of the resulting model is
shorter as a result of the proposed hybrid feature selection framework.</abstract>
		</paper>


