AN IMPROVED C4.5 MODEL CLASSIFICATION ALGORITHM BASED ON TAYLOR’S SERIES


(Received: 2019-01-03, Revised: 2019-02-25 , Accepted: 2019-03-11)
C4.5 is one of the most popular algorithms for rule base classification. Many empirical features in the algorithm exist, such as continuous number categorization, missing value handling and over-fitting. However, despite its promising advantage over the Iterative Dichotomiser 3 (ID3), C4.5 has the major setback of presenting the equivalent result as the ID3, especially when the same number of attributes is used. This paper proposes a technique that will handle the setback reported in C4.5. The performance of the proposed technique is measured based on better accuracy. The Entropy of Information Theory is measured to identify the central attribute for the dataset. The researchers apply exponential splitting information (EC4.5) in utilizing the central attribute of the same dataset. The result obtained on introducing Taylor series suggested a far better result than when the C4.5 (gain ratio) was introduced.

[1] D. Clayson and M. J. Sheffet, "Personality and the Student Evaluation of Teaching, "Journal of Marketing Education, vol. 2, no. 28, pp. 149-160, 2006.

[2] B. Hussina, A. Merbouha and H. Ezzikouri, "A Comparative Study of Decision Tree ID3 and C4.5, " International Journal of Advanced Computer Science, vol. 3, no. 1, pp. 13-19, 2014.

[3] C. Romero, L. O. Juan and V. Sebastian, "A Meta-learning Approach for Recommending a Subset of White-box Classification Algorithms for Moodle Datasets, "Journal of Theoretical and Applied Information Technology, vol. 6, no. 5, pp. 268-271, 2013.

[4] K. Gaganjot and C. Amit, "Improved J48 Classification Algorithm for the Prediction of Diabetes, " International Journal of Computer Application, vol. 98, no. 5, pp. 13-17, 2014.

[5] M. M. Mazid and A. Shawkat, "Improved C4.5 Algorithm for Rule-based Classification," Proceedings of the 9th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, pp. 296-301, 2010.

[6] A. Moustafa, E. Shahira and M. Essam, "Defining Difficult Laryngoscpy Findings by Using Multiple Parameters, A Machine Learning Approach," Egyptian Journal of Anaesthesia, vol. 33, no. 2, pp. 153- 158, 2017.

[7] Y. Yuan and M. Shaw, "Fuzzy Sets and Systems, "Elservier, vol. 69, no. 2, pp. 125-139, 1995.

[8] A. B. Adeyemo and O. Oriola, "Personal Audit Using a Forensic Mining Technique, "International Journal of Computer Science, vol. 7, no.7, pp. 222-231, 2010.

[9] K. Asha, A. M. Gowda and M. Jayaram, "Comparative Study of Attribute Selection Using Gain Ratio and Correlation-based Feature Selection," International Journal of Information Technology and Knowledge Management, vol. 2, no. 2, pp. 271-277, 2010.

[10] G. Chaitn, "Algorithmic Information Theory, "Journal of Research and Development, vol. 8, no. 4, pp. 350-359, 2000.

[11] S. Hardikar, A. Shrivastava and V. Choudary, "Comparison between ID3 and C4.5," International Journal of Computer Science, vol. 2, no. 7, pp. 34-39, 2012.

[12] L. Gaurav and G. Hiaesh, "Optimization of C4.5 Decision Tree Algorithm for Data Mining Application," International Journal of Information Technology and Knowledge Management, vol. 3, no. 3, pp. 2250- 2459, 2013.

[13] R. Hartley, "The Function of Phase Difference in the Binaural Location of Pure Tones, "Journal of Advanced and Applied Sciences, vol. 13, no. 6, pp. 373-385, 2000.

[14] S. Kumar and E. Ramaraj, "Modified C4.5 Algorithm with Improved Information Entropy and Gain Ratio," International Journal of Engineering Research and Technology, vol. 2, no. 9, pp. 2768-2773, 2013.

[15] K. Santhosh, "Modified C4.5 Algorithm with Improved Information Entropy, "International Journal of Engineering Research & Technology, vol. 2, no. 14, pp. 485-512, 2013.

[16] W. Yisen, S. Chaobing and X. Shu-Tao, "Improving Decision Trees by Tsallis Entropy Information Metric Method, "Proc. International Joint Conference on Neural Networks, Vancouver, BC, Canada, IEEE Xplore, 24 – 27, July, 2018.

[17] M. M. Mazid, "Improved C.4.5 Algorithm for Rule-based Classification, "in Mastorakis Nikos (ed.), Artificial Intelligence Knowledge Engineering and Database, vol. 7, no. 5, pp. 296-301, 2017.

[18] A. Neeraj, G. Bhargava and M. Manish, "Decision Tree Analysis on J48 Algorithm for Data Mining, " International Journal of Advanced Research in Computer Science, vol. 3, no. 6, pp. 22-45, 2013.

[19] M. Dragon and G. Lujbisa, "The Use of Data Mining for Basketball Matches Outcome Prediction, "Proc. of the 8th IEEE International Symposium on Intelligent Systems and Informatics, pp. 309-312, Serbia, 2010.

[20] I. H. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes and S. J. Cunningham, "Weka: Practical Machine Learning Tools and Techniques with Java Implementations," (Working Paper 99/11), Department of Computer Science, University of Waikato, Hamilton, New Zealand, vol. 31, pp. 76-81, 2000.

[21] I. Al-Turaiki, M. Alshahrani and T. Almutairi, "Building Predictive Models for MERS-COV Infectionss Using Data Mining Techniques, "Journal of Infection and Public Health, vol. 9, no. 6, pp. 744-748, 2016.

[22] L. Yi-bin, W. Ying-ying and R. Xue-wen, "Improvement of ID3 Algorithm Based on Simplified Information Entropy and Coordination Degree," Proc. of Chinese Automation Conference , IEEE Xplore, Jinan, China , vol. 1, no. 3, pp. 88-92, 2017.

[23] G. Attilio and N. Filipo, "Search-Intensive Concept Induction, "International Journal of Computer Science, vol. 7, no. 6, pp. 137-145, 2000.

[24] T. G. Kumar, "Advanced Applications of Neural Networks and Artificial Intelligence, "International Journal of Information Technology, vol. 2, no. 6, pp. 57-68, 2012.

[25] S. Mardikyan and B. Badur, "Analyzing Teaching Performance of Instructors Using Data Mining," Journal of Informatics in Education, vol. 10, no. 2, pp. 245-257, 2011.