
		<paper>
			<loc>https://jjcit.org/paper/268</loc>
			<title>IMPROVING IOT SECURITY: THE IMPACT OF DIMENSIONALITY AND SIZE REDUCTION ON INTRUSION-DETECTION PERFORMANCE</title>
			<doi>10.5455/jjcit.71-1739734998</doi>
			<authors>Remah Younisse,Amal Saif,Nailah Al-Madi,Sufyan Almajali,Basel Mahafzah</authors>
			<keywords>Dimensionality reduction,Data reduction,Autoencoders,Stratified sampling,Machine learning</keywords>
			<views>1615</views>
			<downloads>846</downloads>
			<received_date>16-Mar.-2025</received_date>
			<revised_date>  23-May-2025 and 24-Jun.-2025</revised_date>
			<accepted_date>  27-Jun.-2025</accepted_date>
			<abstract>Intrusion detection in the Internet of Things (IoT) environments is essential to guarantee computer-network 
security. Machine-learning (ML) models are widely used to improve efficient detection systems. Meanwhile, with 
the increasing complexity and size of intrusion-detection data, analyzing vast datasets using ML models is 
becoming more challenging and demanding in terms of computational resources. Datasets related to IoT 
environments usually come in very large sizes. This study investigates the impact of dataset-reduction techniques 
on machine learning-based Intrusion Detection Systems (IDSs) regarding performance and efficiency. We propose 
a two-stage framework incorporating deep autoencoder-based feature reduction with stratified sampling to reduce 
the dimensionality and size of six publicly available IDS datasets, including BoT-IoT, CSE-CIC-IDS2018, and 
others. Multiple machine-learning models, such as Random Forest, XGBoost, K-Nearest Neighbors, SVM and 
AdaBoost, were evaluated using default parameters. Our results show that dataset reduction can decrease training 
time by up to 99% with minimal loss in F1-score, typically less than 1%. It is recognized that excessive size 
reduction can compromise detection accuracy for minority attack classes. However, employing a stratified 
sampling method can effectively maintain class distributions. The study highlights significant feature redundancy, 
particularly high correlation among features, across multiple IoT security-related datasets, motivating the use of 
dimensionality-reduction techniques. These findings support the feasibility of efficient, scalable IDS 
implementations for real-world environments, especially in resource-constrained or real-time settings. This work 
shows considerable redundancy in the datasets which questions the huge amount of these datasets, because, in 
many cases, the reduced datasets provide almost the same F1-score readings after data reduction. Rasing the 
alarm to notice the unnecessary massive amount of data used to build robust IDSs.</abstract>
		</paper>


