
		<paper>
			<loc>https://jjcit.org/paper/256</loc>
			<title>HAML-IRL: OVERCOMING THE IMBALANCED RECORD-LINKAGE PROBLEM USING HYBRID ACTIVE MACHINE LEARNING</title>
			<doi>10.5455/jjcit.71-1726277421</doi>
			<authors>Mourad J abrane,Mouad Jbel,Imad Hafidi,Yassir Rochd</authors>
			<keywords>Record linkage,Entity resolution,Active machine learning,Hybrid query</keywords>
			<citation>2</citation>
			<views>3566</views>
			<downloads>712</downloads>
			<received_date>14-Sep.-2024</received_date>
			<revised_date>14-Nov.-2024</revised_date>
			<accepted_date>24-Nov.-2024</accepted_date>
			<abstract>Traditional active machine-learning (AML) methods employed in Record Linkage (RL) or Entity Resolution (ER)
tasks often struggle with model stability, slow convergence and handling imbalanced data. Our study introduces
a novel hybrid Active Machine Learning approach to address RL, overcoming the challenges of limited labeled
data and imbalanced classes. By combining and balancing informativeness, which selects record pairs to reduce
model uncertainty and representativeness, it is ensured that the chosen pairs reflect the overall dataset patterns.
Our hybrid approach, called Hybrid Active Machine Learning for Imbalanced Record Linkage (HAML-IRL),
demonstrates significant advancements. HAML-IRL achieves an average 12% improvement in F1-scores across
eleven real- world datasets, including structured, textual and dirty data, when compared to state-of-the-art AML
methods. Our approach also requires up to 60% - 85% fewer labeled samples depending on the datasets,
accelerates model convergence and offers superior stability across iterations, making it a robust and efficient
solution for real-world record-linkage tasks.</abstract>
		</paper>


