Enter your keyword

2-s2.0-84922125967

[vc_empty_space][vc_empty_space]

A comparison for handling imbalanced datasets

Syaripudin A.a, Khodra M.L.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandungw, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2014 IEEE.In various real case, imbalanced datasets problems are inevitable, such as in metal detecting security or diagnosis of disease. With the limitations of existing learning algorithms when faced with imbalanced datasets, the prediction error is caused by the dominance of the majority against the minority class. Various techniques have been made to address the above circumstances. This paper compares those techniques of handling imbalanced datasets with res ample and ensembles. From a different standpoint, this paper examines how much influence the number of instances, number of attributes, the attributes data types, the number of the target class, and missing attribute values affect the classification results with performance analysis using f-measure. An experiment has resulted that the criteria regarding the number of attributes, attribute data types, and the number of the target class do not affect the classification results. While the missing attribute with values have an affect classification result. For better high F-measure, the experiment shows that the best performer is combination of SMOTE 5000/0 and AdaBoostMl.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Attribute values,Classification results,ensembles,Imbalanced Data-sets,Imbalanced dataset,Performance analysis,Prediction errors,Resamples[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]ensembles,imbalanced dataset,resamples[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICAICTA.2014.7005957[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]