[vc_empty_space][vc_empty_space]
Combination of Heuristic, Rule-Based and Machine Learning for Bibliography Extraction
Suryawati E.a, Widyantoro D.H.b
a Research Center for Informatics, Indonesian Institute of Sciences, Bandung, Indonesia
b School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2017 IEEE.With the growth of scientific publications along with the development of science, the need for a better-automated information extraction system still remains a challenging task. The use of a knowledge-based approach (rule-based, machine learning or a combination of both) has been applied to a number of studies related to the process of extracting the reference elements. The extraction process identifies each token with features that lead to a certain type of the reference element, using one or two approaches. In this paper, we present a strategy to use a combination of three approaches (heuristic, rule-based and machine learning) to predict the constituent elements of reference from a set of tokens in a reference line. Each element, in every line of reference, can be predicted by different approaches (heuristic, rule-based or machine-based learning), depending on the need. By adding heuristics, we can directly classify one or more tokens into a certain type of the reference element without using a knowledge-based approach. Heuristic assists in the selection of the element to be extracted. A set of tokens that cannot be predicted heuristically, will be extracted by machine learning (SVM or Ripper) or rule-based (Regular Expression) to obtain the type of the reference element. Combination of rule-based and heuristic also used for the process of element segmentation. Heuristic and machine learning performs the extraction process to obtain a reference type of the element. This strategy is expected to improve the accuracy of the extracted references element information. The result will be compared with the common strategy using SVM classifier and the same dataset (CORA). The experimental results show that the combined approach gives slightly better results than the result of the common approach’s prediction in terms of recall measurements. Although it needs the extraction process time that longer than the common approach, in the testing phase.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Automated information,Constituent elements,heuristic,Knowledge-based approach,Regular expressions,Rule based,Rule learning,Scientific publications[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]heuristic,machine learning,rule-based,rule-learning[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICICI-BME.2017.8537772[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]