[vc_empty_space][vc_empty_space]
Information extraction from scientific paper using rhetorical classifier
Khodra M.L.a, Widyantoro D.H.a, Aziz E.A.b, Bambang R.T.a
a School of Electrical Engineering, Bandung Institute of Technology, Indonesia
b Faculty of Language and Arts, Indonesia University of Education, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]Time constraints often lead a reader of scientific paper to read only the title and abstract of the paper, but reading these parts is often ineffective. This study aims to extract information automatically in order to help the readers get structured information from a scientific paper. The information extraction is done by rhetorical classification of each sentence in a scientific paper. Rhetoric information is the intention to be conveyed to the reader by the author of the paper. This research used corpus-based approach to build rhetorical classifier. Since there was a lack of rethorical corpus, we constructed our own corpus, which is a collection of sentences that have been labeled with rhetorical information. Each sentence represented as a vector of content, location, citation, and meta-discourses features. This collection of feature vectors is used to build rhetorical classifiers by using machine learning techniques. Experiments were conducted to select the best learning techniques for rhetorical classifier. Training set consists of 7239 labeled sentences, and the testing set consists of 3638 labeled sentences. We used WEKA (Waikato Environment for Knowledge Analysis) and LibSVM libraries. Learning techniques being considered were Naive Bayes, C4.5, Logistic, Multi-Layer Perceptron, PART, Instance-based Learning, and Support Vector Machines (SVM). The best performers are the SVM and Logistic classifier with accuracy of 0.51. By applying one-against-all strategy, the SVM accuracy can be improved to 0.60. © 2011 IEEE.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Information Extraction,rhetorical classifier,rhetorical corpus,Scientific papers,SVM classifiers[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]information extraction,rhetorical classifier,rhetorical corpus,scientific paper,SVM classifier[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICEEI.2011.6021634[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]