Enter your keyword

2-s2.0-85017278286

[vc_empty_space][vc_empty_space]

Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic Representation

Rachman G.H.a, Khodra M.L.a, Widyantoro D.H.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© Published under licence by IOP Publishing Ltd.One of some ways to summarize scientific papers is by employing rhetorical structure of sentences. Determining rhetorical sentence itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing semantic similarity words. Therefore, this paper aims to present the rhetorical sentence categorization from scientific paper by using selected features, added previous label, and Word2Vec to capture semantic similarity words. Then, this paper shows the result of employing resampling for balancing the existing instances per class and combining resampling and Word2Vec representation itself. Every experiment is tested in two classifiers, namely IBk and J48 tree. It shows that the use of previous label, Word2Vec (Skip-Gram), and resampling improves performance. After doing all the experiments in the 10-fold cross-validation, the highest performance of F-measure is achieved 84.97% by combining Word2Vec (Skip-Gram), all features, and resampling.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]10-fold cross-validation,Rhetorical categorization,Rhetorical structure,Scientific papers,Semantic representation,Semantic similarity,Text categorization,word2vec[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]feature extraction,Rhetorical categorization,scientific paper,word2vec[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1088/1742-6596/801/1/012070[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]