[vc_empty_space][vc_empty_space]
Set of frequent word sequence (SFWS) as document model for feature based document clustering
a School of Electrical Engineering & Informatics Institut Teknologi Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2019, School of Electrical Engineering and Informatics. All rights reserved.Sequence of word sequence has been considered as an appropriate text representation since text reveal inherent sequential nature. Those representations are Frequent Word Sequence (FWS), Set of Frequent Word Sequence (SFWS) and Frequent Word Itemsets (FWI). Moreover, Maximal Frequent Sequence (MFS) is text feature that exploiting sequential property of textual data. In this paper, we proposed SFWS as the best text representation for document clustering. SFWS considers document as set of sentences in which sentence is the language highest grammatical hierarchy, conveying a complete thought. Consequently, document clustering would have accurate results. The main contribution of this work is the data pre-processing, feature extraction and selection based on SFW. Since SFWS works based on sentence, we need to construct sequence sentences of all document into sequence database for sentences. Then, sequential pattern mining was applied to extract set of frequent sentence sequence. And finally, we select features with maximal set of frequent sequence (MSFS). We conducted experiments on Twenty News Group Text Data (TNTD). To do so, we developed Feature based clustering (FBC) algorithm with MSFS as text feature based on SFWS representation. The experimental results showed that document clustering based on SFWS had the highest accuracy, compared with FWS and FWI.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Document clustering,Feature base clustering,Frequent Word Itemset (FWI),Frequent Word Sequence (FWS),Maximal Frequent Sequence (MFS),Set of Frequent Word Sequence (SFWS)[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.15676/ijeei.2019.11.4.13[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]