Enter your keyword

2-s2.0-84966558958

[vc_empty_space][vc_empty_space]

Document clustering using sequential pattern (SP): Maximal frequent sequences (MFS) as SP representation

Rahmawati D.a, Putri Saptawati G.A.a, Widyani Y.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2015 IEEE.This research proposes an idea to apply Feature Based Clustering (FBC) in document clustering. A huge number of existing documents will be easier to be used if they are clustered into several topics. FBC uses K-Means algorithm to cluster sequential data of features. Features of text document can be presented as sequence of word. In order to be processed as sequential data, features must be extracted from collection of unstructured text documents. Therefore, we need preprocessing tasks to deliver appropriate form of document features. There are two types of sequential pattern using simple form: Frequent Word Sequence (FWS) and Maximal Frequent Sequence (MFS). Both types are appropriate for text data. The difference is in applying the maximum principle in MFS. Therefore, MFS amount from a text document would be less than the amount of its FWS. In this research, we choose maximal frequent sequences (MFS) as feature representation. We proposes framework to conduct FBC using MFS as features. The framework is tested to cluster dataset that is subset of the Twenty News Group Text Data. The result shows that the accuracy of clustering result is affected by the parameter’s value, dataset, and the number of target cluster.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Clustering results,Document Clustering,Feature representation,Feature-based,k-Means algorithm,Maximal frequent sequences,Sequential patterns,Unstructured texts[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]document clustering,feature-based clustering,maximal frequent sequences,sequential pattern[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICODSE.2015.7436979[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]