[vc_empty_space][vc_empty_space]
Topic classification and clustering on Indonesian complaint tweets for bandung government using supervised and unsupervised learning
Pratama T.a, Purwarianti A.a
a Institut Teknologi Bandung, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2017 IEEE.Seeing the public of Bandung city as an active social media user, Bandung government provides channel in Twitter for citizen to report their complaints. In order to make the citizen complaint monitoring easier, there is a need to automatically detect the topics of complaint tweets (written in Indonesian language) in order to assist the government in managing the complaints reported. In this paper, a system to detect the topics of Indonesian complaint tweets automatically using supervised learning and unsupervised learning approaches is proposed. The supervised learning approach is implemented to classify complaint tweets topic, whereas the unsupervised learning approach is used to cluster complaint tweets based on the similarity of detail information contained in the complaints. Both the supervised learning and the unsupervised learning approaches are required to classify the topics of a tweet and to capture the detail information from each detected topic. The topics are classified using single label and multi label classification. The supervised learning approach is evaluated using accuracy, precision, recall, and F1 score. Three supervised machine learning algorithms are evaluated: Sequential Minimal Optimization, Naïve Bayes Multinomial, and Random Forests. The best algorithm for single label topic classification is SMO, with the accuracy average of 95%, whereas the best algorithm for multi-label topic classification is Random Forests, with 97.92% accuracy, 98.74% precision, 98.36% recall, and 98.44% F1 score. In the unsupervised learning approach, Clustering Index Value is used to evaluate the topic clusters detected. Two unsupervised learning algorithms are evaluated; Exemplar Based Topic Detection and Document Pivot Technique using TF-IDF. Exemplar Based Topic Detection has the best performance for detecting detail topic clusters with Clustering Index Value of 0.9653.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian complaint text,Multi label classification,Sequential minimal optimization,Supervised and unsupervised learning,Supervised learning approaches,Supervised machine learning,Topic Classification,Topic clustering[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian complaint text,supervised learning,topic classification,topic clustering,unsupervised learning[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICAICTA.2017.8090981[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]