[vc_empty_space][vc_empty_space]
Comparison on the rule based method and statistical based method on emotion classification for Indonesian Twitter text
Atmadja A.R.a, Purwarianti A.a
a School of Electrical and Informatics Engineering, Bandung Institute of Technology, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2015 IEEE.In this study, we conducted experiments on emotion classification of Indonesian Twitter text. To conduct such experiments, we built a corpus of labeled Twitter data with size of 7622 Twitter text taken from 69 Twitter accounts, manually labeled by 5 native speakers. We used 6 basic emotion labels (angry, disgust, fear, joy, sad, surprise) and add one label of neutral emotion class. Here, we compared a rule based method with a statistical based method. In the rule based method, we employed the existing Synesketch algorithm with two types of emotion word list: a manually written and a translated WordNet-Affect list. In the statistical based method, we employed SVM (Support Vector Machine) algorithm with unigram feature and feature selection algorithms of Information Gain and Minimum Frequency. Other than a pure statistical based method, we also employed the manually built emotion word list in the SVM based classification. In the text pre-processing, we compared several methods such as the normalization, emotion conversion, stop words removal, number removal, and a one-character token removal. The experimental results showed that the statistical based method result of 71.740% accuracy score is higher than the rule based method of 63.172% accuracy score. To enhance the accuracy, we employed SMOTE in order to handle the imbalanced data and achieved best result with the f-measure of 83.203%. In another experiment, we combined the pure statistical method with the rule based method by employing the manually word list into the classification features. The f-measure for this experiment has only reached 81.592%.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Emotion classification,Indonesian Twitter text,Rule-based method,SMOTE,statistical based method[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Emotion Classification,feature selection,Indonesian Twitter text,rule based method,SMOTE,statistical based method,Support Vector Machine[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICITSI.2015.7437692[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]