2-s2.0-85011304724

[vc_empty_space][vc_empty_space]

InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification

Purwarianti A.^a, Andhika A.^a, Wicaksono A.F.^a, Afif I.^a, Ferdian F.^a

^aSchool of Electrical Engineering and Informatics, Bandung Institute of Technology, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2016 IEEE.This research discusses how natural language processing (NLP) toolkit for Indonesia formal text and social media text, named as InaNLP, has been developed. Several NLP modules were integrated into InaNLP to make people easier in building an NLP system for Indonesia language. The toolkit contains several NLP modules such as sentence splitter, tokenization, Part of Speech (POS) tagger, phrase chunker, named entity (NE) tagger, syntactic parser, semantic analyzer, and word normalization. Several NLP modules were built using rule based approach, whereas several others implemented statistical based approach. Here, the accuracy of several modules such as the POS tagger, NE tagger, syntactic parser and semantic analyzer are shown. In the NE tagger, five (5) word windows with features of POS, orthography, and word list are used. In the NE tagger experiment for evaluating the features, using SMO algorithm and 1500 sentences, for 15 NE classes, token classification accuracy of 93.419%, which outperform related work, could be achieved. For the POS tagger, using 12,000 token as the training data and 3,000 token as the testing data, the accuracy of 96.50% was achieved. For the syntactic parser, using CYK algorithm and 100 sentences as the training data and 36 sentences as the testing data, the experiment achieved the accuracy of 47.22%. For the semantic analyzer, using 200 sentences as the training data, the experiment achieved the accuracy of 62.50%. This research also shows an example in building an Indonesia NLP system using InaNLP for complaint tweet classification. In the experiment for the complaint classification, using 7440 data, the experiment achieved 0.892 of average F-measure score.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Classification accuracy,F-measure scores,InaNLP,Indonesia,Named entities,NAtural language processing,Rule-based approach,Syntactic parsers[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]InaNLP,Indonesia language,natural language processing toolkit[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICAICTA.2016.7803103[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]

Enter your keyword

InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification