[vc_empty_space][vc_empty_space]
Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger
Hoesen D.a, Purwarianti A.b
a Prosa Solusi Cerdas, Bandung, Indonesia
b School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.Researches on Indonesian named entity (NE)tagger have been conducted since years ago but without using deep learning. Most researches employed traditional machine learning algorithms such as association rule, support vector machine, random forest, naïve bayes, etc. In those researches, the word lists as gazetteers or clue words are provided to enhance the accuracy. Here, we attempt to employ deep learning in our Indonesian NE tagger. We use long short-term memory (LSTM)as the topology since it is the state-of-the-art of NE tagger. By using LSTM, we don’t need a word list in order to enhance the accuracy. Basically, there are two main things that we investigate. First is the output layers of the network: Softmax vs conditional random field (CRF). Second is the usage of part of speech (POS)tag embedding input layer. Using 8400 sentences as the training data and 97 sentences as the evaluation data, we found that POS tag embedding as the input layer improved the performance of our Indonesian NE tagger. As for the comparison between Softmax and CRF, we found that both architectures have a weakness in classifying an NE tag.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Conditional random field,Indonesian NE Tagger,Named entities,Part Of Speech,Random forests,Softmax,State of the art,Training data[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Bi-LSTM,CRF,Indonesian NE Tagger,POS Tag,Softmax[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/IALP.2018.8629158[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]