[vc_empty_space][vc_empty_space]
Experiments on character and word level features for text classification using deep neural network
Gumilang M.a, Purwarianti A.b
a PT Prosa Solusi Cerdas, Indonesia
b School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.Text classification is a task to assign text documents according to its content to one or more classes automatically. Recently character-level models using deep neural networks have been developed to do classification text. Moreover, in some cases, character-level models have outperformed word-level models and traditional models, especially on user-generated dataset. The topologies that have been used for the character-level models are convolutional neural networks (CNN) and bidirectional recurrent neural networks (Bi-RNN), with its variants; long short-term memory (LSTM) and gated recurrent units (GRU). In this paper, CNN, Bi-RNN, and the combination of both are tested with character-level features and word-level features for text classification on English and Indonesian social media datasets. On small size datasets, word-level model outperformed character-level models. However, on dataset with millions of data, character-level model outperformed word-level model. Further analysis on the evaluation of word-level and character-level models is also discussed in this paper.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Bidirectional,Character level,LSTM,Text classification,Tweet,Word level[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Bidirectional,Character-level,CNN,GRU,LSTM,Text classification,Tweet,Word-level[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/IAC.2018.8780509[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]