[vc_empty_space][vc_empty_space]
Systemic Risk Document Classification on Indonesian News Articles using Deep Learning and Active Learning
Gumilang M.a, Purwarianti A.b, Nurdinasari F.c
a PT Prosa Solusi Cerdas, Bandung, Indonesia
b School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
c Bank Indonesia, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2019 IEEE.Indonesian online news articles are growing fastly in this decade. One of the information is about economic news, including the information on financial systemic risk. In order to get information on financial systemic risk in real time, the task on systemic risk document classification should be done automatically. Here, we employ deep learning and active learning to classify systemic risk document automatically. We use 15 classes of financial systemic risk, such as defined before by Bank of Indonesia. The task is a multi-label classification, where a text document may contain more than 1 information of systemic risk. For the deep learning strategy, we’ve conducted several experiments of CNN, Bi-LSTM and Bi-GRU. We’ve also compared it with two steps of classification. In the experimental result, using 1752 documents as the training data and 228 documents as the testing data, the highest F1 score was achieved by using Bi-LSTM topology with one classification step and large common corpus as the resource for the word embedding. The highest F1 score was 45.37% for 15 classes with probability threshold defined as 0.15. In the two steps of classification, the first classification for 2 classes (contain risk information or not), the accuracy was 82.46%. To handle the limited data, we’ve conducted active learning to select the next candidate to be labeled as training data. In the experiment, for 420 new data with each iteration of 20 new data, the results showed that using active learning couldn’t improve the performance.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Active Learning,Document Classification,Learning strategy,Multi label classification,Probability threshold,Risk information,Systemic risks,Text document[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]active learning,Bi-LSTM,deep learning,Systemic risk document classification[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICEEI47359.2019.8988829[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]