[vc_empty_space][vc_empty_space]
Statistical-based approach for Indonesian complex factoid question decomposition
Basuki S.a, Purwarianti A.b
a Informatics Department, Universitas Muhammadiyah Malang, Indonesia
b School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2016, School of Electrical Engineering and Informatics. All rights reserved.This research has proposed a method to decompose complex factoid question into several independent questions. The method comprises four stages: (1) classifying input question into several categories such as sub-question, coordination, exemplification, or double question, (2) generating all possible question boundary candidates, (3) selecting the best question boundary, and (4) performing the question decomposition rule using the best question boundary. This study compared several machine learning algorithms in the first stage (complex factoid question classification) and third stage (question decomposition boundary selection). The features used in the classification are specific word lists with its related information including the syntactic features of POS (Part of Speech) tag. For the experiments, we annotated 916 sentences for training data and 226 sentences for testing data. The perplexity of the annotated corpus achieved 1.000586 with 307 Out of Vocabulary (OOV). The complex factoid question classification accuracy reached 93.8% with Random Forest algorithm. The question decomposition boundary selection accuracy achieved 93.80% for sub-question (using Random Forest algorithm), 86.11% for double question (using Random Forest algorithm), 88.23% for coordination (using SMO), and 60.87% for exemplification (using kNN, NB, and RF). A revision rule was provided for the question decomposition boundary selection that improved the accuracy into 97.22% for double question, 94.11% for coordination, and 65.21% for exemplification.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Complex factoid question,Coordination,Double question,Exemplification,Question decomposition,Sub-question[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.15676/ijeei.2016.8.2.9[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]