2-s2.0-84964978165

[vc_empty_space][vc_empty_space]

A multi-strategy approach for information extraction of financial report documents

^aDepartment of Statistical Computation, Sekolah Tinggi Ilmu Statistik, Jakarta, Indonesia
^bSchool of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2015 IEEE.Information extraction studies have been conducted to improve the efficiency ansd accuracy of information retrieval. We developed information extraction techniques to extract name of company, period of document, currency, revenue, and number of employee information from financial report documents automatically. Different with other works, we applied a multi-strategy approach for developing extraction techniques. We separated information based on its similar characteristics before designing extraction techniques. We assumed that the difference of characteristics owned by each information induces the difference of strategy applied. First strategy is constructing extraction techniques using rule-based extraction method on information, which has good regularity on orthographic and layout features such as name of company, period of document and currency. Second strategy is applying machine learning-based extraction method on information, which has rich contextual and list look-up features such as revenue and number of employee. On the first strategy, rule patterns are defined by combining orthographic, layout, and limited contextual features. Defined rule patterns succeed to extract information and gain precision, recall, and F1-measure more than 0.98. On the second strategy, we conducted extraction task as classification task. First, we built classification models using Naïve Bayes and Support Vector Machines algorithms. Then, we extracted the most informative features to train the classification models. The best classification model is used for extraction task. Contextual and list look-up features play important role in improving extraction performance. Second strategy succeed to extract revenue and number of employee information and gains precision, recall, and F-1 measure more than 0.93.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Accuracy of information,Classification models,Contextual feature,Extraction techniques,Information extraction techniques,Lookups,orthographic features,Support vector machines algorithms[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]contextual features,information extraction,list lookup features,Naïve Bayes,orthographic features,Support Vector Machines,Support Vector Machines[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICTS.2015.7379893[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]

Enter your keyword

A multi-strategy approach for information extraction of financial report documents