[vc_empty_space][vc_empty_space]
Co-citation & co-reference concepts to control focused crawler exploration
Maimunah S.a, Widyantoro D.H.b, Kuspriyantob, Sastramihardja H.S.b
a Information System Dept., Surabaya Adhi Tama Institute of Technology, Indonesia
b School of Electrical Engineering and Informatics, Bandung Institute of Technology, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]Focused crawler is an agent to index information according to specific topic. To traverse WWW, focused crawler makes a prediction of hyperlink’s visiting priority in order to download relevant documents as maximum as possible and to minimize downloaded irrelevant documents. Many researchers have proposed methods to improve focused crawling precision by minimizing irrelevant documents. However there is a precision and recall trade-off. More precision the results make less recall. This research has studied on conventional focused crawling search strategy (forward crawling) and Web documents structure. The result shows the low recall of conventional focused crawling is caused by some structural characteristics of WWW. Therefore, this research proposes a new strategy of focused crawler. The new strategy is a combination of bidirectional (forward and backward) crawling and bibliometric concepts (co-citation & co-reference). Bidirectional crawling is to improve the exploration and co-citation & co-reference concepts are to control the focusing. With this new strategy, focused crawler can obtain relevant documents that are connected through co-citations or relevant communities that act connected through co-references. Based on experiments that have been carried out, the results show that focused crawler with this new strategy, named CT-FC (more Comprehensive Traversal Focused Crawler) has better exploration capability so that recall increases and precision can remain high. © 2011 IEEE.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Cocitation,Focused crawler,Focused crawling,Hyperlinks,Index information,New strategy,Precision and recall,recall,Relevant documents,Search strategies,Structural characteristics,Web document[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]co-citation & co-reference,focused crawler,forward & backward crawling,recall[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICEEI.2011.6021677[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]