[vc_empty_space][vc_empty_space]
Exploiting comparable corpora to enhance bilingual lexicon induction from monolingual corpora
Sholikah R.W.a, Morimoto Y.b, Arifin A.Z.a, Fatichah C.a, Purwarianti A.c
a Department of Informatics, Faculty of Intelligent Electrical and Information Technology, Institut Teknologi Sepuluh Nopember, Indonesia
b Graduate School of Advanced Science and Engineering, Hiroshima University, Japan
c Informatics Engineering, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2020, Intelligent Network and Systems Society.Bilingual lexicons are essential resources in natural language processing (NLP) and information retrieval (IR). Automatic bilingual lexicon acquisition relies on a large number of parallel corpora that can be scarce or even unavailable for several languages. On the other hand, there are other resources that can be used to build bilingual lexicon such as comparable corpora (aligned documents) and monolingual corpora that are easily to get and available in any language, including resource-limited languages. Hence, this paper proposes a two stages framework that can learn bilingual lexicons from monolingual corpora enhanced using comparable corpora without any additional resources. The framework consists of two stages: comparable dictionary building and monolingual mapping. Comparable dictionary building is a process to create coarse dictionary from comparable corpora by utilizing topic modeling approach. The second stage is monolingual mapping by using the result from the previous stage as seed initialization for the bi-directional projection learning. The utilization of comparable corpora can replace the need of bilingual dictionary. The experiment was conducted using three kinds of language pairs: English-®Indonesia, English-®Arabic and Arabic-®Indonesia. The result of the experiment showed that the proposed method can enhance the accuracy from monolingual corpora and outperform other previous methods.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Bilingual lexicon,Comparable corpora,Enhanced-mono,Hubness problem,Linear mapping,Monolingual corpora[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text]This work was supported by the Ministry of Research, Technology and Higher Education of Republic Indonesia (No. 135 /SP2H/LT/DRPM/IV/ 2017).[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.22266/ijies2020.1031.34[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]