[vc_empty_space][vc_empty_space]
Rule-based reordering and post-processing for indonesian-Korean statistical machine translation
Mawalim C.O.a, Lestari D.P.a, Purwarianti A.a
a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]Copyright © 2017 Candy Olivia Mawalim, Dessi Puji Lestari and Ayu PurwariantiThis paper presents several experiments on constructing Indonesian-Korean Statistical Machine Translation (SMT) system. A parallel corpus containing around 40,000 segments on each side has been developed for training the baseline SMT system that is built based on n-gram language model and the phrase-based translation table model. This system still has several problems, including non-translated phrases, mistranslation, incorrect phrase orders, and remaining Korean particles in the target language. To overcome these problems, some techniques are employed i.e. POS (part-of-speech) tag model, POS-based reordering rules, multiple steps translation, additional post-process, and their combinations. We then test the SMT system by randomly extracting segments from the parallel corpus. In general, the additional techniques lead to better performance in terms of BLEU score compared to the baseline system.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Baseline systems,N-gram language models,Parallel corpora,Part Of Speech,Post processing,Statistical machine translation,Table modeling,Target language[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]