Enter your keyword

2-s2.0-85083261511

[vc_empty_space][vc_empty_space]

Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

Hoesen D.a, Putri F.Y.a, Lestari D.P.b

a Prosa.ai, Bandung, Indonesia
b Institut Teknologi Bandung, Department of Informatics, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2019 IEEE.Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our cross-validation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian languages,Percentage points,Phone error rate,Pronunciation dictionaries,Recognition accuracy,Recognition error,Sequence modeling,Speech recognition systems[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian,pronunciation dictionary,sequence-to-sequence,speech recognition[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/O-COCOSDA46868.2019.9041182[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]