Enter your keyword

2-s2.0-85062891368

[vc_empty_space][vc_empty_space]

Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition

Putri F.Y.a, Lestari D.P.a, Widyantoro D.H.a

a School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Language model,LSTM,N-grams,Perplexity,Speech recognition systems,Spontaneous[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]ASR,Language model,LSTM,N-gram,Perplexity,Speech recognition system,Spontaneous[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text]ACKNOWLEDGMENT Thanks are due to PMDSU Research for the financial support in this project. We also thank PT. Prosa Solusi Cerdas and other parties who contributed in this experiment or during the process of working for this paper.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/IC3INA.2018.8629500[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]