Enter your keyword

2-s2.0-85084087030

[vc_empty_space][vc_empty_space]

Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition

Zakiah I.a, Lestari D.P.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung (ITB), Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2019 IEEE.Nowadays Automatic Speech Recognition (ASR) is increasingly developed, including in the lecture domain. ASR that build from scratch requires very large data, so it can use another approach, transfer learning. Transfer learning is an approach to building models by utilizing existing models as source models. This experiment begins with data collection on the lecture of the Informatics Undergraduate Program in ITB. We used spontaneous language models on the news domain as source models. We divided into three systems, some systems use the news domain, lecture domain, and both of them. In all three systems, the acoustic model used was triphone GMM-HMM and MAP which was only on the 3rd system. The language system uses the N-gram and LSTM with the projection layer. Transfer learning is implemented both in N-gram interpolation and weight initialization LSTM model. The news domain system gives WER score 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58.232% (5-fold) and 62.18% (10sp), and the transfer learning system 52.734% (5-fold) and 67.0 (10sp). The best ASR for the lecture is the system B, but the transfer learning approach gives better result if the acoustic condition from both the test and train data are same. We also conducted topic-specific modeling in the proposed system that divided all courses into 6 clusters and added the text corpus for LM to observe the effect of reducing scope. The experiment shows that OOV is correlated with WER and reducing the data scope not always improve the performance.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Acoustic conditions,Acoustic model,Automatic speech recognition,Data collection,Language model,Undergraduate program,Very large datum,Weight initialization[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]AM,ASR,interpolation,LM,LSTM,MAP,N-gram,transfer learning,triphone,WER[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text]This research is part of research roadmap in Artificial Intelligence and Graphics Laboratory ITB and partially funded by “Program Penelitian, Pengabdian kepada Masyarakat, dan Inovasi (P3MI) Kelompok Keahlian ITB”. Thanks to PT.Prosa Solusi Cerdas that support the hardware, software, and environment for this research.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICAICTA.2019.8904225[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]