Enter your keyword

2-s2.0-85062809612

[vc_empty_space][vc_empty_space]

Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation

Thirafi F.a, Lestari D.P.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.Nowadays, there are many software applications which assist people to access Quran with their own device. Some of those applications are completed by feature to recognize Quran recitation from the user as well. Therefore, capability of the application to recognize Quran recitation is attracting to be observed. Automatic Speech Recognition (ASR)on Quran recitation is a new research for the past years, compared to English or other spoken languages. For some research, Hidden Markov Model (HMM)- Gaussian Mixture Model (GMM)is still popular to be utilized in acoustic modeling. However, HMM-GMM has a disadvantage in generalizing high-variance data. There is also a problem in solving non-linearly separable data. To tackle those problems, a new method to train the acoustic model for Quran speech recognition with deep learning approach was proposed in this paper. Bidirectional Long-Short Term Memory (BLSTM)as one of deep learning topologies was used in the experiment. This topology was combined with HMM as a hybrid system. In some research, this method had worked well for another language e.g. English speech recognition. In general, the research result showed that this method was also working greatly to Quran speech recognition compared to our baseline system with HMM-GMM. For baseline models, the average result of WER was 18.39%. On the other hand, our experimental model (acoustic model with Hybrid HMM-BLSTM)showed a far better result, with average WER value 4.63% for the same testing scenario. In this research also, Quran recitation style effect was also analyzed by building the model which depended on Quran recitation style (Maqam).[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Automatic speech recognition,BLSTM,Gaussian Mixture Model,hybrid,Maqam,Non-linearly separable data,Quran recitation,Software applications[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]BLSTM,deep learning,hybrid,Maqam,Quran recitation[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/IALP.2018.8629184[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]