Enter your keyword

2-s2.0-85045153899

[vc_empty_space][vc_empty_space]

Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

Vista C.B.a, Satriawan C.H.a, Lestari D.P.a, Widyantoro D.H.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© Published under licence by IOP Publishing Ltd.The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Acoustic model,Automatic speech recognition system,Baseline models,Confidence score,First systems,Indonesian languages,Spontaneous speech,Training data[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian language,Specific model,speech recognition,spontaneous speech[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text]This research was partially supported by the Master’s Program toward Doctoral Degree for Excellent Graduate (Program Pendidikan Magister Menuju Doktor untuk Sarjana Unggul/PMDSU) from Kemenristekdikti Indonesia within research entitled “Indonesian Automatic Speech Recognition System”.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1088/1742-6596/978/1/012059[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]