Enter your keyword

2-s2.0-85020228366

[vc_empty_space][vc_empty_space]

Analyzing and classifying Indonesian spontaneous and dictated speech

Satriawan C.H.a, Lestari D.P.a

a School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2016 IEEE.The accurate recognition of spontaneous speech is crucial in achieving practical speech recognition. Statistical-based recognition models typically employ a large amount of read or dictated speech for training, which often yields poor spontaneous recognition performance. Many approaches have been forwarded to improve performance, including model adaptation and model switching. In an effort to improve Indonesian language spontaneous recognition performance, we attempt to pinpoint the acoustic differences between spontaneous and dictated Indonesian speech. At the phoneme level, we find that there are differences in the distribution and pronunciation of several key phonemes associated with filled pauses. Across speakers, there is a consistent reduction in segment duration and segment energy, with a less marked spectral reduction. Using these differences as a starting point, we train a number of classifiers that can accurately identify spontaneous and read Indonenesian utterances at F1 scores consistently above 90%. We show that classification is achievable by considering segment features and feature differences between consecutive segments, or ‘delta’ and ‘delta-delta segments’.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Feature differences,Improve performance,Indonesian languages,Model Adaptation,Recognition models,Segment duration,Spectral reductions,Spontaneous speech[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian language,Speech recognition,spontaneous speech[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICSDA.2016.7918982[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]