[vc_empty_space][vc_empty_space]
Study and Implementation of Prosody Manipulation Method for Indonesian Speech Synthesis System
Prini S.U.a, Prihatmanto A.S.b, Jatmiko D.A.c
a Research Center for Electronics and Telecommunication, Indonesian Institute of Sciences, Bandung, Indonesia
b School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia
c Departement of Informatics Engineering, Indonesian Computer University, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.Speech Synthesis System is a system used to convert text in a language into a sound. The focus of this research is to produce a ‘humanization’ of speech synthesis system pronunciation. The main requirement for Text To Speech system in this research are eSpeak, MBROLA idl database for Indonesia, Human Speech Corpus database which derived from the website that summarizes the words with the most common words used in a country, and three basic types of emotion or intonation that are designed for happy emotions, angry emotions, and sad emotions. The approach method used to develop an emotional filter is to manipulate prosody values (especially pitch and duration values) using predetermined level factors. The test results of Human Speech Corpus perception test for happy emotions are 95%, angry emotion is 96.25% and emotion sad is 98.75%. For the aspect of the clarity test, the audible sound accuracy with the original sentence is 93.3%, and for the clarity level each sentence is 62.8%. For the naturalness aspect to test the accuracy of emotional selection is 75.6% with every happy emotion is 90%, angry emotion is 73.3% and sad emotions of 60%.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Emotions,ESpeak,Human speech,MBROLA,Prosody manipulations[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Emotions,ESpeak,Human Speech Corpus,MBROLA,Prosody manipulation,Speech Synthesis[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICITSI.2018.8696028[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]