[vc_empty_space][vc_empty_space]
Rude-Words Detection for Indonesian Speech Using Support Vector Machine
Novitasari S.a, Lestari D.P.a, Sakti S.b, Purwarianti A.a
a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
b Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.This paper presents an approach to detect rude or swear-words in Indonesian transcribed speech by using Support Vector Machine and various combinations of text and acoustic features. Rude-words considered as words which prohibited to be shown in broadcast and it usually will be censored through censorship. In the constructed framework, those words are detected by identifying the rudeness of each word of the given speech utterance. This identification aimed to be done by considering speech’s context aside from the word itself, since word’s rudeness related to the context of the speech. Results of the experiment show that rude-words detection which utilized textual features set that consists of word-embedding, trigram POS-tag, word list, and sentence-embedding resulted in the best performance compared to other experimented features sets. This model also outperformed the rude-words detection by acoustic model, multi-modal model, and keyword matching technique. The Fl-scores of the best model are 83.62 % in word-level detection and 87.07% in sentence-level detection.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Acoustic features,Acoustic model,Indonesian languages,Key word matching,rude-words,Sentence level,Speech utterance,Textual features[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]detection,Indonesian language,rude-words,support vector machine[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/IALP.2018.8629145[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]