[vc_empty_space][vc_empty_space]
A non deterministic Indonesian stemmer
a School of Electrical Engineering and Informatics, Bandung Institute of Technology, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]A stemmer is a basic natural language processing tool that is widely used for many text based applications such as information retrieval or question answering engine. Existing Indonesian stemmer gives only one alternative of word result which is a deterministic way even though the problem is shown as a non deterministic. The existing algorithm selects only the first fit morphology rule defined in the system. It gives inaccurate result for two problems: words with more than one word candidate result (such as “perbaikan” with “per – an” or “per – kan”) and words with more than one affix combination (such as “beruang” or “mereka”). To handle these problems, this research proposes a stemmer with more accurate word results by employing a non deterministic algorithm which gives more than one word candidate result and more than one affix combination. Here, the word result does not depend on order of the morphology rule. All rules are checked and the word results are kept in a candidate list. To make an efficient stemmer, two kinds of word list (vocabulary) are used: words that have more than one candidate word and list of root word as a candidate reference. The final word results are selected with several heuristic rules. This strategy is proved to have better result than the two most known Indonesian stemmers. The experiments showed that the proposed approach gave higher accuracy than the two most known compared systems. © 2011 IEEE.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]affix combination,Candidate list,First fit,Heuristic rules,Indonesian stemmer,morphologically ambiguous word,NAtural language processing,non deterministic,Nondeterministic algorithms,Question Answering,Word lists[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]affix combination,Indonesian stemmer,morphologically ambiguous word,non deterministic[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICEEI.2011.6021829[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]