Enter your keyword

2-s2.0-85050581734

[vc_empty_space][vc_empty_space]

A statistical and rule-based spelling and grammar checker for Indonesian text

Fahda A.a, Purwarianti A.a

a School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2017 IEEE.Spelling and grammar checkers are widely-used tools which aim to help in detecting and correcting various writing errors. However, there are currently no proofreading systems capable of checking both spelling and grammar errors in Indonesian text. This paper proposes an Indonesian spelling and grammar checker prototype which uses a combination of rules and statistical methods. The rule matcher module currently uses 38 rules which detect, correct, and explain common errors in punctuation, word choice, and spelling. The spelling checker module examines every word using a dictionary trie to find misspellings and Damerau-Levenshtein distance neighbors as correction candidates. Morphological analysis is also added for certain word forms. A bigram/co-occurrence Hidden Markov Model is used for ranking and selecting the candidates. The grammar checker uses a trigram language model from tokens, POS tags, or phrase chunks for identifying sentences with incorrect structures. By experiment, the co-occurrence HMM with an emission probability weight coefficient of 0.95 is selected as the most suitable model for the spelling checker. As for the grammar checker, the phrase chunk model which normalizes by chunk length and uses a threshold score of -0.4 gave the best results. The document evaluation of this system showed an overall accuracy of 83.18%. This prototype is implemented as a web application.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Grammar checkers,Indonesians,Rule based,Spelling checker,statistical checker[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Indonesian grammar checker,Indonesian proofreading system,Indonesian spelling checker,rule-based checker,statistical checker[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICODSE.2017.8285846[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]