Enter your keyword

2-s2.0-85065905457

[vc_empty_space][vc_empty_space]

Vulnerability Detection in PHP Web Application Using Lexical Analysis Approach with Machine Learning

Anbiya D.R.a,b, Purwarianti A.a, Asnar Y.a

a Bandung Institute of Technology, School of Electrical Engineering and Informatics, Bandung, Indonesia
b Agency for the Assesment and Application of Technology, Laboratory for Information and Communication Technology Services, Tangerang Selatan, Indonesia

[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2018 IEEE.Security is an important aspect and continues becoming a challenging topic especially in a web application. Today, 78,9% of websites uses PHP as programming languages. As a popular language, WebApps written in PHP tend to have many vulnerabilities and they are reflected from their source codes. Static analysis is a method that can be used to perform vulnerability detection in source codes. However, it usually requires an additional method that involves an expert knowledge. In this paper, we propose a vulnerability detection technique using lexical analysis with machine learning as a classification method. In this work, we focused on using PHP native token and Abstract Syntax Tree (AST) as features then manipulate them to get the best feature. We pruned the AST to dump some unusable nodes or subtrees and then extracted the node type token with Breadth First Search (BFS) algorithm. Moreover, unusable PHP token are filtered and also combined each other token to enrich the features extracted using TF-IDF. These features are used for classification in machine learning to find the best features between AST token and PHP token. The classification methods that we used were Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM) and Decision Tree. As the result, we were able to get highest recall score at 92% with PHP token as features and Gaussian Naïve Bayes as machine learning classification method.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Abstract Syntax Trees,Breadth first search algorithms,Classification methods,Expert knowledge,Imbalanced Data-sets,Machine learning classification,PHP web applications,Vulnerability detection[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Classification,Imbalanced data set,Machine learning,Vulnerability detection[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICODSE.2018.8705809[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]