[vc_empty_space][vc_empty_space]
State thresholding to accelerate reinforcement learning
Sari S.C.a, Prihatmanto A.S.b, Adiprawita W.b, Kuspriyantob
a Department of Electrical Engineering, Faculty of Engineering, General Achmad Yani University (UNJANI), Cimahi, 40533, Indonesia
b Department of Electrical Engineering, School of Electrical Engineering and Informatics (STEI), Bandung Institute of Technology, Bandung, 40132, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2014 IEEE.Along with the learning convergence and nonstationary equilibria, important thing to be solved in Reinforcement Learning is the slow-learning problem. In highly dynamic and stochastic systems, Markov Decision Processes is often used to model the situation, and RL is used to produce optimum control values which are expressed by optimum policy. However, approaches to solving MDP’s using RL depend on storing the optimal value function or Q-value function and action models as tables do not scale to large state-spaces. The number of states grow exponentially along with the number of agents, the state and action space, which cause the learning process become very slow since it needs a very large amount of computer memory, and yet it needs more computation than the most computer performance nowadays have offered. This paper addressed curse of dimensionality problem by reducing the states in the MDP iteratively using a novel algorithm called State Thresholding in Reinforcement Learning (STRL). STRL accelerate the learning process and empirically proven to outperformed Q learning algorithm.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Computer performance,Curse of dimensionality,Learning convergence,Learning problem,Markov Decision Processes,Optimal value functions,Q-learning algorithms,Robot navigation[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]gridworld robot navigation,learning acceleration,Markov Decision Process,Reinforcement Learning[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICSEngT.2014.7111787[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]