[vc_empty_space][vc_empty_space]
Improving TensorFlow’s Memory Swapping by Prioritizing Earlier Tensors
Alvaro D.a, Kistijantoro A.I.a
a School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, Indonesia
[vc_row][vc_column][vc_row_inner][vc_column_inner][vc_separator css=”.vc_custom_1624529070653{padding-top: 30px !important;padding-bottom: 30px !important;}”][/vc_column_inner][/vc_row_inner][vc_row_inner layout=”boxed”][vc_column_inner width=”3/4″ css=”.vc_custom_1624695412187{border-right-width: 1px !important;border-right-color: #dddddd !important;border-right-style: solid !important;border-radius: 1px !important;}”][vc_empty_space][megatron_heading title=”Abstract” size=”size-sm” text_align=”text-left”][vc_column_text]© 2019 IEEE.The ever-increasing sizes of deep learning models and datasets used increase the need for memory, as insufficient memory may abort a training process. Adding to this problem, deep learning trainings tend to use GPUs over CPUs for better training speed, where in general a GPU has significantly less memory than its CPU counterpart. One solution is memory swapping, a way to free some memory by temporarily moving data to another memory pool, in this case moving data from a GPU memory to a CPU memory. However, since moving data takes time the larger the data the more the time performing memory swapping in a training can significantly increase the training duration. Therefore, an ideal memory swapping has to be selective on how much and which data to swap. TensorFlow, a machine learning framework, features memory swapping that, based on our analysis over its implementation, can be improved to reduce the increase on training duration it causes. The improvement is done is by prioritizing earlier tensors (tensor is the basic data unit in TensorFlow) which, because of a certain backpropagation property, allows the asynchronicity of program execution to increase, ultimately reducing training duration. Based on our experiments on Char-RNN models with various hyperparameters and datasets (of size 285 KB and 4.4 MB), the improvement reduces up to around 3% of training duration on certain cases.[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Author keywords” size=”size-sm” text_align=”text-left”][vc_column_text]Asynchronicity,Data units,Hyperparameters,Learning models,Memory pool,Program execution,Training process,Training speed[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Indexed keywords” size=”size-sm” text_align=”text-left”][vc_column_text]asynchronous,backpropagation,memory swapping,TensorFlow[/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”Funding details” size=”size-sm” text_align=”text-left”][vc_column_text][/vc_column_text][vc_empty_space][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][vc_empty_space][megatron_heading title=”DOI” size=”size-sm” text_align=”text-left”][vc_column_text]https://doi.org/10.1109/ICAICTA.2019.8904300[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/4″][vc_column_text]Widget Plumx[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row][vc_row][vc_column][vc_separator css=”.vc_custom_1624528584150{padding-top: 25px !important;padding-bottom: 25px !important;}”][/vc_column][/vc_row]