Enter your keyword

Multimodal fusion algorithm and reinforcement learning-based dialog system in human-machine interaction

Fakhrurroja H.a, Machbub C.a, Prihatmanto A.S.a, Purwarianti A.a

a Institut Teknologi Bandung, School of Electrical Engineering and Informatics, Indonesia

Abstract

© 2020, School of Electrical Engineering and Informatics. All rights reserved.Studies on human-machine interaction system show positive results on system development accuracy. However, there are problems, especially using certain input modalities such as speech, gesture, face detection, and skeleton tracking. These problems include how to design an interface system for a machine to contextualize the existing conversations. Other problems include activating the system using various modalities, right multimodal fusion methods, machine understanding of human intentions, and methods for developing knowledge. This study developed a method of human-machine interaction system. It involved several stages, including a multimodal activation system, methods for recognizing speech modalities, gestures, face detection and skeleton tracking, multimodal fusion strategies, understanding human intent and Indonesian dialogue systems, as well as machine knowledge development methods and the right response. The research contributes to an easier and more natural human-machine interaction system using multimodal fusion-based systems. The average accuracy rate of multimodal activation, testing dialogue system using Indonesian, gesture recognition interaction, and multimodal fusion is 87.42%, 92.11%, 93.54% and 93%, respectively. The level of user satisfaction towards the multimodal recognition-based human-machine interaction system developed was 95%. According to 76.2% of users, this interaction system was natural, while 79.4% agreed that the machine responded well to their wishes.

Author keywords

Indexed keywords

Human-machine interaction,Indonesian dialogue system,Multimodal fusion,Natural language understanding,Reinforcement learning

Funding details

Kinect is an active sensor for face detection and gesture tracking applications. This is because the Kinect camera has an integrated infrared sensor and captures streaming colour images with accurate data. The Kinect sensor receives three-dimensional data using colour camera components, infrared transmitters and receivers. The sensor is supported by a development kit of face tracking software [16].

DOI