Profit Sharing (PS), a reinforcement learning method that strongly reinforces successful experiences, has been shown to contribute to the improvement of learning speed when combined with a deep Q-network (DQN). We expect a further improvement in learning speed by integrating PS-based learning and Ape-X DQN that has state-of-the-art learning speed instead of the DQN. However, PS-based learning does not use replay memory. In contrast, the Ape-X DQN requires the use of replay memory because the exploration of the environment for collecting experiences and network training are performed asynchronously. In this study, we propose Learning-accelerated Ape-X, which integrates the Ape-X DQN and PS-based learning with some improvements including the use of replay memory. We show through numerical experiments that the proposed method improves the scores in Atari 2600 video games in a shorter time than the Ape-X DQN.
|ジャーナル||IEEJ Transactions on Electrical and Electronic Engineering|
|出版ステータス||Published - 1 8月 2020|