Distributed Deep Reinforcement Learning Method Using Profit Sharing for Learning Acceleration

Naoki Kodama, Taku Harada, Kazuteru Miyazaki

研究成果: Article査読

1 被引用数 (Scopus)

抄録

Profit Sharing (PS), a reinforcement learning method that strongly reinforces successful experiences, has been shown to contribute to the improvement of learning speed when combined with a deep Q-network (DQN). We expect a further improvement in learning speed by integrating PS-based learning and Ape-X DQN that has state-of-the-art learning speed instead of the DQN. However, PS-based learning does not use replay memory. In contrast, the Ape-X DQN requires the use of replay memory because the exploration of the environment for collecting experiences and network training are performed asynchronously. In this study, we propose Learning-accelerated Ape-X, which integrates the Ape-X DQN and PS-based learning with some improvements including the use of replay memory. We show through numerical experiments that the proposed method improves the scores in Atari 2600 video games in a shorter time than the Ape-X DQN.

本文言語English
ページ(範囲)1188-1196
ページ数9
ジャーナルIEEJ Transactions on Electrical and Electronic Engineering
15
8
DOI
出版ステータスPublished - 1 8月 2020

フィンガープリント

「Distributed Deep Reinforcement Learning Method Using Profit Sharing for Learning Acceleration」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル