Skip to content

Xianchao Zhu, Ruiyuan Zhang, Tianyi Huang, Xiaoting Wang, Visual transfer for reinforcement learning via gradient penalty based Wasserstein domain confusion

Full Text: PDF
DOI: 10.23952/jnva.6.2022.3.05

Volume 6, Issue 3, 1 June 2022, Pages 227-238

 

Abstract. It is pretty challenging to transfer learned policies among different visual environments. The recently proposed Wasserstein Adversarial Proximal Policy Optimization (WAPPO) attempts to overcome this difficulty by definitely learning a representation, which is sufficient to express the originating and the target domains simultaneously. Specifically, WAPPO uses the Wasserstein Confusion target function to force reinforcement learning (RL) agents to learn the mapping from visually different environments to domain-independent expressions, thereby achieving better domain adaptation performance in RL. However, WAPPO uses weight clipping to strengthen the Lipschitz continuity of the Wasserstein Confusion target function, which results in poor manifestation. In this paper, we present Gradient Penalty based Wasserstein Adversarial Proximal Policy Optimization (GPWAPPO), a new approach for the visual transfer in RL that learns to match the distributions of distilled characteristics between an originating domain and the objective domain. Specifically, we propose a new target function, Gradient Penalty-based Wasserstein Confusion (GPWC), which uses selective clipping weights to catch up with the gradient norm of the target function relative to its input. GPWAPPO is superior to the previous methods in visual transfer and triumphantly transfers strategies across Visual Cartpole and 16 OpenAI Procgen domains.

 

How to Cite this Article:
X. Zhu, R. Zhang, T. Huang, X. Wang, Visual transfer for reinforcement learning via gradient penalty based Wasserstein domain confusion, J. Nonlinear Var. Anal. 6 (2022), 227-238.