Abstract
Fault-tolerant control policies that automatically restart programable logic controller-based automated production system during fault recovery can increase system availability. This article provides a proof of concept that such policies can be synthesized with deep reinforcement learning. The authors specifically focus on systems with multiple end-effectors that are actuated in only one or two axes, commonly used for assembly and logistics tasks. Due to the large number of actuators in multi-end-effector systems and the limited possibilities to track workpieces in a single coordinate system, these systems are especially challenging to learn. This article demonstrates that a hierarchical multi-agent deep reinforcement learning approach together with a separate coordinate prediction module per agent can overcome these challenges. The evaluation of the suggested approach on the simulation of a small laboratory demonstrator shows that it is capable of restarting the system and completing open tasks as part of fault recovery.