Abstract

Fault-tolerant control policies that automatically restart programable logic controller-based automated production system during fault recovery can increase system availability. This article provides a proof of concept that such policies can be synthesized with deep reinforcement learning. The authors specifically focus on systems with multiple end-effectors that are actuated in only one or two axes, commonly used for assembly and logistics tasks. Due to the large number of actuators in multi-end-effector systems and the limited possibilities to track workpieces in a single coordinate system, these systems are especially challenging to learn. This article demonstrates that a hierarchical multi-agent deep reinforcement learning approach together with a separate coordinate prediction module per agent can overcome these challenges. The evaluation of the suggested approach on the simulation of a small laboratory demonstrator shows that it is capable of restarting the system and completing open tasks as part of fault recovery.

References

1.
Sutton
,
R. S.
, and
Barto
,
A. G.
,
2018
,
Reinforcement Learning: An Introduction
,
MIT Press
,
Boca Raton, FL
.
2.
Zinn
,
J.
,
Vogel-Heuser
,
B.
, and
Ockier
,
P.
,
2020
, “
Deep Q-Learning for the Control of PLC-Based Automated Production Systems
,”
IEEE International Conference on Automation Science and Engineering (CASE)
,
Hong Kong, China
,
Aug. 20–21
.
3.
Laprie
,
J.-C.
,
1992
, “Dependability: Basic Concepts and Terminology,”
Dependability: Basic Concepts and Terminology
,
J.
Laprie
, ed.,
Springer
,
New York
, pp.
3
245
.
4.
Isermann
,
R.
, and
Ballé
,
P.
,
1997
, “
Trends in the Application of Model-Based Fault Detection and Diagnosis of Technical Processes
,”
Control. Eng. Pract.
,
5
(
5
), pp.
709
719
. 10.1016/S0967-0661(97)00053-1
5.
Loborg
,
P.
,
1994
, “
Error Recovery in Automation: An Overview
,”
AAAI Spring Symposium on Detecting and Resolving Errors in Manufacturing Systems
,
Stanford, CA
,
Mar. 19–23
.
6.
Andersson
,
K.
,
Lennartson
,
B.
,
Falkman
,
P.
, and
Fabian
,
M.
,
2011
, “
Generation of Restart States for Manufacturing Cell Controllers
,”
Control. Eng. Pract.
,
19
(
9
), pp.
1014
1022
. 10.1016/j.conengprac.2011.05.013
7.
Zhang
,
Y.
, and
Jiang
,
J.
,
2008
, “
Bibliographical Review on Reconfigurable Fault-Tolerant Control Systems
,”
Ann. Rev. Control
,
32
(
2
), pp.
229
252
. 10.1016/j.arcontrol.2008.03.008
8.
Legat
,
C.
, and
Vogel-Heuser
,
B.
,
2017
, “
A Configurable Partial-Order Planning Approach for Field Level Operation Strategies of PLC-Based Industry 4.0 Automated Manufacturing Systems
,”
Eng. Appl. Artificial Intel.
,
66
, pp.
128
144
. 10.1016/j.engappai.2017.06.014
9.
Lepuschitz
,
W.
,
Zoitl
,
A.
,
Vallée
,
M.
, and
Merdan
,
M.
,
2011
, “
Toward Self-Reconfiguration of Manufacturing Systems Using Automation Agents
,”
IEEE Trans. Syst., Man, Cyber., Part C (Appl. Rev.)
,
41
(
1
), pp.
52
69
. 10.1109/TSMCC.2010.2059012
10.
Vallee
,
M.
,
Merdan
,
M.
,
Lepuschitz
,
W.
, and
Koppensteiner
,
G.
,
2011
, “
Decentralized Reconfiguration of a Flexible Transportation System
,”
IEEE Trans. Indus. Inform.
,
7
(
3
), pp.
505
516
. 10.1109/TII.2011.2158839
11.
Du
,
N.
,
Hu
,
H.
, and
Zhou
,
M.
,
2020
, “
A Survey on Robust Deadlock Control Policies for Automated Manufacturing Systems With Unreliable Resources
,”
IEEE Trans. Auto. Sci. Eng.
,
17
(
1
), pp.
389
406
. 10.1109/TASE.2019.2926758
12.
Bareiss
,
P.
,
Schütz
,
D.
,
Priego
,
R.
,
Marcos
,
M.
, and
Vogel-Heuser
,
B.
,
2016
, “
A Model-Based Failure Recovery Approach for Automated Production Systems Combining SysML and Industrial Standards
,”
IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)
,
Berlin, Germany
,
Sept. 6–9
, IEEE, pp.
1
7
.
13.
Bergagård
,
P.
,
Falkman
,
P.
, and
Fabian
,
M.
,
2015
, “
Modeling and Automatic Calculation of Restart States for an Industrial Windscreen Mounting Station
,”
IFAC-PapersOnLine
,
48
(
3
), pp.
1030
1036
. 10.1016/j.ifacol.2015.06.219
14.
Gu
,
S.
,
Holly
,
E.
,
Lillicrap
,
T.
, and
Levine
,
S.
,
2017
, “
Deep Reinforcement Learning for Robotic Manipulation With Asynchronous Off-Policy Updates
,”
IEEE International Conference on Robotics and Automation (ICRA)
,
Singapore
,
May 29–June 3
, IEEE, pp.
3389
3396
.
15.
Andrychowicz
,
M.
,
Wolski
,
F.
,
Ray
,
A.
,
Schneider
,
J.
,
Fong
,
R.
,
Welinder
,
P.
,
McGrew
,
B.
,
Tobin
,
J.
,
Abbeel
,
O. P.
, and
Zaremba
,
W.
,
2017
, “
Hindsight Experience Replay
,”
Advances in Neural Information Processing Systems
,
Long Beach, CA
,
Dec. 4–9
, pp.
5048
5058
.
16.
Kahn
,
G.
,
Villaflor
,
A.
,
Ding
,
B.
,
Abbeel
,
P.
, and
Levine
,
S.
,
2018
, “
Self-Supervised Deep Reinforcement Learning With Generalized Computation Graphs for Robot Navigation
,”
IEEE International Conference on Robotics and Automation (ICRA)
,
Brisbane, Australia
,
May 21–26
, IEEE, pp.
5129
5136
.
17.
Chen
,
Y. F.
,
Everett
,
M.
,
Liu
,
M.
, and
How
,
J. P.
,
2017
, “
Socially Aware Motion Planning With Deep Reinforcement Learning
,”
IEEE International Conference on Intelligent Robots and Systems (IROS)
,
Vancouver, BC, Canada
,
Sept. 24–28
, IEEE, pp.
1343
1350
.
18.
Schwung
,
D.
,
Kempe
,
T.
,
Schwung
,
A.
, and
Ding
,
S. X.
,
2017
, “
Self-Optimization of Energy Consumption in Complex Bulk Good Processes Using Reinforcement Learning
,”
IEEE International Conference on Industrial Informatics (INDIN)
,
Emden, Germany
,
July 24–26
, IEEE, pp.
231
236
.
19.
Schwung
,
D.
,
Modali
,
M.
, and
Schwung
,
A.
,
2019
, “
Self-Optimization in Smart Production Systems Using Distributed Reinforcement Learning
,”
IEEE International Conference on Systems, Man, and Cybernetics (SMC)
,
Bari, Italy
,
Oct. 6–9
, IEEE, pp.
4063
4068
.
20.
Jaensch
,
F.
,
Csiszar
,
A.
,
Kienzlen
,
A.
, and
Verl
,
A.
,
2018
, “
Reinforcement Learning of Material Flow Control Logic Using Hardware-in-the-Loop Simulation
,”
First International Conference on Artificial Intelligence for Industries (AI4I)
,
Laguna Hills, CA
,
Sept. 26–28
, IEEE, pp.
77
80
.
21.
Sohége
,
Y.
, and
Provan
,
G.
,
2018
, “
On-Line Reinforcement Learning for Trajectory Following With Unknown Faults
,”
26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science (AICS)
,
Dublin, Ireland
,
Dec. 6–7
, Vol.
2259
, pp.
291
302
.
22.
Mao
,
W.
,
Wang
,
L.
,
Zhao
,
J.
, and
Xu
,
Y.
,
2020
, “
Online Fault-Tolerant Vnf Chain Placement: A Deep Reinforcement Learning Approach
,”
2020 IFIP Networking Conference (Networking)
,
Paris, France
,
June 22–26
, IEEE, pp.
163
171
.
23.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
,
Riedmiller
,
M.
,
Fidjeland
,
A. K.
, and
Ostrovski
,
G.
,
2015
, “
Human-Level Control Through Deep Reinforcement learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
24.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,”
arXiv preprint arXiv:1707.06347
.
25.
Van Hasselt
,
H.
,
Guez
,
A.
, and
Silver
,
D.
,
2016
, “
Deep Reinforcement Learning With Double Q-learning
,”
AAAI Conference on Artificial Intelligence
,
Phoenix, AZ
,
Feb. 12–17
, pp.
2094
2100
.
26.
Wang
,
Z.
,
Schaul
,
T.
,
Hessel
,
M.
,
Van Hasselt
,
H.
,
Lanctot
,
M.
, and
De Freitas
,
N.
,
2016
, “
Dueling Network Architectures for Deep Reinforcement Learning
,”
Proceedings of The 33rd International Conference on Machine Learning
,
M. F.
Balcan
, and
K. Q.
Weinberger
, eds.,
New York
,
June 20–22
, http://proceedings.mlr.press/v48/wangf16.pdf
27.
Schaul
,
T.
,
Quan
,
J.
,
Antonoglou
,
I.
, and
Silver
,
D.
,
2016
, “
Prioritized Experience Replay
,”
4th International Conference on Learning Representations
,
San Juan, Puerto Rico
,
May 2–4
.
28.
Schulman
,
J.
,
Moritz
,
P.
,
Levine
,
S.
,
Jordan
,
M.
, and
Abbeel
,
P.
,
2016
, “
High-Dimensional Continuous Control Using Generalized Advantage Estimation
,
4th International Conference on Learning Representations
,
San Juan, Puerto Rico
,
May 2–4
.
29.
Schaul
,
T.
,
Horgan
,
D.
,
Gregor
,
K.
, and
Silver
,
D.
,
2015
, “
Universal Value Function Approximators
,”
International Conference on Machine Learning (ICML)
,
Lille, France
,
July 6–11
, pp.
1312
1320
.
30.
Tavakoli
,
A.
,
Pardo
,
F.
, and
Kormushev
,
P.
,
2018
, “
Action Branching Architectures for Deep Reinforcement Learning
,”
AAAI Conference on Artificial Intelligence
,
New Orleans, LA
,
Feb. 2–7
, pp.
4131
4138
.
31.
Vogel-Heuser
,
B.
,
Legat
,
C.
,
Folmer
,
J.
, and
Feldmann
,
S.
,
2014
, “
Researching Evolution in Industrial Plant Automation: Scenarios and Documentation of the Pick and Place Unit
,”
Technical Report, Institute of Automation and Information Systems, Technical University Munich, Report No. TUM-AIS-TR-02-18-06
.
32.
Horgan
,
D.
,
Quan
,
J.
,
Budden
,
D.
,
Barth-Maron
,
G.
,
Hessel
,
M.
,
Van Hasselt
,
H.
, and
Silver
,
D.
,
2018
, “
Distributed Prioritized Experience Replay
,”
6th International Conference on Learning Representations
,
Vancouver, BC, Canada
,
Apr. 30–May 3
.
33.
Mnih
,
V.
,
Badia
,
A. P.
,
Mirza
,
M.
,
Graves
,
A.
,
Lillicrap
,
T.
,
Harley
,
T.
,
Silver
,
D.
, and
Kavukcuoglu
,
K.
,
2016
, “
Asynchronous Methods for Deep Reinforcement Learning
,”
International Conference on Machine Learning (ICML)
,
New York City, NY
,
June 19–24
, pp.
2850
2869
.
You do not currently have access to this content.