Abstract

Models based on the transformer architecture have seen widespread application across fields such as natural language processing (NLP), computer vision, and robotics, with large language models (LLMs) like ChatGPT revolutionizing machine understanding of human language and demonstrating impressive memory capacity and reproduction capabilities. Traditional machine learning algorithms struggle with catastrophic forgetting, detrimental to the diverse and generalized abilities required for robotic deployment. This article investigates the receptance weighted key value (RWKV) framework, known for its advanced capabilities in efficient and effective sequence modeling, integration with the decision transformer (DT), and experience replay architectures. It focuses on potential performance enhancements in sequence decision-making and lifelong robotic learning tasks. We introduce the decision-RWKV (DRWKV) model and conduct extensive experiments using the D4RL database within the OpenAI Gym environment and on the D’Claw platform to assess the DRWKV model’s performance in single-task tests and lifelong learning scenarios, showing its ability to handle multiple subtasks efficiently. The code for all algorithms, training, and image rendering in this study is available online (open source).

References

1.
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A. N.
,
Kaiser
,
L. U.
, and
Polosukhin
,
I.
,
2017
, “
Attention is All You Need
,”
Conference on Neural Information Processing Systems (NIPS)
,
Long Beach, CA
, pp.
5998
6008
.
2.
Dosovitskiy
,
A.
,
Beyer
,
L.
,
Kolesnikov
,
A.
,
Weissenborn
,
D.
,
Zhai
,
X.
,
Unterthiner
,
T.
,
Dehghani
,
M.
,
Minderer
,
M.
,
Heigold
,
G.
,
Gelly
,
S.
,
Uszkoreit
,
J.
, and
Houlsby
,
N.
,
2021
, “
An Image Is Worth 16$\times$16 Words: Transformers for Image Recognition at Scale
,”
International Conference on Learning Representations (ICLR)
,
Virtual Only Conference
,
May 3–7
.
3.
Liu
,
Z.
,
Lin
,
Y.
,
Cao
,
Y.
,
Hu
,
H.
,
Wei
,
Y.
,
Zhang
,
Z.
,
Lin
,
S.
, and
Guo
,
B.
,
2021
, “
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
,”
IEEE/CVF International Conference on Computer Vision (ICCV)
,
Montreal, QC, Canada
,
Oct. 10–17
, pp.
9992
10002
.
4.
Shridhar
,
M.
,
Manuelli
,
L.
, and
Fox
,
D.
,
2022
, “
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
,”
Conference on Robot Learning (CoRL)
,
Auckland, New Zealand
,
Dec. 14–18
, PMLR, pp.
785
799
.
5.
Parakh
,
M.
,
Fong
,
A.
,
Simeonov
,
A.
,
Chen
,
T.
,
Gupta
,
A.
, and
Agrawal
,
P.
,
2024
, “
Lifelong Robot Learning With Human Assisted Language Planners
,”
CoRL Workshop on Learning Effective Abstractions for Planning (LEAP)
,
Atlanta, GA
,
Nov. 6–9
, pp.
523
529
.
6.
Li
,
W.
,
Luo
,
H.
,
Lin
,
Z.
,
Zhang
,
C.
,
Lu
,
Z.
, and
Ye
,
D.
,
2023
, “
A Survey on Transformers in Reinforcement Learning
,”
Transactions on Machine Learning Research
.
7.
Chen
,
L.
,
Lu
,
K.
,
Rajeswaran
,
A.
,
Lee
,
K.
,
Grover
,
A.
,
Laskin
,
M.
,
Abbeel
,
P.
,
Srinivas
,
A.
, and
Mordatch
,
I.
,
2021
, “
Decision Transformer: Reinforcement Learning via Sequence Modeling
,”
Annual Conference on Neural Information Processing Systems (NeurIPS)
,
Virtual Online
,
Dec. 6–14
, pp.
15084
15097
.
8.
Kirkpatrick
,
J.
,
Pascanu
,
R.
,
Rabinowitz
,
N.
,
Veness
,
J.
,
Desjardins
,
G.
,
Rusu
,
A. A.
,
Milan
,
K.
, et al.
2017
, “
Overcoming Catastrophic Forgetting in Neural Networks
,”
Proc. Natl. Acad. Sci. U. S. A.
,
114
(
13
), pp.
3521
3526
.
9.
McCloskey
,
M.
, and
Cohen
,
N. J.
,
1989
, “
Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
,”
Psychol. Learn. Motiv.
,
24
, pp.
109
165
.
10.
Parisi
,
G. I.
,
Kemker
,
R.
,
Part
,
J. L.
,
Kanan
,
C.
, and
Wermter
,
S.
,
2019
, “
Continual Lifelong Learning With Neural Networks: A Review
,”
Neural Netw.
,
113
(
5
), pp.
54
71
.
11.
Peng
,
B.
,
Alcaide
,
E.
,
Anthony
,
Q.
,
Albalak
,
A.
,
Arcadinho
,
S.
,
Biderman
,
S.
, and
Cao
,
H.
,
2023
, “
RWKV: Reinventing RNNs for the Transformer Era
,”
Conference on Empirical Methods in Natural Language Processing (EMNLP)
,
Singapore
,
Dec. 6–10
, pp.
14048
14077
.
12.
Peng
,
B.
,
Goldstein
,
D.
,
Anthony
,
Q.
,
Albalak
,
A.
,
Alcaide
,
E.
,
Biderman
,
S.
, and
Cheah
,
E.
,
2024
, “
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
,”
Conference on Language Modeling (COLM)
,
Phildelphia, PA
,
Oct. 7–9
.
13.
Fu
,
J.
,
Kumar
,
A.
,
Nachum
,
O.
,
Tucker
,
G.
, and
Levine
,
S.
,
2020
, “
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
,”
arXiv preprint
. https://arxiv.org/abs/2004.07219
14.
Ahn
,
M.
,
Zhu
,
H.
,
Hartikainen
,
K.
,
Ponte
,
H.
,
Gupta
,
A.
,
Levine
,
S.
, and
Kumar
,
V.
,
2020
, “
ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots
,”
Conference on Robot Learning (CoRL)
,
Virtual Conference
,
Nov. 16–18
, PMLR, pp.
1300
1313
.
15.
Kocoń
,
J.
,
Cichecki
,
I.
,
Kaszyca
,
O.
,
Kochanek
,
M.
,
Szydło
,
D.
,
Baran
,
J.
, and
Bielaniewicz
,
J.
,
2023
, “
ChatGPT: Jack of All Trades, Master of None
,”
Inf. Fusion
,
99
(
11
), p.
101861
.
16.
Qin
,
C.
,
Zhang
,
A.
,
Zhang
,
Z.
,
Chen
,
J.
,
Yasunaga
,
M.
, and
Yang
,
D.
,
2023
, “
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Conference on Empirical Methods in Natural Language Processing (EMNLP)
,
Resorts World Convention Centre, Singapore
,
Dec. 6–10
, pp.
1339
1384
.
17.
Bahdanau
,
D.
,
Cho
,
K.
, and
Bengio
,
Y.
,
2015
, “
Neural Machine Translation by Jointly Learning to Align and Translate
,”
International Conference on Learning Representations (ICLR)
,
Vienna, Austria
,
May 7–9
.
18.
Gu
,
A.
, and
Dao
,
T.
,
2024
, “
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
,”
Conference on Language Modeling (COLM)
,
Philadelphia, PA
,
Oct. 7–9
.
19.
Sun
,
Y.
,
Dong
,
L.
,
Huang
,
S.
,
Ma
,
S.
,
Xia
,
Y.
,
Xue
,
J.
,
Wang
,
J.
, and
Wei
,
F.
,
2023
, “
Retentive Network: A Successor to Transformer for Large Language Models
,”
arXiv preprint
. https://arxiv.org/abs/2307.08621
20.
Duan
,
Y.
,
Wang
,
W.
,
Chen
,
Z.
,
Zhu
,
X.
,
Lu
,
L.
,
Lu
,
T.
,
Qiao
,
Y.
,
Li
,
H.
,
Dai
,
J.
, and
Wang
,
W.
,
2024
, “
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
,”
arXiv preprint
. https://arxiv.org/abs/2403.02308
21.
Zheng
,
Q.
,
Zhang
,
A.
, and
Grover
,
A.
,
2022
, “
Online Decision Transformer
,”
International Conference on Machine Learning (ICML)
,
Baltimore, MD
,
July 17–23
, PMLR, pp.
27042
27059
.
22.
Lee
,
K.-H.
,
Nachum
,
O.
,
Yang
,
M. S.
,
Lee
,
L.
,
Freeman
,
D.
,
Guadarrama
,
S.
, and
Fischer
,
I.
,
2022
, “
Multi-Game Decision Transformers
,”
Advances in Neural Information Processing Systems (NeurIP)
,
New Orleans, LA
,
Nov. 28–Dec. 9
, pp.
27921
27936
.
23.
Reid
,
M.
,
Yamada
,
Y.
, and
Gu
,
S. S.
,
2022
, “
Can Wikipedia Help Offline Reinforcement Learning?
arXiv preprint
. https://arxiv.org/abs/2205.06175
24.
Reed
,
S.
,
Zolna
,
K.
,
Parisotto
,
E.
,
Gomez Colmenarejo
,
S.
,
Novikov
,
A.
,
Barth-Maron
,
G.
, and
Gimenez
,
M.
,
2022
, “
A Generalist Agent
,”
arXiv preprint
. arXiv:2205.06175
25.
Siebenborn
,
M.
,
Belousov
,
B.
,
Huang
,
J.
, and
Peters
,
J.
,
2022
, “
How Crucial is Transformer in Decision Transformer?
arXiv preprint
. arXiv:2211.14655
26.
Sun
,
H.
,
Yang
,
L.
,
Gu
,
Y.
,
Pan
,
J.
,
Wan
,
F.
, and
Song
,
C.
,
2023
, “
Bridging Locomotion and Manipulation Using Reconfigurable Robotic Limbs Via Reinforcement Learning
,”
Biomimetics
,
8
(
4
), p.
364
.
27.
Schaul
,
T.
,
Quan
,
J.
,
Antonoglou
,
I.
, and
Silver
,
D.
,
2016
, “
Prioritized Experience Replay
,”
International Conference on Learning Representations (ICLR)
,
Caribe Hilton, San Juan, Puerto Rico
,
May 2–4
.
28.
Lopez-Paz
,
D.
, and
Ranzato
,
M.
,
2017
, “
Gradient Episodic Memory for Continual Learning
,”
Conference on Neural Information Processing Systems (NIPS)
,
Long Beach, CA
,
Dec. 4–9
, pp.
6470
6479
.
29.
Chaudhry
,
A.
,
Ranzato
,
M.
,
Rohrbach
,
M.
, and
Elhoseiny
,
M.
,
2019
, “
Efficient Lifelong Learning With A-GEM
,”
International Conference on Learning Representations (ICLR)
,
New Orleans, LA
,
May 6–9
, pp.
1
20
.
30.
Farajtabar
,
M.
,
Azizan
,
N.
,
Mott
,
A.
, and
Li
,
A.
,
2020
, “
Orthogonal Gradient Descent for Continual Learning
,”
International Conference on Artificial Intelligence and Statistics (AISTATS)
,
Palermo, Sicily, Italy
,
Aug. 26–28
, PMLR, pp.
3762
3773
.
31.
Yoon
,
J.
,
Kim
,
S.
,
Yang
,
E.
, and
Hwang
,
S. J.
,
2019
, “
Scalable and Order-Robust Continual Learning with Additive Parameter Decomposition
,”
International Conference on Learning Representations (ICLR)
,
New Orleans, LA
,
May 6–9
.
32.
Thrun
,
S.
, and
Mitchell
,
T. M.
,
1995
, “
Lifelong Robot Learning
,”
Rob. Auton. Syst.
,
15
(
1–2
), pp.
25
46
.
33.
Liu
,
B.
,
Xiao
,
X.
, and
Stone
,
P.
,
2021
, “
A Lifelong Learning Approach to Mobile Robot Navigation
,”
IEEE Rob. Autom. Lett.
,
6
(
2
), pp.
1090
1096
.
34.
Xie
,
A.
, and
Finn
,
C.
,
2022
, “
Lifelong Robotic Reinforcement Learning by Retaining Experiences
,”
Conference on Lifelong Learning Agents (CoLLAs)
,
Montréal, Québec, Canada
,
Aug. 22–24
, PMLR, pp.
838
855
.
35.
Liu
,
B.
,
Zhu
,
Y.
,
Gao
,
C.
,
Feng
,
Y.
,
Liu
,
Q.
,
Zhu
,
Y.
, and
Stone
,
P.
,
2024
, “
Libero: Benchmarking Knowledge Transfer for Lifelong Robot Learning
,”
Advances in Neural Information Processing Systems (NeurIPS)
,
Vancouver, Canada
,
Dec. 10–15
, pp.
44776
44791
.
36.
Brockman
,
G.
,
Cheung
,
V.
,
Pettersson
,
L.
,
Schneider
,
J.
,
Schulman
,
J.
,
Tang
,
J.
, and
Zaremba
,
W.
,
2016
, “
OpenAI Gym
,”
arXiv preprint
. arXiv:1606.01540
37.
Tarasov
,
D.
,
Nikulin
,
A.
,
Akimov
,
D.
,
Kurenkov
,
V.
, and
Kolesnikov
,
S.
,
2024
, “
CORL: Research-oriented Deep Offline Reinforcement Learning Library
,”
Advances in Neural Information Processing Systems (NeurIPS)
,
Vancouver, Canada
,
Dec. 10–15
, pp.
30997
31020
.
38.
Yang
,
F.
,
Yang
,
C.
,
Liu
,
H.
, and
Sun
,
F.
,
2021
, “
Evaluations of the Gap Between Supervised and Reinforcement Lifelong Learning on Robotic Manipulation Tasks
,”
Conference on Robot Learning (CoRL)
,
London, UK
,
Nov. 8–11
, PMLR, pp.
547
556
.
39.
Haarnoja
,
T.
,
Zhou
,
A.
,
Hartikainen
,
K.
,
Tucker
,
G.
,
Ha
,
S.
,
Tan
,
J.
, and
Kumar
,
V.
,
2018
, “
Soft Actor-Critic Algorithms and Applications
,”
arXiv preprint
. https://arxiv.org/abs/1812.05905
You do not currently have access to this content.