Abstract

As artificial intelligence and industrial automation are developing, human–robot collaboration (HRC) with advanced interaction capabilities has become an increasingly significant area of research. In this paper, we design and develop a real-time, multi-model HRC system using speech and gestures. A set of 16 dynamic gestures is designed for communication from a human to an industrial robot. A data set of dynamic gestures is designed and constructed, and it will be shared with the community. A convolutional neural network is developed to recognize the dynamic gestures in real time using the motion history image and deep learning methods. An improved open-source speech recognizer is used for real-time speech recognition of the human worker. An integration strategy is proposed to integrate the gesture and speech recognition results, and a software interface is designed for system visualization. A multi-threading architecture is constructed for simultaneously operating multiple tasks, including gesture and speech data collection and recognition, data integration, robot control, and software interface operation. The various methods and algorithms are integrated to develop the HRC system, with a platform constructed to demonstrate the system performance. The experimental results validate the feasibility and effectiveness of the proposed algorithms and the HRC system.

References

1.
Burns
,
A.
, and
Wellings
,
A.
,
2001
,
Real-Time Systems and Programming Languages
, 3rd ed.,
Pearson Education
,
Harlow, UK
.
2.
Nicora
,
M. L.
,
Ambrosetti
,
R.
,
Wiens
,
G. J.
, and
Fassi
,
I.
,
2021
, “
Human–Robot Collaboration in Smart Manufacturing: Robot Reactive Behavior Intelligence
,”
ASME J. Manuf. Sci. Eng.
,
143
(
3
), p.
031009
.
3.
Liu
,
S.
,
Wang
,
L.
, and
Wang
,
X. V.
,
2021
, “
Function Block-bBsed Multimodal Control for Symbiotic Human–Robot Collaborative Assembly
,”
ASME J. Manuf. Sci. Eng.
,
143
(
9
), p.
091001
.
4.
Arinez
,
J. F.
,
Chang
,
Q.
,
Gao
,
R. X.
,
Xu
,
C.
, and
Zhang
,
J.
,
2020
, “
Artificial Intelligence in Advanced Manufacturing: Current Status and Future Outlook
,”
ASME J. Manuf. Sci. Eng.
,
142
(
11
), p.
110804
.
5.
Chen
,
H.
,
Leu
,
M. C.
,
Tao
,
W.
, and
Yin
,
Z.
,
2020
, “
Design of a Real-Time Human–Robot Collaboration System Using Dynamic Gestures
,”
ASME International Mechanical Engineering Congress and Exposition
,
Virtual Conference
,
Nov. 16–19
.
6.
Wang
,
X. V.
, and
Wang
,
L.
,
2021
, “
A Literature Survey of the Robotic Technologies During the Covid-19 Pandemic
,”
J. Manuf. Syst.
,
60
, pp.
823
36
.
7.
Zinchenko
,
K.
,
Wu
,
C.-Y.
, and
Song
,
K.-T.
,
2016
, “
A Study on Speech Recognition Control for a Surgical Robot
,”
IEEE Trans. Ind. Inf.
,
13
(
2
), pp.
607
615
.
8.
Bingol
,
M. C.
, and
Aydogmus
,
O.
,
2020
, “
Performing Predefined Tasks Using the Human–Robot Interaction on Speech Recognition for an Industrial Robot
,”
Eng. Appl. Artif. Intell.
,
95
, p.
103903
.
9.
Kuhn
,
M.
,
Pollmann
,
K.
, and
Papadopoulos
,
J.
,
2020
, “
I’m Your Partner-I’m Your Boss: Framing Human–Robot Collaboration With Conceptual Metaphors
,”
Companion of the 2020 ACM/IEEE International Conference on Human–Robot Interaction
,
Virtual Conference
,
Mar. 24–26
, pp.
322
324
.
10.
Coupeté
,
E.
,
Moutarde
,
F.
, and
Manitsaris
,
S.
,
2016
, “
A User-Adaptive Gesture Recognition System Applied to Human–Robot Collaboration in Factories
,”
Proceedings of the 3rd International Symposium on Movement and Computing
,
Thessaloniki, GA, Greece
,
July 5–6
, pp.
1
7
.
11.
Unhelkar
,
V. V.
,
Lasota
,
P. A.
,
Tyroller
,
Q.
,
Buhai
,
R.-D.
,
Marceau
,
L.
,
Deml
,
B.
, and
Shah
,
J. A.
,
2018
, “
Human-Aware Robotic Assistant for Collaborative Assembly: Integrating Human Motion Prediction With Planning in Time
,”
IEEE Rob. Autom. Lett.
,
3
(
3
), pp.
2394
2401
.
12.
Pinto
,
R. F.
,
Borges
,
C. D.
,
Almeida
,
A.
, and
Paula
,
I. C.
,
2019
, “
Static Hand Gesture Recognition Based on Convolutional Neural Networks
,”
J. Electr. Comput. Eng.
,
2019
.
13.
Li
,
J.
,
Liu
,
X.
,
Zhang
,
M.
, and
Wang
,
D.
,
2020
, “
Spatio-Temporal Deformable 3d Convnets With Attention for Action Recognition
,”
Pattern Recognit.
,
98
, p.
107037
.
14.
Tao
,
W.
,
Lai
,
Z.-H.
,
Leu
,
M. C.
, and
Yin
,
Z.
,
2018
, “
Worker Activity Recognition in Smart Manufacturing Using IMU and SEMG Signals With Convolutional Neural Networks
,”
Procedia Manuf.
,
26
, pp.
1159
1166
.
15.
Treussart
,
B.
,
Geffard
,
F.
,
Vignais
,
N.
, and
Marin
,
F.
,
2020
, “
Controlling an Upper-Limb Exoskeleton by EMG Signal While Carrying Unknown Load
,”
2020 IEEE International Conference on Robotics and Automation (ICRA)
,
Virtual Conference
,
May 31–Aug. 31
, IEEE, pp.
9107
9113
.
16.
Ajoudani
,
A.
,
Zanchettin
,
A. M.
,
Ivaldi
,
S.
,
Albu-Schäffer
,
A.
,
Kosuge
,
K.
, and
Khatib
,
O.
,
2018
, “
Progress and Prospects of the Human–Robot Collaboration
,”
Auton. Rob.
,
42
(
5
), pp.
957
975
.
17.
Yongda
,
D.
,
Fang
,
L.
, and
Huang
,
X.
,
2018
, “
Research on Multimodal Human–Robot Interaction Based on Speech and Gesture
,”
Comput. Electr. Eng.
,
72
, pp.
443
454
.
18.
Lin
,
K.
,
Li
,
Y.
,
Sun
,
J.
,
Zhou
,
D.
, and
Zhang
,
Q.
,
2020
, “
Multi-sensor Fusion for Body Sensor Network in Medical Human–Robot Interaction Scenario
,”
Inf. Fusion
,
57
, pp.
15
26
.
19.
Wang
,
L.
,
Liu
,
S.
,
Liu
,
H.
, and
Wang
,
X. V.
,
2020
, “
Overview of Human–Robot Collaboration in Manufacturing
,”
Proceedings of 5th International Conference on the Industry 4.0 Model for Advanced Manufacturing
,
Belgrade, Serbia
,
June 1–4
, Springer, pp.
15
58
.
20.
Yu
,
G.
,
Liu
,
Z.
, and
Yuan
,
J.
,
2014
, “
Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction
,”
Asian Conference on Computer Vision
,
Singapore
,
Nov. 1–5
, Springer, pp.
50
65
.
21.
Shinde
,
S.
,
Kothari
,
A.
, and
Gupta
,
V.
,
2018
, “
Yolo Based Human Action Recognition and Localization
,”
Procedia Comput. Sci.
,
133
, pp.
831
838
.
22.
Sun
,
B.
,
Wang
,
S.
,
Kong
,
D.
,
Wang
,
L.
, and
Yin
,
B.
,
2021
, “
Real-Time Human Action Recognition Using Locally Aggregated Kinematic-Guided Skeletonlet and Supervised Hashing-by-Analysis Model
,”
IEEE Trans. Cybern.
23.
Yu
,
J.
,
Gao
,
H.
,
Yang
,
W.
,
Jiang
,
Y.
,
Chin
,
W.
,
Kubota
,
N.
, and
Ju
,
Z.
,
2020
, “
A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition
,”
IEEE Access
,
8
, pp.
43243
43255
.
24.
Pisharady
,
P. K.
, and
Saerbeck
,
M.
,
2015
, “
Recent Methods and Databases in Vision-Based Hand Gesture Recognition: A Review
,”
Comput. Vis. Image Understand.
,
141
, pp.
152
165
.
25.
McNeill
,
D.
,
2008
,
Gesture and Thought
,
University of Chicago Press
,
Chicago, IL
.
26.
Holler
,
J.
, and
Wilkin
,
K.
,
2009
, “
Communicating Common Ground: How Mutually Shared Knowledge Influences the Representation of Semantic Information in Speech and Gesture in a Narrative Task.
Lang. Cogn. Process.
,
24
(
2
), pp.
267
289
.
27.
Yin
,
Z.
, and
Collins
,
R.
,
2006
, “
Moving Object Localization in Thermal Imagery by Forward–Backward MHI
,”
2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06)
,
New York, NY
,
June 17–22
, IEEE, pp.
133
133
.
28.
Ahad
,
M. A. R.
,
Tan
,
J. K.
,
Kim
,
H.
, and
Ishikawa
,
S.
,
2012
, “
Motion History Image: Its Variants and Applications
,”
Mach. Vision Appl.
,
23
(
2
), pp.
255
281
.
29.
Bobick
,
A. F.
, and
Davis
,
J. W.
,
2001
, “
The Recognition of Human Movement Using Temporal Templates
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
23
(
3
), pp.
257
267
.
30.
Chen
,
H.
,
Tao
,
W.
,
Leu
,
M. C.
, and
Yin
,
Z.
,
2020
, “
Dynamic Gesture Design and Recognition for Human–Robot CCollaboration With Convolutional Neural Networks
,”
International Symposium on Flexible Automation
,
Virtual Conference
,
July 8–9
, American Society of Mechanical Engineers, p. V001T09A001.
31.
Srivastava
,
N.
,
Hinton
,
G.
,
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Salakhutdinov
,
R.
,
2014
, “
Dropout: A Simple Way to Prevent Neural Networks From Overfitting
,”
J. Mach. Learn. Res.
,
15
(
1
), pp.
1929
1958
.
32.
Chen
,
B.
,
Deng
,
W.
, and
Du
,
J.
,
2017
, “
Noisy Softmax: Improving the Generalization Ability of Dcnn Via Postponing the Early Softmax Saturation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 21–26
, pp.
5372
5381
.
33.
Yeo
,
K.
, and
Melnyk
,
I.
,
2019
, “
Deep Learning Algorithm for Data-Driven Simulation of Noisy Dynamical System
,”
J. Comput. Phys.
,
376
, pp.
1212
1231
.
34.
Kopparapu
,
S. K.
, and
Laxminarayana
,
M.
,
2010
, “
Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech
,”
10th International Conference on Information Science, Signal Processing and Their Applications (ISSPA 2010)
,
Kuala Lumpur, Malaysia
,
May 10–13
, IEEE, pp.
121
124
.
35.
Li
,
B.
,
Sainath
,
T. N.
,
Narayanan
,
A.
,
Caroselli
,
J.
,
Bacchiani
,
M.
,
Misra
,
A.
,
Shafran
,
I.
,
Sak
,
H.
,
Pundak
,
G.
,
Chin
,
K. K.
, and
Sim
,
K. C.
,
2017
, “
Acoustic Modeling for Google Home
,”
Interspeech
,
Stockholm, Sweden
,
Aug. 20–24
, pp.
399
403
.
36.
Rabinowitz
,
P.
,
2000
, “
Noise-Induced Hearing Loss
,”
Am. Family Physician
,
61
(
9
), pp.
2749
2756
.
37.
Kamath
,
S.
, and
Loizou
,
P.
,
2002
, “
A Multi-Band Spectral Subtraction Method for Enhancing Speech Corrupted by Colored Noise
,”
ICASSP
,
Orlando, FL
,
May 13–17
.
38.
Upadhyay
,
N.
, and
Karmakar
,
A.
,
2015
, “
Speech Enhancement Using Spectral Subtraction-Type Algorithms: A Comparison and Simulation Study
,”
Procedia Comput. Sci.
,
54
, pp.
574
584
.
39.
Gilakjani
,
A. P.
,
2016
, “
English Pronunciation Instruction: A Literature Review
,”
Int. J. Res. Engl. Educ.
,
1
(
1
), pp.
1
6
.
40.
Amano
,
A.
,
Aritsuka
,
T.
,
Hataoka
,
N.
, and
Ichikawa
,
A.
,
1989
, “
On the Use of Neural Networks and Fuzzy Logic in Speech Recognition
,”
Proceedings of the 1989 International Joint Conference Neural Networks
,
Washington, DC
,
June 18–22
, Vol. 301, pp.
147
169
.
41.
Vani
,
H.
, and
Anusuya
,
M.
,
2020
, “
Fuzzy Speech Recognition: A Review
,”
Int. J. Comput. Appl.
,
177
(
47
), pp.
39
54
.
42.
Karimov
,
E.
,
2020
,
Data Structures and Algorithms in Swift
,
Springer
,
New York City
.
43.
Visentini
,
I.
,
Snidaro
,
L.
, and
Foresti
,
G. L.
,
2016
, “
Diversity-Aware Classifier Ensemble Selection Via F-Score
,”
Inform. Fusion
,
28
, pp.
24
43
.
44.
Al-Amin
,
M.
,
Tao
,
W.
,
Doell
,
D.
,
Lingard
,
R.
,
Yin
,
Z.
,
Leu
,
M. C.
, and
Qin
,
R.
,
2019
, “
Action Recognition in Manufacturing Assembly Using Multimodal Sensor Fusion
,”
Procedia Manuf.
,
39
, pp.
158
167
.
You do not currently have access to this content.