In recent years, deep learning has been very successful in various fields. In particular, since image data contains a lot of information, it is expected that deep learning will be applied to nuclear security technology for detecting the malicious action. However, most human action recognition using deep learning only sets OK / NO in advance for the combination of object identification result, so it is difficult to apply them to complicated nuclear security situation. On the other hand, in most nuclear facility, the nuclear security rules are determined as a detailed text. So, it is desirable to detect malicious actions in captured images according to these rule text. If it is possible to flexibly determine whether the captured image scene violates or matches the nuclear security rules, it is necessary to greatly improve its practicality. However, this requires different deep learning interfaces for image identification and natural language processing.
Most of the deep learning research currently being conducted is closed only within each field, and there is no mutual access, that is, an interface. The human brain, on the other hand, usually links and processes all kinds of cognitive information, including language, sight, speech, and touch. Like the brain, deep learning abilities should develop anew when different types of deep learning are interconnected. One of the reasons this is difficult is that there is no common data format between different types of deep learning.
In this study, it was determined that the logical expression is the most suitable as a common data format. The logical expression is a method of expressing the correlation between an object and a person. In this research, a deep learning interface method between image recognition and natural language processing was developed via the logical expression. This allows flexible judgment of wrong actions according to the rule text.