Abstract

The rapid advance in sensing technology has expedited data-driven innovation in manufacturing by enabling the collection of large amounts of data from factories. Big data provides an unprecedented opportunity for smart decision-making in the manufacturing process. However, big data also attracts cyberattacks and makes manufacturing systems vulnerable due to the inherent value of sensitive information. The increasing integration of artificial intelligence (AI) within smart factories also exposes manufacturing equipment susceptible to cyber threats, posing a critical risk to the integrity of smart manufacturing systems. Cyberattacks targeting manufacturing data can result in considerable financial losses and severe business disruption. Therefore, there is an urgent need to develop AI models that incorporate privacy-preserving methods to protect sensitive information implicit in the models against model inversion attacks. Hence, this paper presents the development of a new approach called mosaic neuron perturbation (MNP) to preserve latent information in the framework of the AI model, ensuring differential privacy requirements while mitigating the risk of model inversion attacks. MNP is flexible to implement into AI models, balancing the trade-off between model performance and robustness against cyberattacks while being highly scalable for large-scale computing. Experimental results, based on real-world manufacturing data collected from the computer numerical control (CNC) turning process, demonstrate that the proposed method significantly improves the ability to prevent inversion attacks while maintaining high prediction performance. The MNP method shows strong potential for making manufacturing systems both smart and secure by addressing the risk of data breaches while preserving the quality of AI models.

1 Introduction

Rapid advances in sensing technology have enabled the collection of vast amounts of data from manufacturing operations, which has expedited big-data-driven innovations in manufacturing. By analyzing big data, valuable insights can be generated, leading to the development of manufacturing artificial intelligence (AI) systems [1]. These AI systems can remarkably enhance various decision-making processes in factories, ultimately raising the smartness level of manufacturing systems. To enhance smart manufacturing systems, cloud computing environments provide connections between these AI systems to enable efficient training in a distributed way. However, due to the value of information, the use of big data and the interconnectivity of AI models pose significant risks of being targeted by cyberattacks and potentially leading to the breach of sensitive information, including parameters related to manufacturing processes or products.

Despite the enhanced decision-making capabilities offered by AI-related models, they are vulnerable to exploitation in the absence of adequate privacy-preserving methods. Adversaries can manipulate model responses to reach latent information that data owners would not wish to disclose voluntarily. Cyberattacks pose a greater threat to the manufacturing industry than any other industries. According to a recent report, manufacturing has accounted for 23.2% of cyberattacks in 2022, overtaking the finance and insurance fields [2]. Among cyberattacks against manufacturing industries, ransomware ranked first, and data theft was the third most common. The failure to protect against these attacks can result in unprecedented disruptions, leading to significant business losses. The average total cost of a data breach in all industries is more than $4.35 million. Small and medium-sized manufacturers (SMMs) are more vulnerable to exploitation. In 2017, 61% of small businesses experienced cyberattacks, and the median cost of a data breach was over $60,000 [3].

Cyberthreats to manufacturing analytics can be categorized into four types: membership inference, property inference, model extraction, and reconstruction [4]. Membership inference attacks determine whether an input is used as part of the training set. Property inference attacks aim to extract latent properties that are not explicitly expressed in the dataset. Model extraction attacks are cyberattacks in which an adversary attempts to reconstruct an alternative model with limited information about the target model. Reconstruction attacks are known as attribute inference attacks or model inversion attacks. An adversary infers sensitive features or datasets based on their knowledge of some features and the target model.

To address this issue, various privacy-preserving techniques, including cryptography, anonymization, and differential privacy, have been studied. While cryptographic countermeasures are effective in ensuring data confidentiality and integrity through encryption algorithms, they can also impede data processing due to their high computational requirements [5]. Anonymization, which masks sensitive attributes before leveraging the dataset, is susceptible to re-identification attacks, where adversaries can infer hidden attributes by linking external knowledge to a publicly shared dataset [6]. Differential privacy, on the other hand, is a technique that adds random perturbation to the data, making it challenging to extract individual information from the dataset while still providing useful information for efficient analysis [7]. Differential privacy allows manufacturers to ensure the privacy of their sensitive business-related information while leveraging the benefits of data analytics.

Differential privacy provides a solution to the privacy leakage issue in manufacturing data by guaranteeing that an individual’s participation in a dataset is not disclosed. However, concerns about sensitive information breaches in manufacturing are not limited to the participation of individuals. Information obtained from predictive models can also be exploited through model inversion attacks, where the adversary manipulates released predictive models and background knowledge to infer sensitive information that the data owner does not want to share with others [8]. Unprotected AI models can unintentionally reveal business-related sensitive information, further increasing the risk of exploitation. Therefore, it is imperative to develop privacy-preserving algorithms incorporated into predictive models to resist the risk of model inversion attacks while maintaining the utility of the models.

This paper presents the development of novel privacy-preserving techniques for manufacturing predictive analytics that fully leverage the smartness of AI models while preserving the latent information in the AI models against model inversion attacks to mitigate the risk of privacy breaches. First, we develop a mosaic neuron perturbation (MNP) algorithm capable of preserving sensitive information, perturbing input neurons with designed noise to ensure differential privacy. The MNP algorithm injects more perturbation to neurons learning sensitive attributes than nonsensitive features, ultimately allowing the protection of sensitive information. Second, we extend the MNP technique to a distributed version called a multi-party MNP algorithm to increase robustness and smartness in collaborative learning. Finally, we conduct an experimental study based on real-world manufacturing data collected from the computer numerical control (CNC) turning process to evaluate and compare the performance of the proposed algorithm in terms of AI model effectiveness, robustness to model inversion, and computational efficiency. AI models integrated with privacy-preserving techniques show strong promise in allowing manufacturers to leverage the smartness of AI systems to make informed decisions with confidence while mitigating the risk of sensitive data breaches.

The remainder of this paper is organized as follows: Sec. 2 introduces the research background of privacy-preserving and differential privacy for predictive analytics; Sec. 3 provides further details on differential privacy and model inversion attacks to build the conceptual foundation; Sec. 4 details the proposed methodology of MNP algorithms; Sec. 5 provides a design of experiments based on real-world CNC turning data; Sec. 6 evaluates and analyzes experimental results to demonstrate the effectiveness of the MNP algorithm; and Sec. 7 concludes this study.

2 Research Background

2.1 Distributed Artificial Intelligence in Manufacturing Systems.

The rapid development of Internet of Things technologies has led to an exponential increase in the amount of data collected from manufacturing operations. While this brings an unprecedented opportunity to generate rich information about manufacturing systems, its value exposes manufacturers to more risk of cyberattacks on manufacturing data [1]. SMMs leverage the benefits of collaborative networked AI models through cloud computing for a service-oriented approach by sharing service data, operation decisions, and some information related to their manufacturing systems.

The distributed predictive model furnishes effectiveness and efficiency in learning performance. The tiers of distributed AI paradigm [9] are delineated as follows:

  • Level 0: sharing data. After locally collecting and pre-processing data, each user uploads their data to a cloud. The global AI model is subsequently constructed based on the aggregated data within the cloud.

  • Level 1: sharing model. Individual users train local AI models using their own data and share these trained AI models with the cloud. The global model is constructed through an aggregation of these local models within the cloud. The resultant global model is then distributed back to each local user.

  • Level 2: sharing results. Each user undertakes the entire process of locally training AI models and subsequently shares the obtained results or outputs with the cloud.

In this study, we develop a privacy-preserving method and multi-party distributed learning for level 1. The framework of a distributed predictive model is illustrated in Fig. 1. Within this architecture, each local data owner (i.e., manufacturer) configures an independent local predictive model based on their own dataset. These local models maintain identical structures. Subsequently, the local data owner uploads the parameters of their local model to the cloud. The cloud consolidates these parameters to build a global model mirroring the structure of the local models. Once the global model construction is complete, the cloud distributes this global model to local data owners. By operating within this framework, manufacturers can leverage highly accurate predictive models without sharing their own data.

Fig. 1
Collaborative learning through local model aggregation and global model distribution
Fig. 1
Collaborative learning through local model aggregation and global model distribution
Close modal

However, cloud environments can be compromised, potentially exposing information to unauthorized parties [10]. While the number of reported cyberattacks on manufacturers remains relatively small, this is because many are not aware that they are being attacked or fail to associate system failures with the possible cyberattacks [5]. In smart manufacturing, data often include sensitive information about operations and customers, so privacy-preserving methods for manufacturing analytics are essential. Predictive models that do not prioritize privacy protection are vulnerable to exploitation, as adversaries can manipulate model responses to access latent information that data owners would not wish to voluntarily disclose. Therefore, there is an urgent need to develop privacy-preserving methods for manufacturing analytics.

2.2 Privacy-Preserving Techniques.

To address privacy issues, privacy-preserving methods, such as cryptography, anonymization, and differential privacy, have been developed. Cryptographic approaches are implemented to secure identity and access to the system, effectively ensuring data confidentiality and integrity [5,11]. However, these methods require high computation power, which can hinder data processing. Data anonymization techniques that remove sensitive information before query evaluation have also been proposed [12]. While anonymization techniques accurately work with high-dimensional data and limit disclosure risks, they have some disadvantages. For example, the original data are lost by removing sensitive information [13]. Additionally, anonymization techniques may not ensure complete privacy-preserving, as the increase of dataset attributes can lead to re-identification. For example, the anonymized data released by Netflix for tech challenges were processed to identify users by matching their Netflix reviews with data from other sites like IMDb and to disclose individuals’ viewing histories [6]. Additionally, combining patient-level health data from Washington state with information from state news articles revealed individual patient privacy, even though the data contained only zip codes and no patient name or address. [14]. To address these issues, Dwork introduced differential privacy, which protects data by adding noise to algorithms [7,15]. Under differentially private models, the neighboring databases differing by one record cannot be identified by the same algorithm.

However, individual participation in the dataset is not the only concern of privacy-preserving. AI models are under threat of being exploited for a target individual’s sensitive information in the presence of available auxiliary information. These types of threats are called model inversion attacks. Predictive models developed from big data can be exploited to infer sensitive features that data owners do not want to expose [8]. Predictive models contain information about the correlated attributes of the training data, which can be exposed by model inversion attacks [16]. While conventional research in manufacturing informatics tends to focus on improving effectiveness, the primary goal of learning AI models should be to achieve the best prediction or accuracy while ensuring the security and privacy of sensitive data. Therefore, it is necessary to develop privacy-preserving algorithms integrated into AI models that prevent model inversion attacks while maintaining prediction performance.

Differential privacy was developed to mitigate the risk of models being exploited, and several studies have investigated the application of differential privacy to protect sensitive data while maintaining accuracy of predictive models. For example, Chaudhuri et al. [17] developed a logistic regression model incorporating differential privacy with two types of perturbation to a parametric learning process. The output perturbation added noise to the model’s output regression coefficients, and the other approach perturbed the model’s objective function to train the coefficients. Zhang et al. [18] proposed a functional mechanism of linear regression and logistic regression analysis under differential privacy. For learning prediction models, this study perturbed objective functions for coefficient training. Training methods such as back-propagation update by gradient descent techniques. Song et al. [19] first introduced a gradient perturbation technique adding noise to the gradient when the algorithms were trained by stochastic gradient descent algorithms. This research facilitated the development of differentially private machine learning models. In the healthcare industry, a linear regression model with differential privacy was developed to prevent model inversion attacks while maintaining model accuracy by adding noise to the coefficients [20]. Krall et al. [21,22] developed differential privacy algorithms for logistic regression by applying different levels of perturbation on the gradient based on the sensitivity of features. The proposed algorithms of their study effectively prevented model inversion attacks on sensitive features more accurately. Hu et al. [23] developed a regression model with differential privacy and evaluated the model with real manufacturing data by optimizing different perturbation mechanisms. The studies concentrated on linear or logistic regression models, but big data requires complex machine learning models such as neural networks.

Early studies of differential privacy in neural networks have been widely conducted in model training with image datasets, but recent research has explored various perturbation techniques in both the input and weight layers. Abadi et al. [24] improved the computational efficiency of differential privacy training models with non-convex objectives through the privacy accounting method. Arachchige et al. [25] redesigned the learning process with a local differentially private algorithm. Their study suggested adding a randomization layer between convolution and fully connected layers to perturb weights. Another perturbation technique has also been proposed that adds noise to the input. With the US census dataset, Wang et al. [26] presented a neural network model that estimated the importance of features, adaptively adding noise into the input data based on the importance. Kang et al. [27] explored input perturbation for empirical risk minimization. This study presented the experimental results of linear regression and multi-layer perceptron models with the KDD archive dataset. Nori et al. [28] developed a differential privacy mechanism added to explainable boosting machines. Their purpose was to achieve both high accuracy and interpretability while securing differential privacy. This method injected Gaussian noise in the residual summation step. Meanwhile, Li et al. [29] proposed personalized local differential privacy by injecting multiple Gaussian variables into the covariance matrix. Their method avoids the risk of model extraction attacks. Differential privacy can provide a robust solution to membership inference attacks. Jarin and Eshete [30] have provided a comprehensive study of differential privacy in all perturbation cases including input, objective, gradient, output, and prediction. Also, they established a framework to conduct a comprehensive privacy-utility trade-off analysis.

However, although these studies on differential privacy in neural networks have shown promise in enhancing prediction accuracy, very little has been done to assess the robustness of these methods against model inversion attacks. The fundamental goal of differential privacy is to strike a balance between preserving privacy and prediction power. Therefore, it becomes crucial to evaluate the robustness of these approaches against white-box inversion attacks. In this study, we conduct a comprehensive evaluation of the proposed MNP algorithms, considering both prediction accuracy and robustness against white-box inversion attacks proposed in Ref. [31].

3 Differential Privacy

In predictive modeling, a dataset D comprises n tuples with xi and a response variable yi. Each tuple xi has p features, denoted as xi = (xi1, …, xip). The goal of predictive modeling is to approximate the predictive analytic function f:Xy, where X is the domain of input feature vectors, assumed to satisfy the condition |xi|2 ≤ 1. Predictive models can be categorized into regression and classification based on the types of the response variable y. For example, classification models are suitable for discrete response variables such as y = 0 or 1, while regression models are more appropriate for continuous response variables, where yR. The choice of predictive models depends on the nature of problems and features of datasets. In this study, we aim to develop differentially private algorithms for neural network models to solve regression problems.

3.1 Conceptual Foundation.

Under differential privacy, the inclusion of individual inputs in a dataset does not lead to statistical differences in the algorithm’s output. Therefore, differentially private algorithms ensure that the adjacency dataset is indistinguishable from the original dataset [7].

(ε,δ)-differential privacy

Definition 1
A randomized algorithmA is (ε,δ)-differentially private if for all setsSRange(A)and for all datasets D and Ddiffering by at most one row:
Pr{A(D)S}eεPr{A(D)S}+δ

ε and δ are the privacy budget and the privacy loss threshold, respectively. ε controls the level of privacy and δ provides an upper bound on the probability that the privacy guarantee fails. The visualized premise of the (ε,δ)-differential privacy is illustrated in Fig. 2.

Fig. 2
The output of a (ε,δ)-differential private algorithm when two databases differ by at most one row
Fig. 2
The output of a (ε,δ)-differential private algorithm when two databases differ by at most one row
Close modal

The Gaussian mechanism achieves (ε,δ)-differential privacy by adding independently and identically distributed (i.i.d.) Gaussian noise to the output of a function g(D) that maps a database D to a p-dimensional vector. This Gaussian noise is zero-mean and has a variance that depends on the desired privacy level and the sensitivity of g(D).

The Gaussian mechanism

Definition 2
Given any functiong:DRp, the Gaussian mechanism is denoted by the algorithmA, defined as
A(D)=g(D)+(Z1,,Zp)
(1)
where Z1, …, Zp are i.i.d. random variables drawn from a Gaussian distribution with mean zero and variance σ2. The algorithm preserves (ε,δ)-differential privacy.

ℓ2 sensitivity

Definition 3
The2sensitivity of a functiong:DRpis denoted as
Δ2(g)=maxD,Ds.t.DΓ(D)g(D)g(D)2
where Γ(D) is a set of all neighboring datasets differing by one row.
Theorem 1
Under the assumption thatε(0,1)is arbitrary, the Gaussian mechanism preserves(ε,δ)-differential privacy, where
σ>2ln(1.25δ)Δ2ε
Algorithm 1

Input: Predictive model: f^Pred(Xs;X,W),

    nonsensitive inputs: X, queried response: y^,

    number of epochs: T, learning rate: η

Output: Estimated sensitive input values: Xs*

   1: Let τ=0

   2: Define JATK(Xs)=1ni=1nATK(Xs)

   3: Initialize Xs(0)randomvalues

   4: whileτ<T

   5:   Xs(τ+1)=Xs(τ)ηJATK(Xs(τ))

   6:   Set τ=τ+1

   7: end while

  13: Let Xs*=Xs(τ)

The proof of Theorem 1 is provided in Ref. [7].

3.2 White-Box Inversion Attack.

Differential privacy is a powerful tool for protecting sensitive information in statistical databases. However, it is not the only concern for privacy, as predictive models can also be exploited through model inversion attacks [8]. As illustrated in Fig. 3, an adversary manipulates their knowledge of the model’s structure and other auxiliary information to reconstruct one or more training samples, including sensitive features.

Fig. 3
The process of model inversion attack
Fig. 3
The process of model inversion attack
Close modal

Despite the complexity inherent in neural network models, their susceptibility to model inversion attacks persists when adversaries gain access to information concerning the model’s structure, including weight parameters, hyper-parameters, and nonsensitive variables. In this study, we assume that an adversary participates as a local data owner contributing to the distributed learning process. Armed with the knowledge of the global model, the adversary can construct an inversion attack model. The white-box model inversion attack is designed to reconstruct sensitive input values based on published information. The outlined process for this approach is presented in Algorithm 1. The effectiveness of inversion attacks is typically evaluated through the loss function of the target predictive model.

The inputs of Algorithm 1 include the predictive model f^Pred, nonsensitive inputs X′, the queried response y^, the number of epochs T, and a learning rate η. The predictive model is a converted form of a trained target neural network model y^=f^(X), denoted as y^=f^Pred(Xs;X,W)+ε, where the input is a sensitive variable Xs and ε is random noise. The algorithm begins by setting the iteration counter τ to 0 and defining the cost function for the model inversion attack, JATK, where ℓATK(Xs) is a loss function. The initial sensitive input values of Xs(0) are randomly generated. In each epoch τ, the sensitive input values are updated by the gradient JATK. Once Xs(τ) is updated, τ is incremented by one. This process continues until τ reaches T.

4 Privacy-Preserving Analytics

To ensure the preservation of sensitive features from model inversion attacks in manufacturing analytics, we develop privacy-preserving methods that mitigate risks from model inversion attacks. As illustrated in Fig. 4, the proposed method effectively protects sensitive features from white-box model inversion attacks. The adversaries in this type of attack have access to nonsensitive feature information (X′) as well as the structural information of the targeted predictive model, including weights (W), activation functions (H), and loss function (J(W)). In this study, the proposed algorithm is integrated into neural network model learning.

Fig. 4
Overview of privacy-preserving data analytics against model inversion attacks
Fig. 4
Overview of privacy-preserving data analytics against model inversion attacks
Close modal
The neural network model with two hidden layers is denoted as
f^(x)=W2h2(W1h1(W0x))
where x is an input vector, W0 is a weight matrix that corresponds to the input layer, and Wl for l = 1, 2 represents the weights for the lth hidden layer. Furthermore, hidden activation functions hl for each hidden layer l are employed to compute the output of the corresponding hidden layer.
Training a neural network model involves searching the set of coefficient parameters W that minimize prediction errors on the given training data. This is achieved by minimizing the cost function denoted as
J(W;X,y)=1ni=1n(W;xi,yi)
where ℓ(·) is a loss function that measures the discrepancy between the predicted output and the actual output for a given input sample (xi, yi).
The optimal set of coefficients W* is obtained by a gradient descent method that iteratively updates the weights. Specifically, at iteration τ, the weights are updated as
W(τ+1)W(τ)ηW(1ni(W(τ);xi,yi))
where W(τ) is the set of weights at iteration τ, η is the learning rate that controls the step size of the weight update, and W denotes the gradient of the cost function.

Differential privacy in predictive models is ensured by adding noise during model training. Specifically, this perturbation can be injected into the model’s optimal coefficients W*, the objective function J, or the gradient of the objective function J. The proposed neuron perturbation algorithm perturbs the objective’s gradient by multiplying Bernoulli noise in gradient descent methods.

4.1 Neuron Perturbation.

Differential privacy is a technique that preserves sensitive data in predictive models by adding controlled amounts of noise during the training process. This approach aims to balance privacy preservation and prediction power. To achieve both, we propose a perturbation technique inspired by dropout regularization, a way to prevent over-fitting in machine learning models [32]. The proposed method, called neuron perturbation, perturbs the weight coefficients of the input neurons (W0) by multiplying them with Bernoulli random variables (ξ) during each training iteration. This mechanism effectively adds noise to the model, making it more resistant to model inversion attacks.

Predictive modeling with neuron perturbation

Algorithm 2

Input: Dataset: D with features X and response y, learning rate: η,

    perturbation probability: pperturb, number of epochs: K, batch size: nb

Output: Approximate set of weights W*

   1: Initialize W(0)randomvariables,τ=0,κ=1

   2: Split D into a set of batches B, each of size nb

   3: whileκ<K

   4:   Generate a random vector ξ(κ) in which elements are
ξjk(κ)Bernoulli(1pperturb)j=1,,pk=1,,|h1|

   5:   for each b=1,,|B|do

   6:     Update weights
W(τ+1)=W(τ)ηWJ(W(τ);ξ(κ)X,y)

   7:     Set τ=τ+1

   8:   end for

   9:   Set κ=κ+1

  10: end while

  11: Let W*=W(τ)

Algorithm 2 outlines steps of the neuron perturbation approach. It takes input parameters including learning rate η, perturbation probability pperturb, number of epochs K, and batch size nb. To begin, the algorithm randomly generates initial weight values of W(0), sets the iteration counter τ to 0, and initializes the epoch counter κ to 1. The dataset D is split into batches of size nb, and at each epoch κ, the algorithm iterates over all batches. During each epoch, the algorithm generates a Bernoulli random vector ξ(κ), where ξjk(κ)Bernoulli(1pperturb) for j = 1, …, p and k = 1, …, |h1|. The algorithm multiplies the input variable by this vector and updates weights based on the gradient WJ computed with the perturbed variable. After each weight update, the value of τ is incremented by one. After the algorithm updates the weights according to every mini-batch, the κ value is incremented by one. This process continues iteratively until the number of epochs κ reaches K.

Because the expected value of each ξjk(κ) is 1 − pperturb, the predictive model integrated with neuron perturbation can be denoted as
f(x)=W2h2(W1h1(W0(1pperturb)x))=W2h2(W1h1((1pperturb)W0x))
Another equivalent approach to achieve the neuron perturbation effect is to train the model with initially scaled-up input weights (Θ) by multiplying a factor of 1/(1 − pperturb) and predict responses with scaled-down weights (W0). The transformed input weights are defined as Θ≜(1 − pperturb)W0. Then, the weight update of the transformed input weights in each iteration can be denoted as
Θ(τ+1)ξ(τ)Θ(τ)ηΘJ
(2)
where the elements of ξ′ are denoted by ξjk and follow a distribution ξjk ∼ 1/(1 − pperturb) · Bernoulli(1 − pperturb) for all j = 1, …, p and k = 1, …, |h1|.
The Bernoulli random variable ξjk can be approximated by a Gaussian distribution. Thus, ξjk is transformed into a Gaussian random variable zjkN(1,pperturb/(1pperturb)). Then, Eq. (2) can be changed to
Θ(τ+1)z(τ)Θ(τ)ηΘJ
(3)
where the elements of z′ are zjk.
Equation (3) can be approximated to
Θ(τ+1)Θ(τ)ηΘJ+z(τ)
(4)
where the elements of z follow zjkN(0,(pperturb/(1pperturb))(θjk(τ))2) for all j = 1, …, p and k = 1, …, |h1|. θjk(τ) is an element of Θ. As a result, the Bernoulli multiplicative noise term ξ is replaced by the Gaussian additive noise term z. Equation (4) is the form of the Gaussian mechanism for the weight update algorithm when training neural network models with neuron perturbation.

To establish that the process of updating scaled input weights (Θ) in Algorithm 2 satisfies Theorem 1, a proof is required. The proof shown in Theorem 2 is supported by Lemma 1 and Corollary 1. For all proofs, we make the assumption that g(D)=ηΘJ(Θ;D), where D and D′ are two neighboring datasets that differ by one row.

Lemma 1

The global sensitivity of the gradient descent weight update is at most(2η/n)maxdΘ(Θ(τ);d)2.

Proof.
Δ2(τ)=g(D)g(D)2=ηΘJ(Θ(τ);D)ηΘJ(Θ(τ);D)2=η(1ndiDΘ(Θ(τ);di))η(1ndiDΘ(Θ(τ);di))2=ηn(Θ(Θ(τ);d)Θ(Θ(τ);d))2ηn(Θ(Θ(τ);d)2+Θ(Θ(τ);d)2)2ηnmaxdΘ(Θ(τ);d)2
where d and d′ are arbitrary tuples in datasets D and D′, respectively.

Corollary 1

IfΘ21, the global sensitivity of gradient descent weight updates is at most 2η/n.

Theorem 2

Neuron perturbation in Algorithm 2 preserves (ε,δ)-differential privacy under the assumption that c2 > 2ln(1.25/δ).

Proof
The gradient update algorithm for τ with neuron perturbation is defined in Eq. (4). Specifically, the algorithm for a dataset D can be expressed as follows:
A(D)=ηΘJ(Θ(τ);D)+z(τ)=g(D)+z(τ)
The elements of z(τ) follow a normal distribution, with zjk(τ)N(0,(pperturb/(1pperturb))(θjk(τ))2) for all j = 1, …, p and k = 1, …, |h1|. Then the variance of z, which represent the elements of z(τ), is denoted as
σ2=pperturb1pperturbθ2>c2Δ22ε2
(5)
where θ is an arbitrary value representing θjk(τ), and the inequality follows from Theorem 3.22 and Theorem A.1 in Ref. [7]. Because ‖θ2 ≤ 1, Eq. (5) is transformed to
pperturb1pperturbpperturb1pperturbθ2>c2Δ22ε2
(6)
Then the privacy budget can be expressed as
ε>1pperturbpperturb(cΔ2)1pperturbpperturb(2ηcn)
(7)
where the first inequality is obtained from Eq. (5), and the second inequality follows from Corollary 1. When 2ηc/npperturb/(1pperturb), the value of ε does not exceed its upper bound 1 so this theorem satisfies Theorem 1.
We now compare the function g in the presence of v, where v is the vector that results from difference between datasets D and D′, and ‖v2 ≤ Δ2. According to the theorems in Ref. [7], the relationship between g(D) and g(D′) is denoted as
Pr{g(D)+vS}=Pr{g(D)+vS1}+Pr{g(D)+vS2}Pr{g(D)+vS1}+δexp(ε)Pr{g(D)+vS1}+δ
The second inequality follows from Lemma 2.
S is a range of g, and S1 and S2 are partitioned sets defined as
S1={g(D)+v|v2pperturb1pperturb(cΔ2)}S2{g(D)+v|v2>pperturb1pperturb(cΔ2)}
(8)

Lemma 2
The following equation holds:
Pr{g(D)+vS1}exp(ε)Pr{g(D)+vS1}
Proof.Pr{g(D)+vS1}=1(2πσ)pj=1pexp(|12(gj(D)vjσ)2|)=1(2πσ)pexp(|12σ2j=1p(gj(D)vj)2|)=1(2πσ)pexp(|12σ2j=1p(gj(D)gj(D)+gj(D)vj)2|)1(2πσ)pexp(|j=1p(gj(D)gj(D))22σ2|+|12σ2j=1p(gj(D)vj)2|)=exp(Δ222σ2)Pr{g(D)+vS1}exp(Δ2221pperturbpperturb)Pr{g(D)+vS1}exp(ε2)Pr{g(D)+vS1}exp(ε)Pr{g(D)+vS1}
The second and third inequalities follow from Eq. (6), and the fourth inequality results from ε(0,1).

4.2 Mosaic Neuron Perturbation.

Algorithm 3

Input: Dataset: D with features X and response y, set of feature labels: Φ, learning rate: η, perturbation probability: pperturb, epoch number: K, batch size: nb, sensitive ratio: γ, fraction of sensitive features: ψS

Output: Approximate set of weights W*

  1: Initialize W(0)randomvariables,τ=0,κ=1, ψN=1ψS

  2: Set pS=11+((1pperturb)/pperturb)(γ/(ψN+ψSγ)), and
pN=11+((1pperturb)/pperturb)(1/(ψN+ψSγ))

  3: Split D into a set of batches B, each of size nb

  4: whileκ<K

  5:   Generate a random vector ξ(κ) in which elements are
ξjk(κ)Bernoulli(1pS)jΦSandk=1,,|h1|
ξjk(κ)Bernoulli(1pN)jΦNandk=1,,|h1|

  6:   for each b=1,,|B|do

  7:     Update weights
W(τ+1)=W(τ)ηWJ(W(τ);ξ(κ)X,y)

  8:     Set τ=τ+1

  9:   end for

 10:     Set κ=κ+1

 11: end while

 12: Let W*=W(τ)

The neuron perturbation mechanism effectively balances the trade-off between prediction power and privacy preservation against model inversion attacks. However, this mechanism treats all features equally and does not differentiate between sensitive and nonsensitive attributes. This can be problematic because sensitive features may carry more privacy risks than nonsensitive features. Furthermore, features highly correlated with the responses may face more vulnerability to privacy leakage.

Therefore, a more sophisticated approach is required to enhance the privacy of sensitive features. We propose a MNP algorithm that addresses this issue by weakening the correlation between sensitive attributes and responses, as outlined in Algorithm 3. The algorithm introduces different levels of perturbation with corresponding perturbation probabilities (pS for sensitive and pN for nonsensitive features) to inject more noise into the sensitive features. Specifically, the weight gradient update corresponding to sensitive features is more heavily perturbed to diminish the correlation. The relationship between pS and pN is determined by the sensitive ratio parameter, which is defined as γ = ((1 − pS)/pS)/((1 − pN)/pN), and 0 ≤ γ ≤ 1. A smaller γ indicates that more perturbation is injected into sensitive attributes.

In the MNP algorithm, features are labeled by a set Φ and partitioned into sets ΦS and ΦN such that Φ=ΦSΦN and ΦSΦN=. This partitioning allows the algorithm to determine the contributions of sensitive and nonsensitive features to the perturbation probability (pperturb). The separated perturbation probabilities pS and pN are applied to their corresponding partitioned attributes, with the sensitive features being perturbed more intensively to weaken their correlation with the response. The contributions of sensitive and nonsensitive features to pperturb are expressed as ψS and ψN, respectively.

Algorithm 3 takes input parameters including learning rate η, perturbation probability pperturb, number of epochs K, and batch size nb. It then generates initial weight values of W(0), initializes the iteration counter τ to 0, and sets the epoch counter κ to 1. To prepare the dataset D for training, the algorithm splits it into batches of size nb. During each epoch κ, the algorithm iterates over all batches, generating a Bernoulli random vector ξ(κ). The vector is constructed such that ξjk(κ)Bernoulli(1pS) for j ∈ ΦS and ξjk(κ)Bernoulli(1pN) for j ∈ ΦN and k = 1, …, |h1|. For a given batch b, the Bernoulli vector is multiplied with corresponding features, and the weights are updated based on the gradient WJ computed with the perturbed features. Each time weights are updated, the τ value is incremented by one. When all mini-batches b have been used to update the weights, the value of κ is incremented by one. This iterative process ends when the κ value reaches K.

The differential private properties of Algorithm 3 are formally proven in Theorem 3.

Theorem 3

MNP in Algorithm 3 preserves (ε,δ)-differential privacy.

Proof
We compare the function g in the presence of an arbitrary v differing datasets D and D′ where ‖v2 ≤ Δ2, S is a range of g, and S1 and S2 are defined in Eq. (8). Then, Pr{g(D)+vS1} is denoted as
Pr{g(D)+vS1}=j=1p12πσjexp(|12(gj(D)vjσj)2|)=jΦS12πσjexp(|12(gj(D)vjσj)2|)jΦN12πσjexp(|12(gj(D)vjσj)2|)j=1p12πσj×exp(|jΦS(gj(D)gj(D))22σj2|+|jΦS12σj2(gj(D)vj)2|)exp(|jΦN(gj(D)gj(D))22σj2|+|jΦS12σj2(gj(D)vj)2|)=exp(|jΦS(gj(D)gj(D))22σj2|+|jΦN(gj(D)gj(D))22σj2|)×Pr{g(D)+vS1}
where σj2 is a corresponding variance with respect to the perturbation probability on the jth variable.
Since the following equation holds:
exp(|jΦS(gj(D)gj(D))22σj2|+|jΦN(gj(D)gj(D))22σj2|)=exp(121pSpSjΦS(Θj(Θ(τ);D)Θj(Θ(τ);D))2)×exp(121pNpNjΦN(Θj(Θ(τ);D)Θj(Θ(τ);D))2)
Pr{g(D)+vS1} can be denoted as
Pr{g(D)+vS1}exp(Δ2221pSpSψS+Δ2221pNpNψN)Pr{g(D)+vS1}=exp(Δ222(γψSγψS+ψN+ψNγψS+ψN)1pperturbpperturb)Pr{g(D)+vS1}=exp(Δ2221pperturbpperturb)Pr{g(D)+vS1}exp(ε2)Pr{g(D)+vS1}exp(ε)Pr{g(D)+vS1}
The second inequality results from Eq. (6), and the third inequality follows from ε(0,1).

Lemma 3

Parameters for fractions of sensitive and nonsensitive features, ψS and ψN, are at mostmaxd(ΘS22/Θ22)andmaxd(ΘN22/Θ22), respectively.

Proof
ψS and ψN are defined as fractional contributions to the L2 norm of the loss gradients for sensitive and nonsensitive features, respectively.
ψS=(ηnΔ2)2jΦS(Θj(Θ(τ);D)Θj(Θ(τ);D))2(ηnΔ2)2maxdjΦS(Θj(Θ(τ);d))2=maxdjΦS(Θj(Θ(τ);d))2Θ(Θ(τ);d)22=maxdΘS(Θ(τ);d)22Θ(Θ(τ);d)22
With the similar steps, ψN can be denoted as
ψN=maxdjΦN(Θj(Θ(τ);d))2Θ(Θ(τ);d)22=maxdΘN(Θ(τ);d)22Θ(Θ(τ);d)22

Corollary 2

IfΘ22θ2, then ψS and ψN are freely configured arbitrarily, subject to the constraint ψS + ψN = 1.

Proof
The constraint ψS + ψN = 1 and the condition Θ22θ2 can be denoted as
1=ψS+ψN=maxdΘS(Θ(τ);d)22+ΘN(Θ(τ);d)22Θ(Θ(τ);d)22=ΘS2+ΘN2θ2ΘS22+ΘN22=θ2
d is arbitrarily chosen so the selection of a maximal loss gradient varies. The choice of d impacts ΘS22 and ΘN22. The magnitude of contribution ΘS22 and ΘN22 to Θ22 is different under the constraints that ΘS22θ2 and ΘN22θ2.

4.3 Multi-party Mosaic Neuron Perturbation.

Cloud computing has become an integral part of smart factory operations, facilitating the interconnection of intelligent manufacturing things and enabling distributed and collaborative learning for informed decision-making. In particular, distributed predictive models significantly enhance prediction accuracy and computational efficiency when processing large amounts of data. However, such models are vulnerable to model inversion attacks, which happen when an adversary is able to infer the dataset of a single participant, allowing them to penetrate the information of all participants contributing to the distributed model.

To address this challenge, we propose the development of a distributed learning version of MNP, called the multi-party MNP algorithm. The framework of distributed learning is depicted in Fig. 1. In this framework, each local data owner configures a local predictive model based on their own dataset. The local model parameters are uploaded to the cloud, where the global model is structured. Then, the global model aggregates and updates its parameters accordingly. Once the aggregation is complete, the global model distributes the updated parameters, which can then be leveraged by local users to achieve high prediction performance.

Predictive modeling with multi-party MNP

Algorithm 4

Input: The number of local models: nparty,

    local dataset: Dm for m=1,,nparty, set of feature labels: Φ,

    learning rate: η, perturbation probability: pperturb, epoch number: K,

    batch size: nb, sensitive ratio: γ, fraction of sensitive features: ψS

Output: Approximate set of weights of the global model WG*

  1: Process each local model fm to independent processors

  2: for allm=1,,npartydo

  3:    Initialize Wm(0),τ=0,κ=1, ψN=1ψS

  4:    Set pS=11+((1pperturb)/pperturb)(γ/(ψN+ψSγ)), and
pN=11+(1pperturb/(pperturb))(1/(ψN+ψSγ))

  5:    Split D into a set of batches B, each of size nb

  6:    whileκ<K

  7:     Generate a random vector ξ(κ) in which elements are
ξjk(κ)Bernoulli(1pS)jΦSk=1,,|h1|
ξjk(κ)Bernoulli(1pN)jΦNk=1,,|h1|

  8:     for each b=1,,|B|do

  9:       Update weights
W(τ+1)=W(τ)ηWJ(W(τ);ξ(κ)X,y)

 10:       Set τ=τ+1

 11:      end for

 12:      Set κ=κ+1

 13:   end while

 14:   Let Wm*=W(τ)

 15: end for

 16: WG*=1npartymWm*

Protecting both the global and local models is essential in collaborative learning systems. To achieve this, the multi-party MNP algorithm introduces perturbation into local model learning. This approach helps effectively prevent access to sensitive information by adversaries outside the collaborative learning system as well as inside the system. Moreover, the distributed learning model offers higher computational efficiency compared to the serial learning model, making it a viable option for processing large datasets. The proposed method is outlined in detail in Algorithm 4.

The implementation of Algorithm 4 involves several input parameters, including learning rate η, perturbation probability pperturb, number of epochs K, batch size nb, and number of participants nparty. The global model learns its global weights WG by aggregating local weights Wm of a local prediction model fm, where m = 1, …, nparty. The procedure outlined in Algorithm 3 is independently applied to train each local model. Overall, the multi-party MNP algorithm ensures differential privacy, effectively preserving sensitive information.

5 Experimental Design

5.1 Design of Experiments.

The proposed MNP and multi-party MNP algorithms are evaluated in this study through experiments based on real-world CNC turning data. These data were collected through sensors in a machine shop to develop monitoring solutions for energy usage [33]. The dataset contains information regarding 1973 workpieces made from 22 different materials. During operation, workpiece profiles were recorded, and workstation descriptions and machining parameters were collected through sensors. Detailed information about features is explained in Table 1. The response for this dataset is the power consumption of seven lathe machines.

Table 1

Feature descriptions in the machining case study

GroupFeatureExplanation
Workpiece profileLevel of refined processingRough, half-refined, thoroughly refined
Workpiece diameterReal number (mm)
Workpiece material45Fe, 40Cr, 60Si2Mn, HT300, 1Cr13, HT200, 45Fe(T235), 35Fe, 0Cr18Ni9Ti, Q235,40Cr(HRC 48), 40Cr(T235), 45(T235), 1Gr13, 20CrMnTi, 2Cr13, 20CrMnTi(HB170), 6061Al, 38CrMoALA, T10A, G10, 20Cr
Material hardnessReal number (HB)
Tensile strengthReal number (MPa)
Machining parameterMachine modelsC2-360HK, C2-50HK/1, C2-6150, C2-6150HK/1, CHK360, CHK460, CHK560
Angle of bladesReal number (deg)
Blade inclinationReal number (deg)
Cutting speedReal number (m/min)
Spindle speedReal number (r/min)
Feed rateReal number (mm/r)
Cutting depthReal number (mm)
Workstation descriptionAir-cutting power (W)Referring to the total power of the machine tool when cutting (removing materials) according to certain parameters
Idle power (W)Referring to the total power of the machine tool when the machine tool spindle rotates at a certain speed, but the workpiece material is not removed
GroupFeatureExplanation
Workpiece profileLevel of refined processingRough, half-refined, thoroughly refined
Workpiece diameterReal number (mm)
Workpiece material45Fe, 40Cr, 60Si2Mn, HT300, 1Cr13, HT200, 45Fe(T235), 35Fe, 0Cr18Ni9Ti, Q235,40Cr(HRC 48), 40Cr(T235), 45(T235), 1Gr13, 20CrMnTi, 2Cr13, 20CrMnTi(HB170), 6061Al, 38CrMoALA, T10A, G10, 20Cr
Material hardnessReal number (HB)
Tensile strengthReal number (MPa)
Machining parameterMachine modelsC2-360HK, C2-50HK/1, C2-6150, C2-6150HK/1, CHK360, CHK460, CHK560
Angle of bladesReal number (deg)
Blade inclinationReal number (deg)
Cutting speedReal number (m/min)
Spindle speedReal number (r/min)
Feed rateReal number (mm/r)
Cutting depthReal number (mm)
Workstation descriptionAir-cutting power (W)Referring to the total power of the machine tool when cutting (removing materials) according to certain parameters
Idle power (W)Referring to the total power of the machine tool when the machine tool spindle rotates at a certain speed, but the workpiece material is not removed

In this experimental study, air-cutting power, which refers to the total power of the machine tool during material removal based on input parameters, was considered the sensitive feature. The reason for setting this attribute as the sensitive feature is that it contains characteristics of the overall process. The duration of cutting and air-cutting power data can reveal other machining parameters that manufacturers do not want to share. Idle power serves as a feature for the comparative analysis, representing the total power of the tool when its spindle rotates without material removal.

The proposed MNP techniques aim to balance model prediction power with robustness against model inversion attacks. To evaluate the models’ performance, two performance metrics were measured: prediction accuracy (RPred2) and attack risk (RATK2). Prediction accuracy was computed with prediction values from the prediction model (y^). Attack risk was defined by calculating the coefficient of determination of the white-box model inversion attack for an arbitrary feature (Xj) as follows:
RATK2=1iD(xijx^ij)2dD(xijx¯ij)2
where x^ij represents the predicted value, and x¯ij stands for the average value of the feature. The range is between 0 and 1. While the baseline model without perturbation provides high prediction accuracy, it remains vulnerable to model inversion attacks. On the other hand, the neural network model incorporated with the proposed MNP algorithm is robust against model inversion attacks while maintaining prediction power. Therefore, the experimental results are discussed in terms of both prediction accuracy and attack risk.

The experimental study comprises five phases. First, the baseline model provides a benchmark for prediction accuracy and attack risk. Second, the performance metrics of the neuron perturbation algorithm are evaluated. Third, the proposed MNP algorithm is validated by comparing its performance to that of the baseline model. Also, the results are analyzed based on varied levels of perturbation probability (pperturb) and sensitive ratio (γ). Furthermore, to assess the effectiveness of MNP, the risk of attack on sensitive and nonsensitive attributes is analyzed. Fourth, to examine the computational efficiency of the multi-party MNP, the model learning speed is investigated. The performance of multi-party distributed models is compared to that of serial models for different-sized training datasets. Last, results of MNP with the case of two sensitive features are presented and analyzed.

5.2 Model Configuration.

The experiment involved partitioning the dataset into a training set comprising 1500 inputs and a test set containing 463 inputs. The architecture of the neural network model encompassed two hidden layers with (4, 3) hidden nodes, while the activation function employed for each hidden layer was the sigmoid function. The selected loss function was the sum of squared errors. For each scenario of training prediction models, a total of K = 5000 epochs were executed, with a batch size nb set at 500, resulting in three training batches. A learning rate of η = 0.005 was applied. To ensure the outcome’s robustness, the model training process was repeated 30 times. Additionally, the training dataset was employed to train the white-box attack model, spanning a total of T = 50, 000 epochs. The learning rate for the attack model was established at 0.005. Similar to the targeted prediction model, the attack model training was replicated 30 times.

In exploring the computational efficiency of the multi-party MNP algorithm, the number of processors varied between 1 and 16, while the training data were augmented to 12,000 instances by an autoencoder. The autoencoder consisted of a hidden layer with 25 nodes. The performance of this autoencoder was R2 = 0.991. All experimentation was conducted within python 3.7 on a system furnished with 128GB of RAM and an Intel 10-core Xeon E5-2680 processor operating at 2.2 GHz.

6 Experimental Results

Performance metrics of the benchmark model show that it is vulnerable to model inversion attacks with a high attack risk of RATK2=0.951, although it accurately predicts responses as RPred2=0.926. This result shows that adversaries can infer sensitive information almost perfectly. To address this problem, adding neuron perturbation to predictive model learning mitigates the risk of model inversion attacks. As illustrated in Fig. 5, the attack risk decreases as pperturb increases. When pperturb = 0.05, the attack risk is 0.209, and prediction accuracy is 0.888. For pperturb ≥ 0.1, the attack risk converges to zero, but at the same time, prediction accuracy decreases with an increment of perturbation probability. Therefore, carefully designed noise is required to mitigate the attack risk on sensitive attributes while maintaining prediction power.

Fig. 5
Prediction accuracy and white-box attack risk of neuron perturbation under varying pperturb
Fig. 5
Prediction accuracy and white-box attack risk of neuron perturbation under varying pperturb
Close modal

6.1 Mosaic Neuron Perturbation Results.

Changes in γ control perturbation probabilities for the sensitive and nonsensitive features in the MNP algorithm. For example, while pperturb remains constant at 0.015, the consequential impact of γ is outlined in Table 2. It is notable that pN maintains its value while pS increases drastically as the γ decreases. As the perturbation probability increases, more noise is injected into the learning process to preserve the latent information of sensitive features. Therefore, this relationship presents that the MNP algorithm adds relatively more noise to sensitive attributes while less perturbing nonsensitive features.

Table 2

The variation of perturbation probabilities pS and pN with respect to γ for pperturb = 0.015

γpSpNγpSpN
1.000.0150.0150.500.0290.015
0.900.0170.0150.400.0360.015
0.800.0190.0150.300.0480.015
0.700.0210.0150.200.0690.015
0.600.0250.0150.100.1300.015
γpSpNγpSpN
1.000.0150.0150.500.0290.015
0.900.0170.0150.400.0360.015
0.800.0190.0150.300.0480.015
0.700.0210.0150.200.0690.015
0.600.0250.0150.100.1300.015

The effect of varying perturbation probability and sensitive ratio on prediction accuracy and attack risk is illustrated in Fig. 6. The results show that increasing pperturb or decreasing γ reduces the attack risk. In contrast, prediction accuracy is not significantly influenced by these parameters. As shown in Fig. 6(a), prediction accuracy remained around 0.9, regardless of pperturb and γ. On the other hand, predictive models face more risks of model inversion attacks when γ is high, and pperturb is low as illustrated in Fig. 6(b). Therefore, it is concluded that a low perturbation probability (pperturb ≤ 0.02) combined with a lower sensitive ratio (γ ≤ 0.1) can balance high prediction accuracy with robustness against model inversion attacks.

Fig. 6
Experimental results of the MNP algorithm with respect to the perturbation probability and the sensitive ratio for (a) prediction accuracy and (b) white-box model inversion attack risk
Fig. 6
Experimental results of the MNP algorithm with respect to the perturbation probability and the sensitive ratio for (a) prediction accuracy and (b) white-box model inversion attack risk
Close modal

The impact of each parameter on prediction accuracy and attack risk is illustrated in Fig. 7. When pperturb is set to 0.015, the prediction model applying MNP guarantees the same level of prediction accuracy as the baseline model regardless of the sensitive ratio as shown in Fig. 7(a). On the other hand, the attack risk on the sensitive feature (air-cutting power) decreases as γ decreases. This is because pS increases when γ is low, as outlined in Table 2. Also, with a fixed value of γ = 0.35, the attack risk decreases more rapidly as pperturb increases. In Fig. 5 with the neuron perturbation algorithm, the attack risk reaches zero when pperturb > 0.1. On the other hand, the MNP algorithm provides higher privacy as depicted in Fig. 7(b). In this case, the attack risk converges to zero when pperturb > 0.04. This is because pS is 0.105 where γ = 0.35. Overall, these results are noteworthy that the MNP algorithm effectively balances prediction accuracy with privacy.

Fig. 7
Prediction accuracy and white-box attack risk of MNP (a) under varying γ where pperturb = 0.015 and (b) under varying pperturb where γ = 0.35
Fig. 7
Prediction accuracy and white-box attack risk of MNP (a) under varying γ where pperturb = 0.015 and (b) under varying pperturb where γ = 0.35
Close modal

The comparison experimental results are presented to verify the attack mitigation performance of the MNP algorithm on sensitive attributes. In this section, idle power serves as the nonsensitive feature. The attack risk on idle power is shown in Fig. 8(a). This feature has a correlation of 0.461 with the response. The overall risk of this feature is higher than that of sensitive features. Furthermore, the attack risk on this feature decreases as the perturbation probability increases, but it is not meaningfully impacted by the sensitive ratio. This trend is depicted in Fig. 8(b) with a fixed value of pperturb = 0.015. As γ decreases, the MNP algorithm effectively mitigates the attack risk on the sensitive feature (air-cutting power), while the attack risks on idle power do not significantly depend on γ. The attack risk on idle power remains above 0.8 regardless of changes in γ.

Fig. 8
(a) Attack risks on idle power as XN and (b) comparison of attack risks between air-cutting power and idle power where pperturb = 0.015
Fig. 8
(a) Attack risks on idle power as XN and (b) comparison of attack risks between air-cutting power and idle power where pperturb = 0.015
Close modal

6.2 Multi-Party Mosaic Neuron Perturbation Results.

Distributed learning ensures high effectiveness and computational efficiency. The variation of perturbation probability and sensitive ratio on prediction accuracy and attack risk is illustrated in Fig. 9. pperturb and γ are parameters of local predictive models, and performance metrics are measured for the global predictive model distributed to the data owners. The use of multi-party models incorporating local models into the MNP increases robustness against model inversion attacks while maintaining prediction accuracy.

Fig. 9
Performance comparison of (a) prediction accuracy and (b) white-box model inversion attack risk on air-cutting power with respect to the perturbation probability and the sensitive ratio
Fig. 9
Performance comparison of (a) prediction accuracy and (b) white-box model inversion attack risk on air-cutting power with respect to the perturbation probability and the sensitive ratio
Close modal

As shown in Fig. 9, a lower sensitive ratio reduces the attack risk by providing relatively higher attack prevention for sensitive features. In contrast, the sensitive ratio has little impact on prediction accuracy of the multi-party MNP as displayed in Fig. 9(a). When pperturb is fixed, attack risks decrease as the sensitive ratio reduces. In this case, the multi-party MNP algorithm can effectively prevent model inversion attacks when γ ≤ 0.1 while maintaining high prediction power above 0.9.

As illustrated in Fig. 9(b), distributed prediction models face higher threats of model inversion attacks. Compared to the single MNP result in Fig. 6, the attack risk on the sensitive feature is higher in the distributed learning case. However, it is remarkable that multi-party MNP algorithms can provide high levels of privacy with low γ, for example, less than 0.015 in this case.

In terms of the effectiveness of the multi-party MNP algorithm, the more cores participating in the distributed learning, the faster learning on large datasets, as shown in Fig. 10. Regardless of the data size, the learning time of serial learning models is longer than that of multi-party models. In particular, if there are 16 participants, the learning efficiency is more than 10 times higher than that of serial learning. For example with 12,000 inputs, training a serial model requires 68 s, whereas a model with 16 cores needs 6.067 s. These results strongly support that the proposed multi-party MNP algorithm is computationally efficient and accurate in mitigating attack risk and maintaining prediction power.

Fig. 10
Performance comparison of learning efficiency between an MNP algorithm and multi-party MNP algorithms with 4, 8, and 16 cores
Fig. 10
Performance comparison of learning efficiency between an MNP algorithm and multi-party MNP algorithms with 4, 8, and 16 cores
Close modal

6.3 Discussion.

The proposed MNP algorithm injects noise during model training, which increases the computational expense while learning a prediction model. The neuron perturbation algorithm multiplies inputs by Bernoulli noise, perturbing it with each iteration of weight updates. This additional process increases the computational cost. Furthermore, the MNP algorithm adds noise in more steps, including separately generating perturbation probabilities concerning the sensitivity of features and injecting Bernoulli noise based on the perturbation probabilities. Training times were collected to investigate the computational complexity of the proposed algorithm. As illustrated in Fig. 11, the training time for each case significantly varies. The median training time for the benchmark case is 10.23 s, whereas the median training times for the neuron perturbation and MNP cases are 11.12 and 11.39 s, respectively. Each case’s 99% confidential interval does not overlap with any other case. This result shows that the proposed algorithm requires more computational resources to train the predictive model.

Fig. 11
Computational complexity comparison of algorithms: (a) benchmark, (b) neuron perturbation, and (c) MNP
Fig. 11
Computational complexity comparison of algorithms: (a) benchmark, (b) neuron perturbation, and (c) MNP
Close modal

Another consideration of the proposed model is the balance of perturbation probabilities between sensitive and nonsensitive features. Reducing γ increases pS and decreases pN. For example, extreme reductions in the value of γ increase the risk of model inversion attacks by decreasing perturbation on nonsensitive features.

The case study in this section was conducted under the assumption of one sensitive feature. However, the proposed model can perform well when there are multiple sensitive features. The number of sensitive attributes impacts the values of pS and pN, leading to varying degrees of perturbation. For example, assuming a case study with three sensitive features and, pperturb = 0.015 and γ = 0.1, pS and pN are 0.119 and 0.013, respectively. On the other hand, pS and pN are 0.130 and 0.015, respectively, assuming a single sensitive attribute as described in Table 2. As the number of sensitive attributes increases, both pS and pN decrease more than the single case. This is due to changes in the contributions of sensitive and nonsensitive features to pperturb, denoted by ψS and ψN, respectively.

The sensitivity of an attribute impacts the correlation between the risk of an attack and the sensitive ratio. Assuming that there are two sensitive features (air-cutting power and idle power), the experimental results for idle power are illustrated in Fig. 12. Prediction accuracy shown in Fig. 12(a) with various levels of parameters does not differ from that presented in Fig. 6(a), which is the case that idle power served as a nonsensitive feature. On the other hand, the attack risk shows different results from Fig. 6(b). When γ = 0.9, the attack risk is indistinguishable from the previous case. However, the attack risk depends on changes in the γ values. Setting the sensitive ratio to 0.1 converges the attack risk to zero. These results show that the proposed MNP algorithm works effectively in the case with multiple sensitive attributes.

Fig. 12
Performance comparison of (a) prediction accuracy and (b) white-box model inversion attack risk on idle power as XS
Fig. 12
Performance comparison of (a) prediction accuracy and (b) white-box model inversion attack risk on idle power as XS
Close modal

7 Conclusions

In this paper, we design and develop a novel privacy-preserving algorithm called MNP for manufacturing data analytics that fully leverages the smartness of AI models while effectively reducing the risk of sensitive data leakage due to model inversion attacks. The MNP technique effectively perturbs neural network model training by injecting carefully designed noise, ensuring differential privacy. Additionally, the algorithm can be extended to a distributed version, called the multi-party MNP algorithm, to address the privacy risk of collaborative learning, while providing the benefits of computational efficiency and prediction accuracy. The MNP technique introduces two control parameters, pperturb and γ, where pperturb determines the level of privacy, and γ controls the ratio of perturbation to sensitive and nonsensitive attributes, allowing the algorithm to minimize the risk of inversion attacks on the sensitive feature while maintaining the predictive model’s prediction power as high as the model without perturbation. Experimental results with the real-world CNC data showed that the proposed algorithm provides higher robustness to white-box model inversion attacks while maintaining prediction accuracy. By employing this novel and flexible MNP algorithm, manufacturers can fully leverage the smartness of AI to make informed decisions while concurrently enhancing cybersecurity, mitigating the risk of sensitive information leakage.

Acknowledgment

This material is based on research sponsored by Office of the Under Secretary of Defense for Research and Engineering, Strategic Technology Protection and Exploitation, and Defense Manufacturing Science and Technology Program under agreement number W15QKN-19-3-0003. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The authors attest that all data for this study are included in the paper.

References

1.
Yang
,
H.
,
Kumara
,
S.
,
Bukkapatnam
,
S. T.
, and
Tsung
,
F.
,
2019
, “
The Internet of Things for Smart Manufacturing: A Review
,”
IISE Trans.
,
51
(
11
), pp.
1190
1216
.
2.
IBM
,
2022
, “
X-Force Threat Intelligence Index 2022
,” https://www.ibm.com/security/data-breach/threat-intelligence/
3.
Ponemon-Institute
,
2019
, “
2019 Global State of Cybersecurity in Small and Medium-Sized Businesses
,” https://start.keeper.io/2019-ponemon-report
4.
Rigaki
,
M.
, and
Garcia
,
S.
,
2020
, “
A Survey of Privacy Attacks in Machine Learning
,”
ACM Comput. Surv.
, arXiv preprint. arXiv:2007.07646
5.
Tuptuk
,
N.
, and
Hailes
,
S.
,
2018
, “
Security of Smart Manufacturing Systems
,”
J. Manuf. Syst.
,
47
, pp.
93
106
.
6.
Narayanan
,
A.
, and
Shmatikov
,
V.
,
2008
, “
Robust De-anonymization of Large Sparse Datasets
,”
2008 IEEE Symposium on Security and Privacy (SP 2008)
,
Oakland, CA
,
May 18–22
,
IEEE
, pp.
111
125
.
7.
Dwork
,
C.
, and
Roth
,
A.
,
2014
, “
The Algorithmic Foundations of Differential Privacy
,”
Found. Trends Theor. Comput. Sci.
,
9
(
3–4
), pp.
211
407
.
8.
Fredrikson
,
M.
,
Lantz
,
E.
,
Jha
,
S.
,
Lin
,
S.
,
Page
,
D.
, and
Ristenpart
,
T.
,
2014
, “
Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing
,”
23rd USENIX Security Symposium (USENIX Security 14)
,
San Diego, CA
,
Aug. 20–22
, pp.
17
32
.
9.
Ma
,
C.
,
Li
,
J.
,
Wei
,
K.
,
Liu
,
B.
,
Ding
,
M.
,
Yuan
,
L.
,
Han
,
Z.
, and
Poor
,
H. V.
,
2023
, “
Trusted AI in Multiagent Systems: An Overview of Privacy and Security for Distributed Learning
,”
Proc. IEEE
,
111
(
9
), pp.
1097
1132
.
10.
Esposito
,
C.
,
Castiglione
,
A.
,
Martini
,
B.
, and
Choo
,
K.-K. R.
,
2016
, “
Cloud Manufacturing: Security, Privacy, and Forensic Concerns
,”
IEEE Cloud Comput.
,
3
(
4
), pp.
16
22
.
11.
Wu
,
D.
,
Ren
,
A.
,
Zhang
,
W.
,
Fan
,
F.
,
Liu
,
P.
,
Fu
,
X.
, and
Terpenny
,
J.
,
2018
, “
Cybersecurity for Digital Manufacturing
,”
J. Manuf. Syst.
,
48
, pp.
3
12
.
12.
Sweeney
,
L.
,
2002
, “
k-Anonymity: A Model for Protecting Privacy
,”
Int. J. Uncertainty Fuzziness Knowledge-Based Syst.
,
10
(
5
), pp.
557
570
.
13.
Hassan
,
M. U.
,
Rehmani
,
M. H.
, and
Chen
,
J.
,
2019
, “
Differential Privacy Techniques for Cyber Physical Systems: A Survey
,”
IEEE Commun. Surv. Tutorials
,
22
(
1
), pp.
746
789
.
14.
Sweeney
,
L.
,
2013
, “
Matching Known Patients to Health Records in Washington State Data
,”
preprint arXiv:1307.1370
.
15.
Dwork
,
C.
,
McSherry
,
F.
,
Nissim
,
K.
, and
Smith
,
A.
,
2006
, “
Calibrating Noise to Sensitivity in Private Data Analysis
,”
Theory of Cryptography Conference
,
New York, NY
,
Mar. 4–7
,
Springer
, pp.
265
284
.
16.
Fredrikson
,
M.
,
Jha
,
S.
, and
Ristenpart
,
T.
,
2015
, “
Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures
,”
Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
,
Denver, CO
,
Oct. 2–16
, pp.
1322
1333
.
17.
Chaudhuri
,
K.
,
Monteleoni
,
C.
, and
Sarwate
,
A. D.
,
2011
, “
Differentially Private Empirical Risk Minimization
,”
J. Mach. Learn. Res.
,
12
(
29
), pp.
1069
1109
.
18.
Zhang
,
J.
,
Zhang
,
Z.
,
Xiao
,
X.
,
Yang
,
Y.
, and
Winslett
,
M.
,
2012
, “
Functional Mechanism: Regression Analysis Under Differential Privacy
,”
Proc. VLDB Endowment
,
5
(
11
), pp.
1364
1375
.
19.
Song
,
S.
,
Chaudhuri
,
K.
, and
Sarwate
,
A. D.
,
2013
, “
Stochastic Gradient Descent With Differentially Private Updates
,”
2013 IEEE Global Conference on Signal and Information Processing
,
Austin, TX
,
Dec. 3–5
,
IEEE
, pp.
245
248
.
20.
Wang
,
Y.
,
Si
,
C.
, and
Wu
,
X.
,
2015
, “
Regression Model Fitting Under Differential Privacy and Model Inversion Attack
,”
Proceedings of the 24th International Conference on Artificial Intelligence
,
Buenos Aires, Argentina
,
July 25–31
, pp.
1003
1009
.
21.
Krall
,
A.
,
Finke
,
D.
, and
Yang
,
H.
,
2020
, “
Gradient Mechanism to Preserve Differential Privacy and Deter Against Model Inversion Attacks in Healthcare Analytics
,”
2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
,
Montreal, QC, Canada
,
July 20–24
,
IEEE
, pp.
5714
5717
.
22.
Krall
,
A.
,
Finke
,
D.
, and
Yang
,
H.
,
2020
, “
Mosaic Privacy-Preserving Mechanisms for Healthcare Analytics
,”
IEEE J. Biomed. Health Inf.
,
25
(
6
), pp.
2184
2192
.
23.
Hu
,
Q.
,
Chen
,
R.
,
Yang
,
H.
, and
Kumara
,
S.
,
2020
, “
Privacy-Preserving Data Mining for Smart Manufacturing
,”
Smart Sustain. Manuf. Syst.
,
4
(
2
).
24.
Abadi
,
M.
,
Chu
,
A.
,
Goodfellow
,
I.
,
McMahan
,
H. B.
,
Mironov
,
I.
,
Talwar
,
K.
, and
Zhang
,
L.
,
2016
, “
Deep Learning With Differential Privacy
,”
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
,
Vienna, Austria
,
Oct. 24–28
, pp.
308
318
.
25.
Arachchige
,
P. C. M.
,
Bertok
,
P.
,
Khalil
,
I.
,
Liu
,
D.
,
Camtepe
,
S.
, and
Atiquzzaman
,
M.
,
2019
, “
Local Differential Privacy for Deep Learning
,”
IEEE Internet Things J.
,
7
(
7
), pp.
5827
5842
.
26.
Wang
,
Y.
,
Gu
,
M.
,
Ma
,
J.
, and
Jin
,
Q.
,
2019
, “
DNN-DP: Differential Privacy Enabled Deep Neural Network Learning Framework for Sensitive Crowdsourcing Data
,”
IEEE Trans. Comput. Social Syst.
,
7
(
1
), pp.
215
224
.
27.
Kang
,
Y.
,
Liu
,
Y.
,
Niu
,
B.
,
Tong
,
X.
,
Zhang
,
L.
, and
Wang
,
W.
,
2020
, “
Input Perturbation: A New Paradigm Between Central and Local Differential Privacy
,”
preprint arXiv:2002.08570
.
28.
Nori
,
H.
,
Caruana
,
R.
,
Bu
,
Z.
,
Shen
,
J. H.
, and
Kulkarni
,
J.
,
2021
, “
Accuracy, Interpretability, and Differential Privacy Via Explainable Boosting
,”
International Conference on Machine Learning
,
Virtual
,
July 18–24
,
PMLR
, pp.
8227
8237
.
29.
Li
,
X.
,
Yan
,
H.
,
Cheng
,
Z.
,
Sun
,
W.
, and
Li
,
H.
,
2022
, “
Protecting Regression Models With Personalized Local Differential Privacy
,”
IEEE Trans. Dependable Secure Comput.
,
20
(
2
), pp.
960
974
.
30.
Jarin
,
I.
, and
Eshete
,
B.
,
2022
, “
Dp-util: Comprehensive Utility Analysis of Differential Privacy in Machine Learning
,”
Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy
,
Baltimore, MD
,
Apr. 25–27
, pp.
41
52
.
31.
He
,
Z.
,
Zhang
,
T.
, and
Lee
,
R. B.
,
2019
, “
Model Inversion Attacks Against Collaborative Inference
,”
Proceedings of the 35th Annual Computer Security Applications Conference
,
San Juan, Puerto Rico
,
Sept. 9–13
, pp.
148
162
.
32.
Srivastava
,
N.
,
Hinton
,
G.
,
Krizhevsky
,
A.
,
Sutskever
,
I.
, and
Salakhutdinov
,
R.
,
2014
, “
Dropout: A Simple Way to Prevent Neural Networks From Overfitting
,”
J. Machine Learning Res.
,
15
(
1
), pp.
1929
1958
.
33.
Wang
,
Q.
, and
Yang
,
H.
,
2020
, “
Sensor-Based Recurrence Analysis of Energy Efficiency in Machining Processes
,”
IEEE Access
,
8
, pp.
18326
18336
.