## Abstract

The rapid advance in sensing technology has expedited data-driven innovation in manufacturing by enabling the collection of large amounts of data from factories. Big data provides an unprecedented opportunity for smart decision-making in the manufacturing process. However, big data also attracts cyberattacks and makes manufacturing systems vulnerable due to the inherent value of sensitive information. The increasing integration of artificial intelligence (AI) within smart factories also exposes manufacturing equipment susceptible to cyber threats, posing a critical risk to the integrity of smart manufacturing systems. Cyberattacks targeting manufacturing data can result in considerable financial losses and severe business disruption. Therefore, there is an urgent need to develop AI models that incorporate privacy-preserving methods to protect sensitive information implicit in the models against model inversion attacks. Hence, this paper presents the development of a new approach called mosaic neuron perturbation (MNP) to preserve latent information in the framework of the AI model, ensuring differential privacy requirements while mitigating the risk of model inversion attacks. MNP is flexible to implement into AI models, balancing the trade-off between model performance and robustness against cyberattacks while being highly scalable for large-scale computing. Experimental results, based on real-world manufacturing data collected from the computer numerical control (CNC) turning process, demonstrate that the proposed method significantly improves the ability to prevent inversion attacks while maintaining high prediction performance. The MNP method shows strong potential for making manufacturing systems both smart and secure by addressing the risk of data breaches while preserving the quality of AI models.

## 1 Introduction

Rapid advances in sensing technology have enabled the collection of vast amounts of data from manufacturing operations, which has expedited big-data-driven innovations in manufacturing. By analyzing big data, valuable insights can be generated, leading to the development of manufacturing artificial intelligence (AI) systems [1]. These AI systems can remarkably enhance various decision-making processes in factories, ultimately raising the smartness level of manufacturing systems. To enhance smart manufacturing systems, cloud computing environments provide connections between these AI systems to enable efficient training in a distributed way. However, due to the value of information, the use of big data and the interconnectivity of AI models pose significant risks of being targeted by cyberattacks and potentially leading to the breach of sensitive information, including parameters related to manufacturing processes or products.

Despite the enhanced decision-making capabilities offered by AI-related models, they are vulnerable to exploitation in the absence of adequate privacy-preserving methods. Adversaries can manipulate model responses to reach latent information that data owners would not wish to disclose voluntarily. Cyberattacks pose a greater threat to the manufacturing industry than any other industries. According to a recent report, manufacturing has accounted for 23.2% of cyberattacks in 2022, overtaking the finance and insurance fields [2]. Among cyberattacks against manufacturing industries, ransomware ranked first, and data theft was the third most common. The failure to protect against these attacks can result in unprecedented disruptions, leading to significant business losses. The average total cost of a data breach in all industries is more than $4.35 million. Small and medium-sized manufacturers (SMMs) are more vulnerable to exploitation. In 2017, 61% of small businesses experienced cyberattacks, and the median cost of a data breach was over $60,000 [3].

Cyberthreats to manufacturing analytics can be categorized into four types: membership inference, property inference, model extraction, and reconstruction [4]. Membership inference attacks determine whether an input is used as part of the training set. Property inference attacks aim to extract latent properties that are not explicitly expressed in the dataset. Model extraction attacks are cyberattacks in which an adversary attempts to reconstruct an alternative model with limited information about the target model. Reconstruction attacks are known as attribute inference attacks or model inversion attacks. An adversary infers sensitive features or datasets based on their knowledge of some features and the target model.

To address this issue, various privacy-preserving techniques, including cryptography, anonymization, and differential privacy, have been studied. While cryptographic countermeasures are effective in ensuring data confidentiality and integrity through encryption algorithms, they can also impede data processing due to their high computational requirements [5]. Anonymization, which masks sensitive attributes before leveraging the dataset, is susceptible to re-identification attacks, where adversaries can infer hidden attributes by linking external knowledge to a publicly shared dataset [6]. Differential privacy, on the other hand, is a technique that adds random perturbation to the data, making it challenging to extract individual information from the dataset while still providing useful information for efficient analysis [7]. Differential privacy allows manufacturers to ensure the privacy of their sensitive business-related information while leveraging the benefits of data analytics.

Differential privacy provides a solution to the privacy leakage issue in manufacturing data by guaranteeing that an individual’s participation in a dataset is not disclosed. However, concerns about sensitive information breaches in manufacturing are not limited to the participation of individuals. Information obtained from predictive models can also be exploited through model inversion attacks, where the adversary manipulates released predictive models and background knowledge to infer sensitive information that the data owner does not want to share with others [8]. Unprotected AI models can unintentionally reveal business-related sensitive information, further increasing the risk of exploitation. Therefore, it is imperative to develop privacy-preserving algorithms incorporated into predictive models to resist the risk of model inversion attacks while maintaining the utility of the models.

This paper presents the development of novel privacy-preserving techniques for manufacturing predictive analytics that fully leverage the smartness of AI models while preserving the latent information in the AI models against model inversion attacks to mitigate the risk of privacy breaches. First, we develop a mosaic neuron perturbation (MNP) algorithm capable of preserving sensitive information, perturbing input neurons with designed noise to ensure differential privacy. The MNP algorithm injects more perturbation to neurons learning sensitive attributes than nonsensitive features, ultimately allowing the protection of sensitive information. Second, we extend the MNP technique to a distributed version called a multi-party MNP algorithm to increase robustness and smartness in collaborative learning. Finally, we conduct an experimental study based on real-world manufacturing data collected from the computer numerical control (CNC) turning process to evaluate and compare the performance of the proposed algorithm in terms of AI model effectiveness, robustness to model inversion, and computational efficiency. AI models integrated with privacy-preserving techniques show strong promise in allowing manufacturers to leverage the smartness of AI systems to make informed decisions with confidence while mitigating the risk of sensitive data breaches.

The remainder of this paper is organized as follows: Sec. 2 introduces the research background of privacy-preserving and differential privacy for predictive analytics; Sec. 3 provides further details on differential privacy and model inversion attacks to build the conceptual foundation; Sec. 4 details the proposed methodology of MNP algorithms; Sec. 5 provides a design of experiments based on real-world CNC turning data; Sec. 6 evaluates and analyzes experimental results to demonstrate the effectiveness of the MNP algorithm; and Sec. 7 concludes this study.

## 2 Research Background

### 2.1 Distributed Artificial Intelligence in Manufacturing Systems.

The rapid development of Internet of Things technologies has led to an exponential increase in the amount of data collected from manufacturing operations. While this brings an unprecedented opportunity to generate rich information about manufacturing systems, its value exposes manufacturers to more risk of cyberattacks on manufacturing data [1]. SMMs leverage the benefits of collaborative networked AI models through cloud computing for a service-oriented approach by sharing service data, operation decisions, and some information related to their manufacturing systems.

The distributed predictive model furnishes effectiveness and efficiency in learning performance. The tiers of distributed AI paradigm [9] are delineated as follows:

*Level 0: sharing data.*After locally collecting and pre-processing data, each user uploads their data to a cloud. The global AI model is subsequently constructed based on the aggregated data within the cloud.*Level 1: sharing model.*Individual users train local AI models using their own data and share these trained AI models with the cloud. The global model is constructed through an aggregation of these local models within the cloud. The resultant global model is then distributed back to each local user.*Level 2: sharing results.*Each user undertakes the entire process of locally training AI models and subsequently shares the obtained results or outputs with the cloud.

In this study, we develop a privacy-preserving method and multi-party distributed learning for level 1. The framework of a distributed predictive model is illustrated in Fig. 1. Within this architecture, each local data owner (i.e., manufacturer) configures an independent local predictive model based on their own dataset. These local models maintain identical structures. Subsequently, the local data owner uploads the parameters of their local model to the cloud. The cloud consolidates these parameters to build a global model mirroring the structure of the local models. Once the global model construction is complete, the cloud distributes this global model to local data owners. By operating within this framework, manufacturers can leverage highly accurate predictive models without sharing their own data.

However, cloud environments can be compromised, potentially exposing information to unauthorized parties [10]. While the number of reported cyberattacks on manufacturers remains relatively small, this is because many are not aware that they are being attacked or fail to associate system failures with the possible cyberattacks [5]. In smart manufacturing, data often include sensitive information about operations and customers, so privacy-preserving methods for manufacturing analytics are essential. Predictive models that do not prioritize privacy protection are vulnerable to exploitation, as adversaries can manipulate model responses to access latent information that data owners would not wish to voluntarily disclose. Therefore, there is an urgent need to develop privacy-preserving methods for manufacturing analytics.

### 2.2 Privacy-Preserving Techniques.

To address privacy issues, privacy-preserving methods, such as cryptography, anonymization, and differential privacy, have been developed. Cryptographic approaches are implemented to secure identity and access to the system, effectively ensuring data confidentiality and integrity [5,11]. However, these methods require high computation power, which can hinder data processing. Data anonymization techniques that remove sensitive information before query evaluation have also been proposed [12]. While anonymization techniques accurately work with high-dimensional data and limit disclosure risks, they have some disadvantages. For example, the original data are lost by removing sensitive information [13]. Additionally, anonymization techniques may not ensure complete privacy-preserving, as the increase of dataset attributes can lead to re-identification. For example, the anonymized data released by Netflix for tech challenges were processed to identify users by matching their Netflix reviews with data from other sites like IMDb and to disclose individuals’ viewing histories [6]. Additionally, combining patient-level health data from Washington state with information from state news articles revealed individual patient privacy, even though the data contained only zip codes and no patient name or address. [14]. To address these issues, Dwork introduced differential privacy, which protects data by adding noise to algorithms [7,15]. Under differentially private models, the neighboring databases differing by one record cannot be identified by the same algorithm.

However, individual participation in the dataset is not the only concern of privacy-preserving. AI models are under threat of being exploited for a target individual’s sensitive information in the presence of available auxiliary information. These types of threats are called model inversion attacks. Predictive models developed from big data can be exploited to infer sensitive features that data owners do not want to expose [8]. Predictive models contain information about the correlated attributes of the training data, which can be exposed by model inversion attacks [16]. While conventional research in manufacturing informatics tends to focus on improving effectiveness, the primary goal of learning AI models should be to achieve the best prediction or accuracy while ensuring the security and privacy of sensitive data. Therefore, it is necessary to develop privacy-preserving algorithms integrated into AI models that prevent model inversion attacks while maintaining prediction performance.

Differential privacy was developed to mitigate the risk of models being exploited, and several studies have investigated the application of differential privacy to protect sensitive data while maintaining accuracy of predictive models. For example, Chaudhuri et al. [17] developed a logistic regression model incorporating differential privacy with two types of perturbation to a parametric learning process. The output perturbation added noise to the model’s output regression coefficients, and the other approach perturbed the model’s objective function to train the coefficients. Zhang et al. [18] proposed a functional mechanism of linear regression and logistic regression analysis under differential privacy. For learning prediction models, this study perturbed objective functions for coefficient training. Training methods such as back-propagation update by gradient descent techniques. Song et al. [19] first introduced a gradient perturbation technique adding noise to the gradient when the algorithms were trained by stochastic gradient descent algorithms. This research facilitated the development of differentially private machine learning models. In the healthcare industry, a linear regression model with differential privacy was developed to prevent model inversion attacks while maintaining model accuracy by adding noise to the coefficients [20]. Krall et al. [21,22] developed differential privacy algorithms for logistic regression by applying different levels of perturbation on the gradient based on the sensitivity of features. The proposed algorithms of their study effectively prevented model inversion attacks on sensitive features more accurately. Hu et al. [23] developed a regression model with differential privacy and evaluated the model with real manufacturing data by optimizing different perturbation mechanisms. The studies concentrated on linear or logistic regression models, but big data requires complex machine learning models such as neural networks.

Early studies of differential privacy in neural networks have been widely conducted in model training with image datasets, but recent research has explored various perturbation techniques in both the input and weight layers. Abadi et al. [24] improved the computational efficiency of differential privacy training models with non-convex objectives through the privacy accounting method. Arachchige et al. [25] redesigned the learning process with a local differentially private algorithm. Their study suggested adding a randomization layer between convolution and fully connected layers to perturb weights. Another perturbation technique has also been proposed that adds noise to the input. With the US census dataset, Wang et al. [26] presented a neural network model that estimated the importance of features, adaptively adding noise into the input data based on the importance. Kang et al. [27] explored input perturbation for empirical risk minimization. This study presented the experimental results of linear regression and multi-layer perceptron models with the KDD archive dataset. Nori et al. [28] developed a differential privacy mechanism added to explainable boosting machines. Their purpose was to achieve both high accuracy and interpretability while securing differential privacy. This method injected Gaussian noise in the residual summation step. Meanwhile, Li et al. [29] proposed personalized local differential privacy by injecting multiple Gaussian variables into the covariance matrix. Their method avoids the risk of model extraction attacks. Differential privacy can provide a robust solution to membership inference attacks. Jarin and Eshete [30] have provided a comprehensive study of differential privacy in all perturbation cases including input, objective, gradient, output, and prediction. Also, they established a framework to conduct a comprehensive privacy-utility trade-off analysis.

However, although these studies on differential privacy in neural networks have shown promise in enhancing prediction accuracy, very little has been done to assess the robustness of these methods against model inversion attacks. The fundamental goal of differential privacy is to strike a balance between preserving privacy and prediction power. Therefore, it becomes crucial to evaluate the robustness of these approaches against white-box inversion attacks. In this study, we conduct a comprehensive evaluation of the proposed MNP algorithms, considering both prediction accuracy and robustness against white-box inversion attacks proposed in Ref. [31].

## 3 Differential Privacy

In predictive modeling, a dataset *D* comprises *n* tuples with **x**_{i} and a response variable *y*_{i}. Each tuple **x**_{i} has *p* features, denoted as **x**_{i} = (*x*_{i1}, …, *x*_{ip}). The goal of predictive modeling is to approximate the predictive analytic function $f:X\u2192y$, where $X$ is the domain of input feature vectors, assumed to satisfy the condition |**x**_{i}|_{2} ≤ 1. Predictive models can be categorized into regression and classification based on the types of the response variable *y*. For example, classification models are suitable for discrete response variables such as *y* = 0 or 1, while regression models are more appropriate for continuous response variables, where $y\u2208R$. The choice of predictive models depends on the nature of problems and features of datasets. In this study, we aim to develop differentially private algorithms for neural network models to solve regression problems.

### 3.1 Conceptual Foundation.

Under differential privacy, the inclusion of individual inputs in a dataset does not lead to statistical differences in the algorithm’s output. Therefore, differentially private algorithms ensure that the adjacency dataset is indistinguishable from the original dataset [7].

#### (ε,δ)-differential privacy

*A randomized algorithm*$A$ is $(\epsilon ,\delta )$-

*differentially private if for all sets*$S\u2286Range(A)$

*and for all datasets D and D*′

*differing by at most one row:*

$\epsilon $ and *δ* are the privacy budget and the privacy loss threshold, respectively. $\epsilon $ controls the level of privacy and *δ* provides an upper bound on the probability that the privacy guarantee fails. The visualized premise of the $(\epsilon ,\delta )$-differential privacy is illustrated in Fig. 2.

The Gaussian mechanism achieves $(\epsilon ,\delta )$-differential privacy by adding independently and identically distributed (i.i.d.) Gaussian noise to the output of a function *g*(*D*) that maps a database *D* to a *p*-dimensional vector. This Gaussian noise is zero-mean and has a variance that depends on the desired privacy level and the sensitivity of *g*(*D*).

#### The Gaussian mechanism

*Given any function*$g:D\u2192Rp$

*, the Gaussian mechanism is denoted by the algorithm*$A$

*, defined as*

*Z*

_{1}, …,

*Z*

_{p}are i.i.d. random variables drawn from a Gaussian distribution with mean zero and variance

*σ*

^{2}. The algorithm preserves $(\epsilon ,\delta )$-differential privacy.

#### ℓ2 sensitivity

*The*ℓ

_{2}

*sensitivity of a function*$g:D\u2192Rp$

*is denoted as*

*D*) is a set of all neighboring datasets differing by one row.

*Under the assumption that*$\epsilon \u2208(0,1)$

*is arbitrary, the Gaussian mechanism preserves*$(\epsilon ,\delta )$

*-differential privacy, where*

**Input:** Predictive model: $f^Pred(Xs;X\u2032,W)$,

nonsensitive inputs: $X\u2032$, queried response: $y^$,

number of epochs: $T$, learning rate: $\eta $

**Output:** Estimated sensitive input values: $Xs*$

1: Let $\tau =0$

2: Define $JATK(Xs)=1n\u2211i=1n\u2113ATK(Xs)$

3: Initialize $Xs(0)\u2190randomvalues$

4: **while**$\tau <T$

5: $Xs(\tau +1)=Xs(\tau )\u2212\eta \u2207JATK(Xs(\tau ))$

6: Set $\tau =\tau +1$

7: **end while**

13: Let $Xs*=Xs(\tau )$

### 3.2 White-Box Inversion Attack.

Differential privacy is a powerful tool for protecting sensitive information in statistical databases. However, it is not the only concern for privacy, as predictive models can also be exploited through model inversion attacks [8]. As illustrated in Fig. 3, an adversary manipulates their knowledge of the model’s structure and other auxiliary information to reconstruct one or more training samples, including sensitive features.

Despite the complexity inherent in neural network models, their susceptibility to model inversion attacks persists when adversaries gain access to information concerning the model’s structure, including weight parameters, hyper-parameters, and nonsensitive variables. In this study, we assume that an adversary participates as a local data owner contributing to the distributed learning process. Armed with the knowledge of the global model, the adversary can construct an inversion attack model. The white-box model inversion attack is designed to reconstruct sensitive input values based on published information. The outlined process for this approach is presented in Algorithm 1. The effectiveness of inversion attacks is typically evaluated through the loss function of the target predictive model.

The inputs of Algorithm 1 include the predictive model $f^Pred$, nonsensitive inputs ** X**′, the queried response $y^$, the number of epochs

*T*, and a learning rate

*η*. The predictive model is a converted form of a trained target neural network model $y^=f^(X)$, denoted as $y^=f^Pred(Xs;X\u2032,W)+\epsilon $, where the input is a sensitive variable

*X*

_{s}and $\epsilon $ is random noise. The algorithm begins by setting the iteration counter

*τ*to 0 and defining the cost function for the model inversion attack,

*J*

_{ATK}, where ℓ

_{ATK}(

*X*

_{s}) is a loss function. The initial sensitive input values of $Xs(0)$ are randomly generated. In each epoch

*τ*, the sensitive input values are updated by the gradient $\u2207JATK$. Once $Xs(\tau )$ is updated,

*τ*is incremented by one. This process continues until

*τ*reaches

*T*.

## 4 Privacy-Preserving Analytics

To ensure the preservation of sensitive features from model inversion attacks in manufacturing analytics, we develop privacy-preserving methods that mitigate risks from model inversion attacks. As illustrated in Fig. 4, the proposed method effectively protects sensitive features from white-box model inversion attacks. The adversaries in this type of attack have access to nonsensitive feature information (** X**′) as well as the structural information of the targeted predictive model, including weights (

**), activation functions (**

*W***), and loss function (**

*H**J*(

**)). In this study, the proposed algorithm is integrated into neural network model learning.**

*W***x**is an input vector,

**W**

^{0}is a weight matrix that corresponds to the input layer, and

**W**

^{l}for

*l*= 1, 2 represents the weights for the

*l*th hidden layer. Furthermore, hidden activation functions

**h**

^{l}for each hidden layer

*l*are employed to compute the output of the corresponding hidden layer.

**that minimize prediction errors on the given training data. This is achieved by minimizing the cost function denoted as**

*W***x**

_{i},

*y*

_{i}).

*** is obtained by a gradient descent method that iteratively updates the weights. Specifically, at iteration**

*W**τ*, the weights are updated as

*W*^{(τ)}is the set of weights at iteration

*τ*,

*η*is the learning rate that controls the step size of the weight update, and $\u2207W$ denotes the gradient of the cost function.

Differential privacy in predictive models is ensured by adding noise during model training. Specifically, this perturbation can be injected into the model’s optimal coefficients *W**, the objective function *J*, or the gradient of the objective function $\u2207J$. The proposed neuron perturbation algorithm perturbs the objective’s gradient by multiplying Bernoulli noise in gradient descent methods.

### 4.1 Neuron Perturbation.

Differential privacy is a technique that preserves sensitive data in predictive models by adding controlled amounts of noise during the training process. This approach aims to balance privacy preservation and prediction power. To achieve both, we propose a perturbation technique inspired by dropout regularization, a way to prevent over-fitting in machine learning models [32]. The proposed method, called neuron perturbation, perturbs the weight coefficients of the input neurons (**W**^{0}) by multiplying them with Bernoulli random variables (** ξ**) during each training iteration. This mechanism effectively adds noise to the model, making it more resistant to model inversion attacks.

#### Predictive modeling with neuron perturbation

**Input:** Dataset: $D$ with features $X$ and response $y$, learning rate: $\eta $,

perturbation probability: $pperturb$, number of epochs: $K$, batch size: $nb$

**Output:** Approximate set of weights $W*$

1: Initialize $W(0)\u2190randomvariables,\tau =0,\kappa =1$

2: Split $D$ into a set of batches $B$, each of size $nb$

3: **while**$\kappa <K$

5: **for** each $b=1,\u2026,|B|$**do**

7: Set $\tau =\tau +1$

8: **end for**

9: Set $\kappa =\kappa +1$

10: **end while**

11: Let $W*=W(\tau )$

Algorithm 2 outlines steps of the neuron perturbation approach. It takes input parameters including learning rate *η*, perturbation probability *p*_{perturb}, number of epochs *K*, and batch size *n*_{b}. To begin, the algorithm randomly generates initial weight values of *W*^{(0)}, sets the iteration counter *τ* to 0, and initializes the epoch counter *κ* to 1. The dataset *D* is split into batches of size *n*_{b}, and at each epoch *κ*, the algorithm iterates over all batches. During each epoch, the algorithm generates a Bernoulli random vector *ξ*^{(κ)}, where $\xi jk(\kappa )\u223cBernoulli(1\u2212pperturb)$ for *j* = 1, …, *p* and *k* = 1, …, |*h*^{1}|. The algorithm multiplies the input variable by this vector and updates weights based on the gradient $\u2207WJ$ computed with the perturbed variable. After each weight update, the value of *τ* is incremented by one. After the algorithm updates the weights according to every mini-batch, the *κ* value is incremented by one. This process continues iteratively until the number of epochs *κ* reaches *K*.

*p*

_{perturb}, the predictive model integrated with neuron perturbation can be denoted as

**Θ**) by multiplying a factor of 1/(1 −

*p*

_{perturb}) and predict responses with scaled-down weights (

**W**). The transformed input weights are defined as

^{0}**Θ**≜(1 −

*p*

_{perturb})

**W**

^{0}. Then, the weight update of the transformed input weights in each iteration can be denoted as

**are denoted by**

*ξ′**ξ*′

_{jk}and follow a distribution

*ξ*′

_{jk}∼ 1/(1 −

*p*

_{perturb}) · Bernoulli(1 −

*p*

_{perturb}) for all

*j*= 1, …,

*p*and

*k*= 1, …, |

*h*

^{1}|.

*ξ*′

_{jk}can be approximated by a Gaussian distribution. Thus,

*ξ*′

_{jk}is transformed into a Gaussian random variable $zjk\u2032\u223cN(1,pperturb/(1\u2212pperturb))$. Then, Eq. (2) can be changed to

**z′**are

*z*′

_{jk}.

**z**follow $zjk\u223cN(0,(pperturb/$$(1\u2212pperturb))\u22c5(\theta jk(\tau ))2)$ for all

*j*= 1, …,

*p*and

*k*= 1, …, |

*h*

^{1}|. $\theta jk(\tau )$ is an element of

**Θ**. As a result, the Bernoulli multiplicative noise term

**is replaced by the Gaussian additive noise term**

*ξ***z**. Equation (4) is the form of the Gaussian mechanism for the weight update algorithm when training neural network models with neuron perturbation.

To establish that the process of updating scaled input weights (**Θ**) in Algorithm 2 satisfies Theorem 1, a proof is required. The proof shown in Theorem 2 is supported by Lemma 1 and Corollary 1. For all proofs, we make the assumption that $g(D)=\u2212\eta \u2207\Theta J(\Theta ;D)$, where *D* and *D*′ are two neighboring datasets that differ by one row.

*The global sensitivity of the gradient descent weight update is at most*$(2\eta /n)maxd\Vert \u2207\Theta \u2113(\Theta (\tau );d)\Vert 2$.

*d*and

*d*′ are arbitrary tuples in datasets

*D*and

*D*′, respectively.

*If*$\Vert \u2207\Theta \u2113\Vert 2\u22641$*, the global sensitivity of gradient descent weight updates is at most 2η/n.*

*Neuron perturbation in Algorithm 2 preserves (*$\epsilon ,\delta $*)-differential privacy under the assumption that c ^{2} > 2ln(1.25/δ).*

*τ*with neuron perturbation is defined in Eq. (4). Specifically, the algorithm for a dataset

*D*can be expressed as follows:

**z**

^{(τ)}follow a normal distribution, with $zjk(\tau )\u223cN(0,(pperturb/(1\u2212pperturb))\u22c5(\theta jk(\tau ))2)$ for all

*j*= 1, …,

*p*and

*k*= 1, …, |

*h*

^{1}|. Then the variance of

*z*, which represent the elements of

**z**

^{(τ)}, is denoted as

*θ*is an arbitrary value representing $\theta jk(\tau )$, and the inequality follows from Theorem 3.22 and Theorem A.1 in Ref. [7]. Because ‖

*θ*‖

_{2}≤ 1, Eq. (5) is transformed to

*g*in the presence of

**v**, where

**v**is the vector that results from difference between datasets

*D*and

*D*′, and ‖

**v**‖

_{2}≤ Δ

_{2}. According to the theorems in Ref. [7], the relationship between

*g*(

*D*) and

*g*(

*D*′) is denoted as

*S*is a range of

*g*, and

*S*

_{1}and

*S*

_{2}are partitioned sets defined as

*The following equation holds:*

### 4.2 Mosaic Neuron Perturbation.

**Input:** Dataset: $D$ with features $X$ and response $y$, set of feature labels: $\Phi $, learning rate: $\eta $, perturbation probability: $pperturb$, epoch number: $K$, batch size: $nb$, sensitive ratio: $\gamma $, fraction of sensitive features: $\psi S$

**Output:** Approximate set of weights $W*$

1: Initialize $W(0)\u2190randomvariables,\tau =0,\kappa =1$, $\psi N=1\u2212\psi S$

3: Split $D$ into a set of batches $B$, each of size $nb$

4: **while**$\kappa <K$

6: **for** each $b=1,\u2026,|B|$**do**

8: Set $\tau =\tau +1$

9: **end for**

10: Set $\kappa =\kappa +1$

11: **end while**

12: Let $W*=W(\tau )$

The neuron perturbation mechanism effectively balances the trade-off between prediction power and privacy preservation against model inversion attacks. However, this mechanism treats all features equally and does not differentiate between sensitive and nonsensitive attributes. This can be problematic because sensitive features may carry more privacy risks than nonsensitive features. Furthermore, features highly correlated with the responses may face more vulnerability to privacy leakage.

Therefore, a more sophisticated approach is required to enhance the privacy of sensitive features. We propose a MNP algorithm that addresses this issue by weakening the correlation between sensitive attributes and responses, as outlined in Algorithm 3. The algorithm introduces different levels of perturbation with corresponding perturbation probabilities (*p*_{S} for sensitive and *p*_{N} for nonsensitive features) to inject more noise into the sensitive features. Specifically, the weight gradient update corresponding to sensitive features is more heavily perturbed to diminish the correlation. The relationship between *p*_{S} and *p*_{N} is determined by the sensitive ratio parameter, which is defined as *γ* = ((1 − *p*_{S})/*p*_{S})/((1 − *p*_{N})/*p*_{N}), and 0 ≤ *γ* ≤ 1. A smaller *γ* indicates that more perturbation is injected into sensitive attributes.

In the MNP algorithm, features are labeled by a set Φ and partitioned into sets Φ_{S} and Φ_{N} such that $\Phi =\Phi S\u222a\Phi N$ and $\Phi S\u2229\Phi N=\u2205$. This partitioning allows the algorithm to determine the contributions of sensitive and nonsensitive features to the perturbation probability (*p*_{perturb}). The separated perturbation probabilities *p*_{S} and *p*_{N} are applied to their corresponding partitioned attributes, with the sensitive features being perturbed more intensively to weaken their correlation with the response. The contributions of sensitive and nonsensitive features to *p*_{perturb} are expressed as *ψ*_{S} and *ψ*_{N}, respectively.

Algorithm 3 takes input parameters including learning rate *η*, perturbation probability *p*_{perturb}, number of epochs *K*, and batch size *n*_{b}. It then generates initial weight values of *W*^{(0)}, initializes the iteration counter *τ* to 0, and sets the epoch counter *κ* to 1. To prepare the dataset *D* for training, the algorithm splits it into batches of size *n*_{b}. During each epoch *κ*, the algorithm iterates over all batches, generating a Bernoulli random vector *ξ*^{(κ)}. The vector is constructed such that $\xi jk(\kappa )\u223cBernoulli(1\u2212pS)$ for *j* ∈ Φ_{S} and $\xi jk(\kappa )\u223cBernoulli(1\u2212pN)$ for *j* ∈ Φ_{N} and *k* = 1, …, |*h*^{1}|. For a given batch *b*, the Bernoulli vector is multiplied with corresponding features, and the weights are updated based on the gradient $\u2207WJ$ computed with the perturbed features. Each time weights are updated, the *τ* value is incremented by one. When all mini-batches *b* have been used to update the weights, the value of *κ* is incremented by one. This iterative process ends when the *κ* value reaches *K*.

*MNP in Algorithm 3 preserves (*$\epsilon ,\delta $*)-differential privacy.*

*g*in the presence of an arbitrary

**v**differing datasets

*D*and

*D*′ where ‖

**v**‖

_{2}≤ Δ

_{2},

*S*is a range of

*g*, and

*S*

_{1}and

*S*

_{2}are defined in Eq. (8). Then, $Pr{g(D)+v\u2208S1}$ is denoted as

*j*th variable.

*Parameters for fractions of sensitive and nonsensitive features, ψ _{S} and ψ_{N}, are at most*$maxd(\Vert \u2207\Theta \u2113S\Vert 22/\Vert \u2207\Theta \u2113\Vert 22)$

*and*$maxd(\Vert \u2207\Theta \u2113N\Vert 22/\Vert \u2207\Theta \u2113\Vert 22)$

*, respectively.*

*ψ*

_{S}and

*ψ*

_{N}are defined as fractional contributions to the

*L*

^{2}norm of the loss gradients for sensitive and nonsensitive features, respectively.

*ψ*

_{N}can be denoted as

*If*$\Vert \u2207\Theta \u2113\Vert 22\u2264\theta 2$*, then ψ _{S} and ψ_{N} are freely configured arbitrarily, subject to the constraint ψ_{S} + ψ_{N} = 1.*

*ψ*

_{S}+

*ψ*

_{N}= 1 and the condition $\Vert \u2207\Theta \u2113\Vert 22\u2264\theta 2$ can be denoted as

*d*is arbitrarily chosen so the selection of a maximal loss gradient varies. The choice of

*d*impacts $\Vert \u2207\Theta \u2113S\Vert 22$ and $\Vert \u2207\Theta \u2113N\Vert 22$. The magnitude of contribution $\Vert \u2207\Theta \u2113S\Vert 22$ and $\Vert \u2207\Theta \u2113N\Vert 22$ to $\Vert \u2207\Theta \u2113\Vert 22$ is different under the constraints that $\Vert \u2207\Theta \u2113S\Vert 22\u2264\theta 2$ and $\Vert \u2207\Theta \u2113N\Vert 22\u2264\theta 2$.

### 4.3 Multi-party Mosaic Neuron Perturbation.

Cloud computing has become an integral part of smart factory operations, facilitating the interconnection of intelligent manufacturing things and enabling distributed and collaborative learning for informed decision-making. In particular, distributed predictive models significantly enhance prediction accuracy and computational efficiency when processing large amounts of data. However, such models are vulnerable to model inversion attacks, which happen when an adversary is able to infer the dataset of a single participant, allowing them to penetrate the information of all participants contributing to the distributed model.

To address this challenge, we propose the development of a distributed learning version of MNP, called the multi-party MNP algorithm. The framework of distributed learning is depicted in Fig. 1. In this framework, each local data owner configures a local predictive model based on their own dataset. The local model parameters are uploaded to the cloud, where the global model is structured. Then, the global model aggregates and updates its parameters accordingly. Once the aggregation is complete, the global model distributes the updated parameters, which can then be leveraged by local users to achieve high prediction performance.

#### Predictive modeling with multi-party MNP

**Input:** The number of local models: $nparty$,

local dataset: $Dm$ for $m=1,\u2026,nparty$, set of feature labels: $\Phi $,

learning rate: $\eta $, perturbation probability: $pperturb$, epoch number: $K$,

batch size: $nb$, sensitive ratio: $\gamma $, fraction of sensitive features: $\psi S$

**Output:** Approximate set of weights of the global model $WG*$

1: Process each local model $fm$ to independent processors

2: **for all**$m=1,\u2026,nparty$**do**

3: Initialize $Wm(0),\tau =0,\kappa =1$, $\psi N=1\u2212\psi S$

5: Split $D$ into a set of batches $B$, each of size $nb$

6: **while**$\kappa <K$

8: **for** each $b=1,\u2026,|B|$**do**

10: Set $\tau =\tau +1$

11: **end for**

12: Set $\kappa =\kappa +1$

13: **end while**

14: Let $Wm*=W(\tau )$

15: **end for**

16: $WG*=1nparty\u2211mWm*$

Protecting both the global and local models is essential in collaborative learning systems. To achieve this, the multi-party MNP algorithm introduces perturbation into local model learning. This approach helps effectively prevent access to sensitive information by adversaries outside the collaborative learning system as well as inside the system. Moreover, the distributed learning model offers higher computational efficiency compared to the serial learning model, making it a viable option for processing large datasets. The proposed method is outlined in detail in Algorithm 4.

The implementation of Algorithm 4 involves several input parameters, including learning rate *η*, perturbation probability *p*_{perturb}, number of epochs *K*, batch size *n*_{b}, and number of participants *n*_{party}. The global model learns its global weights *W*_{G} by aggregating local weights *W*_{m} of a local prediction model *f*_{m}, where *m* = 1, …, *n*_{party}. The procedure outlined in Algorithm 3 is independently applied to train each local model. Overall, the multi-party MNP algorithm ensures differential privacy, effectively preserving sensitive information.

## 5 Experimental Design

### 5.1 Design of Experiments.

The proposed MNP and multi-party MNP algorithms are evaluated in this study through experiments based on real-world CNC turning data. These data were collected through sensors in a machine shop to develop monitoring solutions for energy usage [33]. The dataset contains information regarding 1973 workpieces made from 22 different materials. During operation, workpiece profiles were recorded, and workstation descriptions and machining parameters were collected through sensors. Detailed information about features is explained in Table 1. The response for this dataset is the power consumption of seven lathe machines.

Group | Feature | Explanation |
---|---|---|

Workpiece profile | Level of refined processing | Rough, half-refined, thoroughly refined |

Workpiece diameter | Real number (mm) | |

Workpiece material | 45Fe, 40Cr, 60Si2Mn, HT300, 1Cr13, HT200, 45Fe(T235), 35Fe, 0Cr18Ni9Ti, Q235,40Cr(HRC 48), 40Cr(T235), 45(T235), 1Gr13, 20CrMnTi, 2Cr13, 20CrMnTi(HB170), 6061Al, 38CrMoALA, T10A, G10, 20Cr | |

Material hardness | Real number (HB) | |

Tensile strength | Real number (MPa) | |

Machining parameter | Machine models | C2-360HK, C2-50HK/1, C2-6150, C2-6150HK/1, CHK360, CHK460, CHK560 |

Angle of blades | Real number (deg) | |

Blade inclination | Real number (deg) | |

Cutting speed | Real number (m/min) | |

Spindle speed | Real number (r/min) | |

Feed rate | Real number (mm/r) | |

Cutting depth | Real number (mm) | |

Workstation description | Air-cutting power (W) | Referring to the total power of the machine tool when cutting (removing materials) according to certain parameters |

Idle power (W) | Referring to the total power of the machine tool when the machine tool spindle rotates at a certain speed, but the workpiece material is not removed |

Group | Feature | Explanation |
---|---|---|

Workpiece profile | Level of refined processing | Rough, half-refined, thoroughly refined |

Workpiece diameter | Real number (mm) | |

Workpiece material | 45Fe, 40Cr, 60Si2Mn, HT300, 1Cr13, HT200, 45Fe(T235), 35Fe, 0Cr18Ni9Ti, Q235,40Cr(HRC 48), 40Cr(T235), 45(T235), 1Gr13, 20CrMnTi, 2Cr13, 20CrMnTi(HB170), 6061Al, 38CrMoALA, T10A, G10, 20Cr | |

Material hardness | Real number (HB) | |

Tensile strength | Real number (MPa) | |

Machining parameter | Machine models | C2-360HK, C2-50HK/1, C2-6150, C2-6150HK/1, CHK360, CHK460, CHK560 |

Angle of blades | Real number (deg) | |

Blade inclination | Real number (deg) | |

Cutting speed | Real number (m/min) | |

Spindle speed | Real number (r/min) | |

Feed rate | Real number (mm/r) | |

Cutting depth | Real number (mm) | |

Workstation description | Air-cutting power (W) | Referring to the total power of the machine tool when cutting (removing materials) according to certain parameters |

Idle power (W) | Referring to the total power of the machine tool when the machine tool spindle rotates at a certain speed, but the workpiece material is not removed |

In this experimental study, air-cutting power, which refers to the total power of the machine tool during material removal based on input parameters, was considered the sensitive feature. The reason for setting this attribute as the sensitive feature is that it contains characteristics of the overall process. The duration of cutting and air-cutting power data can reveal other machining parameters that manufacturers do not want to share. Idle power serves as a feature for the comparative analysis, representing the total power of the tool when its spindle rotates without material removal.

*X*

_{j}) as follows:

The experimental study comprises five phases. First, the baseline model provides a benchmark for prediction accuracy and attack risk. Second, the performance metrics of the neuron perturbation algorithm are evaluated. Third, the proposed MNP algorithm is validated by comparing its performance to that of the baseline model. Also, the results are analyzed based on varied levels of perturbation probability (*p*_{perturb}) and sensitive ratio (*γ*). Furthermore, to assess the effectiveness of MNP, the risk of attack on sensitive and nonsensitive attributes is analyzed. Fourth, to examine the computational efficiency of the multi-party MNP, the model learning speed is investigated. The performance of multi-party distributed models is compared to that of serial models for different-sized training datasets. Last, results of MNP with the case of two sensitive features are presented and analyzed.

### 5.2 Model Configuration.

The experiment involved partitioning the dataset into a training set comprising 1500 inputs and a test set containing 463 inputs. The architecture of the neural network model encompassed two hidden layers with (4, 3) hidden nodes, while the activation function employed for each hidden layer was the sigmoid function. The selected loss function was the sum of squared errors. For each scenario of training prediction models, a total of *K* = 5000 epochs were executed, with a batch size *n*_{b} set at 500, resulting in three training batches. A learning rate of *η* = 0.005 was applied. To ensure the outcome’s robustness, the model training process was repeated 30 times. Additionally, the training dataset was employed to train the white-box attack model, spanning a total of *T* = 50, 000 epochs. The learning rate for the attack model was established at 0.005. Similar to the targeted prediction model, the attack model training was replicated 30 times.

In exploring the computational efficiency of the multi-party MNP algorithm, the number of processors varied between 1 and 16, while the training data were augmented to 12,000 instances by an autoencoder. The autoencoder consisted of a hidden layer with 25 nodes. The performance of this autoencoder was *R*^{2} = 0.991. All experimentation was conducted within python 3.7 on a system furnished with 128GB of RAM and an Intel 10-core Xeon E5-2680 processor operating at 2.2 GHz.

## 6 Experimental Results

Performance metrics of the benchmark model show that it is vulnerable to model inversion attacks with a high attack risk of $RATK2=0.951$, although it accurately predicts responses as $RPred2=0.926$. This result shows that adversaries can infer sensitive information almost perfectly. To address this problem, adding neuron perturbation to predictive model learning mitigates the risk of model inversion attacks. As illustrated in Fig. 5, the attack risk decreases as *p*_{perturb} increases. When *p*_{perturb} = 0.05, the attack risk is 0.209, and prediction accuracy is 0.888. For *p*_{perturb} ≥ 0.1, the attack risk converges to zero, but at the same time, prediction accuracy decreases with an increment of perturbation probability. Therefore, carefully designed noise is required to mitigate the attack risk on sensitive attributes while maintaining prediction power.

### 6.1 Mosaic Neuron Perturbation Results.

Changes in *γ* control perturbation probabilities for the sensitive and nonsensitive features in the MNP algorithm. For example, while *p*_{perturb} remains constant at 0.015, the consequential impact of *γ* is outlined in Table 2. It is notable that *p*_{N} maintains its value while *p*_{S} increases drastically as the *γ* decreases. As the perturbation probability increases, more noise is injected into the learning process to preserve the latent information of sensitive features. Therefore, this relationship presents that the MNP algorithm adds relatively more noise to sensitive attributes while less perturbing nonsensitive features.

γ | p_{S} | p_{N} | γ | p_{S} | p_{N} |
---|---|---|---|---|---|

1.00 | 0.015 | 0.015 | 0.50 | 0.029 | 0.015 |

0.90 | 0.017 | 0.015 | 0.40 | 0.036 | 0.015 |

0.80 | 0.019 | 0.015 | 0.30 | 0.048 | 0.015 |

0.70 | 0.021 | 0.015 | 0.20 | 0.069 | 0.015 |

0.60 | 0.025 | 0.015 | 0.10 | 0.130 | 0.015 |

γ | p_{S} | p_{N} | γ | p_{S} | p_{N} |
---|---|---|---|---|---|

1.00 | 0.015 | 0.015 | 0.50 | 0.029 | 0.015 |

0.90 | 0.017 | 0.015 | 0.40 | 0.036 | 0.015 |

0.80 | 0.019 | 0.015 | 0.30 | 0.048 | 0.015 |

0.70 | 0.021 | 0.015 | 0.20 | 0.069 | 0.015 |

0.60 | 0.025 | 0.015 | 0.10 | 0.130 | 0.015 |

The effect of varying perturbation probability and sensitive ratio on prediction accuracy and attack risk is illustrated in Fig. 6. The results show that increasing *p*_{perturb} or decreasing *γ* reduces the attack risk. In contrast, prediction accuracy is not significantly influenced by these parameters. As shown in Fig. 6(a), prediction accuracy remained around 0.9, regardless of *p*_{perturb} and *γ*. On the other hand, predictive models face more risks of model inversion attacks when *γ* is high, and *p*_{perturb} is low as illustrated in Fig. 6(b). Therefore, it is concluded that a low perturbation probability (*p*_{perturb} ≤ 0.02) combined with a lower sensitive ratio (*γ* ≤ 0.1) can balance high prediction accuracy with robustness against model inversion attacks.

The impact of each parameter on prediction accuracy and attack risk is illustrated in Fig. 7. When *p*_{perturb} is set to 0.015, the prediction model applying MNP guarantees the same level of prediction accuracy as the baseline model regardless of the sensitive ratio as shown in Fig. 7(a). On the other hand, the attack risk on the sensitive feature (air-cutting power) decreases as *γ* decreases. This is because *p*_{S} increases when *γ* is low, as outlined in Table 2. Also, with a fixed value of *γ* = 0.35, the attack risk decreases more rapidly as *p*_{perturb} increases. In Fig. 5 with the neuron perturbation algorithm, the attack risk reaches zero when *p*_{perturb} > 0.1. On the other hand, the MNP algorithm provides higher privacy as depicted in Fig. 7(b). In this case, the attack risk converges to zero when *p*_{perturb} > 0.04. This is because *p*_{S} is 0.105 where *γ* = 0.35. Overall, these results are noteworthy that the MNP algorithm effectively balances prediction accuracy with privacy.

The comparison experimental results are presented to verify the attack mitigation performance of the MNP algorithm on sensitive attributes. In this section, idle power serves as the nonsensitive feature. The attack risk on idle power is shown in Fig. 8(a). This feature has a correlation of 0.461 with the response. The overall risk of this feature is higher than that of sensitive features. Furthermore, the attack risk on this feature decreases as the perturbation probability increases, but it is not meaningfully impacted by the sensitive ratio. This trend is depicted in Fig. 8(b) with a fixed value of *p*_{perturb} = 0.015. As *γ* decreases, the MNP algorithm effectively mitigates the attack risk on the sensitive feature (air-cutting power), while the attack risks on idle power do not significantly depend on *γ*. The attack risk on idle power remains above 0.8 regardless of changes in *γ*.

### 6.2 Multi-Party Mosaic Neuron Perturbation Results.

Distributed learning ensures high effectiveness and computational efficiency. The variation of perturbation probability and sensitive ratio on prediction accuracy and attack risk is illustrated in Fig. 9. *p*_{perturb} and *γ* are parameters of local predictive models, and performance metrics are measured for the global predictive model distributed to the data owners. The use of multi-party models incorporating local models into the MNP increases robustness against model inversion attacks while maintaining prediction accuracy.

As shown in Fig. 9, a lower sensitive ratio reduces the attack risk by providing relatively higher attack prevention for sensitive features. In contrast, the sensitive ratio has little impact on prediction accuracy of the multi-party MNP as displayed in Fig. 9(a). When *p*_{perturb} is fixed, attack risks decrease as the sensitive ratio reduces. In this case, the multi-party MNP algorithm can effectively prevent model inversion attacks when *γ* ≤ 0.1 while maintaining high prediction power above 0.9.

As illustrated in Fig. 9(b), distributed prediction models face higher threats of model inversion attacks. Compared to the single MNP result in Fig. 6, the attack risk on the sensitive feature is higher in the distributed learning case. However, it is remarkable that multi-party MNP algorithms can provide high levels of privacy with low *γ*, for example, less than 0.015 in this case.

In terms of the effectiveness of the multi-party MNP algorithm, the more cores participating in the distributed learning, the faster learning on large datasets, as shown in Fig. 10. Regardless of the data size, the learning time of serial learning models is longer than that of multi-party models. In particular, if there are 16 participants, the learning efficiency is more than 10 times higher than that of serial learning. For example with 12,000 inputs, training a serial model requires 68 s, whereas a model with 16 cores needs 6.067 s. These results strongly support that the proposed multi-party MNP algorithm is computationally efficient and accurate in mitigating attack risk and maintaining prediction power.

### 6.3 Discussion.

The proposed MNP algorithm injects noise during model training, which increases the computational expense while learning a prediction model. The neuron perturbation algorithm multiplies inputs by Bernoulli noise, perturbing it with each iteration of weight updates. This additional process increases the computational cost. Furthermore, the MNP algorithm adds noise in more steps, including separately generating perturbation probabilities concerning the sensitivity of features and injecting Bernoulli noise based on the perturbation probabilities. Training times were collected to investigate the computational complexity of the proposed algorithm. As illustrated in Fig. 11, the training time for each case significantly varies. The median training time for the benchmark case is 10.23 s, whereas the median training times for the neuron perturbation and MNP cases are 11.12 and 11.39 s, respectively. Each case’s 99% confidential interval does not overlap with any other case. This result shows that the proposed algorithm requires more computational resources to train the predictive model.

Another consideration of the proposed model is the balance of perturbation probabilities between sensitive and nonsensitive features. Reducing *γ* increases *p*_{S} and decreases *p*_{N}. For example, extreme reductions in the value of *γ* increase the risk of model inversion attacks by decreasing perturbation on nonsensitive features.

The case study in this section was conducted under the assumption of one sensitive feature. However, the proposed model can perform well when there are multiple sensitive features. The number of sensitive attributes impacts the values of *p*_{S} and *p*_{N}, leading to varying degrees of perturbation. For example, assuming a case study with three sensitive features and, *p*_{perturb} = 0.015 and *γ* = 0.1, *p*_{S} and *p*_{N} are 0.119 and 0.013, respectively. On the other hand, *p*_{S} and *p*_{N} are 0.130 and 0.015, respectively, assuming a single sensitive attribute as described in Table 2. As the number of sensitive attributes increases, both *p*_{S} and *p*_{N} decrease more than the single case. This is due to changes in the contributions of sensitive and nonsensitive features to *p*_{perturb}, denoted by *ψ*_{S} and *ψ*_{N}, respectively.

The sensitivity of an attribute impacts the correlation between the risk of an attack and the sensitive ratio. Assuming that there are two sensitive features (air-cutting power and idle power), the experimental results for idle power are illustrated in Fig. 12. Prediction accuracy shown in Fig. 12(a) with various levels of parameters does not differ from that presented in Fig. 6(a), which is the case that idle power served as a nonsensitive feature. On the other hand, the attack risk shows different results from Fig. 6(b). When *γ* = 0.9, the attack risk is indistinguishable from the previous case. However, the attack risk depends on changes in the *γ* values. Setting the sensitive ratio to 0.1 converges the attack risk to zero. These results show that the proposed MNP algorithm works effectively in the case with multiple sensitive attributes.

## 7 Conclusions

In this paper, we design and develop a novel privacy-preserving algorithm called MNP for manufacturing data analytics that fully leverages the smartness of AI models while effectively reducing the risk of sensitive data leakage due to model inversion attacks. The MNP technique effectively perturbs neural network model training by injecting carefully designed noise, ensuring differential privacy. Additionally, the algorithm can be extended to a distributed version, called the multi-party MNP algorithm, to address the privacy risk of collaborative learning, while providing the benefits of computational efficiency and prediction accuracy. The MNP technique introduces two control parameters, *p*_{perturb} and *γ*, where *p*_{perturb} determines the level of privacy, and *γ* controls the ratio of perturbation to sensitive and nonsensitive attributes, allowing the algorithm to minimize the risk of inversion attacks on the sensitive feature while maintaining the predictive model’s prediction power as high as the model without perturbation. Experimental results with the real-world CNC data showed that the proposed algorithm provides higher robustness to white-box model inversion attacks while maintaining prediction accuracy. By employing this novel and flexible MNP algorithm, manufacturers can fully leverage the smartness of AI to make informed decisions while concurrently enhancing cybersecurity, mitigating the risk of sensitive information leakage.

## Acknowledgment

This material is based on research sponsored by Office of the Under Secretary of Defense for Research and Engineering, Strategic Technology Protection and Exploitation, and Defense Manufacturing Science and Technology Program under agreement number W15QKN-19-3-0003. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The authors attest that all data for this study are included in the paper.