## Abstract

The recent COVID-19 pandemic reveals the vulnerability of global supply chains: the unforeseen supply crunches and unpredictable variability in customer demands lead to catastrophic disruption to production planning and management, causing wild swings in productivity for most manufacturing systems. Therefore, a smart and resilient manufacturing system (S&RMS) is promised to withstand such unexpected perturbations and adjust promptly to mitigate their impacts on the system’s stability. However, modeling the system’s resilience to the impacts of disruptive events has not been fully addressed. We investigate a generalized polynomial chaos (gPC) expansion-based discrete-event dynamic system (DEDS) model to capture uncertainties and irregularly disruptive events for manufacturing systems. The analytic approach allows a real-time optimization for production planning to mitigate the impacts of intermittent disruptive events (e.g., supply shortages) and enhance the system’s resilience. The case study on a hybrid bearing manufacturing workshop suggests that the proposed approach allows a timely intervention in production planning to significantly reduce the downtime (around one-fifth of the downtime compared to the one without controls) while guaranteeing maximum productivity under the system perturbations and uncertainties.

## 1 Introduction

Manufacturing systems are increasingly intertwined through globally expanding supply chains [1]. Diverse activities within manufacturing firms, including design, fabrication, production planning, and sales, heavily rely on the sustainable flow of materials from upstream to downstream of the supply chains. It is reported that the proportion of organizations that experienced setbacks due to unexpected catastrophic events (e.g., natural disasters and transportation failures) reached its highest level at ∼77% in 2020 [2]. Notably, the COVID-19 pandemic relentlessly reveals the vulnerability of manufacturing systems within global supply chains across a variety of sectors and industries: the unforeseen supply crunches, in conjunction with the unpredictable fluctuation in customer demands, lead to catastrophic disruptions to the production planning and management [1,3,4].

The resilience embodies a system’s capability to withstand significant shocks and perturbations, promptly adapt to sudden changes, and recover rapidly before any adverse impacts are materialized [5]. Zhang and Van Luttervelt [6] included such system response in the design of resilient manufacturing systems. They provided guidelines in multiple areas across the manufacturing systems, including system design and management, external impacts on supply operating models, and human factors to enhance the system’s resilience. Following this, the resilience concept has gained momentum in production planning and supply chains, particularly as COVID-19 has inflicted massive perturbations in manufacturing systems globally [1,7]. Indeed, customer demands and upstream supplies have varied dramatically since the onset of COVID-19 [8–10], resulting in delays in manufacturing systems. Such drawbacks cause significant disruptions in manufacturing, which in turn renders the existing production schedules futile. Therefore, prudent strategies and flexible production planning are urgent to ensure a manufacturing system’s endurance amid unexpected disruptive events.

Recent investigations provide analytic approaches for modeling and characterizing system resilience for shop floor levels. Gu et al. [11] modeled the multistage configurable manufacturing systems. They presented the system dynamics using a stationary Markov chain to evaluate the impacts of different control measures on systematic resilience based on specific resilience metrics. Hu et al. [12] introduced a deterministic model for manufacturing systems and provided various criteria for resilience assessment. Abimbola and Khan [13] modeled the system resilience via a dynamic object-oriented Bayesian network, and three capacities that embody the resilience were proposed, i.e., absorption, adaptation, and restoration. These models of resilient manufacturing systems are rooted mainly in deterministic models or with the stationary assumption. However, a real-world system barely resides in a steady-state because activities such as production time, material logistics, and their coupling effects introduce significant variations and non-stationarity in the process. Most existing models, which are overwhelming with the stationary assumption and the deterministic models, cannot deal with the process uncertainties, which may create unexpected machine downtime and lead to significant disturbances in production planning [14].

The growing industrial adoption of the plant floor automation and information systems (PFS) can create mega/tera-bytes of fast steaming sensor data per hour from the shop floor (see Fig. 1). These data are mainly generated from the data acquisition & control systems and most recent industrial internet-of-things (IIoT) enabled sensors and devices [15]. This implementation of PFS systems and IIoT in manufacturing systems offers unprecedented opportunities to track shop floor events (e.g., buffer levels, material, logistics flow, and machine production rates), capture the process variations, and provide real-time system evaluation and control. The production scheduling and shop floor management under the smart system need to accommodate this norm via a framework of adaptive scheduling and control (in real-time) for the dynamics of manufacturing systems [16]. Simulation techniques such as discrete-event simulation and system dynamics have been widely applied in manufacturing system management and planning [17]. The advances in PFS, IIoT, and simulation techniques bring up an opportunity to realize a smart and resilient manufacturing system, which consists of mathematical modeling for evaluating resilience, production planning, and forecasting disruptive events. The proposed smart and resilient manufacturing system (S&RMS) is projected to provide instant decisions on the shop floor scheduling to mitigate impacts from unforeseen disruptive events (and machine breakdowns) and therefore enhance system resilience.

This paper develops a discrete dynamic modeling approach to model the production system. A set of parameters, including work-in-process, processing rates, buffer stocks, upstream and downstream supplies at different workstations, and their coupling effects on the process dynamics, is used to trace the production system dynamics. The generalized polynomial chaos (gPC) is further introduced to capture the process uncertainties during production (i.e., the processing rates, the supply insufficiencies, and the delay of arrivals) due to random disruptive events. An optimization approach is further developed, which provides instant strategic shop floor planning and mitigates impacts from the perturbations [18,19]. The case study shows the presented method’s performance in maintaining the shop floor production from a real-world bearing manufacturing factory. Therefore, the presented model provides optimal solutions to the shop floor scheduling by coordinating all workstations within the shop floor to enhance the system resilience from the adverse impacts of disruptive events and sudden fluctuations (and insufficiencies) in the material flows.

The remainder of the paper is organized as follows: the developed approach that formulates the gPC expansion for process characterizations and optimization is presented in Sec. 2; Sec. 3 details the implementation procedure of the presented method, which carries out on a real-world manufacturing factory to demonstrate its outperformance compared to existing methods tested; some concluding remarks are provided in Sec. 4.

## 2 Methodology

### 2.1 Discrete-Event Dynamic System With Disruptive Events.

The linchpin to realizing S&RMS is to characterize the uncertainties and disruptive events across the manufacturing systems. From the perspective of system dynamics, the uncertainties in manufacturing systems can be formulated as intermittent perturbations. The processes are switching between the static behaviors and the emanating chaotic bursts (due to external shocks or perturbations) [20–22]. The duration of those static phases follows a power/scaling law [23]. Hence, one can conveniently formulate such intermittent perturbations in a discrete manufacturing process/system using a random variable from the exponential family.

The proposed method provides a continuous optimization tool to solve instability issues and enhance resilience when the system undergoes intermittent shortages in the materials flow and the resultant perturbations/disturbances in production planning. Essentially, the presented approach continuously adjusts the processing rate for each machine station to coordinate all the machines on the shop floor to maintain production flow and reduce overall system-level downtimes. The controllable processing time has been considered as an important optimization variable for shop floor scheduling and production planning problems [24]. In real-world manufacturing processes, the processing conditions can be controlled according to the real-time monitored quality of the process, and hence the actual processing time is not constant but adjustable in a certain scope. Adjusting the machining parameters (e.g., cutting speeds and feed rates for lathing and milling, and polishing time for the chemical mechanical planarization)—when such interventions are technically reasonable—allows the controllability for changing the processing time of the machining processes. Essentially, tuning the processing time can coordinate all workstations to reduce production downtime, mitigate the bottleneck impacts, and maintain the material flows throughout the system [25,26], which consequently impacts system performance criteria such as operating costs and inventory carrying costs. Related analytical approaches have been studied extensively to evaluate the performance of manufacturing systems via adjusting the processing time: Kayan and Akturk [25] investigated how to control the processing time in a computer numerical control (CNC) machine scheduling problem. Their bi-criteria scheduling problem on a single CNC machine scheduling problem suggests that by properly alternating the process parameters (e.g., cutting speed and feed rate), the production system can efficiently utilize the resources given specific requirements of make spans. Renna [26] proposed new policies for shop floor scheduling by adjusting the processing time of machine stations. The results suggested a significant system performance improvement under several dynamic conditions. In addition to controlling processing time, process uncertainty is another critical aspect of production planning. Shabtay and Zofi [27] investigated a single machine scheduling problem to minimize the overall makespan via adjusting controllable processing time. This problem was solved using a constant factor approximation algorithm and a polynomial-time approximation scheme. Kim and Lee [28] first considered the uncertain processing time and pick-up constraints in the scheduling problem. Wu and Zhou [29,30] further investigated derived optimality and feasibility conditions under this assumption and extended the work by developing a Petri-net model and control policy to obtain offline periodic scheduling solutions. In addition, adjusting the processing rate allows for the system's high performance even though the production complexities and uncertainties exist in intelligent machine systems [31,32].

In this paper, we consider optimizing the production planning in an *I*-stage assembly line consisting of workstations with buffers and storage. As shown in Fig. 2, the process starts from the material replenished buffer *B*_{0} and each stage *i* is followed by a buffer *B*_{i} with finite capacity *C*_{i} < ∞.

*t*is the time index,

*x*

_{0}records the state of storage size for raw materials at

*B*

_{0},

*X*_{C}is a 1 × (

*I*− 1) column vector, i.e., $XC=[x1,x2,\u2026,xI\u22121]\u22a4$, records all the levels of buffer from stage 1 to stage

*I*− 1, and

*x*

_{I}represents the level of machined products at

*B*

_{I}. $I$ is the identity matrix such that $I\u2208R(I\u22121)\xd7(I\u22121)$. Note that a fixed time interval is usually selected between two adjacent time indices to discretize the continuous time (in the numerical case study, we set the time interval as 20 min). The vector

**(**

*V**t*) contains the processing rates for all machines at time index

*t*, i.e., $V(t)=[V1,1,\u2026,V1,S1,V2,1,\u2026V2,S2,\u2026,VI,1,\u2026,VI,SI]\u22a4$ and $M\u2208R(I+1)\xd7\u2211i=1ISi$ is the machine control matrix as

*S*

_{i}.

*R*(

*t*) represents the amount of raw materials (at

*B*

_{0}), and

*P*(

*t*) represents the number of machined products (at

*B*

_{I}). The equations for

*R*(

*t*) and

*P*(

*t*) are formulated as

Raw materials will be replenished at the time *φ*_{In} and products will be sent to the downstream customer at the time *φ*_{out}. *t*_{0}, $t0\u2032$ are the beginning time indices and $T\tau $, $T\tau \u2032\u2032$ are the random intervals of the replenishment and take-out from the manufacturing system, respectively. *τ*, *τ*′ ∈ ℤ^{+} (*τ* = 1, 2, 3, …, *r*, *τ*′ = 1, 2, 3, …, *p*), where *r* is the number of the replenishment period, and *p* is the number of take-out of machined parts. Here, we model the time interval between every two events (*Q*_{In}’s and/or *Q*_{Out}’s) using a Poisson distribution. Such intermittent demand patterns in material flows/supplies can be characterized by periods with positive demand, alternated with (many, or a few) periods without demands. Researchers reported that such intermittent demand follows a Poisson process [30]. The denoted intervals between two incidents (e.g., *Q*_{In}’s and/or *Q*_{Out}’s), i.e., $T\tau $ and $T\tau \u2032\u2032$, follow Poisson distributions: $T\tau \u223cPoisson(hr)$ and $T\tau \u2032\u2032\u223cPoisson(hp)$. After the disruptive event(s), the parameters of these Poisson distributions change to $hr\u2032$ and $hp\u2032$, correspondingly. Also, *Q*_{In} and *Q*_{Out} are in turn replaced by $QIn\u2032$ and $QOut\u2032$ after the disruptive event.

*x*

_{i}describes the buffer level at

*B*

_{i}. Here, we denote its derivative as Δ

*x*

_{i}(

*t*) = [

*x*(

*t*) −

*x*(

*t*− Δ

*t*)]/Δ

*t*at time

*t*, which can be formulated as

*x*

_{i}(

*t*) ≤

*C*

_{i}. Δ

*x*

_{0}(

*t*) describes the releasing rate of raw materials at

*B*

_{0}, and Δ

*x*

_{I}(

*t*) is the throughput rate capturing the productivity at the last stage

*I*.

*V*

_{i,j}(

*t*) and

*V*

_{i+1,j}(

*t*) are processing rates for the

*j*th machine at the

*i*th and (

*i*+ 1)th stage at time

*t*, respectively. If there is no material in the (

*i*− 1)th storage, i.e.,

*x*

_{i−1}(

*t*) = 0, then all the machines at stage

*i*will be starved. Meanwhile, if the buffer capacity at

*B*

_{i}surpasses its capacity, i.e.,

*x*

_{i}(

*t*) ≥

*C*

_{i}, a flow jam will be formed for the machines at stage

*i*. Hence, the equation can be formulated as

*u*

_{i,j}is the actual processing rate when the machine

*M*

_{i,j}is under operation. When the machine

*M*

_{i,j}is unoccupied, the processing rate is set to zero. Essentially, the discontinuity nature in Eq. (8) introduces nonlinear behaviors in the system dynamics. Assume that the processing rate for the machine

*M*

_{i,j}is with mean $\mu i,j0$ and the standard deviation $\sigma i,j0$ to describe the process uncertainties. As most products are processed in batches, one can estimate the mean of the processing rate per item [33,34] by a normal distribution with the sample mean

*μ*

_{i,j}and the sample variance $\sigma i,j2$, i.e., $ui,j\u223cN(\mu i,j,\sigma i,j2)$. Let the vector $\mu =(\mu 1,1,\u2026,\mu 1,S1,\u2026,$$\mu I,S1,\u2026,\mu I,SI)\u22a4\u2208R1\xd7\u2211i=1ISi$ and $\sigma =(\sigma 1,1,\u2026,\sigma 1,s1,\u2026,\sigma I,S1,\u2026,$$\sigma I,SI)\u22a4$ represent the mean and standard deviation of processing rates for all the machines, respectively. In addition, we apply the signal-noise ratio (SNR) value [35], i.e., SNR =

*μ*

_{i,j}/

*σ*

_{i,j}, to denote the level of the uncertainty in the processing rate. During the production,

*V*

_{i,j}’s are affected by other machines from upstream and downstream. Hence, we can rewrite the notation

*V*

_{i,j}(

*t*) as

*V*

_{i,j}(

**,**

*μ**t*), denoting its relationship with

**. Similarly,**

*μ**x*

_{i}(

**,**

*μ**t*) replaces the notation

*x*

_{i}(

*t*) in the following (sub) sections. Therefore, we apply the time interval between two events conformed to Poisson distribution and the processing rate conformed to normal distribution. Those probability distributions lead to manufacturing system uncertainties, which cause the unexpected intermittent shortage/congestion of the material and/or the perturbations in production planning.

At each stage *i*, all the machines *M*_{i,j}′s operate in parallel, and the material flows assigned to each machine are proportional to each machine’s processing rate. For example, since the machine *M*_{i,1} and the machine *M*_{i,2} at stage *i* are located parallelly, they receive the material quantities from upstream with the proportions *V*_{i,1}(** μ**,

*t*)/(

*V*

_{i,1}(

**,**

*μ**t*) +

*V*

_{i,2}(

**,**

*μ**t*)) and

*V*

_{i,2}(

**,**

*μ**t*)/(

*V*

_{i,1}(

**,**

*μ**t*) +

*V*

_{i,2}(

**,**

*μ**t*)), respectively. In addition, these parallel machines share the same buffer

*B*

_{i}. In other words, once the upstream buffer

*B*

_{i−1}is empty or the buffer at stage

*i*(

*B*

_{i}) is full, all the machines at the stage

*i*, {

*M*

_{i,j}} for

*j*= 1, 2, …,

*S*

_{i}, would be idle (starved or blocked).

### 2.2 The Generalized Polynomial Chaos-Based Model to Capture Manufacturing System Uncertainties.

To capture the propagation of uncertainties, we utilize the gPC expansion-based discrete-event dynamic system (DEDS) for manufacturing systems under intermittent disruptive events [36]. The gPC expansion has recently garnered tremendous traction for its efficient uncertainty quantification in complex dynamic systems [37]. Uncertainties are always present in the manufacturing environment, especially for the production process, such as processing rates and delivery lead time. In this paper, we model the processing time and its variations as a stochastic process *u*(*ξ*), which can be represented as a linear combination of a convergent series of polynomials in gPC expansion, i.e., $u(\xi )=\u2211k=0\u221euk\varphi k(\xi )$ [38], and the expansion is truncated to finite order *K* as an approximation, i.e., $u(\xi )\u2248\u2211k=0Ku^k\varphi k(\xi )$. Here, *u*_{k} is the polynomial coefficient in the gPC expansion while $u^k$ is the polynomial coefficient in the truncated gPC expansion, *ξ* is the random variable inherent in the stochastic process *u*, and the functional *ϕ*_{k} represents the polynomial chaos basis function conforming to the distribution of *ξ*. Note that *ϕ*_{k} for *k* = 1, …, *K* are mutually orthogonal in *L*_{2} space, namely, $\u27e8\varphi k,\varphi l\u27e9=\u222b\varphi k(\xi )\varphi l(\xi )\rho (\xi )d\xi =\delta kl\u27e8\varphi k\u27e9,\varphi l$, where *ρ*(*ξ*) is the probability density function of *ξ*, and *δ*_{kl} = 1 if *k* = *l* and *δ*_{kl} = 0 otherwise. The expansion coefficients $u^k$ can be approximated by the Galerkin projection approach, such that $u^k=u,\u27e8\varphi k/\varphi k2\u27e9$ [22,37]. The numerator is approximated using a Gaussian quadrature rule as $u,\varphi k=\u2211z=1Z\omega (z)u(\xi (z))\varphi k(\xi (z))$ and the denominator is evaluated as $\u27e8\varphi k2\u27e9=\u2211z=1Z\omega (z)\varphi k(\xi (z))\varphi k(\xi (z))$, where $Z$ is the total number of quadrature points, $\xi (z)$ is *z*-th quadrature point and $\omega (z)$ represents its weight. Hermite, Legendre, and Jacobi polynomials have been satisfied with this orthogonality criterion for Gaussian, Uniform, and Beta random variables, respectively [39,40]. Gaussian distribution is assumed in this study without loss of generality, and correspondingly, the functional *ϕ*_{k} is selected as the univariate Hermite polynomials.

In general, two functions are used to model the controllable time for manufacturing processes, namely the bounded linear and the scaling decreasing functions [41]: the linear bounded function is presented as $Vj=Vj\xaf\u2212\alpha juj,0\u2264uj\u2264u\xafj\u2264V\xafj/\alpha j$, where for the job *j*, $Vj\xaf$ is the non-compressed processing time, $u\xafj$ is the maximum (upper bound) based on the resource in the buffer that can be allocated to the current machine *M*_{ij}, and the parameter *α*_{j} represents the compression rate. Another formula utilizes the scaling function which can be stated as *V*_{j} = (*θ*_{j}/*u*_{j})^{k}, where the parameter *θ*_{j} is the workload of the job *j* with a positive scaling factor *k*. In our presented method, the processing time is controllable given a specific range of the processing time with induced randomness to describe the uncertainty of the processes. The detailed formulation is summarized in Sec. 2.3.

### 2.3 A Smart DEDS Based Production Optimization Approach.

Next, we investigate an optimization strategy for production planning under perturbations and uncertainties in the aforementioned manufacturing system. The discontinuity nature causes perturbation behaviors in the system dynamics, i.e., the processing rate of machine *M*_{i,j} is set to *u*_{i,j} only when the machine is running, as shown in Eq. (8). Meanwhile, the uncertainty for the manufacturing process resides in the process variations, i.e., $ui,j\u223cN(\mu i,j,\sigma i,j2)$. As suggested in Sec. 2.2, we introduce the surrogate gPC expansion estimations, i.e., $ui,j(\mu i,j,\sigma i,j,\xi )=\u2211k=0qu^k(\mu i,j,\sigma i,j,\xi )\varphi k(\xi )$, to approximate *u*_{i,j}, where *q* is a finite order to the truncated, *ξ* is the Gaussian random variable, the functional *ϕ*_{k}(*ξ*) denotes the univariate Hermite polynomials conforming to the distribution of *ξ*, and $u^k$ is the corresponding coefficient determined by the Galerkin projection approach [39]. Essentially, implementing the gPC approximation avoids uncertainty terms in the optimization problem, and the mean processing rates $(\mu i,j\u2032s)$ are considered as the variables in the optimization problem as stated in Eqs. (9)–(16).

As demonstrated in the objective function of Eq. (9), we seek to maximize the production with a minimal of system downtime by adjusting the mean processing rates $\mu =(\mu 1,1,\u2026,\mu 1,S1,\u2026,$$\mu I,S1,\u2026,\mu I,SI)\u22a4$ for machines in the system. Here, *T*_{0} and *T*_{f} denote the starting and finishing time stamps of the production process. The first term in the objective function is the total output of the system between *T*_{0} and *T*_{f}, where *V*_{I,j} represents the processing rate of the machines at the last stage. When the shortage of material in production happens, the decrease in the throughput rate can be mainly caused by the deficiency of the material. This term does not guarantee a minimal downtime. The second term is added in the objective function to ensure maximal production rate with a minimal downtime under the perturbations of the material supplies. Frequent system/machine breakdowns have adverse effects on the equipment effectiveness, management activities, and profits at any capacity level of a manufacturing system [42]. For example, unplanned system breakdowns may result in a decline in labor efficiencies, and frequently starting and stopping machines and changing the operation parameters (e.g., revolutions per minute, cutting speed, and feed rate) may accelerate equipment degradation [43]. In addition, machine downtime increases the cost of labor and spares value or other resources for facilities in the industry. Therefore, decreasing downtime while maintaining consistent productivity is a value-added activity that consequently enhances machine utilization efficiency and business profitability in system [44,45]. *d*_{i,j}(*t*) is a binary variable indicating whether the machine *j* at stage *i* is down at time *t*, i.e., *d*_{i,j}(*t*) = 0 when the machine *M*_{i,j} is in operation, and *d*_{i,j}(*t*) = 1 when *M*_{i,j} is either blocked or starved. The variable *a* is a penalty term, which controls tolerance of the total downtime of the system (in most of our case studies, *a* is selected as 100 to properly weigh the downtime in the objective function) [46]. The constraints are delineated in Eqs. (10)–(16). Here, Eq. (10) can capture process perturbation and uncertainty based on gPC expansion. $Vi,j(\mu ,t)=\u2211k=0qu^k(\mu i,j,$$\sigma i,j,\xi )\varphi k(\xi )$ when the machine *j* at stage *i* is under operations, and *V*_{i,j}(** μ**,

*t*) = 0 during the downtime. Equations (11)–(16) are the state equations. The first derivative of the inventory level at

*i*th buffer, $\Delta xi(\mu ,t)=\u2211j=1SiVi,j(\mu ,t)\u2212\u2211j=1Si+1Vi+1,j(\mu ,t)$, denotes the rate of buffer level changes, which is determined by the differential of the throughput between its upstream and downstream machines (Eq. (12)). Especially, Δ

*x*

_{0}(

**,**

*μ**t*) is the releasing rate of raw materials (Eq. (11)), and Δ

*x*

_{I}(

**,**

*μ**t*) is the throughput rate of the final production (Eq. (13)). In addition,

*x*

_{0}(

*t*) increases due to the replenishment (Eq. (14)) and

*x*

_{I}(

**,**

*μ**t*) decreases by the outflow (Eq. (15)) as the quantity of produced parts sent to downstream customers. The inventory level

*x*

_{i}(

**,**

*μ**t*) of the buffer

*B*

_{i}at time

*t*is subject to the capacity limits in Eq. (16), i.e.,

*x*

_{i}(

**,**

*μ**t*) is non-negative and no greater than the storage capacity

*C*

_{i}. The intermittent perturbations of the material flow can be reflected by the uncertainties of replenishment and take-out in both quantities and time durations. Therefore, we model such perturbations as stated in Sec. 2.1 (Eqs. (3)–(6)). Those intermittent perturbations will affect the mathematical model of the optimization problem through Eqs. (14) and (15). Note that the main purpose of presenting the explicit formulas and parameters for system uncertainties is to demonstrate the validity of the presented approach for optimizing the system under uncertainties. This presented framework opens up an opportunity for implementing quantification approaches (to estimate the distribution formulas and parameters) to capture the real-world uncertainties. In this paper, a constrained optimization by linear approximation (COBYLA) based algorithm [47] is applied in the optimization process. If multiple equivalent optimal solutions are available, it is preferred to select the optima with minimal difference for adjusting the processing time. In other words, if we have $K$ equivalent optimal solutions, we select the solution

*μ*

^{(*)}such that $*=argmin$ Therefore, we obtain the optimal mean processing rates (

**) after solving the optimization problem using the COBYLA algorithm [47], which can ensure maximizing the production rate with minimum system breakdowns.**

*µ*### 2.4 Discrete Event-Based Simulation With Real-Time Optimal Solutions for Production Planning.

A simulation for the DEDS model of a production system is carried out in SimPy [48] to test the performance of the presented gPC-SRPOC. The manufacturing system simulation was conducted given the optimal production plan generated by gPC-SRPOC. As shown in Fig. 3, the objects such as the workstations (machines), buffers, and material flows are initialized. The material flows (shown as arrows in Fig. 3) suggest the production line structure and the connectivity between workstations. The overall flowchart describing the simulation procedure is summarized in Fig. 4. First, the parameters including material quantity, the time interval between arrivals, the processing rate as well as the uncertainty level for each machine are initialized. The dashed block in Fig. 4 describes the logic for determining the status of the current machine (i.e., being occupied or idle). The throughput rate and the total machine downtime (i.e., the time duration when one machine is under starved and blocked conditions) through the simulation are recorded to evaluate the performance of the presented method [17].

To compare the performance of the gPC-SRPOC with other methods, two strategies are carried out: Plan A selects a consistent production planning throughout the whole time-span (i.e., the exact solution of shop floor scheduling before and after the disruptive events); The plan B applies the presented gPC-SRPOC to generate an adaptive optimal solution, i.e., this solution to the production plans adjusts promptly after the onset of the disruptive events. The roadmap for the proposed approach is illustrated in Fig. 5:

Initialization: it includes parameters which are the mean of the processing rate of the machine

*μ*_{i,j}, the period $T\tau (1)$, and the quantity*Q*_{I}of the replenished of the raw materials before perturbation in the multistage configurable manufacturing system;During the iteration of the dynamic model, the buffer capacity,

*x*_{i}(,*μ**t*),*i*= 0, 1, 2, …,*I*, is updated. The objective function value in Eq. (9) is calculated once*t*=*T*_{f};The optimal $\mu i,j*$ is generated by solving the optimization problem described in Sec. 2.3. Then the measures, including the throughput rate and total downtime rate, are calculated;

Initialize the parameters, such as $T\tau (1)\u2032$ and $QI\u2032$ at the starting point of phase II;

In phase II, plan A applies the original optimal solution for the production planning. The measures, including the throughput rate and total downtime, are also calculated;

Under plan B, the processing rate of the machine $\mu i,j*\u2032$ is re-optimized by solving the optimization problem with new inputs (e.g., $T\tau (1)\u2032$ and $QI\u2032$). The measures for plan B are then calculated;

The optimized production plans are used as inputs in a discrete-event simulation module. The simulation result is used for evaluating the performance of the gPC-SRPOC.

## 3 Case Study and Results

### 3.1 Manufacturing System Simulation.

We consider a bearing manufacturing workshop, as portrayed in Fig. 6. It consists of three production lines for machining three parts of the rolling bearing, i.e., the inner ring, the outer rings, and the rolling elements. The inner and outer rings go through the raw material removal procedures (e.g., turning, chamfering, and grooving). Before sending to the assembly and inspection, these machined parts are treated with heat treatment, grinding, and polishing. The simulation model of this manufacturing workshop is depicted in Fig. 7. This multistage manufacturing system can be simplified into two parallel production lines. Correspondingly, the production lines for producing inner and outer rings (i.e., material flows 1 and 2, respectively) include 2, 1, and 2 machines at three stages. The production line for material flow 3 only contains one machine at each stage. These three production lines are merged into a single unit for assembly, inspection, and packaging. The buffer size for each *B*_{i} (for *i* = 1, 2, …, 6) is set as 20 in the simulation. The processing rate *u*_{i,j} is time-varying with uncertainties, i.e., $ui,j\u223cN(\mu i,j,\sigma i,j2)$, for each machine *M*_{ij}, and the estimated PC coefficient $u^k$ is obtained using 20 quadrature points (i.e., $Z=20$). SNR values are set as 100, 10, 5, 2, and 1 to simulate the system from a nearly deterministic process (SNR = 100) to a high uncertainty for the manufacturing process (SNR = 1).

In addition, note that the intervals between the events (*Q*_{In}’s and *Q*_{Out}’s) are discrete values in this dynamic model. Then this point process can be characterized by Poisson statistics [49]. Therefore, in the presented model, the material flow demand at a random interval *Int*_{r} follows a Poisson distribution as *Int*_{r} ∼ Poisson (*h*_{R}). At the onset of the disruptive event, the delivery time from upstream supplies varies significantly. This leads to a longer period between material replenishments, i.e., $Intr\u2032\u223cPoisson(hR\u2032)$ where $hR<hR\u2032$. Due to the disruptive event, the quantity of replenished raw materials is reduced to $QIn\u2032(QIn\u2032<QIn)$. As a result, such an uncertain disruptive incidence causes supply insufficiency, and the downstream machines will be under starvation. The total quantity of raw material *Q*_{In} is set as 300 with the simulation period of 33.5 h and *h*_{R} as 10 for phase I. Then, a disruptive event is at the beginning of phase II (at the time index *t*_{p} (33.5)). The total quantity $QIn\u2032$ is set as 250 with 50 h of simulation and $hR\u2032$ as 15 in phase II. A 20 min interval (Δ*t* = 20min) is set for this discrete-event simulation.

### 3.2 Data-Driven Modeling Results.

A DEDS simulation is carried out based on two production planning strategies A and B. The gPC-SRPOC generates the solution for the process parameters (i.e., the dynamic processing rate *μ*_{i,j} for each machine) in the DEDS simulation model. Figure 8 lists the evolution of the throughput rate and the downtime ratio for one study case with SNR = 100. The results of plan A (in dashed line) are obtained using the optimal solution when the system has no perturbations (i.e., no observable disruptive events). When the perturbations emerge (at the time index *t*_{p}), plan A still sticks to the original solution for the production planning. The manufacturing system exhibits lower productivity, as indicated by the decreased throughput rate in Fig. 8(b). Moreover, the system is under significant machine downtime—more than 10% of the downtime ratio for the system after 50 h when the disruptive event emerges.

On the other hand, by applying strategy B, the production planning strategy promptly adapts to the system uncertainty (the sudden perturbation introduced at *t*_{p}). The overall downtime ratio is under 3% (shown in the dash-dotted lines in Fig. 8(a)). Specifically, the overall downtime for all machines is 18 h under plan B compared to 87 h under plan A. This illustrative example shows a significant improvement using the presented gPC-SRPOC method to enhance the system resilience.

Next, we compute the results with 20 replications under each SNR level. The throughput rates and the downtime ratios based on 20 computations of the simulations following strategies A and B are summarized in Table 1 and Fig. 9. The results generated from plan A in phase I are plotted in boxes with backslashes, and plan B’s results in phase I are shown with boxes with slashes. For comparisons between plans A and B, Figs. 9(a) and 9(b) are the results of the throughput rate and the downtime ratio before and after the perturbations. It may be noticed that there are no significant differences between these two plans after the disruptive events due to the deficiency of materials, signified by the similar throughput rates. However, the downtime ratio values from the two strategies (A and B) are significantly different: plan A has a downtime ratio increasing to 14.7%, and plan B’s downtime ratio is only 1.83% using the gPC-SRPOC under SNR of 100. Therefore, plan B, which re-optimizes the production plan after disruptive events, shows significant advances in enhancing the machine utility and reducing the system downtime (which may save associated marginal costs). This case study suggests that the proposed approach allows a timely intervention in production planning to significantly reduce downtime while guaranteeing maximum productivity under system perturbations and uncertainties.

Measurements | SNR | Before perturbation | After perturbation | ||||||
---|---|---|---|---|---|---|---|---|---|

Plan A | Plan B | Plan A | Plan B | ||||||

Mean | Std. | Mean | Std. | Mean | Std. | Mean | Std. | ||

Throughput rate | 100 | 3.007 | 0.0021 | 3.005 | 0.0027 | 2.015 | 0.0196 | 2.008 | 0.0186 |

10 | 2.990 | 0.0070 | 2.986 | 0.0081 | 2.001 | 0.0166 | 2.005 | 0.0168 | |

5 | 2.954 | 0.0085 | 2.956 | 0.0076 | 2.000 | 0.0221 | 2.002 | 0.0238 | |

2 | 2.809 | 0.0116 | 2.809 | 0.0119 | 2.001 | 0.0239 | 1.998 | 0.0263 | |

1 | 2.746 | 0.0145 | 2.751 | 0.0131 | 1.997 | 0.0288 | 1.991 | 0.0257 | |

Downtime ratio (%) | 100 | 0.0 | 0.0 | 0.01 | 0.05 | 14.70 | 0.41 | 1.83 | 0.36 |

10 | 0.03 | 0.08 | 0.03 | 0.08 | 14.84 | 0.49 | 2.20 | 0.51 | |

5 | 0.06 | 0.12 | 0.02 | 0.05 | 16.94 | 0.49 | 2.52 | 0.58 | |

2 | 0.15 | 0.23 | 0.09 | 0.15 | 21.04 | 0.56 | 3.33 | 0.57 | |

1 | 0.22 | 0.32 | 0.15 | 0.26 | 26.22 | 0.85 | 3.67 | 0.60 |

Measurements | SNR | Before perturbation | After perturbation | ||||||
---|---|---|---|---|---|---|---|---|---|

Plan A | Plan B | Plan A | Plan B | ||||||

Mean | Std. | Mean | Std. | Mean | Std. | Mean | Std. | ||

Throughput rate | 100 | 3.007 | 0.0021 | 3.005 | 0.0027 | 2.015 | 0.0196 | 2.008 | 0.0186 |

10 | 2.990 | 0.0070 | 2.986 | 0.0081 | 2.001 | 0.0166 | 2.005 | 0.0168 | |

5 | 2.954 | 0.0085 | 2.956 | 0.0076 | 2.000 | 0.0221 | 2.002 | 0.0238 | |

2 | 2.809 | 0.0116 | 2.809 | 0.0119 | 2.001 | 0.0239 | 1.998 | 0.0263 | |

1 | 2.746 | 0.0145 | 2.751 | 0.0131 | 1.997 | 0.0288 | 1.991 | 0.0257 | |

Downtime ratio (%) | 100 | 0.0 | 0.0 | 0.01 | 0.05 | 14.70 | 0.41 | 1.83 | 0.36 |

10 | 0.03 | 0.08 | 0.03 | 0.08 | 14.84 | 0.49 | 2.20 | 0.51 | |

5 | 0.06 | 0.12 | 0.02 | 0.05 | 16.94 | 0.49 | 2.52 | 0.58 | |

2 | 0.15 | 0.23 | 0.09 | 0.15 | 21.04 | 0.56 | 3.33 | 0.57 | |

1 | 0.22 | 0.32 | 0.15 | 0.26 | 26.22 | 0.85 | 3.67 | 0.60 |

To compare with the gPC-SRPOC, we apply a conventional Monte Carlo-based production optimization and control (MCPOC) approach [50]. The detailed procedures for implementing MCPOC are as follows:

a set of random numbers $(\u03f5i,j)\u2200(i,j)$ are generated by the sampling from the distribution of $N(0,\sigma i,j2)$ [7];

based on sampled

*ε*_{i,j}, estimate*u*_{i,j}=*μ*_{i,j}+*ε*_{i,j}, $\u2200(i,j)$;$\mu i,j*(\u2200(i,j))$ is obtained after resolving mathematical modeling of the optimization problem;

iterate steps (1), (2), and (3)

*n*times to obtain the solutions of processing rates ${\mu i,j*(1),\mu i,j*(2),\u2026,\mu i,j*(\u22c5)}$;the expectation $\mu \xafi,j*$ of all $\mu i,j*(.)$’s is used as the input in a discrete-event simulation module. The simulation result is used for evaluating the performance of the MCPOC.

Specifically, this MCPOC method uses two different sample sizes, i.e., 1000 and 10,000. The throughput rate and the downtime ratio with gPC-SRPOC and MCPOC are summarized in Table 2 and Fig. 10, where the gPC-SRPOC results are represented by boxplots with slashes, and MCPOC results are plotted by boxes with dots/crosses. Note that both methods are applied only under plan B to test their performances in optimizing the production plans after disruptive events occur.

Measurement | SNR | Method | |||||
---|---|---|---|---|---|---|---|

gPC | MC1 (size = 1000) | MC2 (size = 10,000) | |||||

Mean | Std. | Mean | Std. | Mean | Std. | ||

Throughput rate | 100 | 2.605 | 0.0024 | 2.608 | 0.0088 | 2.602 | 0.0048 |

10 | 2.580 | 0.0043 | 2.570 | 0.0111 | 2.579 | 0.0047 | |

5 | 2.554 | 0.0100 | 2.503 | 0.0506 | 2.543 | 0.0171 | |

2 | 2.412 | 0.0163 | 2.363 | 0.0933 | 2.408 | 0.0278 | |

1 | 2.348 | 0.0170 | 2.311 | 0.0912 | 2.346 | 0.0378 | |

Downtime ratio (%) | 100 | 6.1 | 0.03 | 6.6 | 0.85 | 6.1 | 0.09 |

10 | 6.1 | 0.06 | 7.1 | 0.91 | 6.1 | 0.15 | |

5 | 7.1 | 0.14 | 9.2 | 1.59 | 7.0 | 0.24 | |

2 | 10.1 | 0.28 | 12.1 | 1.90 | 10.0 | 0.80 | |

1 | 13.4 | 0.44 | 14.4 | 2.01 | 13.4 | 1.09 |

Measurement | SNR | Method | |||||
---|---|---|---|---|---|---|---|

gPC | MC1 (size = 1000) | MC2 (size = 10,000) | |||||

Mean | Std. | Mean | Std. | Mean | Std. | ||

Throughput rate | 100 | 2.605 | 0.0024 | 2.608 | 0.0088 | 2.602 | 0.0048 |

10 | 2.580 | 0.0043 | 2.570 | 0.0111 | 2.579 | 0.0047 | |

5 | 2.554 | 0.0100 | 2.503 | 0.0506 | 2.543 | 0.0171 | |

2 | 2.412 | 0.0163 | 2.363 | 0.0933 | 2.408 | 0.0278 | |

1 | 2.348 | 0.0170 | 2.311 | 0.0912 | 2.346 | 0.0378 | |

Downtime ratio (%) | 100 | 6.1 | 0.03 | 6.6 | 0.85 | 6.1 | 0.09 |

10 | 6.1 | 0.06 | 7.1 | 0.91 | 6.1 | 0.15 | |

5 | 7.1 | 0.14 | 9.2 | 1.59 | 7.0 | 0.24 | |

2 | 10.1 | 0.28 | 12.1 | 1.90 | 10.0 | 0.80 | |

1 | 13.4 | 0.44 | 14.4 | 2.01 | 13.4 | 1.09 |

First, we apply a sampling size of 1000 for the MCPOC method. As shown in Fig. 10, the performance of the presented gPC expansion is with less variance than the MCPOC (here we denote the results based on 1000 sample size using MC1) and consistently with lower downtime ratios under all SNR levels. The throughput rate obtained using MCPOC decreases as the system uncertainty increases. This is because a considerable process variation may abrupt the production planning and increase the chances of being idle for downstream workstations. For instance, under the condition with SNR = 5, the throughput rate decreases to 2.503, and the downtime ratio increases up to 9.2%. On the contrary, the presented gPC method has less downtime (nearly 22% improvement compared to the MC-based method tested) but a higher throughput rate (1.99% increment compared to the MC-based method). Besides, the throughput rate and downtime ratio variances generated using MCPOC are more significant than gPC expansion. Pertinently, as the uncertainty increases (SNR decreases), the throughput rate using gPC varies from 0.0024 to 0.0170, and the MCPOC’s variance changes from 0.0088 to 0.0912. Meanwhile, the variance of the total downtime using gPC varies from 0.03% to 0.44%, but the MC method possesses relatively higher variances, increasing from 0.85% to 2.01% as the SNR ratio decreases.

We further test the MOPOC approach with a different sampling size of 10,000. The results of MCPOC (here it is denoted by MC2) and gPC were summarized in Table 2 and Fig. 10. The variances of the MC method are shrunk due to the increment of its sampling size. However, it is noticed that the presented gPC-SRPOC is still with less variance and lower downtime ratios under all SNR levels. For even lower SNR levels (i.e., larger uncertainties), the presented gPC-SRPOC is more stable than the MCPOC. Besides, in real-world applications, especially for monitoring a manufacturing platform/system, it is inconvenient to acquire sufficient information to infer the underlying probability distribution of a collection of empirical observations due to limited data availability. Compared to the MC-based method, gPC expansion possesses high accuracy and avoids the time-consuming sampling scheme. The gPC-SRPOC can generate consistent solutions throughout different levels of system dynamic uncertainties. Therefore, gPC-SRPOC can serve as an efficient analytic approach to enhance system resilience and mitigate the impacts of disruptive perturbations in manufacturing firms.

## 4 Conclusions and Discussions

This paper presents a DEDS modeling approach to capture the manufacturing system dynamics under production disturbances. A gPC-based optimization and control approach is proposed to realize a smart and resilient shop floor by providing responsive production planning strategies. The contributions of the paper are summarized as follows:

The core issues of characterizations of nonstationary behaviors and uncertainties in manufacturing systems have been addressed using a gPC-based DEDS model.

A gPC-based differential evolution global optimization tool is further developed. It provides instantaneous optimization and control of the shop floor activities to enhance the system’s resilience against unforeseen catastrophic events.

The experimental case study suggests that the proposed gPC-SRPOC approach maintains the system production when the systems undergo intermittent material flow shortages without significant system downtime. In addition, this approach avoids large sampling efforts to save computation time and performs consistently compared to the existing random sampling methods MCPOC. Overall, the presented method captures the variations and uncertainties for responsive strategic planning on the shop floor, mitigating the impacts of disruptive events to enhance the system’s resilience.

Currently, the presented approach focuses on the *I*-stage system, which is a general presentation for most manufacturing factory platforms. However, further investigations are needed to implement gPC-SRPOC into manufacturing systems with various structures, such as custom workshops and flexible manufacturing systems which are with more complex logistics and manufacturing processes (e.g., feedback material flows into earlier workstations). The DEDS model needs to be extended for such scenarios. In addition, approaches such as Bayesian inference and Markov chain Monte Carlo need to be incorporated for further uncertainty quantifications.

## Acknowledgment

This work is supported by the National Science Foundation (Award Number 2119334) of the United States.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.