## Abstract

In this study, we carry out robust optimal design for the machining operations, one key process in wafer polishing in chip manufacturing, aiming to avoid the peculiar regenerative chatter and maximize the material removal rate (MRR) considering the inherent material and process uncertainty. More specifically, we characterize the cutting tool dynamics using a delay differential equation (DDE) and enlist the temporal finite element method (TFEM) to derive its approximate solution and stability index given process settings or design variables. To further quantify the inherent uncertainty, replications of TFEM under different realizations of random uncontrollable variables are performed, which however incurs extra computational burden. To eschew the deployment of such a crude Monte Carlo (MC) approach at each design setting, we integrate the stochastic TFEM with a stochastic surrogate model, stochastic kriging, in an active learning framework to sequentially approximate the stability boundary. The numerical result suggests that the nominal stability boundary attained from this method is on par with that from the crude MC, but only demands a fraction of the computational overhead. To further ensure the robustness of process stability, we adopt another surrogate, the Gaussian process, to predict the variance of the stability index at unexplored design points and identify the robust stability boundary per the conditional value at risk (CVaR) criterion. Therefrom, an optimal design in the robust stable region that maximizes the MRR can be identified.

## 1 Introduction

The ongoing devastating COVID-19 pandemic has triggered turmoil in global supply chains, manifesting in shipping delays, rising costs for raw materials, shortages of finished components and parts, and as well as capacity crunches in transportation networks. In particular, a shortfall in semiconductors throws a wrench into a host of industries, such as the automobile, home appliances, and computer gadgets. In wake of this supply upheaval, automakers have scrambled to stockpile chips and other key parts, in antithesis of the “Just in Time” philosophy [1]. As the coronavirus contagion subsides and we gradually return to normalcy, the demand for chips is roaring up, which further complicates the logistical snags in chip acquisition. Against this backdrop, the scaleup of chip manufacturing has been brought to sharp relief. Quality assurance is the linchpin in scaleup manufacturing for the sought-after chips, which calls for optimal process design.

Remarkably, machining is a fundamental procedure in chip fabrication, including ultra-precision machining and chemical mechanical polishing [2–4]. Also known as material removal or subtractive manufacturing, the unwanted parts from a workpiece or wafer are scraped by cutting tools to achieve the desired surface roughness. The surface roughness here determines the chip performance: while the run-of-the-mill chips are made from the 28 nm wafers, chips in advanced AI settings require the 5 nm ones. One detrimental effect in a variety of machining operations (e.g., turning, drilling, and milling) is the self-exciting dynamics or regenerative chatter arising from the intrinsic time delay phenomenon, which results in exceedingly fierce tool vibration, premature tool wear/failure, and scrap parts due to unacceptable surface finish. It is noted that a high depth of cut usually induces a large cutting force, which in turn, magnifies the chance of chatter. On the other front, large-scale fabrication enforces a high production rate and hence a high depth of cut. Therefore, optimal design for machining operations hinges on the selection of process parameters (e.g., spindle speed and depth of cut) to achieve high productivity (or equivalently, material removal rate (MRR)) and avert the adverse chatter. Identification of the boundary that separates the stable from unstable design configurations in machining processes is a profound quest.

Craftsman knowledge and trial-and-error tests are traditionally used in machining process design, which, however, incurs tedious efforts and enormous costs [5]. Alternatively, computer simulations have been enlisted to characterize the machining process dynamics. This includes the delay differential equation (DDE) models that capture the quintessential phenomenon of time delay [6,7]. In contrast to ordinary differential equations, solutions to DDEs are of infinite dimensions, and sophisticated numerical approaches, such as the temporal finite element method (TFEM), are carried out to construct the approximate solutions. Thus, chatter avoidance boils down to discerning optimal machine settings that induce stable solutions of the DDEs. In addition, the inherent uncertainty in machining operations (e.g., material and process calibration) begs the question of robust process design. Indeed, a workshop on “Uncertainty in Machining” sponsored by the National Science Foundation in 2010 underscored uncertainty quantification and risk mitigation for machining and related manufacturing operations [8]. More recently, the Bayesian network was utilized to evaluate the impact of the random residual stress on machining performance [9]. The challenge resides in how to efficiently navigate the design space to delineate the contour of the stable regimes, graphically represented as a stability lobe diagram (SLD), and pinpoint the optimal design that maximizes the MRR in a robust sense considering the underlying uncertainty.

In this exposition, we propose an active learning multi-fidelity approach that integrates a high-fidelity simulation oracle (TFEM here) to solve the DDE model and a low-fidelity surrogate to mimic this simulator. More specifically, stochastic kriging (SK) is used as the surrogate to elucidate the relationship between the stochastic response (here, the stability index) and the design parameters and predict the response distribution at unexplored design points. On the back of such predictive distributions, active learning provides guidance on the sequential batch selection of the most informative design points that will be further evaluated by the high-fidelity TFEM. This considerably trims the computational cost compared to the high-fidelity simulator-only experimental design, and the nominal stability boundary can be attained cost-effectively. To tackle the uncertainty, we also incorporate the Gaussian process (GP) to model the heterogeneous variance of stability index across the design space and predict the variance at unexplored design points. This avoids the crude Monte Carlo (MC) sampling at each design point to derive the variance and further ameliorates the computational overhead. Subsequently, a risk measure, conditional value-at-risk (CVaR), is imposed on each candidate design point to construct the robust stability boundary. Next, design points within the robust stable region will be queried to maximize MRR.

The remainder of this article is organized as follows: related works on stability modeling in machining processes are presented in Sec. 2; the problem description, including the DDE model and TFEM, is illustrated in Sec. 3; the methodology, including sequential design, SK-active learning, Gaussian process, and CVaR are explained in Sec. 4; numerical results are be demonstrated in Sec. 5; and finally the last section concludes this study.

## 2 Related Work

Regenerative chatter is a major destructive phenomenon in machining operations, the avoidance of which via optimal design is indispensable for quality assurance and scaleup manufacturing. Numerical analytical approaches [10–12] have long been sought after to predict the regenerative chatter, including the finite element method (FEM) and Nyquist plots. In FEM, the governing equations of system dynamics are discretized into multiple elements to predict the behavior of cutting tools or other components in the design stage [13,14]. Nyquist plots are commonly used in the frequency response of the dynamical system, and the stability boundary is derived by repeatedly adjusting the cutting conditions (i.e., depth of cut or spindle speed) in the feasible range [15]. The numerical analytical approaches can also be classified into time and frequency domain methods. Landers and Ulsoy [16] adopted time domain simulation (TDS) with a nonlinear force model to track the evolution of the cutting tool dynamics in a face-turning process. Whereas the TDS is a straightforward simulation approach in chatter analysis, it entails tremendous computation effort, particularly considering a large design space. Greis et al. [17] applied the frequency domain stability analysis to identify the SLD by integrating physical experimental study and numerical analysis. In the frequency domain stability analysis reported in Ref. [18], Fourier transformation was implemented to transform the time domain force signal into the frequency domain displacement-to-force ratio signal to determine the analytical stability limit [18].

With the recent leap forward in sensing techniques to monitor the machining processes, data-driven methods have been extensively investigated. For instance, Khasawneh and Munch applied topological data analysis on vibration signals from a turning process to predict chatter: the time series was first transformed into a point cloud, and persistent homology was then applied to quantify the topological difference under stable and unstable cuttings [19]. Schmitz et al. [20] extracted the statistical variance of the once-per-revolution sampled audio signal to detect chatter based on the fact that regenerative chatter leads to a larger variance for the audio signal due to asynchronous motion. Axinte et al. [21] suggested that cutting forces were more sensitive to the burr formation and chatter marks caused by regenerative instability. Denkena et al. [22] treated the identification of SLD as a classification problem and compared the performance of kernel interpolation, support vector machine (SVM), and artificial neural network to pinpoint the stability boundary with training data from simulation and experimental studies. With in situ acceleration signals, continuous learning models, such as SVM, were developed to identify the SLD under different process dynamics [23]. In a similar vein, Friedrich et al. [24] combined reinforcement learning and nearest neighbor classification to extend the SLD obtained at a certain time frame to changing machining dynamics based on in situ vibration signal. Ringgaard et al. [5] formulated a constrained optimization problem to maximize MRR, under the penalty of chatter and forced vibration, and capture the trade-off between robustness and optimality for this optimization problem. Yet, uncertainty was not explicitly included in this study.

To quantify the impact of such uncertainty on machining stability, the probabilistic characterization of the regenerative chatter in turning was studied using both the MC method and the advanced first-order second moment method, and the reliability of stability was determined by comparing the actual depth of cut and the limiting values [25]. Park and Qin [26] applied the edge theorem to predict the stability boundary considering epistemic uncertainty of process parameters for both single- and multiple-degrees-of-freedom milling systems. Totis [27] treated the model parameters as random variables to obtain the probabilistic instead of deterministic stability lobes. A new criterion for system stability based on level curves and gradient of the probabilistic lobes was then applied to identify optimal robust stable cutting conditions. Che and Cheng applied generalized polynomial chaos to quantify the uncertainty propagation from the material properties to process stability via a TFEM solver [6]. Similarly, the fuzzy mathematical theory was adopted to account for the fuzzy stability lobes, each of which was assigned a membership value for the chatter status [28,29]. Löser et al. [28] investigated the robust SLD via the frequency domain approach. The tedious MC method was only implemented at a few key design points for stability assessment, and the solution was only approximated for other design points. However, how to select those critical design points and build the approximation solutions remain elusive.

Following the principle of Bayesian optimization, active learning has gained enormous traction in the literature on optimal engineering design. Active learning aims to select the most informative data points for further investigation and has been adopted in semi-supervised learning for data labeling, particularly in data imbalance settings. To explore the design space in a cost-effective manner, the probability of improvement [30,31], expected improvement (EI) [32,33], lower confidence bound [34], and knowledge gradient [35] have been attempted. For instance, the acquisition of high-dimensional data (e.g., imaging) in advanced manufacturing prevails for process quality monitoring, which demands, however, labor-intensive data annotation. An adaptive weighted uncertainty sampling was investigated in Ref. [36] that reduces the amount of necessary annotations by 20–70% in the data query of additive manufacturing. Che and Cheng [37] integrated active learning and relevance vector machine to sequentially probe the critical design points to locate the decision boundary in stability analysis of power systems. Botcha et al. [38] developed a query-by-committee active learning approach for selecting the next optimal experimental point to increase the prediction accuracy of surface roughness in a grinding process. More recently, deep learning (e.g., convolutional neural network (CNN)) has been adopted in active learning to handle wafer map pattern classification [39]. The CNN platform has been extensively used for spatial map data, and a softmax layer is added to represent the classification uncertainty, which is amenable for active learning. Similarly, Bayesian deep learning has been designed to automatically handle data and model uncertainty in active learning for labeling of high-dimensional image data [40]. Compared to those existing efforts, our research explicitly considers the intrinsic uncertainty at each design point to identify the nominal decision boundary, and a reliability-based design is further pursued using CVaR criteria.

## 3 Problem Description

Here, the time delay *τ* = 1/Ω is the time period for one revolution of the workpiece rotation determined by the spindle speed Ω (rounds/min). *z*(*t*) and *z*(*t* − *τ*) are the cutting tool displacement at time *t* and the previous revolution *t* − *τ*, respectively. The nominal feed rate *f*_{0} = 5 × 10^{−2} mm is kept constant in this study. The tunable parameters, depth of cut *b*, and spindle speed Ω, are deemed as the design variables. For illustration, we choose the range of *b* ∈ [0.0, 0.4]mm and Ω ∈ [1.5 × 10^{3}, 7.5 × 10^{3}] rounds/min in this study, which covers broad machine settings. Thanks to this intrinsic time delay, the effective feed rate *f*_{1} = *f*_{0} − *z*(*t*) + *z*(*t* − *τ*), consequently, *b*(*f*_{0} − *z*(*t*) + *z*(*t* − *τ*)) signifies the instantaneous cutting area. Thus, −*kb*(*f*_{0} − *z*(*t*) + *z*(*t* − *τ*)) represents the effective cutting force, proportional to the instantaneous cutting area at time *t*. That said, the phase difference of surface finish at two successive revolutions could amplify the cutting force and magnify fierce tool vibration, also called chatter, which typically results in inferior machined surfaces and premature tool wear. Moreover, with randomness in material and process calibration, we impose a normal distribution on each of the uncontrollable variables, including damping ratio *ξ*, natural frequency *ω*_{n}, and force coefficient *k*: $\xi \u223cN(0.02,0.0022)$, $\omega n\u223cN(600\pi ,(2\pi )2)$ rounds/s, and $k\u223cN(2\xd71011,52)N/m2$. The distributions of parameters were obtained from our prior investigation, and a comparatively small variability of the force coefficient *k* was reported [2]. To achieve high productivity, a certain level of MRR is oftentimes desired. Here we use the nominal MRR = *π*(*r*^{2} − (*r* − *b*)^{2}) × *f*_{0} × Ω and *r* is the radius of the workpiece. Yet, increase in the depth of cut inevitably leads to a high chance of chatter. Hence, it is indispensable to design the machining process to circumvent the chatter and simultaneously maximize the MRR.

The DDE of Eq. (1) does not admit a closed-form solution, and approximate algorithms, including Chebyshev collocation and TFEM, have been extensively adopted in literature to approximate *z*(*t*) [41]. To make this article self-contained, we elaborate on the TFEM solution for machining stability analysis. Concretely, the time delay or workpiece rotation period is divided into *E* elements, such that each element amounts to a time interval of *T*_{E} = *τ*/*E*. Then for the *n*th revolution, a polynomial representation for tool displacement in the *j*th element is given as

*ρ*(

*t*) =

*t*− (

*n*− 1)

*τ*− (

*j*− 1)

*T*

_{E}, and

*ρ*(

*t*) ∈ [0,

*T*

_{E}] represents local time in the

*j*th element in the

*n*th period.

*H*

_{i}(

*ρ*) are the cubic Hermite polynomial functions [41]:

*a*)

*b*)

*c*)

*d*)

The coefficient *a*_{ji} characterizes the displacement *z*(*t*) and velocity $z\u02d9(t)$ at the two end nodes of each temporal element (i.e., *ρ* = 0 or *ρ* = *T*_{E}). As depicted in Fig. 2 with *E* = 2 temporal elements for demonstration, the upper cuve represents the tool displacement *z*(*t*) and the lower curve represents the velocity $z\u02d9(t)$ at time *t*. $a11n\u22121$ and $a12n\u22121$ signify the displacement *z*(*t*) and velocity $z\u02d9(t)$ at the starting node (*ρ* = 0) of the first element in the (*n* − 1)th revolution; $a13n\u22121$ and $a14n\u22121$ represent *z*(*t*) and $z\u02d9(t)$ at the ending node (*ρ* = *T*_{E}) of the first temporal element in the (*n* − 1)*th* revolution. Likewise, $a21n$ and $a22n$ ($a23n$ and $a24n$) stand for *z*(*t*) and $z\u02d9(t)$ at the starting (ending) node for the second temporal element in the *n*th revolution. As the feasible solution complies with the continuity and boundary conditions, consistent *z*(*t*) and $z\u02d9(t)$ at the common node between two adjacent elements are imposed to regulate the TFEM solution, namely, $a13n\u22121=a21n\u22121$, $a14n\u22121=a22n\u22121$, as well as $a23n\u22121=a11n$ and $a24n\u22121=a12n$. In the computational implementation, a large *E* is the requisite for accurate approximation, and we use *E* = 100 in this study.

*a*

_{ji}, and this inevitably leads to the approximation residual. To minimize this approximation residual, all terms on both sides of Eq. (2) are projected onto a set of test functions Ψ

_{1}(

*ρ*) = 1 and Ψ

_{2}(

*ρ*) =

*ρ*/

*T*

_{E}− 1/2, in a Galerkin projection [42]. Therefore, discretization of the cutting tool dynamics for the

*j*th element in the

*n*th revolution is expressed as

*τ*in Eq. (1). Moreover, according to the aforementioned continuity and boundary conditions,

*j*th element, Eq. (5) here, into a global matrix equation for all the

*E*elements:

Denote *λ*_{G} as the maximum absolute eigenvalue of the transition matrix $G=N\u22121P$. According to the finite-dimensional Monodromy operator theory [6], the stability condition of Eq. (1) is *λ*_{G} < 1. This is equivalent to stability index *λ* = max(Re(log(*λ*_{G})))/*τ* < 0, which is also the criterion to gauge stability of the machining process in this study. In the deterministic case, the stability boundary is the contour of *λ* = 0 in the design space of *x* = [Ω, *b*]^{T}. Yet, the uncontrollable variables *ζ* = [*ξ*, *ω*_{n}, *k*]^{T} render the quest of stability boundary a stochastic experimental design problem. This enforces a large number of replications of TFEM according to *ζ* = [*ξ*, *ω*_{n}, *k*]^{T} at each *x* = [Ω, *b*]^{T} to derive the average response $\lambda (x)=E\zeta [\lambda (x,\zeta )]$ and quantify the intrinsic variance $V$(*x*) = Var(*λ*(*x*, *ζ*)), incurring the onerous computational cost. As shown in Sec. 5, $V$(*x*) is heterogeneous in the design space. The objective of this study is to seek the optimal machine settings (the design variables) that can achieve MRR above a certain level and, at the same time, avert the regenerative chatter in a robust sense.

## 4 Methodology

We adopt the SK surrogate to emulate the TFEM under uncertainty to capture how *λ*(*x*) varies across *x*, and an active learning scheme is further developed to seek the trade-off between the high-fidelity TFEM and low-fidelity SK. Thus, the nominal stability boundary, or the contour of *λ*(*x*) = 0, can be sequentially estimated by SK. We note that SK generates a predictive distribution for *λ*(*x*) as $\lambda ^(x)\u223cN(\lambda \xaf(x),V(\lambda (x)))$, where $\lambda \xaf(x)$ is the predictive mean and *V*(*λ*(*x*)) is the corresponding predictive variance. It bears mentioning that $V$(*x*) and *V*(*λ*(*x*)) are different. Furthermore, the intrinsic uncertainty $V$(*x*) associated with *λ*(*x*, *ζ*) entails a robust quantification of the nominal stability boundary, which necessitates further quantification of $V$(*x*). To avoid brute force replications at each *x* to estimate $V$(*x*), we fit a GP surrogate using existing TFEM simulations and MC runs and predict $V$(*x*) for design points near the nominal stability boundary estimated from SK-active learning. A CVaR measure is then imposed on those design points, and a robust boundary can be derived.

The overall flowchart for the optimal design of machining processes which seeks to simultaneously maximize the MRR and eschew the adverse regenerative chatter in a robust sense is depicted in Fig. 3. In this iterative process, design points from the initial slice of batch sampling and their MC replications at uncontrollable variables are evaluated via the TFEM, which constitutes the initial training set for SK. This statistical learning model is then adopted to predict the mean response of the stability index *λ*(*x*) at design points from the next slice. To ameliorate the prohibitive computational cost associated with TFEM and MC, we leverage the EI criterion in active learning to pick out the most informative design points in this new slice, and only those selected informative design points will be further evaluated by MC-TFEM. The evaluation results will be annexed into the training set. Therefrom, the SK model along with the estimated stability boundary is updated. This iteration continues until the estimated nominal stability boundary converges in the average sense. To ensure a robust stability boundary, we adopt risk measurement CVaR on design points close to the nominal stability boundary, which entails the valuation of the average response *λ*(*x*) and variance $V$(*x*) of the stability index *λ*(*x*, *ζ*). GP is utilized to estimate the variance $V$(*x*) for design points near the stability boundary. Therefrom, a grid search is conducted in the robust stability region to seek the design that maximizes the MRR. We note that this active learning approach, which reconciles the high-fidelity stochastic physical simulation and low-fidelity surrogate, only incurs a fraction of the computational cost for the stochastic simulation.

### 4.1 Fully-Sequential Space-Filling Design.

*C*

_{N}= {

*x*

_{1},

*x*

_{2}, …,

*x*

_{N}} with

*N*design points, which is a quasi-random fully-sequential sequence to uniformly cover a rectangular design region, albeit with poor space-filling properties. Then one design point is removed from

*C*

_{N}at each stage, and we denote the sequence of indices for the removed points as {

*i*

_{N},

*i*

_{N−1}, …,

*i*

_{1}}.

*i*

_{N}is the index for the first removed point and

*i*

_{1}is for the last removed one. At each stage

*m*=

*N*− 1,

*N*− 2, …, 0, we have the design points $Sm={xi1,\u2026,xim}$ with index set

*I*

_{m}= {

*i*

_{1}, …,

*i*

_{m}}. Thus, a sequence of nested design points with size from 1 to

*N*

_{c}are formed as $S1\u2282S2\u2282\u2026\u2282SNc\u2282CN$. At the core of FSSF-b, given

*I*

_{m+1}and

*S*

_{m+1}from stage

*m*+ 1, we seek to find the index

*i*

_{m+1}∈

*I*

_{m+1}for the next point $xim+1\u2208Sm+1$ to remove to form $Sm=Sm+1\u2216xim+1$ for each stage

*m*. The criterion to find the index

*i*

_{m+1}is the maximum of the minimal pairwise distance among the design

*S*

_{m+1}, i.e.,

*a*)

*b*)

*d*(

*x*

_{j},

*x*

_{k}) =

*x*

_{j}−

*x*

_{k}is the Euclidean distance between point

*x*

_{j}and

*x*

_{k}. The removed points are recorded reversely to explicate the sequential addition of design points. Generally, a large size

*N*for the Sobol sequence is required for achieving a number

*N*

_{c}of space-filling design points, and the relationship is empirically expressed as

*N*= 1000

*d*

_{m}+ 2

*N*

_{c}according to [43], where

*d*

_{m}is the dimensionality of the design points. Denote the

*N*

_{c}space-filling design points as $B={xi1,\u2026,xiNc}$, then the sequential batch sampling is given by design points on each of the

*n*slices as $B1={xi1,\u2026,xi\phi}$, $B2={xi\phi +1,\u2026,xi2\xd7\phi}$,…,$Bk={xi(k\u22121)\xd7\phi +1,\u2026,xik\xd7\phi}$,

*k*= 1, …,

*n*. Here,

*φ*=

*N*

_{c}/

*n*is the number of design points in each slice, and

*n*is the total number of slices. For demonstration purposes, we showcase the procedure of sequential batch sampling in Fig. 4, with

*φ*= 20.

### 4.2 Stochastic Kriging.

*x*= [Ω,

*b*]

^{T}. Therefore, for

*x*

_{i}= [Ω

_{i},

*b*

_{i}]

^{T}the simulation output at the

*j*

^{th}replication can be estimated as

*λ*

_{j}(

*x*

_{i}) =

*λ*(

*x*

_{i},

*ζ*

_{j}) denotes the stability index from TFEM in the

*j*th replication.

*f*is a predefined basis function that captures the overall trend (e.g.,

*f*(

*x*) = [1,

*x*,

*x*

^{2}, …]

^{T}), and

*β*represents the basis coefficients.

*M*is a zero-mean Gaussian random field and encapsulates the extrinsic uncertainty, characterized by a covariance function $Cov(x,x\u2032;L)=\sigma f2exp(\u221212(x\u2212x\u2032)L\u22121(x\u2212x\u2032)T)$, where $\sigma f2$ is the signal variance and

*L*is the diagonal length-scale matrix.

*ɛ*

_{j}(

*x*

_{i}) ∼ ℕ(0, $V$(

*x*

_{i})). $V$(

*x*

_{i}) is the intrinsic variance, and it is independent of the extrinsic uncertainty induced by

*M*. With

*N*

_{s}design points

*X*= [

*x*

_{1}, …,

*x*

_{Ns}]

^{T}and

*n*

_{i}replications at

*x*

_{i},

*i*= 1, 2, …,

*N*

_{s}, the response, namely, the average stability index

*λ*(

*x*) and the intrinsic variance $V$(

*x*), are acquired from TFEM and MC runs:

*a*)

*b*)

*x*

_{0}is expressed in terms of its first two moments:

_{ɛ}=

*diag*{$V$(

*x*

_{i})/

*n*

_{i}} is the intrinsic uncertainty matrix and $F=[f(x1),\u2026,f(xNs)]T$. The hyperparameters $\theta =[\beta ,\sigma f2,L]$ can be trained by the maximum log likelihood [7,49]. It is noteworthy that in SK, we estimate the mean of the stability index $\lambda ^(x0)$, rather than the stability index itself

*λ*

_{j}(

*x*

_{i}). Whereas the variance of

*λ*

_{j}(

*x*

_{i}) is irreducible as shown in Eq. (11), entries in the intrinsic variance matrix tend to vanish with a large number of replications.

### 4.3 Active Learning.

*λ*(

*X*) and $V$(

*X*), we obtain the predictive distribution of $\lambda ^(x)$ according to Eqs. (13) and (14) from SK. The nominal stability boundary is the contour of

*λ*(

*x*) = 0. Intuitively, design points far away from this nominal boundary barely contribute to refining the estimation. Therefore, active learning is promising to single out the most informative design points that will be further evaluated by the simulation oracle or the TFEM simulator with MC runs in this study. We adopt the EI criterion on estimation for sampling points. Specifically, new sampling points in each FSSF-b slice are fed into SK to estimate their response $\lambda \xaf(x)$ and the uncertainty

*V*(

*λ*(

*x*)), and then only the most informative ones according to the EI criterion will be further evaluated by the TFEM and MC runs to obtain

*λ*(

*x*) and $V$(

*x*). Those selected points along with their responses will be annexed into the training set to update the SK model and the nominal boundary. The EI criterion [47,50,51] involves a utility function at a new sampling point

*x*

_{0}:

*γ*

_{min}= min(|

*λ*

_{0}−

*λ*|), $\gamma e=|\lambda 0\u2212\lambda ^(x0)|$, and

*λ*

_{0}= 0 is the target value. $\lambda =[\lambda (x1),\u2026,\lambda (xNs)]T$ signifies the set of

*λ*(

*x*) computed from TFEM. $\lambda ^(x0)$ is the estimated stability index for input

*x*

_{0}from SK. Considering the uncertainty associated with SK prediction, the EI is defined as:

*x*

_{0}.

*ϕ*(

*γ*) and $\Phi (\gamma )$ are the probability density function and cumulative distribution function for the standard normal distribution, respectively. Intuitively, the EI tends to select

*x*

_{0}with mean response $\lambda \xaf(x0)$ close to the target

*λ*

_{0}or with large variance. Fifteen points with the highest EI scores are selected from each slice

*B*

_{k}in this study, and they will be annexed into the existing training set to update the design points and intrinsic variance. This iterative process is halted when the estimated nominal stability boundary converges. Here, we evaluate the convergence of the hyperparameters $\theta =[\beta ,\sigma f2,L]$ of SK. If the change of

*θ*is within 1%, then both the SK model and the fitted nominal boundary converge. For convenience, we denote the set of all design points selected for TFEM and MC as

*X*

_{G}.

### 4.4 Gaussian Process.

Yet, owing to the intrinsic uncertainty, the machining process may still lose stability for design points below the nominal boundary from the sequential SK algorithm. Thereby, it is necessary to assess the intrinsic uncertainty $V$(*x*) for design points in the vicinity of the nominal boundary to seek a more reliable boundary, which can be attained from TFEM and MC runs in the same way as we assemble the training set for SK. To avoid this laborious process, we draw on GP to predict the intrinsic variance for un-investigated design points in this vicinity, based upon the existing TFEM simulation and MC runs.

*x*) is the variance of

*λ*(

*x*,

*ζ*) at design point

*x*= [Ω,

*b*]

^{T}associated with uncontrollable variables

*ζ*= [

*ξ*,

*ω*

_{n},

*k*]

^{T}.

*f*

_{G},

*β*

_{G}, and

*M*

_{G}are similar to those terms defined in SK. Thus, given the training set $XG=[x1,\u2026,xNG]T$ and $VG=[V(x1),\u2026,V(xNG)]T$, the predictive distribution of the response at a test point

*x*

_{0}is given in terms of the first two moments

Here, $\Sigma MG\u2208RNG\xd7NG$ is the extrinsic covariance matrix with entity $\Sigma MG(i,j)=cov(MG(xi),MG(xj))$, and $\Sigma MG(x0,XG)=[cov(MG(x0),,\u2026$$MG(x1)),\u2026$. $cov(MG(x0),MG(xNG))]T$. $FG=[fG(x1),\u2026,fG(xNG)]T$. We use the first moment $V\xaf(x0)$ to characterize the variance of stability index *λ*(*x*_{0}, *ζ*), at a design point *x*_{0}.

### 4.5 Conditional Value-at-Risk.

*λ*(

*x*) = 0. With the uncontrollable variables

*ζ*= [

*ξ*,

*ω*

_{n},

*k*]

^{T},

*λ*(

*x*,

*ζ*) ∼ ℕ(

*λ*(

*x*), $V$(

*x*)), where $\lambda (x)=E\zeta [\lambda (x,\zeta )]$ and the intrinsic variance $V$(

*x*) = Var(

*λ*(

*x*,

*ζ*)). We define the probability of instability as

*p*

_{c}=

*P*[

*λ*(

*x*,

*ζ*) ≥ 0] at design point

*x*= [Ω,

*b*]

^{T}during the machining process. Given a target reliability or confidence level

*α*, the probability of instability

*p*

_{c}≤ 1 − α is the shaded area under the distribution density curve of

*λ*(

*x*,

*ζ*) in Fig. 5. In this illustrative example, we set $\alpha =99%$ and

*x*= [4.53 × 10

^{3}, 2.22 × 10

^{−1}]

^{T}, then the predicted stability index

*λ*(

*x*) from SK is $\lambda \xaf(x)=\u221213.4088$, and the predicted variance of

*λ*(

*x*,

*ζ*) from GP is $V$(

*x*) = 5.0876. The

*α*-quantile

*Q*

_{α}(

*λ*(

*x*,

*ζ*)) of

*λ*(

*x*,

*ζ*) at

*x*is given as

*λ*(

*x*,

*ζ*). The CVaR denoted as $Q\xaf\alpha (\lambda (x,\zeta ))$ is

*λ*(

*x*,

*ζ*) −

*Q*

_{α}(

*λ*(

*x*,

*ζ*))]

^{+}= max(0,

*λ*(

*x*,

*ζ*) −

*Q*

_{α}(

*λ*(

*x*,

*ζ*))). Following this, the CVaR can be derived at each design point

*x*= [Ω,

*b*]

^{T}, and the reliable stability regime of the machining process is $R={x|Q\xaf\alpha (\lambda (x,\zeta ))<0}$.

## 5 Numerical Result

We first conduct a full factorial design of 100 levels of spindle speed Ω ∈ [1.5 × 10^{3}, 7.5 × 10^{3}] round/min and depth of cut *b* ∈ [0.0, 0.4]mm. At each of the 100 × 100 design points of *x* = [Ω, *b*]^{T}, we enlist 500 crude MC runs of the TFEM for the uncontrollable variables *ζ* = [*ξ*, *ω*_{n}, *k*]^{T} to gauge the intrinsic uncertainty $V$(*x*). As depicted in Fig. 6, the uncontrollable variables induce heterogeneous variance $V$(*x*) of the response *λ*(*x*, *ζ*) across the design space. This further justifies the SK to accommodate the variance heterogeneity.

The true nominal stability boundary *λ*(*x*) = 0 attained from the MC runs at those 10,000 design points is showcased as the solid curves in Fig. 7, which, however, is prohibitively expensive. Next, the SK-active learning framework is called upon to ameliorate the computational expense. We start the initial slice with *N*_{s} = 100 space-filling points from the LHD, evaluate those design points with the TFEM and 500 MC runs to derive *λ*(*x*) and $V$(*x*), and train the SK and GP. This leads to a rough estimate of the nominal stability boundary. For the sequential batch sampling, a set *B* of *N*_{c} = 10,000 space-filling design points across *n* = 10 slices are included, and their responses are estimated from SK. Hence, the active learning algorithm selects the top 15 informative design points among *N*_{c}/*n* = 1,000 space-filling design points in each slice *B*_{k}, *k* = 1, …, 10 via the EI criterion. Those informative points are further fed into the TFEM and MC runs and then annexed into the training dataset to train SK and GP. Therefrom, the SK and estimated boundary are updated. We halt this iterative process after *n* = 10 slices since the estimated boundary converges.

The sequential estimate of the nominal stability boundary via SK-active learning at different slice numbers is shown in Fig. 7. The dots above the dashed line symbolize the informative design points in the unstable region, whereas the ones below the dashed line represent the informative design points in the stable region. The dashed curve is the predictive stability boundary in each slice, which approaches the true boundary (the solid curve) with the addition of new slices. To discern the subtle difference in the boundary estimation at the 6th and 10th slices, we further zoom into Figs. 7(c) and 7(d) and manifest the local contour in Figs. 7(e) and 7(f), respectively. Therefore, with only 100 + 15 × 10 = 250 design points, the estimated stability contour derived from this novel active learning framework is congruous with the nominal contour obtained from the crude MC with 10,000 design points.

As aforementioned, the SK-active learning framework only derives the nominal contour *λ*(*x*) = 0. Design points below this nominal boundary could still lead to unstable dynamics owing to the intrinsic uncertainty. In an attempt to certify the stochastic optimal design, it is imperative to appraise the risk and quantify the variance $V$(*x*) at design points on the underside in the vicinity of the estimated stability boundary. To shun the computation of $V$(*x*) via the crude MC, we apply GP, trained on the same training dataset in SK-active learning, to predict $V$(*x*) for design points in this critical proximity. To demonstrate the accuracy of variance prediction, 300 design points are randomly selected from the underside vicinity, and the comparison of predictive $V$(*x*) from GP and the valuation from the onerous TFEM and MC runs is illustrated in Fig. 8. A high predictive accuracy registered here suggests that the predictive variance can be adopted for the CVaR formulation. Consequently, the CVaR is utilized to acquire the reliable boundary under the 99% reliability constraint, shown as the blue curve in Fig. 9. That said, we can safely select the design points below this reliable stability boundary to ensure the avoidance of chatter with a probability of 99%. The dashed line here is the stability boundary estimated by the SK-active learning the same as those in Fig. 7.

To verify the robust stability boundary, we randomly select three points in the design space and evaluate the response according to the tedious TFEM and 500 MC runs of the uncontrollable variables. We also indicate the probability of instability *P*[*λ*(*x*, *ζ*) ≥ 0] at each verifying point. As depicted in Fig. 9, point A (the square) is just above the nominal stability boundary registers *P*[*λ*(*x*, *ζ*) ≥ 0] = 0.72, point B (the triangle) residing between the nominal and robust stability boundary showcases *P*[*λ*(*x*, *ζ*) ≥ 0] = 0.36, and point C (the star) below the robust stability boundary notches *P*[*λ*(*x*, *ζ*) ≥ 0] = 0.00. This underscores the necessity to include intrinsic uncertainty to identify the robust stability boundary. Moreover, we test the running time for crude MC and our novel active learning framework on a computer with a processor of i7-8700 CPU, 3.7 GHz, and six cores. It takes 365.87 min for the crude MC to estimate the stability boundary, whereas it only costs 12.71 min in our approach. Thus, our novel approach slashes over 95% of the computational overhead required by the crude MC.

Next, we evaluate MRR at each design point within the robust stability region and identify the maximal MRR. As illustrated in Fig. 9, at point D with Ω = 5984.85 rounds/min and *b* = 0.35 mm (the diamond), the maximal MRR of $3.26\xd7104mm3/min$ is achieved. Further, a baseline MRR is often needed in engineering practice. For demonstration, we set the baseline for MRR as $2.50\xd7104mm3/min$, and then seek the design that achieves an MRR above this baseline, showcased as the shaded area in Fig. 9. This gives the operators leeway to choose from a set of machine settings that avoid the adverse chatter and at the same time achieves an MRR above this baseline.

## 6 Discussion and Conclusions

In the wake of the ongoing COVID-19 pandemic, the chip shortfall is exacting huge economic tolls on a variety of industries. This underscores the scaleup manufacturing of chips, which necessitates optimal design for quality assurance. In this present work, we investigate the robust optimal design of machining processes, one key operation in chip fabrication, to maximize the MRR and eschew the regenerative chatter, considering the inherent uncertainty. While computer simulations have been widely used to locate the stability boundary in machining, they generally incur a huge computational cost, not to mention the replications to account for the uncertainty. We adopt an active learning framework that integrates the stochastic surrogate SK with the high-fidelity yet tedious TFEM and crude MC runs to predict the stability index at querying design points. The EI criterion is called upon to sequentially identify the critical design points that are informative to locate the stability boundary, and only those critical design points are evaluated via the TFEM. Following this, a GP surrogate is utilized to predict the variance of the stability index at design points below the nominal stability boundary, and CVaR is used to ensure a robust solution. A maximal MRR of $3.26\xd7104mm3/min$ is achieved, and a set of reliable optimal designs that attain a baseline MRR of $2.50\xd7104mm3/min$ are also derived in the illustrative example. We emphasize that the stability boundary is sequentially identified in a cost-effective manner, taking into account the uncontrollable variables *ζ*. Whereas numerous studies have been reported in decision boundary approximation in a stochastic setting, including efficient global reliability analysis [57,58] and SVM-based limit state analysis [59–62], very few works consider intrinsic uncertainty, particularly in the study of machine stability. The stability boundary is oftentimes bumpy and difficult to fit compared with other reliability boundaries in those existing works. We hope our novel approach can be used in a variety of machining operations (e.g., nano/micromachining) and stimulate the interest in stochastic optimal design in the broad community. We highlight the following implications:

The intrinsic variance attributed to the uncontrollable variables

*ζ*provokes the robust design problem. TFEM and MC run at different realizations of*ζ*are used to establish the ground truth. To ameliorate the computational overhead, SK, as opposed to the kriging or Gaussian process, is utilized to emulate the high-fidelity TFEM and MC runs. It is noted that in SK, we are modeling the average stability index $\lambda (xi)=1/ni\u2211j=1ni\lambda j(xi)$ rather than the stability index*λ*_{j}(*x*_{i}).Identification of machining stability boundary is essentially a binary classification problem. In the deterministic case, both GP [7,63–65] and SVM have been adopted to approximate the decision boundary. As reported in our earlier work, SVM fails to capture the bumpy stability boundary in a sequential framework [7]. GP has mostly been used for deterministic problems. Very few, if any, existing works consider the active learning-driven sequential design under uncertainty. Hence, we compare the result from our innovative approach to the ground truth given by the TFEM and MC runs.

While we corroborate the proposed approach on a one-dimensional example, it can be easily extended to other high-dimensional machining operations (e.g., five-axis milling). Such high-dimensional processes typically incur enormous computational costs to derive the stability boundary via the high-fidelity simulation, and it is also cumbersome to explore the design space through active learning. Therefore, active learning is imperative for the optimal design for such complex processes.

## Acknowledgment

This work is supported by the National Science Foundation (Award Number 2119334) of the United States and the Smart Energy Transdisciplinary Area of Excellence at Binghamton University.

## Author’s Statement of Original Work

This work has been neither published nor submitted for publication, in whole or in part, either in a serial, professional journal, or as a part of a book that is formally published and made available to the public.

## Conflict of Interest

The authors claim that there is no conflict of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.