Reliability analysis involving high-dimensional, computationally expensive, highly nonlinear performance functions is a notoriously challenging problem in simulation-based design under uncertainty. In this paper, we tackle this problem by proposing a new method, high-dimensional reliability analysis (HDRA), in which a surrogate model is built to approximate a performance function that is high dimensional, computationally expensive, implicit, and unknown to the user. HDRA first employs the adaptive univariate dimension reduction (AUDR) method to construct a global surrogate model by adaptively tracking the important dimensions or regions. Then, the sequential exploration–exploitation with dynamic trade-off (SEEDT) method is utilized to locally refine the surrogate model by identifying additional sample points that are close to the critical region (i.e., the limit-state function (LSF)) with high prediction uncertainty. The HDRA method has three advantages: (i) alleviating the curse of dimensionality and adaptively detecting important dimensions; (ii) capturing the interactive effects among variables on the performance function; and (iii) flexibility in choosing the locations of sample points. The performance of the proposed method is tested through three mathematical examples and a real world problem, the results of which suggest that the method can achieve an accurate and computationally efficient estimation of reliability even when the performance function exhibits high dimensionality, high nonlinearity, and strong interactions among variables.

## Introduction

*x*

_{1},…,

*x*)

_{d}^{T}is a vector of

*d*random variables;

*f*(

**x**) is the joint probability distribution function (PDF) of

**x**; and $\Omega S$ denotes the safety region, defined as $\Omega S$ ={

**x**:

*G*(

**x**) < 0} with

*G*(

**x**) being the performance function (or response). The boundary,

*G*(

**x**) = 0, that separates the safety and failure regions is known as the limit-state function (LSF). In engineering practice, the increasing dimension and complexity of the performance function often make the solution of Eq. (1) computationally intractable. An alternative solution in light of this is to perform the direct Monte Carlo simulation (MCS), where the random variable (

**x**) space is randomly discretized to a large number of samples, $xk,k=1:N,$ based on the joint PDF of

**x,**$fx$. Then, the reliability can be estimated based on these random samples as

where the system safety indicator $I\Omega S$ equals 1 (0) given a safety (failure) state. Although the direct MCS can yield an accurate reliability estimate, achieving satisfactory accuracy often requires a large number of *G* function evaluations [2]. A popular alternative to the direct MCS is to use a surrogate model, $G\u0302(x)$, which is built with a much smaller number of sample points, to approximate the original performance function, $G(x)$ [3]. A large number of surrogate modeling methods have been proposed in the literature, among which our review is mainly focused on two categories of these methods: high-dimensional model representation (HDMR)-based methods [4–19] and kriging-based methods [20–24].

The HDMR-based methods simplify the construction of the surrogate model by decomposing a high-dimensional performance function into several low-dimensional functions (or low-order component functions) [4,5]. In most practical applications, the response of an engineered system can be largely determined by the low-order interactions among the input variables, which makes HDMR an attractive option in these applications. HDMR was initially applied for efficient multivariate representations in chemical system modeling [6–8]. Numerical experiments showed that HDMR-based methods are promising for simulation-based design of high-dimensional systems [9]. Two of the primary and well-known HDMR-based methods are univariate dimension reduction (UDR) and bivariate dimension reduction (BDR), which get their names on the basis of whether the performance function is decomposed into first- or second-order component functions [10,11]. Each component function can be approximated with an interpolation technique, and a global surrogate model of the performance function can then be built based on the summation of the approximated component functions. The HDMR-based methods thereafter have been further developed for the application in an expanded range of areas (e.g., design optimization [12], metamodeling [13], and reliability analysis [14]). In design optimization, Li et al. [12] integrated HDMR with an expected improvement sampling strategy that compensates for the underestimation of the response uncertainty by the kriging predictor. The resulting HDMR-based method achieved satisfactory search performance in deterministic optimization. In metamodeling, Foo and Karniadakis [13] proposed the combined use of HDMR and stochastic collocation, where HDMR is first used to decompose a high-dimensional function into several low-dimensional functions, and multi-element probabilistic collocation is then used to approximate these low-dimensional functions. In reliability analysis, the random input variables could be statistically dependent and in the form of intervals. Xie et al. [14] proposed an efficient interval analysis algorithm based on the HDMR expansion to handle-dependent interval input variables in reliability analysis. Hu and Youn [15] proposed the adaptive dimension decomposition and reselection method that can automatically identify potentially important low-order component functions and adaptively reselect the truly important ones. Li et al. [16] coupled HDMR with an intelligent sampling strategy for building global surrogate models on nonlinear problems. To integrate HDMR with the sampling strategy, a projection-based intelligent method was introduced to locate the sample points along the corresponding cuts (lines, planes, or hyperplanes of high orders). This method treats all dimensions equally and is able to track the nonlinear regions in the input space. The component functions in HDMR, resulting from the decomposition of a performance function, can also be approximated by a family of linearly independent basis functions and then metamodeling or reliability analysis can be performed over the approximated component functions [17–19]. The basis functions, such as orthogonal polynomials and cubic splines, are defined as uni- or multivariate functions in a way that a proper combination of these basis functions can reflect the true behavior of the performance function. Hajikolaei and Wang [17] proposed the integration of principal component analysis with HDMR for finding the coefficients of orthogonal basis functions that provide the best linear combination of the bases with minimum error. Liu et al. [18] proposed the generalized radial basis function-based HDMR method, which employs the virtual regular points, projected from the random points, to update the predictions by the basis functions. The radial basis function-based HDMR method was further developed by using a recently proposed proportional sampling strategy that allows adding first-order sample points to efficiently construct each component basis function [19].

Given the negligibly weak high-order interactions, HDMR-based methods with low-order component functions would provide an efficient and accurate approximation of the performance function. However, several limitations of these HDMR-based methods do exist: (i) they usually consider only the low-order interactive effects among input variables; (ii) increasing the order of component functions may require a dramatic increase in the number of function evaluations (e.g., building the first-order component functions for a *d*-dimensional performance function only needs *nd* + 1 function evaluations with *n* + 1 equal to the number of sample points along each dimension; however, building the *m*th-order component functions with *m* ≥ 2 requires $dmnm$ function evaluations^{1}); and (iii) most of these methods build a surrogate of the performance function based on the function values determined at a set of structured and predefined grid points, thus lacking the flexibility in choosing the locations of the sample points.

Kriging-based methods first construct an initial surrogate model of the performance function with an initial set of observations (e.g., at a set of sample points generated by Latin Hypercube sampling), and then continuously refine the surrogate model by identifying and adding new sample points until adequate accuracy is achieved [20–29]. Two well-known kriging-based reliability analysis methods are efficient global reliability analysis and maximum confidence enhancement, both of which consider the prediction uncertainty and LSF proximity in selecting a new sample point [20–22]. A recently developed method, namely, sequential exploration–exploitation with dynamic trade-off (SEEDT), introduces an exploration–exploitation trade-off coefficient that adaptively weighs exploration and exploitation throughout sampling process to achieve a more optimal balance between these two criteria [23,24]. Two unique features that kriging-based methods possess are: (i) they are flexible in choosing the location of a new sample point, which enables these methods to sample in an optimized sequence; and (ii) they provide a probabilistic surrogate model that quantifies both the prediction of a performance function and its uncertainty. With these features, kriging-based methods have been successful in reliability analysis involving system performance functions that are computationally expensive to evaluate, and implicit and unknown to the user. Additionally, kriging-based surrogate modeling has also been successfully applied in real-time processing of multisensor signals for structural health monitoring [25]. However, these success stories have been on problems of low (typically < 10) dimension. When it comes to high-dimensional problems, the efficiency of these methods diminishes dramatically. It has been reported in the machine learning area that kriging does not scale to high-dimensional problems [30,31]. This severely limits the application of kriging to reliability analysis of systems that involve the use of complex simulation models with large dimensions (e.g., > 30 parameters) and expensive simulations (e.g., hours of run-time for numerically solving differential equations in one simulation).

In this paper, we tackle the above challenges by addressing efficient reliability analysis involving high-dimensional, computationally expensive, highly nonlinear performance functions. Due to the above mentioned limitations, neither HDMR- nor kriging-based methods are capable of providing an accurate and computationally efficient solution for these challenging problems. However, we find that appropriately modifying these two methods and optimally combining their strengths can lead to the development of a feasible solution that is capable of (i) capturing the strong interactions among variables; (ii) alleviating the curse of dimensionality; and (iii) flexibly choosing the locations of new sample points. Built upon this finding, this study proposes a new method for reliability analysis, named as high-dimensional reliability analysis (HDRA). The proposed HDRA method decomposes the task of surrogate model construction for reliability analysis into two sequential steps: (i) adaptive univariate sampling (with UDR and SEEDT) for building a global surrogate model; and (ii) kriging-based multivariate sampling (with SEEDT) to refine the global surrogate model in highly uncertain regions close to the LSF. At the first step, a newly proposed method, named as adaptive UDR (AUDR), adaptively locates significant univariate sample points by decomposing the important multivariate points identified by SEEDT. A UDR-based global surrogate model is then built based on these univariate sample points. At the second step, the global surrogate model is used as the trend function in the kriging model, and SEEDT further refines the surrogate model by sequentially locating critical multivariate sample points in highly uncertain regions close to the LSF.

The reminder of this paper is organized as follows: Section 2 gives a brief review on two related methods, UDR and SEEDT. Section 3 introduces the proposed HDRA method, a hybrid of AUDR and SEEDT, for reliability analysis. The effectiveness of the proposed method is demonstrated using four mathematical and engineering examples in Sec. 4. Section 5 provides several concluding remarks.

## Review of Previous Methods

### Univariate Dimension Reduction.

*d*-dimensional performance function can be decomposed in a hierarchical manner as [11]

*G*(

_{i}*x*) is the first-order component function and captures the effect of a single variable

_{i}*x*on the performance function [11]. The interactive effects among variables are represented by the rest of the terms which are the second- and higher-order component functions. Considering only the zeroth- and first-order component functions leads to the definition of the UDR method, which is described by [10]:

_{i}where $\u03f5(x)$ is the truncation error of the univariate approximation, caused by dropping the interactive effects among variables on the response. Table 1 describes explicitly the UDR method in a pseudo-code. The algorithm starts by generating 3 or 5 sample points along each dimension (line 2). After evaluating the performance function at the sample points, each component function $Gi(xi)$ is approximated based on the cubic spline interpolation (line 5). At the end, the global surrogate model is built based on Eq. (4) (line 7).

Algorithm 1: Univariate dimension reduction |

1 fori = 1: ddo |

2 Initialize sample points with an initial set of univariate points: |

$xi,jj=1:3=\mu i,\mu i\xb13\sigma i$ for n = 2, or |

$xi,jj=1:5={\mu i,\mu i\xb11.5\sigma i,\mu i\xb13\sigma i}$for n = 4 |

3 Observe the performance function at $xi,j$: |

$yi,j=G(x=(xi,j\u2212\mu i)\xb7ei+\mu )$ |

4 Build a sample data set for surrogate modeling: |

$Di={(xi,j,yi,j)}j=1:n+1$ |

5 Approximate the ith first-order component function with cubic spline interpolation: |

$G\u0302i(xi)=Gspline(xi|Di)$ |

6 end for |

7 Build the global surrogate model: |

$G\u0302x=G0+\u2211i=1dG\u0302ixi$ |

8 Perform MCS on the approximated performance function for reliability estimation |

Algorithm 1: Univariate dimension reduction |

1 fori = 1: ddo |

2 Initialize sample points with an initial set of univariate points: |

$xi,jj=1:3=\mu i,\mu i\xb13\sigma i$ for n = 2, or |

$xi,jj=1:5={\mu i,\mu i\xb11.5\sigma i,\mu i\xb13\sigma i}$for n = 4 |

3 Observe the performance function at $xi,j$: |

$yi,j=G(x=(xi,j\u2212\mu i)\xb7ei+\mu )$ |

4 Build a sample data set for surrogate modeling: |

$Di={(xi,j,yi,j)}j=1:n+1$ |

5 Approximate the ith first-order component function with cubic spline interpolation: |

$G\u0302i(xi)=Gspline(xi|Di)$ |

6 end for |

7 Build the global surrogate model: |

$G\u0302x=G0+\u2211i=1dG\u0302ixi$ |

8 Perform MCS on the approximated performance function for reliability estimation |

### Sequential Exploration–Exploitation With Dynamic Trade-Off.

The SEEDT method first builds an initial surrogate model with kriging and then sequentially refines the model. For the updating of the surrogate model, expected utility is used as the acquisition function to locate the next sample point. The sample points are sequentially chosen according to a new exploration–exploitation scheme that adaptively weighs exploring the region with high prediction uncertainty and exploiting the probable location of the LSF throughout the sampling process [23]. The exploration criterion ensures that the sequential sample points would not cluster in one region because the algorithm in SEEDT explores the whole input space for selecting the new sample points. In other words, if the algorithm places a number of sample points in one nonlinear area, the acquisition function will tend to guide the exploration to other nonlinear areas with higher prediction uncertainty. The sequential sampling procedure of this method is listed in Table 2. The algorithm starts by generating several initial Latin hypercube points using the Latin hypercube sampling technique. The initial surrogate model is built with kriging upon the initial sample points (line 1). At each iteration, a new sample point is located by maximizing the expected utility function *EU* (lines 3–6). The surrogate model is then updated with the augmented data set (lines 7, 8).

Algorithm 2: Sequential exploration-exploitation with dynamic trade-off |

1 Construct an initial surrogate model of the performance function G(x) with initial data set $D0$ |

2 fort = 1: Tdo |

3 Compute the exploration-exploitation trade-off (EET) coefficient, α^{t} |

4 Construct the acquisition function EU over x |

5 Locate the new sample point $xt$ that maximizes EU: |

$xt=argmaxEU(x|Dt\u22121)$ |

6 Observe the performance function at $xt$: |

$\u2003yt=G(xt)$ |

7 Supplement the data set with the new data: |

$Dt=Dt\u22121\u222a{xt,yt}$ |

8 Refine the surrogate model according to the updated data set $Dt$ |

9 end for |

10 Evaluate the reliability with the surrogate model by performing MCS |

Algorithm 2: Sequential exploration-exploitation with dynamic trade-off |

1 Construct an initial surrogate model of the performance function G(x) with initial data set $D0$ |

2 fort = 1: Tdo |

3 Compute the exploration-exploitation trade-off (EET) coefficient, α^{t} |

4 Construct the acquisition function EU over x |

5 Locate the new sample point $xt$ that maximizes EU: |

$xt=argmaxEU(x|Dt\u22121)$ |

6 Observe the performance function at $xt$: |

$\u2003yt=G(xt)$ |

7 Supplement the data set with the new data: |

$Dt=Dt\u22121\u222a{xt,yt}$ |

8 Refine the surrogate model according to the updated data set $Dt$ |

9 end for |

10 Evaluate the reliability with the surrogate model by performing MCS |

**x**, and $\mu G\u0302$ and $\sigma G\u0302$ are the mean and standard deviation, respectively, of the prediction derived from the kriging model. The kriging model builds a Gaussian process using the residual

*Z*over the trend function

*m*(

**x**), which can be expressed as [26]

*G*(

**x**) is captured by $m(x)$, while $Z$ predicts the process noise. Given a sample data set $Dt={x1,y1,\u2026,xt,yt}$, the correlation matrix $\Psi $ is expressed as

In both kernel functions, $\theta $ is the hyper-parameter vector that determines the smoothness of the prediction. The maximum likelihood estimation technique is performed to estimate these hyper-parameters. Figure 1 schematically illustrates the correlation between two arbitrary points $xandx\u2032,$using both the squared exponential kernel (*k*_{se}) and additive kernel (*k*_{add}) functions with $\theta =1,1T$ in a two-dimensional (2D) space. It can be observed that *k*_{se} is related to the effect of the direct bivariate distance, *l*, between $xandx\u2032$, while *k*_{add} corresponds to the sum of the effects of the decomposed univariate distances, *l*_{1} and *l*_{2}, between these two points.

**x**. The mean and standard deviation of the prediction are expressed, respectively, as [26]

where $y=y1,\u2026,ytT$ is a vector of *t* responses, $M$ is the trend function value at sample points {$x1,\u2026,xt$}, and $r(x)=\psi (x,x1),\u2026,\psi (x,xt)T$ is a correlation vector [26].

## High-Dimensional Reliability Analysis

The overall flowchart of the proposed HDRA method is shown in Fig. 2. The method is composed of two sequential steps. At the first step, AUDR adaptively identifies significant univariate sample points by first identifying important multivariate sample points with SEEDT and then decomposing the multivariate points into univariate points along all active dimensions. This step builds a global surrogate model with Eq. (4) while ensuing satisfactory approximation accuracy of the univariate component function along each dimension. Upon the completion of adaptive univariate sampling in AUDR, the global surrogate model is used in step 2 as the trend function in the kriging model, and SEEDT further refines the surrogate model by sequentially locating critical multivariate sample points in regions near the LSF and with high prediction uncertainty. Table 3 describes explicitly the procedure of HDRA. Steps 1 and 2 are described in lines 1–14 and 15–24 of the pseudo-code, respectively. In what follows, the two main steps are explained in further detail.

Algorithm 3: High-dimensional reliability analysis |

Step 1: Building Global Surrogate with AUDR |

1 Initialize active index vector: |

$I=ones(d,1)$ |

2 Initialize sample points with an initial set of univariate points: |

${xi,j0}j=1:3={\mu i,\mu i\xb13\sigma i}$, for n = 3, or |

${xi,j0}j=1:5={\mu i,\mu i\xb11.5\sigma i,\mu i\xb13\sigma i}$, for n = 5 |

3 Observe the performance function at the initial univariate points: |

$yi,j0=Gi(x=(xi,j0\u2212\mu i)\xb7ei+\mu )$ |

4 Build an initial sample data set for surrogate modeling: |

$D0=(xi,j0,yi,j0)i=1:d,j=1:n+1$ |

5 t = 1 |

6 whilet < T & I ≠ 0 do |

7 Update the active index vector I and the surrogate model: |

$[I,G\u0302AUDR]=update(Dt\u22121)$ |

8 Construct the kriging model of $G(x)$ based on $Dt\u22121$: |

$G\u0302x=GPxG\u0302AUDR,\Psi ,Dt\u22121$ |

9 Identify the new multivariate point $xt$ that maximizes the acquisition function EU: |

$xt=argmaxEU(x|Dt\u22121)$ |

10 Generate univariate sample points according to I: |

$xit=(xt\u2212\mu ).ei+\mu i$ if $Ii=1$ |

11 Observe the performance function at $xit$: |

$yit=Gxit=xit\u2212\mu i\xb7ei+\mu $ |

12 Supplement the data set with the new data: |

$Dt$ = $Dt\u22121\u222axit,yit1\u2264i\u2264d|Ii=1$ |

13 t = t +1 |

14 end while |

Step 2: Refining Global Surrogate with SEEDT |

15 Build an initial kriging model based on $Dt$ from Step 1: |

$G\u0302x=GPxG\u0302AUDR,\Psi ,Dt$ |

16 $k=1$ |

17 whilek < K & CE > CE_{th}do |

18 Select the new multivariate point $xt+k$ that maximizes EU: |

$xt+k=argmaxEUxDt+k\u22121$ |

19 Observe the performance function at $xt+k$: |

$yt+k=G(xt+k)$ |

20 Supplement the data set with the new data: |

$Dt+k=Dt+k\u22121\u222a{xt+k,yt+k}$ |

21 Update the kriging model with $Dt+k:$ |

$G\u0302x=GP(x|G\u0302AUDR,\Psi ,Dt+k)$ |

22 Evaluate the convergence estimator (CE): |

$CE=\u222b|G(x)|<a\sigma (x))fx(x)dx$ |

23 k = k+1 |

24 end while |

25 Evaluate the reliability with the kriging model $G\u0302x$ by performing MCS. |

26 $funtion[I,G\u0302AUDR]=update(Dt)$ |

27 fori = 1: ddo |

28 Approximate the ith first-order component function with spline interpolation: |

$G\u0302ixi=GsplinexiDit$ |

29 Compute information gain $\xi it$ for the ith dimension with Eq. (12). |

30 Evaluate I with Eq. (13)._{i} |

31 end for |

32 Build the global surrogate model |

$G\u0302AUDRx=G0+\u2211i=1dG\u0302ixi$ |

33 end function |

Algorithm 3: High-dimensional reliability analysis |

Step 1: Building Global Surrogate with AUDR |

1 Initialize active index vector: |

$I=ones(d,1)$ |

2 Initialize sample points with an initial set of univariate points: |

${xi,j0}j=1:3={\mu i,\mu i\xb13\sigma i}$, for n = 3, or |

${xi,j0}j=1:5={\mu i,\mu i\xb11.5\sigma i,\mu i\xb13\sigma i}$, for n = 5 |

3 Observe the performance function at the initial univariate points: |

$yi,j0=Gi(x=(xi,j0\u2212\mu i)\xb7ei+\mu )$ |

4 Build an initial sample data set for surrogate modeling: |

$D0=(xi,j0,yi,j0)i=1:d,j=1:n+1$ |

5 t = 1 |

6 whilet < T & I ≠ 0 do |

7 Update the active index vector I and the surrogate model: |

$[I,G\u0302AUDR]=update(Dt\u22121)$ |

8 Construct the kriging model of $G(x)$ based on $Dt\u22121$: |

$G\u0302x=GPxG\u0302AUDR,\Psi ,Dt\u22121$ |

9 Identify the new multivariate point $xt$ that maximizes the acquisition function EU: |

$xt=argmaxEU(x|Dt\u22121)$ |

10 Generate univariate sample points according to I: |

$xit=(xt\u2212\mu ).ei+\mu i$ if $Ii=1$ |

11 Observe the performance function at $xit$: |

$yit=Gxit=xit\u2212\mu i\xb7ei+\mu $ |

12 Supplement the data set with the new data: |

$Dt$ = $Dt\u22121\u222axit,yit1\u2264i\u2264d|Ii=1$ |

13 t = t +1 |

14 end while |

Step 2: Refining Global Surrogate with SEEDT |

15 Build an initial kriging model based on $Dt$ from Step 1: |

$G\u0302x=GPxG\u0302AUDR,\Psi ,Dt$ |

16 $k=1$ |

17 whilek < K & CE > CE_{th}do |

18 Select the new multivariate point $xt+k$ that maximizes EU: |

$xt+k=argmaxEUxDt+k\u22121$ |

19 Observe the performance function at $xt+k$: |

$yt+k=G(xt+k)$ |

20 Supplement the data set with the new data: |

$Dt+k=Dt+k\u22121\u222a{xt+k,yt+k}$ |

21 Update the kriging model with $Dt+k:$ |

$G\u0302x=GP(x|G\u0302AUDR,\Psi ,Dt+k)$ |

22 Evaluate the convergence estimator (CE): |

$CE=\u222b|G(x)|<a\sigma (x))fx(x)dx$ |

23 k = k+1 |

24 end while |

25 Evaluate the reliability with the kriging model $G\u0302x$ by performing MCS. |

26 $funtion[I,G\u0302AUDR]=update(Dt)$ |

27 fori = 1: ddo |

28 Approximate the ith first-order component function with spline interpolation: |

$G\u0302ixi=GsplinexiDit$ |

29 Compute information gain $\xi it$ for the ith dimension with Eq. (12). |

30 Evaluate I with Eq. (13)._{i} |

31 end for |

32 Build the global surrogate model |

$G\u0302AUDRx=G0+\u2211i=1dG\u0302ixi$ |

33 end function |

### Building Global Surrogate With Adaptive Univariate Dimension Reduction (Step 1).

*d*-dimensional index vector,

**I**= (

*I*

_{1},…,

*I*)

_{d}^{T}, indicating which dimensions are active. Let $G\u0302itxi$ and $G\u0302it+1xi$ be the approximation of the first-order component function at two sequential iterations,

*t*and

*t*+ 1, along the

*i*th dimension. The amount of improvement in $G\u0302ixi$, by adding a new sample point $xit+1$ to the data set, can be considered as the amount of

*information*gained. The

*information*gain takes the following form:

*i*th dimensions, $G\u0302max$ and $G\u0302min$ are the maximum and minimum predicted values, respectively, of the performance function at the latest iteration (i.e., iteration

*t*), and $Efxi\xb7$ is the expectation operator with respect to the marginal probability distribution of the

*i*th random variable

*x*. Let $e0$ denote a predefined threshold. If $\xi it<e0$, then adding more sample points hardly changes our current belief about the

_{i}*i*th component function. On the contrary, if $\xi it>e0$, more sample points are required to be added to enhance our belief about the

*i*th component function. The

*i*th element in the index vector at iteration

*t*indicates the activeness of the

*i*th dimension, and can be evaluated based on the information gain along the

*i*th dimension as

It should be mentioned that the first part of the algorithm (lines 1–14) does not consider the interactive effects among variables, while the second part (lines 15–24) efficiently captures the interactions among variables by identifying multivariate sample points in regions near the LSF and with high uncertainty in prediction. We use the additive kernel function, Eq. (9), in the correlation matrix $\Psi $ for determining the correlation between sample points.

It can be observed from the LSF that the performance function is more highly nonlinear along the $x1$ dimension. UDR treats both dimensions to be of equal importance, while AUDR identifies the higher degree of nonlinearity along the $x1$ dimension and adaptively assigns more univariate points to the $x1$ dimension and fewer points to the $x2$ dimension. Furthermore, AUDR allocates more points to regions that potentially contribute more to improving the accuracy in the LSF and reliability predictions.

### Refining Global Surrogate With Sequential Exploration–Exploitation With Dynamic Trade-Off (Step 2).

AUDR does not incorporate any multivariate sample points during the sampling process, and thus the resulting global surrogate of the performance function lacks the ability to capture the interactive effects among variables. To address this limitation, after step 1 is completed, the SEEDT method is utilized again to adaptively refine the surrogate model by sequentially locating multivariate sample points in regions close to the LSF and with high prediction uncertainty. For the refinement of the global surrogate, multivariate sample points are chosen according to a sequential exploration–exploitation scheme in SEEDT that dynamically weighs exploring the regions with high prediction uncertainty and exploiting the ones close to the LSF. The procedure of SEEDT (see Algorithm 2 in Sec. 2.2) is used here as step 2 of the HDRA method, and the surrogate model built in step 1, $G\u0302AUDRx,$ is employed as the trend function of the kriging model in Eq. (6). To allow for the consideration of the interactions among variables, this step uses the squared exponential kernel in Eq. (8) for building the correlation matrix. After building the initial kriging model, a new sample point $xk$ is chosen by maximizing the expected utility function (line 18). This new point is then added to the data set, and the kriging model is updated with the augmented data set (lines 19–21). This procedure is repeated until the target accuracy (see the convergence estimator in line 22) is achieved or the maximum number of iterations is reached (line 17). Interested readers are referred to Ref. [23] for more information about the convergence estimator in SEEDT.

### Discussion.

As described earlier, HDRA consists of two sequential steps: (i) adaptive univariate sampling (with AUDR) for building a global surrogate that captures the global trend of a performance function (see Sec. 3.1) and (ii) kriging-based multivariate sampling (with SEEDT) to refine the global surrogate for capturing the variate interactions in the performance function (see Sec. 3.2). AUDR in step 1 does not add any multivariate samples to the sample data set and thus the resulting global surrogate of the performance function lacks the ability to capture the interactive effects among variables. To address this limitation of the global surrogate, SEEDT in step 2 sequentially identifies multivariate samples in regions close to the LSF and with high prediction uncertainty. The addition of these multivariate samples produces a refined surrogate that is capable of capturing the variate interactions in the performance function.

It is important to note that the initial samples (i.e., the univariate samples added in step 1) greatly influence the accuracy and efficiency of the proposed HDRA method. In what follows, we will elaborate this influence based on two extreme scenarios:

Scenario 1—considering too many initial samples: The addition of more initial samples in step 1 (AUDR) would result in higher accuracy in the approximation of the first-order component functions in the UDR model of the performance function (see Eq. (4)). This UDR model, regardless of how many univariate samples are incorporated, cannot capture the interactive effects among variables. If the performance function contains strong variate interactions, the kriging-based multivariate sampling in step 2 (SEEDT) is always essential, as it overcomes the limitation of the UDR approximation in step 1. In this regard, considering too many initial univariate samples in step 1 could exhaust the computing budget, thereby allowing for the addition of only few multivariate samples in step 2 and producing an inaccurate surrogate model that does not adequately represent the variate interactions.

Scenario 2—considering too few initial samples: The smallest number of initial samples in step 1 is 2

*d*+ 1, where*d*is the number of random variables in the performance function. Assuming only 2*d*+ 1 univariate samples are considered under this scenario, step 1 produces an initial global surrogate model built with UDR without adaptive enrichment of univariate samples along any dimension. This initial UDR model may fail to capture the global trend of the performance function, especially along dimensions with high degrees of nonlinearity. In such cases, no matter how many multivariate samples are added in step 2 to refine the global surrogate, the surrogate may always fail to capture both the high dimensionality and strong variate interactions. Under this scenario, the final HDRA model would likely be a slightly improved version of ordinary kriging with the UDR model as the trend function.

To conclude, adaptive univariate sampling (AUDR) is used in step 1 to efficiently build a global trend function for dealing with high dimensionality, and kriging-based multivariate sampling (SEEDT) is adopted in step 2 to efficiently refine the global trend function for capturing the interactive effect among variables. It is important to determine a proper number of initial samples in step 1 and this sample size determination is automatically performed in AUDR by stopping the adaptive univariate sampling when the information gain in all the dimensions by adding a new sample is lower than a predefined threshold (see Eqs. (12) and (13)).

## Examples

Three mathematical examples and one real-world problem are used to evaluate the performance of the proposed HDRA method. The purpose of each example is summarized in Table 4. Example 1 compares the reliability errors by four methods (UDR, AUDR, SEEDT, and HDRA) for different degrees of response nonlinearity. Example 2 employs a highly nonlinear performance function and investigates the effect of the reliability level on the efficiency of UDR, AUDR, SEEDT, and HDRA. Example 3 is also a mathematical example chosen to evaluate the effect of dimensionality on the performance of the three methods. Then, Example 4 estimates the practicality of the proposed HDRA method with a real-world application. In this example, the performance function is defined as the power generation of a piezoelectric energy harvester and can be represented by a function of six intrinsic input variables (i.e., three geometric design variables and three material property variables), embedded in a space of extrinsic dimensionality *d* = 15. That is, we add nine extrinsic input variables, each of which contains uncertainty but does not have any effect on the performance function.

Example | # of dimensions | Purpose | Methods |
---|---|---|---|

1 | 40 | Influence of function nonlinearity | UDR, AUDR, SEEDT, and HDRA |

2 | 8 | Influence of reliability level | UDR, AUDR, SEEDT, and HDRA |

3 | 2–14 | Influence of function dimensionality | UDR, SEEDT, and HDRA |

4 | 15 | Practicality in real-world application | UDR, SEEDT, HDRA, and MCS |

Example | # of dimensions | Purpose | Methods |
---|---|---|---|

1 | 40 | Influence of function nonlinearity | UDR, AUDR, SEEDT, and HDRA |

2 | 8 | Influence of reliability level | UDR, AUDR, SEEDT, and HDRA |

3 | 2–14 | Influence of function dimensionality | UDR, SEEDT, and HDRA |

4 | 15 | Practicality in real-world application | UDR, SEEDT, HDRA, and MCS |

In these four examples, the reliability estimate by each method is compared with that by the direct MCS to evaluate the accuracy of the method. If one wants to directly estimate the accuracy of the surrogate model produced by one method, the LSF error estimator defined in Ref. [23] can be used that may address the potential issue that two methods give very similar reliability estimates but produce very different surrogate models and thus estimates of the LSF. In the examples of this study, the reliability estimation results by the proposed method have consistently shown better accuracy than those by the benchmark methods. As a result, only the reliability estimation accuracy is used to compare the performance of these methods.

Due to the randomness in the initial sample selection and MCS (for reliability analysis), the reliability estimation errors by UDR, AUDR, SEEDT, and HDRA contain uncertainty. To capture this uncertainty, we repeatedly run each method with the same parameter setting for ten times for all the examples. The performance metric of each method is presented by the mean and uncertainty ($\xb1\sigma $) of the metric over the ten repeated runs.

### Example 1: Influence of Function Nonlinearity.

where the coefficient *b* is used to adjust the nonlinearity of the performance function along the first direction. All 40 input variables are independent and normally distributed (mean *μ* = 1.5, and standard deviation *σ* = 1). The corresponding reliability is defined as *R* = *P*(*G*(**x**) < 0), which is kept in the same level for all chosen values of *b*. MCS is performed to set the benchmark results of reliability analysis for each level of nonlinearity (sample size: 1 million). The relative reliability error, which is the ratio of the absolute difference between the reliability estimate by each method and that by the direct MCS to the reliability estimate by the direct MCS, serves as the criterion for evaluating the performance of each method. The relative reliability errors by UDR, AUDR, SEEDT, and HDRA for different nonlinearity levels are graphically compared in Fig. 4. The numbers of sample points (or function evaluations) required by UDR, SEEDT, and HDRA are 161, 95, and 92, respectively. The number of sample points for AUDR varies from 83 to 86. From Fig. 4, HDRA, in general, is more accurate than UDR, with considerably fewer sample points, and more accurate than AUDR and SEEDT, with approximately the same number of sample points. SEEDT, as a kriging-based reliability analysis method, fails to produce satisfactory accuracy (i.e., <1% reliability error) in this high-dimensional problem, and this suggests that the limitation of kriging in high-dimensional problems reported in the machine learning area also applies to reliability analysis.

### Example 2: Influence of Reliability Level With Strong Variate Interactions.

*F*. This problem has been investigated by many researchers due to its highly nonlinearity [2,35]. The mathematical expression of the performance function of this problem is given by:

_{s}where $\omega p=kp/mp$, $\omega s=ks/ms$, $\omega a=(\omega p+\omega s)/2$, $\xi a=(\xi p+\xi s)/2$, $\gamma =ms/mp$ and $\theta =(\omega p\u2212\omega s)/\omega a$, **x**=[$\omega p,\omega s,\omega a,\xi a,\gamma ,\theta $]. Table 5 summarizes the statistical information of the input variables.

Variable | Distribution | Mean | St. dev. |
---|---|---|---|

$kp$ | Lognormal | 1 | 0.2 |

$ks$ | Lognormal | 0.01 | 2 × 10^{−3} |

$mp$ | Lognormal | 1.5 | 0.15 |

$ms$ | Lognormal | 0.01 | 1 × 10^{−3} |

$\xi p$ | Lognormal | 0.05 | 0.02 |

$\xi s$ | Lognormal | 0.02 | 0.01 |

$Fs$ | Lognormal | 13–16 | 1.5 |

$S0$ | Lognormal | 100 | 10 |

Variable | Distribution | Mean | St. dev. |
---|---|---|---|

$kp$ | Lognormal | 1 | 0.2 |

$ks$ | Lognormal | 0.01 | 2 × 10^{−3} |

$mp$ | Lognormal | 1.5 | 0.15 |

$ms$ | Lognormal | 0.01 | 1 × 10^{−3} |

$\xi p$ | Lognormal | 0.05 | 0.02 |

$\xi s$ | Lognormal | 0.02 | 0.01 |

$Fs$ | Lognormal | 13–16 | 1.5 |

$S0$ | Lognormal | 100 | 10 |

With the reliability estimate by MCS as the benchmark, changing the mean value of $Fs$ from 15.0 N to 27.5 N results in a reliability interval between 98.22% and 99.77%. Figure 5 graphically compares the relative reliability errors of UDR, AUDR, SEEDT, and HDRA at seven reliability levels within the interval. Table 6 summarizes the numbers of function evaluations, reliability estimates, and relative errors by the three methods. As can be seen in the figure and table, the proposed HDRA method yields the minimum error among the four methods with the fewest function evaluations. It is important to emphasize that although UDR and AUDR succeed in accurately approximating all first-order component functions by considering only a small number of sample points at each dimension, they fail to produce an accurate prediction of the reliability. This indicates that the performance function in Eq. (18) may not show high nonlinearity along each dimension, but it is most affected by high-order interactions among the random variables. HDRA captures these interactive effects by adding to the sample data set multivariate sample points in regions close to the LSF and with high prediction uncertainty.

$Fs$ | |||||||||
---|---|---|---|---|---|---|---|---|---|

Method | No. of G evaluations | Reliability and error (both in %) | 13 | 13.5 | 14 | 14.5 | 15 | 15.5 | 16 |

MCS | 1,000,000 | Reliability | 98.22 | 98.69 | 99.07 | 99.33 | 99.51 | 99.67 | 99.77 |

SEEDT | 79 | Reliability | 99.23 | 99.43 | 99.71 | 99.67 | 99.81 | 99.78 | 99.9 |

Error | 1.02 | 0.74 | 0.64 | 0.342 | 0.301 | 0.110 | 0.13 | ||

UDR | 65^{a} | Reliability | 96.40 | 97.13 | 97.95 | 98.47 | 98.998 | 99.297 | 99.52 |

Error | 1.82 | 1.563 | 1.123 | 0.856 | 0.521 | 0.372 | 0.245 | ||

AUDR | 19–21 | Reliability | 96.54 | 97.24 | 98.04 | 98.58 | 98.97 | 99.34 | 99.55 |

Error | 1.68 | 1.42 | 1.03 | 0.75 | 0.54 | 0.33 | 0.22 | ||

HDRA | 26 | Reliability | 98.54 | 99.07 | 99.38 | 99.38 | 99.53 | 99.71 | 99.80 |

Error | 0.32 | 0.385 | 0.31 | 0.054 | 0.045 | 0.047 | 0.037 |

$Fs$ | |||||||||
---|---|---|---|---|---|---|---|---|---|

Method | No. of G evaluations | Reliability and error (both in %) | 13 | 13.5 | 14 | 14.5 | 15 | 15.5 | 16 |

MCS | 1,000,000 | Reliability | 98.22 | 98.69 | 99.07 | 99.33 | 99.51 | 99.67 | 99.77 |

SEEDT | 79 | Reliability | 99.23 | 99.43 | 99.71 | 99.67 | 99.81 | 99.78 | 99.9 |

Error | 1.02 | 0.74 | 0.64 | 0.342 | 0.301 | 0.110 | 0.13 | ||

UDR | 65^{a} | Reliability | 96.40 | 97.13 | 97.95 | 98.47 | 98.998 | 99.297 | 99.52 |

Error | 1.82 | 1.563 | 1.123 | 0.856 | 0.521 | 0.372 | 0.245 | ||

AUDR | 19–21 | Reliability | 96.54 | 97.24 | 98.04 | 98.58 | 98.97 | 99.34 | 99.55 |

Error | 1.68 | 1.42 | 1.03 | 0.75 | 0.54 | 0.33 | 0.22 | ||

HDRA | 26 | Reliability | 98.54 | 99.07 | 99.38 | 99.38 | 99.53 | 99.71 | 99.80 |

Error | 0.32 | 0.385 | 0.31 | 0.054 | 0.045 | 0.047 | 0.037 |

We do not observe any noticeable improvement in accuracy when further increasing the number of function evaluations.

### Example 3: Influence of Dimensionality With Strong Variate Interactions.

The above definition allows us to create performance functions of varying dimensions by simply changing the value of input dimension *d*. Figure 6 compares the reliability errors of UDR, SEEDT, and HDRA under four settings of input dimension *d*. To make the comparison fair, we keep the numbers of function evaluations the same (i.e., 20*d* + 1) for the three methods. The number of initial sample points for HDRA is set as 2*d* + 1, and in most cases, AUDR automatically considers three more sample points along each direction to refine the univariate component functions. As can be seen in Fig. 6, HDRA yields the lowest error value among the three methods when *d* = 6, 10, and 14. SEEDT produces slightly better accuracy than HDRA when *d* = 2, and somewhat comparable accuracy when *d* = 6. These results suggest that kriging-based methods have a strong capability to handle low-dimensional reliability analysis problems. However, SEEDT fails to accurately predict the reliability for the cases of *d* = 10 and *d* = 14. Furthermore, the relative performance of UDR, as compared to SEEDT, improves with the input dimension, *d*, which shows the unique strength of UDR in handling high-dimensional problems. By taking the advantages of the unique strengths of SEEDT and UDR for low- and high-dimensional problems, HDRA is able to achieve satisfactory performance in all cases.

### Example 4: Reliability Analysis of Piezoelectric Energy Harvester.

Vibration energy harvesting is widely used to transform commonly wasted energy of vibration into accessible energy, which can then be applied to charge supercapacitors, batteries, or enable self-powering sensors. The typical configuration of an energy harvester is shown in Fig. 7. It consists of a shim laminated by piezoelectric materials at one end and a tip mass attached at the other end. The tip and shim mass are constructed from tungsten/nickel alloy and Blue steel, respectively [36].

Under the piezoelectric effect, mechanical strain is transferred into electric voltage or current. This study considers 31 modes, which allows for higher longitudinal strain when energy harvester is subject to smaller input forces. Under longitudinal stress/strain, voltage is produced along the thickness direction. The piezoelectric harvester plays similar role as a transformer circuit. Per the Kirchhoff's voltage Law, the circuit can be expressed by the coupled differential equations that describe the conversion from mechanical stress/strain to electrical voltage. The conversion process can be simulated by matlabsimulink. The harvester output is determined by the geometries of input and material properties. A detailed description of the energy harvester model can be found in the authors' previous publication [36]. The study in Ref. [36] aims to optimize the design of the energy harvester with reliability-based design optimization (RBDO). Three geometric terms, $x=lb,lm,hm$, and three material properties, $\upsilon =\upsilon 1,\upsilon 2,\upsilon 3$, were considered as the random design variables and the random parameters, respectively. For all these random terms, the means and standard deviations were considered fixed. In this RBDO problem, the size of the energy harvester was minimized while fulfilling the reliability requirement with respect to power generation.

*P*, needs to be higher than the minimum required power,

*P*

_{0}. Then, the reliability of the energy harvester can be expressed as the probability that

*P*is larger than

*P*

_{0}given the random design variables $x$ and parameters $\upsilon $. The performance function is expressed as

Besides $x$ and $\upsilon $, we add nine random input variables ($z1,\u2026,z9$) that do not affect the power output of the energy harvester, making the performance function of high extrinsic dimension (i.e., *d* = 15). This treatment is to demonstrate the capability of HDRA in dealing with high-dimensional reliability analysis problems. The distributional information of the 15 random variables is summarized in Table 7.

Random input variable | Description | Dist. type | Mean | Std. dev. |
---|---|---|---|---|

$x1(m)$ | Length of shim ($lb$) | Normal | $8.75\xd710\u22122$ | $1.92\xd710\u22123$ |

$x2(m)$ | Length of tip mass ($lm$) | Normal | $1.42\xd710\u22122$ | $2.40\xd710\u22124$ |

$x3(m)$ | Height of tip mass ($hm$) | Normal | $8.00\xd710\u22123$ | $1.27\xd710\u22124$ |

$\upsilon 1(m/V)$ | Piezoelectric strain coefficient | Normal | $\u2212153.9\xd710\u221212$ | $7.7\xd710\u221212$ |

$\upsilon 2(Pa)$ | Young's modulus for PZT_5A | Normal | $66\xd710+9$ | $3.3\xd710+9$ |

$\upsilon 3(Pa)$ | Young's modulus for the shim | Normal | $20\xd710+10$ | $1.00\xd710+10$ |

$z1,\u2026,z9$ | Ineffective dimension | Normal | 0 | $10\u22123$ |

Random input variable | Description | Dist. type | Mean | Std. dev. |
---|---|---|---|---|

$x1(m)$ | Length of shim ($lb$) | Normal | $8.75\xd710\u22122$ | $1.92\xd710\u22123$ |

$x2(m)$ | Length of tip mass ($lm$) | Normal | $1.42\xd710\u22122$ | $2.40\xd710\u22124$ |

$x3(m)$ | Height of tip mass ($hm$) | Normal | $8.00\xd710\u22123$ | $1.27\xd710\u22124$ |

$\upsilon 1(m/V)$ | Piezoelectric strain coefficient | Normal | $\u2212153.9\xd710\u221212$ | $7.7\xd710\u221212$ |

$\upsilon 2(Pa)$ | Young's modulus for PZT_5A | Normal | $66\xd710+9$ | $3.3\xd710+9$ |

$\upsilon 3(Pa)$ | Young's modulus for the shim | Normal | $20\xd710+10$ | $1.00\xd710+10$ |

$z1,\u2026,z9$ | Ineffective dimension | Normal | 0 | $10\u22123$ |

The reliability level is estimated with the UDR, SEEDT, and HDRA method with a total of 121, 120 and 120 sample points, respectively. In step 1 of HDRA, 37 initial sample points are selected to build the global trend function. In HDRA, it is observed that in step 1 of the algorithm, all nine ineffective dimensions are considered as inactive dimensions. Therefore, no more sample points are selected along these directions. This feature helps the algorithm focus more on the effective variables (i.e., the first six variables) by locating more sample points along them. The direct MCS serves as the benchmark method for the reliability estimation. Since the uncertainty of mean reliability estimates of the direct MCS with 10,000 sample points is much smaller than that of the HDRA method and the process of function evaluation is computationally expensive, it is concluded that 10,000 sample points are enough for the benchmark estimation of reliability. Figure 8 graphically summarizes the reliability estimates by UDRA, SEEDT, HDRA, and MCS for different levels of *P*_{0}. It can be observed that the reliability estimates by HDRA are closer to the benchmark reliability levels acquired by the direct MCS in comparison with those by UDR and SEEDT. The reliability analysis results of this real-world engineering application further demonstrate the effectiveness and accuracy of the proposed method in high-dimensional reliability analysis.

### Discussion.

Table 8 lists the main features of four reliability analysis methods, UDR, AUDR, SEEDT, and HDRA. These features are identified by analyzing the intrinsic structures of the algorithms in the methods and interpreting the results from the above examples. UDR is capable of alleviating the curse of dimensionality, and AUDR further enhances this capability by incorporating an adaptive sampling strategy. Both methods are particularly useful for high-dimensional problems with weak variate interactions. However, neither possesses the capability to handle performance functions with strong variate interactions. Although this limitation can be alleviated by considering higher order component functions (e.g., in BDR and trivariate DR), the computational efficiency quickly diminishes with the increase of the input dimension. Kriging-based methods such as SEEDT can capture strong variate interactions and are favored in low-dimensional problems (typically *d* < 6) with strong variate interactions. But similar to the limitation of BDR and trivariate DR, these methods lose efficiency in high-dimensional problems. HDRA optimally combines the strengths of UDR and SEEDT, and can achieve satisfactory accuracy and efficiency for problems of both ranges of dimension and with various degrees of variate interactions. The HDRA method provides a better alternative to these popular existing methods by alleviating the fundamental limitation of UDR in tackling strong variate interactions and high computational cost of SEEDT in high-dimensional problems.

Features | UDR | AUDR | SEEDT | HDRA |
---|---|---|---|---|

Alleviate the curse of dimensionality | ✓ | ✓ | ⊗ | ✓ |

Adaptive sampling strategy for variable selection | ⊗ | ✓ | ⊗ | ✓ |

Flexibility in locating the sample points | ⊗ | ⊗ | ✓ | ✓ |

Capture the interactions among variables | ⊗ | ⊗ | ✓ | ✓ |

Features | UDR | AUDR | SEEDT | HDRA |
---|---|---|---|---|

Alleviate the curse of dimensionality | ✓ | ✓ | ⊗ | ✓ |

Adaptive sampling strategy for variable selection | ⊗ | ✓ | ⊗ | ✓ |

Flexibility in locating the sample points | ⊗ | ⊗ | ✓ | ✓ |

Capture the interactions among variables | ⊗ | ⊗ | ✓ | ✓ |

## Conclusion

The high-dimensional reliability analysis (HDRA) method has been proposed to solve reliability analysis problems involving high dimensionality, computationally expensive simulations, high nonlinearity, and strong variate interactions. The basic idea of HDRA is to first build a global surrogate model with AUDR and then locally refine the surrogate model using SEEDT. AUDR adaptively identifies important univariate sample points considering both response nonlinearity and criticality (with respect to accurate reliability prediction), while SEEDT captures the interactive effects among variables on the performance function by adding multivariate sample points in highly uncertain regions close to the LSF. Results from four mathematical and real-world examples with up to 40 dimensions suggest that the HDRA method can achieve significantly higher efficiency in reliability analysis than the existing DR- and kriging-based methods, and is especially useful for high-dimensional, computationally expensive problems with strong interactions among random variables. Future research will further develop the proposed method for a wider range of applications, such as sequential experimental design for high-dimensional uncertainty quantification and simulation-based design under high-dimensional uncertainty.

## Acknowledgment

This research was in part supported by the U.S. National Science Foundation (NSF) and the NSF I/UCRC Center for e-Design. Any opinions, findings or conclusions in this paper are those of the authors and do not necessarily reflect the views of the sponsoring agency.

## Funding Data

National Science Foundation (Grant Nos. CNS-1566579 and ECCS-1611333).

To put this into perspective, it requires a total number of 3040 function evaluations to build the second-order component functions (*m* = 2) for a 20-dimensional performance function (*d* = 20) with five sample points along each dimension (*n* = 4).