0
Research Papers

# Integrating Bayesian Calibration, Bias Correction, and Machine Learning for the 2014 Sandia Verification and Validation Challenge ProblemOPEN ACCESS

[+] Author and Article Information
Wei Li

School of Aeronautics,
Northwestern Polytechnical University,
Hangkong Building C506,
Xi'an, Shaanxi 710072, China
e-mail: liwiair@gmail.com

Shishi Chen

School of Aerospace Engineering,
Beijing Institute of Technology,
5 South Zhongshancun Street,
Beijing 100081, China
e-mail: shishi.chen@northwestern.edu

Zhen Jiang

Department of Mechanical Engineering,
Northwestern University,
Evanston, IL 60208
e-mail: ZhenJiang2015@u.northwestern.edu

Daniel W. Apley

Department of Industrial Engineering and
Management Sciences,
Northwestern University,
Evanston, IL 60208
e-mail: apley@northwestern.edu

Zhenzhou Lu

School of Aeronautics,
Northwestern Polytechnical University,
Hangkong Building C506,
Xi'an, Shaanxi 710072, China
e-mail: zhenzhoulu@nwpu.edu.cn

Wei Chen

Department of Mechanical Engineering,
Northwestern University,
Evanston, IL 60208
e-mail: weichen@northwestern.edu

1Corresponding author.

Manuscript received February 6, 2015; final manuscript received October 15, 2015; published online February 19, 2016. Guest Editor: Kenneth Hu.

J. Verif. Valid. Uncert 1(1), 011004 (Feb 19, 2016) (12 pages) Paper No: VVUQ-15-1007; doi: 10.1115/1.4031983 History: Received February 06, 2015; Revised October 15, 2015

## Abstract

This paper describes an integrated Bayesian calibration, bias correction, and machine learning approach to the validation challenge problem posed at the Sandia Verification and Validation Challenge Workshop, May 7–9, 2014. Three main challenges are recognized as: I—identification of unknown model parameters; II—quantification of multiple sources of uncertainty; and III—validation assessment when there are no direct experimental measurements associated with one of the quantities of interest (QoIs), i.e., the von Mises stress. This paper addresses these challenges as follows. For challenge I, sensitivity analysis is conducted to select model parameters that have significant impact on the model predictions for the displacement, and then a modular Bayesian approach is performed to calibrate the selected model parameters using experimental displacement data from lab tests under the “pressure only” loading conditions. Challenge II is addressed using a Bayesian model calibration and bias correction approach. For improving predictions of displacement under “pressure plus liquid” loading conditions, a spatial random process (SRP) based model bias correction approach is applied to develop a refined predictive model using experimental displacement data from field tests. For challenge III, the underlying relationship between stress and displacement is identified by training a machine learning model on the simulation data generated from the supplied tank model. Final predictions of stress are made via the machine learning model and using predictions of displacements from the bias-corrected predictive model. The proposed approach not only allows the quantification of multiple sources of uncertainty and errors in the given computer models, but also is able to combine multiple sources of information to improve model performance predictions in untested domains.

<>

## Introduction

Assessing the uncertainty and predictive capability of computational models becomes increasingly crucial as industries and government entities depend more heavily on the predictions from computer models to justify decisions. Unfortunately, as asserted by Box and Draper in Ref. [1], all models of any physical reality are wrong, or narrowly speaking, models are never perfect. In light of this, there are at least three interrelated issues to be addressed when using a computer model for engineering decision making. First, how well does the computer model represent the underlying physical reality? Second, how do we assess the impact of different sources of uncertainty when using the model for prediction? Finally, how can the data gathered from experimental study and/or other sources be utilized to improve model predictions in untested domains?

This paper addresses these three issues in the context of the challenge problem posed by the V&V, uncertainty quantification (UQ), and Credibility Processes Department of Sandia National Laboratories at the 2014 Verification and Validation Challenge Workshop. This challenge problem [2,3] considers a large number of pressurized storage tanks that hold a mystery liquid, and both the pressure and liquid weights cause deformation on the tank walls. The main objectives of the challenge problem are to (1) predict the probability of failure Pf at the nominal operating conditions of the tank and (2) determine a range of operating conditions, which satisfies Pf < 10−3 when the standard operating limits on each loading variable are given. Other issues that need to be addressed along with the prediction tasks are: the quantification of different sources of uncertainty in the entire V&V process, credibility assessment for the predictions of interest, and ultimately, how to make decisions based on all the analyses. Here, credibility refers to the accuracy and confidence in the predictions.

Distinct from the challenge problem posed by Sandia in the 2008 Model Validation Workshop [47], three unique challenges associated with this new challenge problem are:

• (I)How to identify unknown model parameters when multiple sources of data are available? Although data sets 1, 2, and 4 already provide legacy data and certain measurements on each material and dimensional parameter, since the service ages of the tanks range from 4 yr to 12 yr, the legacy data may not be current. Furthermore, as shown in Table 1Table 1

Prior knowledge on model parameters (material properties and tank dimensions)

Experimental data (ten measurements)Model parametersLegacy dataMeanLower boundUpper boundYoung's modulus, E (psi)3 × 1072.814 × 1072.720 × 1072.927 × 107Poisson's ratio, ν0.270.2720.2660.286Yield stress, σy (psi)4.5 × 1044.420 × 1043.983 × 1044.591 × 104Length, L (in.)6060.68460.07160.979Radius, R (in.)3031.15330.57431.674Thickness, T (in.)0.250.2310.2250.244, the experimental data for some parameters like Young's modulus E, Poisson's ratio v, and the wall thickness T have very large variability across the ten measurements, which might lead to significant variation in the final predictions of interest.
• (II)How to quantify multiple sources of uncertainty? Based on the supplied experimental data, it is clear that the experimental uncertainty is an important source of uncertainty for the challenge problem. On the other hand, the supplied tank model assumes that the tank has flat, as opposed to hemispherical, end caps, which introduces model discrepancy. Furthermore, as shown in Table 1, the uncertainty introduced by model parameters might also have substantial influence on the model predictions.
• (III)How to make validation assessments when there are no data directly measured for a quantity of interest (QoI) in prediction? Although the final predictions of Pf all depend on the von Mises stress σ, there is no measurement data associated with σ. The experimental data sets 5 and 6 available at the system level are for a different system QoI, i.e., the normal displacement w. As shown in Fig. 1Fig. 1

Illustration of challenge III

, in the test domain, the two sets of measurement data are for displacements. In the prediction domain, we have the supplied computer model for σ, but due to the absence of experimental data for σ, quantification of model bias is not straightforward. Since w and σ are related, the issue becomes that when the data available for validation is not immediately relevant to the prediction of interest, how do we fill the gap between available data and the data needed for the QoI?

For challenge III shown in Fig. 1, we propose a machine learning technique that identifies and exploits the underlying relationship between σ and the predictions of w at multiple adjacent locations. Based on the assumption that the relationship between w and σ is well captured by the simulation model even though simulation predictions for individual QoIs may have errors, we fit a neural network regression model to predict stress σ as a function of a minimum set of model input variables and displacements from multiple adjacent locations. Using this machine learning model for σ predictions, the entire stress field under nominal operating condition can be analyzed to find the maximum stress, which allows the estimation of Pf for prediction scenario 1. For prediction scenario 2, a presampling scheme is developed for efficiently identifying the failure frontiers. Credibility of the final predictions is addressed via validation and uncertainty analyses through the proposed approach.

The remainder of the paper is organized as follows: Section 2 introduces the basics of model updating (including both calibration and bias correction), UQ, and model validation. A detailed work flow for addressing the challenge problem, the associated methods with preliminary results, and comparative studies are given in Sec. 3. Section 4 presents the details for the predictions under two scenarios. A summary of pros and cons of the proposed approach is provided in Sec. 5.

## Basics of Model Updating, UQ, and Model Validation

Model validation, as defined in ASME V&V 10 [11], is the process of determining the degree to which a model is an accurate representation of the real world from the perspective of the intended uses of the model. An extensive discussion of fundamental validation concepts, terminologies, and literature in computational mechanics can be found in Oberkampf et al. [12]. In Liu et al. [13], existing validation methods are classified into four categories, i.e., hypothesis testing, Bayes factor, frequentists metric, and area metric. Although there is no unified approach for model validation, various validation metrics have been developed to accommodate different challenging scenarios, such as uncertainty in both computer models and experimental data [14,15], multivariate responses of interest [16,17], and multiple validation input settings [14,17].

Although computer simulation itself is often deterministic in nature, using a model instead of physical reality for various engineering applications can introduce uncertainty from several sources. In this research, we follow the modular Bayesian calibration and bias correction proposed by KOH [9,18], where five sources of uncertainty in model prediction are considered: (1) parameter uncertainty, which is caused by constant but unknown parameters of a computer model, e.g., Young's modulus, Poisson's ratio, etc.; (2) model discrepancy or model bias, which is due to insufficient modeling of physical realities, such as model form errors or incorrect assumptions; (3) numerical uncertainty, to which is mainly caused by the numerical implementations of the computer model; (4) experimental variability, which is the result of measurement errors; and (5) interpolation uncertainty when predicting the response at input settings without simulation or experimental data [18]. A general formulation for quantifying different sources of uncertainty has been widely adopted in the existing works on UQ [810,19,20] Display Formula

(1)$ye(x)=ym(x,θ*)+δ(x)+ε$
where x is a vector of model input variables; ye (x) denotes the experimental response, which is a function of all input variables x; θ* are the true values for unknown model parameters θ; ym (x, θ) is the response from computer model, which is a function of both x and θ; δ(x) is the discrepancy function or bias function that accounts for model discrepancy; and εis the measurement error that accounts for experimental variability. Estimation of δ(x) and θ* using this formulation fulfills the goal of quantifying uncertainty from different sources. For the challenge problem, the input vector x is [x, $φ$, P, γ, H], and the unknown model parameter vector θ is [E, v, L, R, T, m].

In most of the existing research, additional data from experimental studies are only used as references for showing the degree to which a model agrees to the data [14,21] or testing whether to accept or reject a model [22,23]. However, we believe that the most important issue is not whether a model is accepted or rejected in the face of existing data, but rather the sufficiency of the model for its intended predictions, i.e., taking into account the model bias for the predictions of interest, as well as the setting of the model parameters. For this objective, a more elaborate validation approach must be developed to make use of available data for identifying model parameters, quantifying model errors, and other sources of uncertainty in the model predictions.

In this paper, we propose to use the general model updating and validation framework that we developed in Refs. [10] and [20], based on the overarching KOH framework, for the challenge problem. The end goal of this model updating (including both calibration and bias correction) and validation framework is to create an updated model that combines the simulations from the computer model and observations from the physical experiments to make better predictions of the QoI in untested regions. As illustrated in Fig. 2, the iterative process begins by generating simulation data from the computer model ym (x, θ) at a set of combinations of input variable x and calibration parameter θ settings. In addition, a set of experimental data is collected from physical testing. Usually, the experimental data are partitioned into two subsets: one subset is the training data used for model updating and the other is reserved for validation. Based on Eq. (1), an updated model is then created by incorporating the simulations and experimental training data using the modular Bayesian calibration approach from KOH [9,18]. Meanwhile, uncertainty in the calibration parameters θ and discrepancy function δ(x) as well as experimental variability ε and interpolation uncertainty in the predictions can all be assessed. After the model updating and UQ process, the accuracy and predictive capability of the updated model are assessed using quantitative validation metrics [13,14] by comparing the predictions of the updated model with the reserved experimental validation data set. If the metrics indicate that the updated model is not an adequate representation of reality, additional data must be collected to further update the model or refine the model from the perspective of physics [10,20]. For either case, the predictive capability of the original model at any input settings is assessed by checking the model bias function and its uncertainty.

## Bayesian Model Calibration, Bias Correction, and Machine Learning for the Challenge Problem

Based on the model validation framework shown in Fig. 2, we present an integrated Bayesian model updating (calibration and bias correction) and machine learning technique to address the aforementioned challenges in model UQ and validation. Details of the proposed approach are demonstrated through the challenge problem work flow shown in Fig. 3. The overall process consists of three major parts: (1) data preprocessing for understanding the usage of supplied data sets and studying the importance of model parameters; (2) model updating, model validation, and UQ of the simulation model under untested input settings; and (3) making predictions under two specified scenarios. The legacy data 1 and measurement data sets 2 and 4 all provide some prior information for the model parameters. Therefore, they are used as references to build the prior distribution of the model parameters during data preprocessing, model calibration, and bias correction. Given that experimental data sets 5 and 6 for displacement collected at system level are measured under two different loading conditions, i.e., the lab controlled pressure only (without liquid) condition for the former and the pressure plus liquid condition from field tests for the latter, we incorporate the two sets of data separately. Data set 5 is used for model calibration under the pressure only loading conditions. Data set 6 is divided into two subsets: one is the training subset for model bias correction and the other is reserved for model validation. The details of each subprocess shown in the work flow are elaborated in Secs. 3.1, 3.2, and 3.4. The empirical liquid model provided in data set 3 will be used for transforming between the liquid composition and liquid-specific weight for facilitating the final predictions, the details of which are described in Sec. 4.2.

###### Data Preprocessing.

To begin with the whole process shown in Fig. 3, all supplied data sets are examined carefully to determine the potential usages, check the associated errors, and determine whether there are outliers. The computational results using different meshes are compared with the experimental observations, and it is found that the results of meshes 1 and 4 generally match better with the data observed at different conditions than that of the meshes 2 and 3. Since mesh 4 is much more computationally expensive than mesh 1, we decided to use only mesh 1 for addressing the challenge problem. As shown in Table 2, the errors associated with all measurements are treated as normal distributions with zero means and standard deviations (SD) that are calculated by the supplied error bands based on the three-sigma rule. In data set 5, some measurements in the second repeat of tank 2 deviate far away from the other three repeats under identical loading conditions. Since the problem statement [2,3] presumed that these measurements are extremely accurate (experimental variability within ± 3% or 0.002 in.), this phenomenon is more likely to be caused by operational error rather than experimental variability. Therefore, they are considered as outliers and will not be used as the training data for model calibration in Sec. 3.2.

As mentioned earlier, data sets 1, 2, and 4 provide the prior knowledge on material properties and tank dimensions, such as Young's modulus E, Poisson's ratio ν, wall thickness T, tank radius R, and length L, which can all be treated as unknown model parameters. Since there are ten measurements associated with each parameter, small variations in some of them might not have much impact on the response of interest. Because the more the calibration parameters are included in model calibration, the more prone the analysis is to “identifiability” issues [10,24], it is beneficial to choose a small number of critical parameters for calibration. To determine the critical parameters, global sensitivity analysis is conducted for identifying the important model parameters using Sobol's first-order sensitivity index [25], i.e., Display Formula

(2)$Si=VXi(EX∼i(Y|Xi))V(Y)=ViV(Y)$

###### Model Calibration and Bias Correction.

In this section, we use a modular Bayesian approach for model calibration and bias correction based on the formula given in Eq. (1). The modular Bayesian approach is a significant part of the comprehensive model updating framework in Fig. 2 and is based on Eq. (1) [810,19,20]. As shown in the flowchart of Fig. 6 (see the Appendix), the computer model ym (x, θ) and the discrepancy function δ(x) are represented separately by two independent Gaussian process (GP) models to incorporate both computer simulations and experimental observations. As a special case of SRPs, GPs have been widely used in statistical modeling due to their computational conveniences attributed from the properties of the multivariate normal distribution. As shown in the Appendix, the computer model is first replaced by a GP model using simulation data in module 1, and then, based on the prior distribution of the calibration parameters, the other GP model for discrepancy function is fitted using both simulation data and experimental data in module 2. Compared to a full Bayesian approach, the advantage of using separate GP models is investigated in Refs. [9] and [10]. The details associated with the four modules are elaborated in the Appendix.

The modular Bayesian approach is applied here for combining measurements from lab tests pressure only loading and the simulation model to calibrate the selected calibration parameters. Since γ and H are set to zero for the pressure only loading case, the controllable inputs become x = [x, $φ$, P], the calibration parameters are θ= [E, T], and the response of calibration interest y corresponds to the normal displacement w. By following the first three modules, 100 Latin Hypercube samples are generated from simulation model to create a GP model of the computer simulation for w. The joint prior distribution p(E,T) of calibration parameters is a bivariate uniform distribution, the lower and upper bounds of which are determined by data sets 1 and 2. By using experiential data set 5 and the prior distribution p(E,T), hyperparameters of the GP model for discrepancy function δ(x) are estimated, which allows the calculation of the joint posterior distribution of the model parameters shown in Fig. 7 by using the Bayes theorem. The peak of the distribution indicates the maximum a posteriori probability (MAP) estimate of E and T. Table 3 shows the detailed calibration results, where SD stands for standard deviation and CoV is the coefficient of variation. The MAP is quite close to the mean value and parameter uncertainty is considerably reduced compared to the prior distribution. A comparison between the measurement data and the updated model predictions through calibration is shown in Fig. 8. Most of the measurements lie within the 95% prediction intervals (PIs), except for a few measurements with large experimental variability.

The updated model from calibration is suitable for predictions with pressure only loading. For the pressure plus liquid loading cases, the computer simulations must be combined with data from field testing for model bias correction to assess the related model discrepancy and experimental error. The field tests data set 6 contains 240 displacement measurements at 20 locations from four tanks, each with three different loading conditions. Therefore, we randomly choose 200 measurements from data set 6 for training the GP models and leave out 40 measurements for validation purpose. Once the calibration parameters are identified, a simplified UQ formula without unknown model parameter is adopted, i.e., Display Formula

(3)$ye(x)=ym(x)+δ(x)+ε$
where x=[x, $φ$, P, H, γ], which includes two additional input variables (H and γ) as compared with the preceding model calibration process. Similar to the modular Bayesian approach, 200 Latin Hypercube samples are generated based on the lower and upper limits of the input variable x, and simulations are run at these input samples. Then, the simulations and measurement training data are combined for estimating the hyperparameters of GP models for both the computer simulations and the discrepancy function to predict the experimental response ye (x).

A graphical comparison between the reserved validation data and the updated model predictions after model bias correction is shown in Fig. 9, and it is found that almost all 40 validation data points lie within the 95% PIs of the updated GP model, which means the bias-corrected model matches quite well with the experimental data. However, it should be noted that although the tank model involves the same set of parameters under different loading scenarios, the KOH approach does not guarantee that the calibrated parameters obtained from the pressure only conditions will be useful under the pressure plus liquid loading cases. More specifically, it is difficult to distinguish between the effects of the unknown calibration parameters and the discrepancy function when both are included in the KOH model, due to statistical identifiability issues. In spite of this, it is still possible to have the “adjusted” model (computer model plus the estimated discrepancy) to agree well with the experimental data over the experimental region. Our presumption is that the adjusted model still will be reasonable at the pressure plus liquid loading cases, even though they are outside the experimental region, We refer the reader to Ref. [27] and the references therein for recent discussion on this identifiability problem in the KOH approach, as well as an investigation into when identifiability may or may not be reasonably achieved, how to predict and assess the level of identifiability prior to conducting the physical experiment for the purpose of designing the physical experiment and other related issues. The predictive capability of the updated model through both calibration and bias correction is further assessed in Sec. 3.2.2 via a quantitative validation metric using the reserved data.

###### Model Validation.

To further assess the credibility of the updated model for displacement, we use a u-pooling metric for an overall validation assessment for the reserved data collected from 40 different input settings. The u-pooling method proposed by Ferson et al. [14] is an area-based validation metric that aims to measure the agreement between a predictive model and experimental data observed at multiple input settings. The original area metric assesses the difference between model and data at a single input setting by measuring the area difference between the cumulative distribution function (CDF) of the model response and the empirical CDF of experimental observations, as shown in the below equation Display Formula

(4)$d(Fm,Sne)=∫−∞+∞|Fm(y)−Sne(y)|dy$
where Fm (y) denotes the CDF of model response and $Sne(y)$ is the empirical CDF of experimental observations. Given the same amount of experimental observations, a smaller area difference for a model among other alternative choices would indicate a more accurate representation of the physical reality. The underlying idea behind the u-pooling method is that any continuous probability distribution can be transformed into a standard uniform distribution by using the so-called probability integral transformation [28]. The method first assumes that if a model is 100% accurate, the experimental data can be treated as random samples generated from the population of the model response, hence substituting the experimental observations into the corresponding CDF of the model will produce a sample from a standard uniform distribution. Figure 10 illustrates the u-pooling method for experimental data collected at three validation sites, i.e., Ye= { $Y1e$, $Y2e$, $Y3e$ }. For each observed datum in Ye, a u-value is calculated correspondingly as the CDF of model prediction at the validation site, i.e., ui =  $Fim(Yie)$, i$∈$ {1, 2, 3}. The difference between the empirical CDF of the u-values and the standard uniform distribution is shown by the shaded areas in Fig. 10(a). Summation of the area differences is the associate metric value that accounts for the overall disagreement between model and data from multiple input sites. The value of u-pooling metric is between 0 and 0.5, with 0 indicating a perfect match between the model and experiments, and 0.5 indicating the worst match.

An illustration of the u-pooling metric for comparing the updated model predictions against the reserved data is shown in Fig. 11 for the challenge problem. The dashed black line is the CDF of standard uniform distribution, while the empirical CDF of the transformed data is shown in solid blue curve, together with the dashed blue curves indicating its 99% confidence bounds. The metric value 0.0341, which is quite small, suggests that the updated model is acceptable for representing the experimental response and will be further used for the predictions of displacement w.

###### Comparative Study: Performing the Validation Process Without Model Calibration.

In this subsection, we demonstrate the significant impact of calibration on model predictions by eliminating model calibration from the proposed approach. Similar to model bias correction in Sec. 3.2, the same amount of simulation data is generated from the original tank model and combined together with the same set of experimental data for identifying the model discrepancy. Figure 12 shows a comparison between 95% PIs and validation data points. Compared to the results shown in Fig. 9, the PIs are much larger due to the negative impact from the unquantified parameter uncertainty. Figure 11 provides a comparison between the u-pooling metric with and without model calibration (the calibrated model was from Sec. 3.2). The metric value of the updated model without calibration is 0.1563, which is much larger than the 0.0341 value for the calibrated model. This indicates that model calibration has substantial benefit for quantification of model parameter uncertainty and improvement of the model predictive capability.

###### Machine Learning for Identifying and Exploiting the Relevancy Between Two QoIs.

After the model calibration, bias correction, and model validation assessment performed in Sec. 3.2, we obtain an updated GP model in the prediction domain of displacement w. However, estimation of the Pf requires prediction of the von Mises stress σ. As shown earlier in Fig. 1, since there are no experimental data associated with σ, we cannot directly correct the bias in the original model for predicting σ. Therefore, the problem becomes how to make use of the updated predictions for w to appropriately infer the bias-corrected predictions of σ.

To accomplish this, we propose a machine learning technique to identify and exploit the relationship between the two responses of interest. Theoretically, the normal displacement w and von Mises stress σ should have a connection via certain laws of physics. However, exploring analytical relationships between the two can be intractable for analysts possessing little physical knowledge about the tank, which can be skewed due to asymmetric loading conditions. Fortunately, the simulation model is built on sophisticated equations from well-established theories of plates and shells (see Chap. 15 of Ref. [29]). Therefore, even though the supplied simulation model has bias, we assume that the relationship between w and σ is well captured by the model. Based on simultaneous simulations of these two QoIs generated from the model, their connection can be empirically learned using machine learning methods. Since w and σ have a same set of input variables x =[x, $φ$, P, γ, H] as well as model parameters θ = [E, v, L, R, T, m] in the supplied model, they can be written separately as follows: Display Formula

(5)$w=M1(x,φ,P,γ,H;θ)$
Display Formula
(6)$σ=M2(x,φ,P,γ,H;θ)$
However, we can also consider viewing σ as a function of the input variables, the parameters, and the set of displacements {w, w1, w2, w3, w4} at locations neighboring the location (x, $φ$) at which to predict σ (Fig. 13), i.e.,

Display Formula

(7)$σ=f(x,φ,P,H,γ,w,w1,w2,w3,w4;θ)$

Clearly, Eq. (7) has redundant input variables since, according to Eq. (6), σ can be determined solely via the original input sets x = [x, $φ$, P, γ, H] and θ. Based on the well-known tight physical relationship between displacement and stress fields, it is reasonable to conjecture that σ can also be accurately predicted as a function of the neighboring displacements (perhaps augmented with some subset of the inputs and parameters) with the important input P omitted. If we can identify such a predictive relationship empirically, this gives us a mechanism to use the bias-corrected displacement model to also correct for the bias in the stress model. To identify an appropriate predictive relationship and also an appropriate subset of input variables to augment the displacement inputs, we first generated a Latin Hypercube design with 2000 samples for all input variables in x, and then we conducted simulations of w, w1, w2, w3, w4 and σ from the supplied tank model at each input combination. With these data serving as the training data, we then fit neural network regression models for predicting σ as a function of {w, w1, w2, w3, w4} and various subsets of the other input variables. The best model (i.e., with the highest predictive power) was of the form Display Formula

(8)$σ=f(x,φ,H,w,w1,w2,w3,w4;θ*)+εσ$
where θ* denotes the MAP values of the model parameters obtained from model calibration and εσN (0, MSE) is the neural network regression model error. To determine the best model, we set aside a test set of 600 simulation data points, separate from the training set (1400 simulation data points), and used the fitted models to predict the test set. The validation results for predicting the test set are shown in Fig. 14. The coefficient of determination between the predicted and actual σ for the test cases was R = 0.9969, which is a nearly perfect fit. The root mean square error was 3.0809 × 102, which indicates the amount of uncertainty by using the fitted neural network model for predicting σ. The learned relationship between w and σ in Eq. (8), together with the bias-corrected predictive model for w, will be used as a surrogate for the bias-corrected model for stress to generate σ predictions.

###### Uncertainty Management.

The proposed integrated Bayesian calibration, bias correction, and machine learning approach can also be considered as an uncertainty management procedure. During data preprocessing, removing outliers reduced the experimental variability. The parameter uncertainty is first reduced by choosing significant model parameters identified through sensitivity analysis as calibration parameters, and then further reduced through model calibration by using the Bayesian calibration and bias correction approach, which takes into account the experimental variability, model discrepancy, and interpolation uncertainty. Similarly, by using the bias correction approach, uncertainties from the experimental data, model bias, and simulation are all quantified and propagated to the predictions of displacement w during model bias correction. Finally, the predictions of displacement from neighboring locations carry uncertainty to the prediction of σ through the machine learning regression model in Eq. (8), which introduces an additional source of uncertainty that is characterized by the MSE of the regression model. The overall uncertainty of σ is quantified altogether by uncertainty propagation approaches [30].

## Final Predictions

With the updated predictions of w and the neural network model for the relationship between the two QoIs, the remaining task focuses on the two prediction scenarios depicted in Fig. 15. For the first prediction scenario, in order to estimate Pf, the prediction of σ is required, as well as optimization algorithm for finding the maximum stress over the entire stress field of the tank under the nominal operating condition. For the second prediction scenario, we propose a presampling scheme to efficiently locate the loading levels on the failure frontier.

###### Prediction Scenario 1: Probability of Failure at Nominal Conditions.

As shown in the left part of Fig. 15, in order to estimate the probability of failure Pf based on the failure criterion, the maximum stress under nominal operating condition must be located. The flowchart in Fig. 16 provides details for maximizing the prediction of σ and estimating Pf. In the beginning, a starting point (x0, φ0) of location variables x and φ is chosen in the design space. The GP model for displacement is then used for the predictions w, w1, w2, w3, and w4 under the nominal conditions. With the predictions of displacements from neighboring locations, σ under the nominal conditions is determined via the machine learning model created in Sec. 3.4. The stress field is maximized by the following formulas:

Display Formula

(9)$maxmean(σ)=f(x,φ,H,w,w1,w2,w3,w4;θ*)st:0≤x≤L/20≤φ≤π$
Once the maximum stress is located, its variance is estimated by uncertainty propagation approaches [30]. The uncertain estimates of the maximum stress and the yield stress σy with experimental uncertainty are combined in the limit state function g = σy − max(σ) for Pf estimation.

For the challenge problem, two local maxima are found in the design space of x and φ, marked as the pentagram A and the triangle B in Fig. 17. Table 4 lists the coordinates of the two local maximum and their corresponding mean predictions, SD, and estimated probabilities of failure. Although the maximum mean stress prediction at location B is larger than that of location A, the uncertainty of prediction at B is much more significant as there are very few data points around B. Since there is no available expert opinion regarding the failure of the tank, we prefer to trust the σ prediction at A rather than B due to the fact that the prediction of the former has more support from the data in surrounding areas, and thus, its prediction error is smaller. For both location points A and B, the failure probability estimates are larger than 10−3, which represent unacceptably large probability that the tank will fail under the nominal operating condition. Because the square root of MSE for the neural network model in Eq. (8) is much smaller than the SD of the maximum stress predictions, uncertainty introduced by the machine learning model will have very small influence on the failure probability estimates.

###### Prediction Scenario 2: Determine “Safe” Operating Conditions.

In order to determine a set of safe operating conditions of the tank in the operating limits as shown in Fig. 15, Pf must be compared with the 10−3 threshold at every point in the 3D space of P, χ, and H. Estimating Pf in the whole operating space can be computationally expensive considering each operating condition is an optimization problem for maximizing σ. However, since allχ can be converted toγ using the empirical liquid model provided in data set 3, as shown in Fig. 18, the operating limits can therefore be transformed from

Display Formula

(10)$P∈[15,75]psigχ∈[0.1,1]H∈(0,55]in.$

(11)$P∈[15,75]psigγ∈[2.7138, 3.2361]H∈(0,55]in.$
Because P, γ, and H are all loading variables, the higher the loading levels, the larger the probability of failure, and vice versa. Therefore, if the probability of failure Pf is lower than the threshold under the lowest loading level while higher under the highest loading level, i.e., Display Formula
(12)$Pf(P=15 psig,γ=2.7138,H=0 in.)<10−3
We can conclude that there exists a continuous convex failure surface SGamma in the operating space of P, γ, and H, on which each point corresponds to the threshold failure probability (i.e., 10−3).

To efficiently obtain SGamma, we propose a presampling scheme for specifying locations at which to perform Pf estimations in the three-dimensional space of P, γ, and H. Analogous to the three-dimensional operating space for the current problem, Fig. 19 shows two different cases of a two-dimensional normalized space to illustrate the proposed presampling scheme. The failure frontiers are indicated by blue curves. The normalized 2D spaces are first partitioned into several identical squares, the black and red nodes are the first set of presamples that, respectively, represent the highest and lowest loading levels of each square. Probability of failures are estimated at these presamples and then compared with the threshold 10−3. If the tank fails under the loading condition of the corresponding presample, i.e., if Pf > 10−3, a code 1 will be assigned to the nodes, otherwise a code 0 indicating “being safe” will be assigned. Since SGamma can only lie between areas with 0 for the lowest level (red node) and 1 for the highest loading level (black node), areas with identical codes for the highest and lowest loading levels, such as the green and red areas, will all be excluded from the sampling space. The areas with different codes of high and low levels will be further partitioned to generate new presamples for estimations of failure probability. A desired number of loading levels which are close to the failure frontier are identified by iteratively repeating the presampling process. To use which as starting points, the corresponding set of n loading levels on the failure frontier SGamma is accurately located by solving Eq. (13), the loading levels on SGamma are denoted by $LGamma*={[P,γ,H]1*,…,[P,γ,H]n*}$

Display Formula

(13)$SGamma=10−3−Pf(P,γ,H)=0s.t.P∈[15,75]psigγ∈[2.7138, 3.2361]H∈(0,55]in.$

According to Fig. 18, γ decreases monotonically with χ in [0.1, 0.3299] and increases in (0.3299, 1]. Therefore, as χ increases from 0.1 to 1, each γ corresponds to two different values of the liquid composition χ. As a result, SGamma in the operating space of P,γ, and H will be transformed into two failure frontiers in the operating space of P,χ, and H, which are denoted as $SChi1$ and $SChi2$. Based on the liquid model, $LGamma*$ is converted into two sets of operating conditions $LChi1*$ and $LChi2*$, which, respectively, belong to $SChi1$ and $SChi2$. Then, the two failure frontiers $SChi1$ and $SChi2$ in the space of P, χ, and H can be obtained by building regression models using data sets $LChi1*$ and $LChi2*$, i.e., Display Formula

(14)$χ={r1(P,H)+ε1r2(P,H)+ε20.1≤χ≤0.32990.3299<χ≤1$

where Display Formula

(15)$ε1∼N(0,MSE12)ε2∼N(0,MSE22)$

Therefore, Display Formula

(16)$SChi1(P,H,χ)=χ−r1(P,H)+ε10.1≤χ≤0.3299SChi2(P,H,χ)=χ−r2(P,H)+ε20.3299<χ≤1$

Figure 20 shows the mean predictions of the failure frontiers $SChi1$ and $SChi2$. Operating conditions above $SChi1$ and below $SChi2$ are the safe operating conditions for the tank, i.e.,

Display Formula

(17)$safe[P,H,χ]={{P,H,χ|SChi1(P,H,χ)>0,χ∈[0.1,0.3299],P∈[15,75],H∈(0,55]}{P,H,χ|SChi2(P,H,χ)<0,χ∈[0.3299,1],P∈[15,75],H∈(0,55]}$

## Conclusion

An integrated Bayesian calibration, bias correction, and machine learning approach are presented for addressing the Sandia V&V challenge problem. Three challenges of the problem are identified and addressed in detail via the presented approach. The benefits of the Bayesian model updating are that it not only improves model predictions in untested regions by combining multiple sources of information but also allows the quantification of uncertainty from multiple sources using the Bayesian model calibration and bias correction formula in the model updating process. Furthermore, by employing the proposed machine learning technique for identifying and exploiting the relationship between stress and displacement, we develop a surrogate bias-corrected model for σ (which was not measured experimentally) based on an explicit bias-corrected model for w (which was experimentally measured). The bias-corrected model for σ allows us to predict Pf under the nominal operating condition in prediction scenario 1. For prediction scenario 2, a presampling scheme is developed for efficiently locating the failure frontier that divides the operating space.

In the proposed approach, credibility of the final predictions is addressed by quantifying multiple sources of uncertainty in predictions and aggregating their impacts on the end quantity of interest. As a result, a large uncertainty will lead to low reliability in meeting desired performance. A potential shortcoming of this treatment is that the reliability assessment might be overconservative, due to its consideration of all sources of uncertainty, including the interpolation uncertainty associated with lack of data. To further improve our approach to the challenge problem, the computational costs of different meshes could be taken into account by applying multifidelity UQ methods. The level of identifiability of model parameters could also be assessed prior to observing any experimental data [27]. Finally, if experimental data for stress are available, multiresponse Bayesian calibration and bias correction methods can be applied for improving the identifiability of model parameters and enhancing the predictive power of the updated model [10,24].

## Acknowledgements

The grant support from the U.S. National Science Foundation (CMMI-1233403) was gratefully acknowledged. Wei Li and Shishi Chen's predoctoral visit at the Northwestern University is sponsored by the China Scholarship Council.

## Appendices

###### Appendix: Modular Bayesian Model Calibration Approach

The four modules in the flowchart of Fig. 6 are implemented as follows:

###### Module 1: GP Model for the Computer Model.

In this module, a GP model is fitted using data from model simulations to predict the computer responses for untried simulation inputs. The prior mean and covariance function of the GP model is expressed as Display Formula

(A1)$ym(x,θ)∼GP(hm(x,θ)βm,σm2Rm((x,θ),(x′,θ′)))$
where $hm(⋅,⋅)$ is a vector of user-predefined regression functions and $βm$ is a vector of the corresponding unknown regression coefficients. $σm2$ is the unknown prior variance and $Rm((⋅,⋅),(⋅,⋅))$ is the spatial correlation function between two sets of input variables and parameters. In this paper, we define a constant prior mean for the GP model, i.e., $hm(⋅,⋅)=1$, and we choose a Gaussian correlation function for $Rm((⋅,⋅),(⋅,⋅))$, i.e., Display Formula
(A2)$Rm(x,x′)=exp {−∑k=1dωk(xk−xk′)2}$
in which $ω=[ω1m,ω2m,…,ωd+rm]T$ are the length scale parameters for capturing the nonlinearity of the process. d and r are the numbers of input variables and calibration parameters, respectively.

Let $Ym=[Ym(x1m,θ1m),,…Ym(xnmm,θnmm)]$ denote the simulation data collected at nm input sites. The multivariate normal log-likelihood function for Ym is maximized to estimate the hyperparameters Φm=[βm, $σm2$,ωm]. The posterior distribution of ym (x,θ) can be obtained by substituting the maximum likelihood estimates of Φm into Eq. (A1).

###### Module 2: GP Model for the Bias Function.

The prior of GP model for bias function δ(x) is Display Formula

(A3)$δ(x)∼GP(hδ(x)βδ,δδ2R(x,x′))$
Similar to the GP model for computer simulation, we define a constant prior mean and Gaussian correlation function for characterizing the GP model for the bias function. Assuming a priori independence between the computer model, the discrepancy function and the experimental variability, and by further assuming the experimental variability ε follows a zero-mean normal distribution ε ∼ N(0, λ), the two GP models in Eqs. (A1) and (A3) are combined to form the GP model for the experimental response Display Formula
(A4)$ye(x)|θ∼GP(me(x,θ),Ve((x,θ),(x′,θ′)))$
where the prior mean is Display Formula
(A5)$me(x,θ)=hm(x,θ)βm+hδ(x)βδ$
and the prior covariance is Display Formula
(A6)$Ve((x,θ),(x′,θ′))=σm2Rm((x,θ),(x′,θ′))+δδ2R(x,x′)+λ$

The detailed procedure for obtaining the maximum likelihood estimates of the hyperparameters Φδ=[βδ, $σδ2$ ,ωδ, λ]using only a prior uniform distribution p(θ) for calibration parameters and the collected simulation and experimental data was developed by KOH [9], and a closed-form expression for the likelihood function with a normal p(θ), constant prior mean, and Gaussian correlation functions is derived in Sec. 3 of Ref. [9].

###### Module 3: Posterior Distribution for Calibration Parameters.

Let d be all the data collected from the computer simulations Ym and experimental observations Ye, i.e., d = [(Ym)T, (Ye)T]T, with the estimated hyperparameters $Φ̂$  = MLEs [Φμδ]. The posterior distribution of calibration parameters is Display Formula

(A7)$p(θ|d,Φ̂)=p(d|θ,Φ̂)p(θ)p(d|Φ̂)$
The multivariate normal likelihood function p(d|θ, $Φ̂$) is determined by the mean and covariance functions of the GP models for both the computer simulation (Eq. (A1)) and the experimental response (Eq. (A4)).

###### Module 4: Prediction of Experimental Response.

The equations for calculating the posterior mean and variance of the experimental response for a given calibration parameter vector θ are provided in Sec. 4.1 of Ref. [9]. This predictive distribution for the experimental response can then be marginalized with respect to the posterior distribution p(θ|d, $Φ̂$) obtained in module 3. Multiple sources of uncertainty including model discrepancy, interpolation uncertainty, experimental variability, and parameter uncertainty are all taken into account in the posterior distribution for the final predictions. The posterior distribution of the bias function can be calculated in a manner similar to that of the experimental response.

## References

Box, G. E. , and Draper, N. R. , 1987, Empirical Model-Building and Response Surfaces, Wiley, New York.
Hu, K. , 2014, “ 2014 V&V Challenge: Problem Statement,” SAND Report No. 2013-10486P.
Hu, K. , and Orient, G. , 2015, “ The 2014 Sandia V&V Challenge Problem: A Case Study in Simulation, Analysis, and Decision Support,” ASME J. Verif., Validation Uncertainty Quantif. (submitted).
Hills, R. G. , Pilch, M. , Dowding, K. J. , Red-Horse, J. , Paez, T. L. , Babuška, I. , and Tempone, R. , 2008, “ Validation Challenge Workshop,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2375–2380.
Dowding, K. J. , Pilch, M. , and Hills, R. G. , 2008, “ Formulation of the Thermal Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2385–2389.
Babuška, I. , Nobile, F. , and Tempone, R. , 2008, “ Formulation of the Static Frame Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2496–2499.
Red-Horse, J. , and Paez, T. , 2008, “ Sandia National Laboratories Validation Workshop: Structural Dynamics Application,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2578–2584.
Bayarri, M. J. , Berger, J. O. , Paulo, R. , Sacks, J. , Cafeo, J. A. , Cavendish, J. , Lin, C.-H. , and Tu, J. , 2007, “ A Framework for Validation of Computer Models,” Technometrics, 49(2), pp. 138–154.
Kennedy, M. C. , and O'Hagan, A. , 2001, “ Bayesian Calibration of Computer Models,” J. R. Stat. Soc.: Ser. B (Stat. Methodol.), 63(3), pp. 425–464.
Arendt, P. D. , Apley, D. W. , and Chen, W. , 2012, “ Quantification of Model Uncertainty: Calibration, Model Discrepancy, and Identifiability,” ASME J. Mech. Des., 134(10), p. 100908.
Schwer, L. , Mair, H. , and Crane, R. , 2006, “ Guide for Verification and Validation in Computational Solid Mechanics,” ASME V&V 10-2006, New York.
Oberkampf, W. L. , Sindir, M. , and Conlisk, A. , 1998, Guide for the Verification and Validation of Computational Fluid Dynamics Simulations, Reston, VA, AIAA-G-077-1998.
Liu, Y. , Chen, W. , Arendt, P. , and Huang, H.-Z. , 2011, “ Toward a Better Understanding of Model Validation Metrics,” ASME J. Mech. Des., 133(7), p. 071005.
Ferson, S. , Oberkampf, W. L. , and Ginzburg, L. , 2008, “ Model Validation and Predictive Capability for the Thermal Challenge Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2408–2430.
Roy, C. J. , and Oberkampf, W. L. , 2011, “ A Comprehensive Framework for Verification, Validation, and Uncertainty Quantification in Scientific Computing,” Comput. Methods Appl. Mech. Eng., 200(25), pp. 2131–2144.
Jiang, X. , and Mahadevan, S. , 2008, “ Bayesian Validation Assessment of Multivariate Computational Models,” J. Appl. Stat., 35(1), pp. 49–65.
Li, W. , Chen, W. , Jiang, Z. , Lu, Z. , and Liu, Y. , 2014, “ New Validation Metrics for Models With Multiple Correlated Responses,” Reliab. Eng. Syst. Saf., 127, pp. 1–11.
Apley, D. W. , Liu, J. , and Chen, W. , 2006, “ Understanding the Effects of Model Uncertainty in Robust Design With Computer Experiments,” ASME J. Mech. Des., 128(4), pp. 945–958.
Jiang, Z. , Chen, W. , Fu, Y. , and Yang, R.-J. , 2013, “ Reliability-Based Design Optimization With Model Bias and Data Uncertainty,” SAE Int. J. Mater. Manuf., 6(3), pp. 502–506.
Chen, W. , Tsui, K.-L. , Wang, S. , and Xiong, Y. , 2008, “ A Design-Driven Validation Approach Using Bayesian Prediction Models,” ASME J. Mech. Des., 130(2), p. 021101.
Oberkampf, W. L. , and Barone, M. F. , 2006, “ Measures of Agreement Between Computation and Experiment: Validation Metrics,” J. Comput. Phys., 217(1), pp. 5–36.
Zhang, R. , and Mahadevan, S. , 2003, “ Bayesian Methodology for Reliability Model Acceptance,” Reliab. Eng. Syst. Saf., 80(1), pp. 95–103.
Loehle, C. , 1997, “ A Hypothesis Testing Framework for Evaluating Ecosystem Model Performance,” Ecol. Modell., 97(3), pp. 153–165.
Arendt, P. D. , Apley, D. W. , Chen, W. , Lamb, D. , and Gorsich, D. , 2012, “ Improving Identifiability in Model Calibration Using Multiple Responses,” ASME J. Mech. Des., 134(10), p. 100909.
Sobol′, I. M. , 2001, “ Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates,” Math. Comput. Simul., 55(1–3), pp. 271–280.
Saltelli, A. , Tarantola, S. , and Chan, K.-S. , 1999, “ A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41(1), pp. 39–56.
Arendt, P. D. , Apley, D. W. , and Chen, W. , 2015, “ A Preposterior Analysis to Predict Identifiability in Experimental Calibration of Computer Models,” IIE Trans., 48(1), p. 75.
Angus, J. E. , 1994, “ The Probability Integral Transform and Related Results,” SIAM Rev., 36(4), pp. 652–654.
Timoshenko, S. , Woinowsky-Krieger, S. , and Woinowsky-Krieger, S. , 1959, Theory of Plates and Shells, McGraw-Hill, New York.
Lee, S. , and Chen, W. , 2009, “ A Comparative Study of Uncertainty Propagation Methods for Black-Box Type Problems,” Struct. Multidiscip. Optim., 37(3), pp. 239–253.
View article in PDF format.

## References

Box, G. E. , and Draper, N. R. , 1987, Empirical Model-Building and Response Surfaces, Wiley, New York.
Hu, K. , 2014, “ 2014 V&V Challenge: Problem Statement,” SAND Report No. 2013-10486P.
Hu, K. , and Orient, G. , 2015, “ The 2014 Sandia V&V Challenge Problem: A Case Study in Simulation, Analysis, and Decision Support,” ASME J. Verif., Validation Uncertainty Quantif. (submitted).
Hills, R. G. , Pilch, M. , Dowding, K. J. , Red-Horse, J. , Paez, T. L. , Babuška, I. , and Tempone, R. , 2008, “ Validation Challenge Workshop,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2375–2380.
Dowding, K. J. , Pilch, M. , and Hills, R. G. , 2008, “ Formulation of the Thermal Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2385–2389.
Babuška, I. , Nobile, F. , and Tempone, R. , 2008, “ Formulation of the Static Frame Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2496–2499.
Red-Horse, J. , and Paez, T. , 2008, “ Sandia National Laboratories Validation Workshop: Structural Dynamics Application,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2578–2584.
Bayarri, M. J. , Berger, J. O. , Paulo, R. , Sacks, J. , Cafeo, J. A. , Cavendish, J. , Lin, C.-H. , and Tu, J. , 2007, “ A Framework for Validation of Computer Models,” Technometrics, 49(2), pp. 138–154.
Kennedy, M. C. , and O'Hagan, A. , 2001, “ Bayesian Calibration of Computer Models,” J. R. Stat. Soc.: Ser. B (Stat. Methodol.), 63(3), pp. 425–464.
Arendt, P. D. , Apley, D. W. , and Chen, W. , 2012, “ Quantification of Model Uncertainty: Calibration, Model Discrepancy, and Identifiability,” ASME J. Mech. Des., 134(10), p. 100908.
Schwer, L. , Mair, H. , and Crane, R. , 2006, “ Guide for Verification and Validation in Computational Solid Mechanics,” ASME V&V 10-2006, New York.
Oberkampf, W. L. , Sindir, M. , and Conlisk, A. , 1998, Guide for the Verification and Validation of Computational Fluid Dynamics Simulations, Reston, VA, AIAA-G-077-1998.
Liu, Y. , Chen, W. , Arendt, P. , and Huang, H.-Z. , 2011, “ Toward a Better Understanding of Model Validation Metrics,” ASME J. Mech. Des., 133(7), p. 071005.
Ferson, S. , Oberkampf, W. L. , and Ginzburg, L. , 2008, “ Model Validation and Predictive Capability for the Thermal Challenge Problem,” Comput. Methods Appl. Mech. Eng., 197(29), pp. 2408–2430.
Roy, C. J. , and Oberkampf, W. L. , 2011, “ A Comprehensive Framework for Verification, Validation, and Uncertainty Quantification in Scientific Computing,” Comput. Methods Appl. Mech. Eng., 200(25), pp. 2131–2144.
Jiang, X. , and Mahadevan, S. , 2008, “ Bayesian Validation Assessment of Multivariate Computational Models,” J. Appl. Stat., 35(1), pp. 49–65.
Li, W. , Chen, W. , Jiang, Z. , Lu, Z. , and Liu, Y. , 2014, “ New Validation Metrics for Models With Multiple Correlated Responses,” Reliab. Eng. Syst. Saf., 127, pp. 1–11.
Apley, D. W. , Liu, J. , and Chen, W. , 2006, “ Understanding the Effects of Model Uncertainty in Robust Design With Computer Experiments,” ASME J. Mech. Des., 128(4), pp. 945–958.
Jiang, Z. , Chen, W. , Fu, Y. , and Yang, R.-J. , 2013, “ Reliability-Based Design Optimization With Model Bias and Data Uncertainty,” SAE Int. J. Mater. Manuf., 6(3), pp. 502–506.
Chen, W. , Tsui, K.-L. , Wang, S. , and Xiong, Y. , 2008, “ A Design-Driven Validation Approach Using Bayesian Prediction Models,” ASME J. Mech. Des., 130(2), p. 021101.
Oberkampf, W. L. , and Barone, M. F. , 2006, “ Measures of Agreement Between Computation and Experiment: Validation Metrics,” J. Comput. Phys., 217(1), pp. 5–36.
Zhang, R. , and Mahadevan, S. , 2003, “ Bayesian Methodology for Reliability Model Acceptance,” Reliab. Eng. Syst. Saf., 80(1), pp. 95–103.
Loehle, C. , 1997, “ A Hypothesis Testing Framework for Evaluating Ecosystem Model Performance,” Ecol. Modell., 97(3), pp. 153–165.
Arendt, P. D. , Apley, D. W. , Chen, W. , Lamb, D. , and Gorsich, D. , 2012, “ Improving Identifiability in Model Calibration Using Multiple Responses,” ASME J. Mech. Des., 134(10), p. 100909.
Sobol′, I. M. , 2001, “ Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates,” Math. Comput. Simul., 55(1–3), pp. 271–280.
Saltelli, A. , Tarantola, S. , and Chan, K.-S. , 1999, “ A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41(1), pp. 39–56.
Arendt, P. D. , Apley, D. W. , and Chen, W. , 2015, “ A Preposterior Analysis to Predict Identifiability in Experimental Calibration of Computer Models,” IIE Trans., 48(1), p. 75.
Angus, J. E. , 1994, “ The Probability Integral Transform and Related Results,” SIAM Rev., 36(4), pp. 652–654.
Timoshenko, S. , Woinowsky-Krieger, S. , and Woinowsky-Krieger, S. , 1959, Theory of Plates and Shells, McGraw-Hill, New York.
Lee, S. , and Chen, W. , 2009, “ A Comparative Study of Uncertainty Propagation Methods for Black-Box Type Problems,” Struct. Multidiscip. Optim., 37(3), pp. 239–253.

## Figures

Fig. 2

General framework for model updating (calibration and bias correction), UQ, and model validation [10]

Fig. 3

Detailed work flow for the challenge problem

Fig. 4

First-order sensitivity index of model parameters under the pressure only loading conditions: six different pressure loading levels are evenly spaced over [0, 150] psig

Fig. 5

First-order sensitivity index of model parameters under the nominal loading condition (P = 73.5 psig, χ = 0.1, and H = 50 in.)

Fig. 6

Flowchart of the modular Bayesian approach

Fig. 7

Joint posterior distribution of calibration parameters E and T

Fig. 8

Comparison between experimental data and model predictions for displacement w after model calibration under the pressure only loading conditions

Fig. 9

Comparison between reserved validation data and updated model predictions for displacement w after model bias correction under the pressure plus liquid loading conditions

Fig. 10

U-pooling approach: Transformation of observations through predictive distributions to a standard uniform probability scale

Fig. 11

Comparison of u-pooling metrics for the calibrated model and the bias-corrected model without parameter calibration

Fig. 12

Comparison between validation data and their corresponding predictions for the bias-corrected model without parameter calibration for displacement w under the pressure plus liquid loading conditions

Fig. 13

Displacements at neighboring locations, which serve as input variables in the machine learning model for predicting σ at location (x, φ)

Fig. 14

Validation results of the neural network model for σ

Fig. 15

Two prediction scenarios

Fig. 16

Flow chart for maximizing the stress and estimating failure probability

Fig. 17

Local maxima of stress predictions

Fig. 18

Transformation between liquid composition χ and liquid-specific weight γ

Fig. 19

Presampling scheme for efficiently locating the failure frontier

Fig. 20

Failure frontiers in the operating space of P, χ, and H

## Tables

Table 2 Representations of measurement errors
Table 3 Results of model parameter calibration
Table 4 Predictions at local maxima

## Discussions

Some tools below are only available to our subscribers or users with an online account.

### Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related Proceedings Articles
Related eBook Content
Topic Collections