0
Research Papers

Sandia Verification and Validation Challenge Problem: A PCMM-Based Approach to Assessing Prediction CredibilityPUBLIC ACCESS

[+] Author and Article Information
Lauren L. Beghini

Multi-Physics Modeling and Simulation,
Sandia National Laboratories,
P.O. Box 969, MS 9042,
Livermore, CA 94550-0969
e-mail: llbeghi@sandia.gov

Patricia D. Hough

Quantitative Modeling and Analysis,
Sandia National Laboratories,
P.O. Box 969, MS 9159,
Livermore, CA 94550-0969
e-mail: pdhough@sandia.gov

1Corresponding author.

Manuscript received February 7, 2015; final manuscript received December 18, 2015; published online February 19, 2016. Guest Editor: Kenneth Hu.

J. Verif. Valid. Uncert 1(1), 011002 (Feb 19, 2016) (10 pages) Paper No: VVUQ-15-1009; doi: 10.1115/1.4032369 History: Received February 07, 2015; Revised December 18, 2015

Abstract

The process of establishing credibility in computational model predictions via verification and validation (V&V) encompasses a wide range of activities. Those activities are focused on collecting evidence that the model is adequate for the intended application and that the errors and uncertainties are quantified. In this work, we use the predictive capability maturity model (PCMM) as an organizing framework for evidence collection activities and summarizing our credibility assessment. We discuss our approaches to sensitivity analysis, model calibration, model validation, and uncertainty quantification and how they support our assessments in the solution verification, model validation, and uncertainty quantification elements of the PCMM. For completeness, we also include some limited assessment discussion for the remaining PCMM elements. Because the computational cost of performing V&V and the ensuing predictive calculations is substantial, we include discussion of our approach to addressing computational resource considerations, primarily through the use of response surface surrogates and multiple mesh fidelities.

<>

Introduction

The process of verification and validation (referred to hereafter as V&V) is useful for establishing credibility in simulation-based predictions and assessing the associated uncertainty of those predictions. As defined in Ref. [1], verification is the process of assessing software correctness and numerical accuracy of the solution to a given mathematical model while validation is the process of assessing the physical accuracy of the mathematical model using comparisons between experimental data and computational results. In this paper, we discuss our approach to the 2014 Sandia V&V challenge problem.

Background.

In short, the goal of the 2014 V&V challenge problem was, given a computational model of liquid-storage tanks, to (i) compute the probability of failure and uncertainty at nominal test conditions and (ii) determine the loading levels which will violate the probability of failure threshold, P(Fail) < 10−3, and (iii) assess the credibility of those predictions. We note that “failure” is defined for this problem strictly based on stress (i.e., failure occurs when the Von Mises stress exceeds the yield stress at any point).

A complete description of the problem statement and detailed information appears in Ref. [2]. We do summarize some of the key pieces here. The model simulates an idealized tank under pressure and liquid loading. It can also simulate a pressure loading only scenario. Response quantities reported by the model are displacements and stresses at a range of locations on the tank. Four meshes of different levels of discretization are provided; however, the model is a black box, which prohibits the ability to explore geometry and physics aspects of the model.

In support of the computational activity, the following data was supplied:

• legacy data from the manufacturer (including material properties and tank dimensions)

• coupon tests in a controlled lab environment

• liquid characterization tests in a controlled lab environment

• full tank tests in a controlled lab environment with no loading applied

• full tank tests in a controlled lab environment under a pressure loading

• full tank tests in a production environment under a pressure and liquid loading

Analysis Philosophy.

We take a comprehensive approach that employs a range of technical activities to collect evidence in support of assessing the predictive credibility of the model used to make the predictions and address (i) the probability of failure and (iii) assess the credibility of those predictions. Our aim for this work is to present a practical, deliverable driven approach. Rather than focusing on the development of new methods, our approach instead utilizes existing methods within Dakota [3,4] to conduct a full, end-to-end engineering analysis. As the participants were advised to consider cost and time constraints, our team placed emphasis on the big picture analysis rather than on performing deep dives into any given individual piece in order to better answer the questions posed and provide recommendations. Thus, by utilizing off-the-shelf tools rather than writing custom codes, we were able to spend more time on generating evidence to inform the PCMM [1,5] which was used to assess the credibility of the predictions. We felt such an assessment would be crucial in delivery to the customer. As a result, we did not address (ii) determine the loading levels which violate the failure threshold due to time constraints.

Summary of Approach.

The key technical activities of the work discussed in this paper, namely, solution verification, sensitivity analysis, calibration, validation, and prediction, and how they tie together with each other and bridge between the different experimental data sets provided are highlighted in Fig. 1. In summary, the tank measurements, material properties and liquid properties were used to establish initial bounds on the material properties. Those bounds were fed forward to the laboratory tests under controlled pressure to perform solution verification, sensitivity analysis, and a multifidelity calibration of the material properties. Using the results of the calibration, the provided code was then used to simulate the field tests, which were validated against the field test data. Finally, predictions of the probability of failure were made, and the credibility of those predictions was discussed in light of the evidence collected throughout the activities leading up to the prediction. For all of these computational studies, we make use of readily available methods in the Dakota software package [3].

The context in which we evaluate the credibility of the predictions we will ultimately make is the PCMM [1,5]. The PCMM is a model which assesses the confidence one should place in the simulation and its results through evaluation of the maturity level of the following six fundamental elements:

• geometric fidelity: level of detail included in the spatial and temporal definition of the system

• physics fidelity: range of physics modeling fidelity ranging from empirical models to first-principles physics

• code verification: correctness of source code and numerical algorithm

• solution verification: assessment of numerical solution errors and confidence in computational results

• model validation: assessment of the physical accuracy of the model by comparing the experimental data and computational results

• uncertainty quantification: identification and characterization of uncertainties, and thoroughness of sensitivity analysis to determine sources of uncertainties

At the end of this paper, we will return to these elements and provide an assessment and maturity level score associated with each of these elements based on the evidence we generate as part of the analysis activities. The maturity level scores can then be used to aid decision-making activities.

In Sec. 2, we discuss a sensitivity analysis we did for the pressure loading only scenario. This analysis is done across all mesh fidelities in order to simultaneously address solution verification and identification of driving model parameters. The intent is to base our choice of mesh fidelity not just on error at the nominal choice of parameter values but also on consistency of parameter sensitivities. Section 3 describes our approach to calibration, which makes use of multiple model fidelities. The intent of this activity is primarily to refine our estimates of model parameters such that model predictions are consistent with experimental data. Additionally, this activity can support identification of model form error and help inform distributions on model parameters. Section 4 describes validation of model predictions against displacement data collected in the field. This includes the use of a polynomial chaos expansion (PCE) representation of model predictions both to perform sensitivity analysis and to generate a large number of instantiations over the parameter uncertainties to compute a broad range for errors between model and data. Section 5 discusses the probability of failure prediction, which makes use of a global reliability method based on a Gaussian process (GP) to incorporate parameter uncertainties in a computationally tractable way. Section 6 describes our assessment of the credibility of the failure predictions in the context of the PCMM, and Sec. 7 concludes the paper with a final recommendation and suggestions for future work that would further this effort.

Sensitivity Study and Solution Verification

The first step in our analysis was a sensitivity study conducted over the model parameters for the pressure-only model. We included the mesh as a parameter in order to assess the importance of the mesh discretization on the quantity of interest relative to the model parameters and to inform our choice of mesh for the validation and prediction elements of the analysis. The sensitivity study also serves as a screening of the parameters to identify those with the most influence on the quantity of interest and thus those we would want to include in model calibration and uncertainty quantification associated with validation and prediction.

For the purposes of the sensitivity study, we constructed tolerance intervals [6,7] using the provided data on tank measurements, material properties, and liquid properties in order to establish initial bounds on those model parameters. Given a number of data samples, a tolerance interval provides the bounds within which a specified percentage of the true population falls with a specified amount of confidence. For this problem, we used 99/90 tolerance intervals, i.e., the bounds in which 99% of the population lies with 90% confidence. We used minitab [8] to compute the intervals. The Minitab approach is based on Ref. [9], which defines the tolerance interval to be (μ − , μ + ), where μ is the sample mean, σ is the standard deviation, and Display Formula

(1)$k=ν(1+1N)z(1−p)/22χ1−γ,ν2$

N is the number of samples, $χ1−γ,ν2$ is the critical value of the chi-square distribution with degrees of freedom ν that is exceeded with probability γ, and z(1−p)/2 is the critical value of the normal distribution associated with cumulative probability (1 − p)/2. We note that much of this notation duplicates that used to define the material properties. We do this to remain consistent with common notation used for tolerance interval definition. This is the only place in the paper where the notation takes on these meanings; the material property definitions apply everywhere else. While this approach makes a normality assumption that we cannot confirm given the very small amount of data, normality tests indicate that we cannot rule out a normal distribution. Furthermore, Ref. [10] argues that this approach more reliably estimates conservative bounds when data is scarce than other approaches and other distributions. The bounds we obtained for the model parameters are as follows:

• E ∈ [2.522 × 107, 3.139 × 107] (psi)

• ν ∈ [0.250, 0.293]

• L ∈ [59.317, 61.926] (in.)

• R ∈ [29.221, 32.875] (in.)

• T ∈ [0.203, 0.263] (in.)w

We took pressure P ∈ [27.911, 154.245] (psi) to encompass the range reported in the pressure-only lab tests plus the 5% measurement error.

To establish the relationship between model fidelities, a small initial Latin hypercube sampling (LHS) [11] study over the aforementioned parameters was conducted at each level of mesh discretization. More specifically, we took an incremental LHS approach that allowed us to progressively double the sample size in such a way that reused existing samples and that generated a new LHS sample that preserved the stratification and correlation structures of the original sample. The total number of simulations runs using this approach were:

• mesh 4 (finest) = 8 runs

• mesh 3 = 16 runs

• mesh 2 = 32 runs

• mesh 1 (coarsest) = 64 runs

Note that the number of runs was increased as the mesh discretization decreased. Because more runs can be accommodated with faster run times, we did this as a means of having as large a basis of comparison from one mesh level to the next as possible as well as putting in place the information needed to generate surrogates for further analysis once a mesh has been chosen. The minimum number of samples required to establish linear correlations is one more than the number of parameters. Given that we were considering six model parameters, we chose eight as the initial sample size and ran all meshes at those samples. We then doubled the number of samples via the incremental LHS approach to get the next set at which to run meshes 1–3, bringing those to a total of 16 samples. We repeated this process until we reached the reported number of samples for the remaining two meshes. Given experience-based projections of the level of parallelism we could leverage for each mesh, the number of runs we could do simultaneously, and the throughput of the available computing platforms, we determined this allocation of samples to be consistent with what could be completed in a week to 10 days of time. The general observations of the study were as follows:

• As maximum stress and displacement values are the quantities of interest for the failure predictions that will need to be made, we examined the consistency of these values across meshes and found them to be consistent across fidelities. Maximum stress results for mesh 2 and mesh 3 appear in Fig. 2Fig. 2

Maximum values of stress for mesh 2 relative to values for mesh 3. Each data point corresponds to a single LHS run. Stress values have very small differences.

. Other mesh-to-mesh comparisons are similar, as are comparable comparisons for displacements at locations where lab measurements were taken.

• The location of the maximum stress was consistently along the bottom of the tank ($ϕ=0$) with the distance from the centerline between roughly 26.0 in. and 28.5 in. and with a typical deviation between meshes of about 0.5 in. Further examination of the locations revealed the variance in location might be due to numerical precision and resolution of the meshes (flat slope of curve between reported max locations, as illustrated in Fig. 3Fig. 3

Variance in location of maximum stress for mesh 2 and mesh 3 is likely due to small numerical differences between meshes

).

• For each of the initial eight samples, the differences were calculated between the maximum stress from mesh 4 and the stress at the same location from each of the other meshes. The differences between stress predictions ranged from 0.001% to 0.065% for mesh 3, 0.002% to 0.091% for mesh 2, and 0.009% to 0.127% for mesh 1.

• For sensitivity analysis, we focused on partial rank correlations [12], which capture the influence of parameters on the trend in the model response(s) of interest. We placed the emphasis on trends in order to identify parameters that will be most important in calibration. Furthermore, we used rank correlations because of the large difference in magnitude between the model parameters. Results for maximum stress appear in Table 1Table 1

Sensitivity results: partial rank correlations for max stress and corresponding location using mesh 3. The closer the magnitude of the correlation to one, the stronger the influence that parameter has on the trend in the response.

Max stressLocationE−0.3420.343ν0.240−0.085L−0.2370.852R0.780−0.233T−0.940−0.842P0.988−0.108; results are similar for displacements. Based on these results, we included all parameters in the model calibration.

Given our observations, we decided to move ahead with model calibration using mesh 3 and to leverage meshes 1 and 2 in the process.

Calibration

Since limited test data was provided, we opted to use the displacements from the full tank laboratory tests under a pressure loading (Tanks 1 and 2) for the calibration and to save the displacement data from the field tests (full tanks under pressure and liquid loading—Tanks 3–6) for the validation. The specific objective was to find model parameter values that would yield predicted displacements that most closely fit, on average, those measured in the lab tests. The fit is formulated as a sum of squared displacement differences calculated across all measurement locations, i.e., a nonlinear least-squares objective, with model parameter bounds defined by the tolerance intervals identified in Sec. 2. To perform the calibration using Dakota, we used a multistart nonlinear least-squares solver, nl2sol [13] with ten unique starting points. This allowed us to find possible multiple local minima, corresponding to multiple acceptable sets of nominal values for the model parameters.

Based on our observations regarding the consistent trends in displacement predictions across model fidelities, we incorporated a multifidelity approach in our calibration in order to reduce the number of expensive (mesh 3) simulations that needed to be done. For each step of the process, we report the total number of runs required for a given mesh discretization. We note that this total is computed over all ten optimizations (one for each starting point) and includes the simulations done to compute finite-difference gradients. We additionally note that we leveraged the independence of the computations and performed them simultaneously in order to reduce the time required. The steps were as follows:

1. (1)Using ten starting points, calibrate mesh 1 against all data (327 runs).
2. (2)Feed mesh 1 solutions (see Table 2Table 2

Calibration solutions from each step of the multifidelity calibration process. Solution k·l refers to mesh k, lth local optimization.

SolutionE (107)νLRT1.13.000.29159.4730.550.2301.22.900.29159.4729.950.2321.32.780.29159.4729.440.2371.42.690.25359.6129.440.2541.52.850.29159.4729.660.2341.62.620.29159.5929.440.2521.72.580.29159.6629.440.2591.82.980.25359.4729.940.2331.92.730.29159.4729.440.2411.103.080.28261.5231.070.2502.13.000.28759.4729.700.2252.22.890.29159.4729.440.2292.32.770.29159.4729.440.2382.42.690.29159.6629.440.2462.52.840.29159.4929.440.2322.62.620.29159.5929.440.2522.72.570.29159.7529.440.2592.82.980.29059.5230.370.2302.92.730.29159.6129.440.2422.103.070.28261.5231.070.2503.13.000.28459.5129.700.2253.22.890.29159.5429.440.2293.32.770.29159.6629.440.2383.42.690.29159.6629.440.2463.52.840.29159.5929.440.2333.62.620.29159.5929.440.2523.72.570.29159.8229.440.2593.82.980.29059.5230.370.2303.92.730.29159.6729.440.2423.103.070.27859.4729.440.220) forward to calibrate mesh 2 against all data (243 runs).
3. (3)Feed mesh 2 solutions (see Table 2) forward to calibrate mesh 3 against all data (168 runs).

One might expect larger reductions in the number of simulations required from one calibration level to the next; however, some noise in the least-squares objective function caused by small interacting changes in individual residuals leads to extra steps needed by the optimization method to reach the convergence criteria. Even so, without this multifidelity approach, we estimated that we would have needed around 400 simulations with the mesh 3 model. The solutions for all three steps are reported in Table 2. The final total residuals for all of these solutions fell in the following range:

$res∈(9.0×10−3,1.2×10−2)$

This small range indicates that all solutions found are roughly equivalent and that the residual landscape is likely fairly flat (with some noise) in the region containing these solutions. As a result, there is little evidence for narrowing the original parameter ranges.

In addition to considering the point solutions computed using the deterministic nonlinear least squares solver nl2sol [13] in Dakota, we also examined the 95% confidence intervals for the individual optimal parameters. We do not report the confidence intervals for all of the solutions here, but we note that the intervals became smaller after each step of the calibration process described above. The following confidence intervals are typical of what we observed for all of the solutions obtained using mesh 3:

• E ∈ [−5.098 × 109, 5.158 × 109] (psi)

• ν ∈ [−24.397, 24.965]

• L ∈ [57.911, 61.102] (in.)

• R ∈ [−2.442 × 103, 2.5028 × 103] (in.)

• T ∈ [−9.621, 10.071] (in.)

With the exception of the length parameter, these intervals are clearly unrealistically large and encompass parameter values that are not physically meaningful. This could be the result of insufficiencies in the experimental data, insufficiencies in the model, or errors in confidence interval computation due to nonlinearities. Regardless, we were unable to derive useful information from these confidence intervals to further inform bounds or uncertainty characterizations for the model parameters. As such, the results leave open questions as to whether or not the experimental data is adequate to inform the calibration computation and whether or not there is model form uncertainty that should be characterized. Both clearly point to the need for further investigation.

In addition to computational savings, the multistart approach afforded us additional opportunities to explore the local minima. As a result, we found an inconsistency between the solutions and the material data. In the course of trying to understand the experimental data, we examined the measurements taken from the coupons, shown in Table 3, for correlations between the properties. Figure 4 shows a positive correlation we observed between Young's modulus and the tank thickness. The results of the calibration, however, demonstrate the inverse trend. Measurements of Young's modulus and thickness were taken independently, and physical intuition provides no basis for believing that there would necessarily be a relationship between Young's modulus and thickness, so this brings into question whether or not there is some unintended correlation captured in the experimental data against which we were calibrating as well as whether or not it is resulting in questionable values of the material property values identified. This compounds the question of whether or not the calibration data and the material property data are credible.

We speculated that this error may be due to the difference in orders of magnitude amongst the parameters and recalibrated using parameter scaling. By scaling the parameters, this inconsistency was reduced, but the calibration converged to only one solution, which lays somewhat outside the trend of the samples and materials data (t = 0.221 in. with E = 3.1 × 107 psi). Furthermore, we were unable to obtain confidence intervals for the scaled variant of the problem. As an alternative, we also fit the materials data with a linear regression (E = 8 × 109 t + 9 × 106) which was used as the basis for an additional linear constraint for the calibration, −10 × 106 ≤ 8 × 107 t − E ≤ −8 × 106. With the linear constraint, the calibration again converged to one solution, t = 0.236 in. with E = 2.74 × 107 psi which was closer to the trend given by the samples. However, it begged the question of whether or not we were force fitting the model to an unrealistic trend. As a result, we moved forward with the parameter values determined by the first calibration approach described.

Before we address validation, we note that the extremely large confidence intervals and the discrepancy between calibrated and experimental material relationships raise concerns regarding credibility of the ensuing analysis. In short, we have no pedigree or uncertainty information for the materials data and very little for the calibration data. As a result, we cannot determine if the data sets are adequate for determining material properties. We also cannot yet assess to what extent model form uncertainty may be responsible for our observations. For example, if the correlation between Young's modulus and thickness is real, then the model may be missing the physics that propagates the effects of those parameters appropriately. Thus, we are moving forward with a model with unknown and uncharacterized error.

Validation

With the data from the laboratory tests used previously for the material calibration, the field data for displacements of the tanks under a combined pressure and liquid loading were left to use for validation of the numerical models. The field data provided was somewhat limited, but included displacements for a variety of liquid compositions, liquid heights, and pressures at a set of circumferential angles and axial locations for tanks 3–6.

As with the pressure loading only scenario, we first conducted an incremental LHS study across the model parameters for each of the meshes in the same manner as in Sec. 2. We include two additional parameters in this study

• H ∈ [0.0, 58.442] (psi)

• γ ∈ [2.232, 3.575]where H ∈ [0.0, 58.442] is based off the problem statement specification, H ∈ [0, 2R] with R as the lower bound on the radius (i.e., the liquid height cannot exceed twice the radius) and γ ∈ [2.232, 3.575] using the 99/90 tolerance interval from the provided pressure plus liquid loading data for tanks 3–6. The number of runs done at each fidelity are

• mesh 4 (finest) = 10 runs

• mesh 3 = 20 runs

• mesh 2 = 40 runs

• mesh 1 (coarsest) = 80 runs

We did not conduct a formal mesh convergence analysis, as the meshes are not known to be in the asymptotic region. We did evaluate the differences between the displacement and maximum stress predictions from one fidelity to the next. The relative differences in these predictions between mesh 4 and mesh 3 was on the order of 10−5, far smaller than the variability observed by varying the model parameters. The errors induced by moving to mesh 2 or mesh 1 were slightly larger, so we chose to use mesh 3 for any analysis going forward.

Reusing the results from the LHS runs for mesh 3, a PCE was constructed to compute higher-order sensitivities. As a brief description, a PCE is an approximate response constructed using global multivariate orthogonal polynomial basis functions defined over standard random variables as follows: Display Formula

(2)$R=∑j=0PαjΨj(ξ)$

where Display Formula

(3)$αj=〈R,Ψj〉〈Ψj2〉=1〈Ψj2〉∫ΩRΨjρ(ξ)dξ$

Once approximated, it is possible to calculate statistical quantities analytically, thereby eliminating the need for extensive sampling of the simulation. A complete discussion of PCE can be found in Ref. [14].

The expansion that best fit the model predictions of displacement at all of the measurement locations was of order one. The quality of the expansion was determined by a cross validation process [4,15]. In this approach, a fixed-size subset of the computational data used to construct the PCE is withheld. The PCE is then constructed over the remaining data and used to predict the computational model responses for the withheld subset of data. This is repeated for all subsets of the given size. The discrepancies between PCE prediction and computational data over all of these subsets can then be used to estimate the amount of error introduced when using the PCE to predict the computational model response. For this problem, the order one polynomial had an estimated prediction error on the order of 10−5.

Given the PCE, we then used it to perform variance-based decomposition. Variance-based decomposition is an approach to global sensitivity analysis that apportions the uncertainty in the model response to the model parameters. This method quantifies both main effects and total effects in the form of sensitivity indices for each model parameter. The main effects sensitivity index represents the fraction of uncertainty in the model response for which a single given model parameter alone is responsible. The total effects sensitivity index represents the fraction of uncertainty in the model response for which a given parameter and its interactions with other model parameters are responsible. More details can be found in Ref. [12]. Variance-based decomposition based on sampling a computational model is typically computationally intractable due to the large number of samples needed. However, it can be done extremely efficiently by analytically computing the sensitivity indices using the PCE [4]. In this manner, we identified pressure as the parameter that most contributes to the variance (approximately 63%) in the displacement predictions. Young's modulus (16%) and tank wall thickness (15%) are also contributors. These are therefore the driving uncertainties we took forward into our probability of failure predictions.

In addition to using the PCE-based sensitivity analysis to identify the model parameters that most contribute to the uncertainty in the predicted displacements, more extensive sampling over the model parameters was conducted on the PCE. These displacement predictions were compared to the experimental field data as follows:

1. (1)For a given measurement location and instantiation of model parameters, compute the relative difference between the PCE prediction and one of the experimental measurements.
2. (2)Repeat the first step for the PCE prediction (using the same parameter instantiation) and each of the other experimental measurements at that location.
3. (3)Repeat the first two steps with a new instantiation of parameters for the PCE prediction.

This comparison was done for each measurement location, resulting in an extensive set of possible relative errors between model and data at every location. A representative subset of those errors is shown in Fig. 5.

As previously mentioned, computational studies done to this point consistently showed the largest stresses and displacements along the bottom of the tank ($ϕ=0$). In the absence of experimental data at the bottom of the tank, we are unable to confirm whether or not this predicted behavior is consistent with the actual physical behavior. The measurements closest to the bottom occur at $ϕ=30$, which corresponds to locations 1, 6, 11, and 16 in Fig. 5. We see that the smallest relative errors occur at those locations and place more weight on these than on some of the larger errors we see at locations farther from the bottom of the tank. We consider this a preliminary judgement, however. Without confirmation that the largest displacements occur at the bottom of the tank in the field, we cannot definitively say that it is acceptable to have large errors at locations farther from the bottom. Furthermore, we have been unable to explain the cause of the especially large errors at locations 17 and 18. The lack of understanding further calls into question the credibility of the model prediction.

It is worth noting that from a physical standpoint, further examination of the model results (stresses and displacements) for variations in the angle and distance from the centerline showed axisymmetric behavior. Typically, the expected behavior for an axisymmetric shell under such a uniform loading would demonstrate the highest deformation at the location furthest from the support (or along the centerline), but we note here that the highest deformation occurs toward the supports located 30 in. from the centerline.

Similarly, the provided experimental data for tanks 1 and 2 showed consistent trends (i.e., higher displacement near the support than the centerline) for a few of the points provided (see, for example, pressure = 73.45 for tank 1 plotted in Fig. 6). This observation might be indicative of a region of high curvature or buckling occurring at the point of the highest stress. As this behavior is counterintuitive to the expected physical behavior, we cannot draw any conclusions about the validation with any confidence.

Predicting Probability of Failure

One of the main goals of this activity was to determine the probability of failure at nominal operating conditions. Accurately computing failure probabilities can require an intractable number of simulations, especially if failure is extremely rare. In order to keep the number of simulations as small as possible, we used a global reliability method.

Efficient Global Reliability Analysis (EGRA).

The specific method we used was efficient global reliability analysis (EGRA) [16]. This is a reliability method that seeks to represent the simulation response quantity of interest with a GP. It first constructs an initial GP using an LHS sample taken over the entire uncertain parameter space. It then identifies a new sample point by solving a global optimization problem on the GP using an objective that balances exploration of unknown space with refinement around the likely failure boundary. A simulation is done at that sample point, and the resulting data is used to update the GP. This process is repeated until the objective falls below a hard-coded tolerance of 0.001. The idea is that the GP will be accurate near the failure boundary, thereby allowing accurate computation of the probability of failure. Unfortunately, however, a cross validation error on the GP surrogate is not currently reported. The GP is trivial to evaluate, so the probability computation, which uses importance sampling on the GP, becomes tractable. The targeted, iterative construction of the GP keeps the number of simulations required to a minimum.

Since it is fundamental to EGRA, we here briefly define the GP surrogate. The GP is a stochastic process defined by mean and covariance functions. It can have a constant, linear, or quadratic mean trend. The covariance function at points a and b is defined as Display Formula

(4)$Cov[Z(a),Z(b)]=σZ2R(a,b)$

where $σZ2$ is the process variance and R() is the following correlation function: Display Formula

(5)$R(a,b)=exp [−∑i=1dθi(ai−bi)2]$

where d represents the problem dimension and θi is a scale parameter, found by solving a maximum likelihood problem that indicates the correlation between the points in dimension i. A more complete introduction to GPs can be found in Ref. [17].

One key piece of information needed for the failure analysis is a defined failure threshold. Since the challenge problem definition defines failure to occur when von Mises stress exceeds yield stress, we derive the failure threshold from the yield stress values reported in the experimental measurements taken from tank 0. Since there is scatter in that data, the value that should be used for the failure threshold is uncertain. Thus, we considered three different possible values to account for that variability. The chosen values are overlaid on the experimental measurements in Fig. 7 and span the range of the data (with the exception of one outlier). The probability of failure was computed with respect to each of the identified failure thresholds.

Another key piece of information needed is the uncertainty characterizations for the most influential uncertain parameters. Based on the sensitivity analysis performed in Sec. 4, those parameters are pressure, Young's modulus, and thickness. We fixed pressure, as well as χ and H, at nominal conditions of the out-of-spec tank as specified in the problem statement (P = 73.5 psi, χ = 1, and H = 50 in.) [2]. This left us with only Young's modulus, E, and thickness, T, as uncertain, and we set all other parameters to their average calibrated values identified in Sec. 3. Since the calibration found multiple values of these parameter within the tolerance intervals defined in Sec. 2, we consider all of them possible and therefore assign uniform distributions with bounds as defined by the tolerance intervals. Normality tests on the material property data indicate that normal distributions cannot be ruled out, so we also consider normal distributions with mean and standard deviation as computed from the experimental data.

For both uncertainty characterizations, the probability was found to be 0.0110020000 for all three failure thresholds, as reported out to ten digits (and computed to machine precision), with only 50 unique simulations needed. We found this surprising but found no evidence of stress values near the failure thresholds in any of the simulations we ran as part of the analyses done in Secs. 24. We also tested EGRA on other problems with known solutions and found it to be working correctly. The results warrant further investigation into not just the correctness of EGRA, but also into whether or not the yield stress material data is reliable and into whether or not the yield stress values are appropriate to define failure thresholds.

Credibility Assessment (PCMM)

While we completed the task of predicting the probability of failure of the storage tanks under nominal operating conditions through application of existing Dakota methods, we must also comment on the credibility of our prediction. To do this, we return to the PCMM and assess how well the evidence gathered during supporting activities (sensitivity analysis, calibration, and validation) convey confidence associated with each element of the PCMM.

Our qualitative level of confidence associated with each element of the PCMM and a summary of the rationale behind it are discussed below. A visual summary of our assessment appears in Table 4.

Geometric Fidelity.

For the challenge problem discussed here, since we were given a model that “can be treated as a black box” to use for the analyses, little information is known about the geometric fidelity, therefore, we cannot comment too much to this end. The problem statement does state that the tank model is “built off a simplified geometry” [2]. For example, the real geometry consists of a cylinder with two half-sphere end caps and supports at the ends of the cylinder, while the provided model includes only the cylindrical portion with simple supports and flat end caps. Without the ability to explore the importance of such geometric differences on model predictions, we have low confidence regarding whether the geometric fidelity is sufficient, and we have no evidence to quantify how it affects predictions.

Physics Fidelity.

By definition, the physics fidelity was also low. In the problem statement, participants were told to treat the model as if it were a finite element model “for which the numerical behavior and the complex physics are not completely understood” [2]. Despite this, the model did a fairly good job of predicting displacements in both the lab (pressure loading only) and the field (pressure and liquid loading) cases. While not rigorous, this provides some small measure of confidence that the fidelity of the physics is reasonable for this application.

Code Verification.

For code verification, the problem statement says it was verified that the code accurately computes the equations given in the series solution model. However, comparable verification was not done for models as complex as those used in this analysis. Furthermore, during several of our simulation runs, we noticed that the code crashed for some parameter sets (most often when H ≥ 50 in.). Thus, we have low confidence in the code verification.

Solution Verification.

Solution verification is the one element for which we can express reasonably high confidence. While we did not complete a formal convergence study, we did assess the effects of mesh discretization on the displacement and stress predictions. The relationship between meshes was well understood, and our initial LHS studies demonstrated very small changes and trend consistencies in the maximum stresses and displacements across fidelities.

Model Validation.

While validation using the experimental data was completed, conclusions that can be drawn are somewhat limited, as displacements were provided for the field tests only at the noncritical locations (circumferential angles, $ϕ∈[30,150]$, rather than along the bottom of the tanks, $ϕ=0$ where the model reported the highest stresses and displacements). That being said, we did find the best agreement between model and data at $ϕ=30$, the location closest to the bottom of the tank, so there is some small amount of confidence that can be placed in the validation.

Uncertainty Quantification.

Some uncertainty quantification was conducted, but incorporation of the experimental data uncertainty was incomplete. Measurement uncertainties, including displacement measurements (presumed to be within 3%); pressure measurements (within 5% but no additional documentation on confidence level), liquid height (varying with axial position due to tank orientation) and χ measurements (within 0.05 mass fraction), in particular, were not factored into the analysis. These uncertainties are, for the most part, an order of magnitude smaller than those that were included. Thus, we expect they will not change any of our conclusions. Additionally, we did not include errors in the response surface surrogates nor did we aggregate all of the uncertainties together in the final probability of failure predictions. Finally, the calibration activity revealed little information to refine experimentally determined uncertainty characterizations of the model parameters and identified possible model form uncertainty that needs exploration.

Conclusions and Future Work

The approach described in this work demonstrated a practical, project driven response to the challenge problem. By utilizing currently available tools rather than developing our own, we were constrained by what methods were easily available through Dakota as well as by what information they report, but we were able to perform a more comprehensive analysis and to better understand both the experimental and the computational data. In our case, the benefits were (1) a wider range of analysis tools at our disposal that could be used within the time constraints, (2) a well-informed assessment of the analysis credibility, and (3) the ability to identify avenues for future work that would strengthen the analysis.

Conclusions.

Though we reported a zero probability of failure (to ten digits), we are skeptical about whether the yield stress data and threshold values are reliable and would recommend further investigation before drawing conclusions and making recommendations to the customer. Based on our engineering judgement and PCMM (discussed previously), we note that the credibility of some data, computations, or both is questionable throughout this process. The most substantial of these include:

• possibly unrealistic correlation in calibration data

• excessively large confidence intervals on calibration parameter estimates

• unknown and uncharacterized model form error

• lack of experimental data at the bottom of the tank

• nonintuitive “buckling” behavior

• unknown error in the GP surrogate used for probability of failure prediction

Thus, we report our overall confidence in the failure predictions as low to medium. As such, we would consider it unwise to base any decisions solely on these predictions. However, the supporting sensitivity, calibration, and validation activities did yield insights into future work that would improve the credibility of the model. Those suggestions are described in Sec. 7.2.

Suggestions for Future Work.

Based on the activities done in support of the tank failure assessment, we have three main proposals for future work: one associated with calibration, one with validation, and an additional failure assessment prediction.

A substantial red flag was raised during model calibration in that the correlation between some of the material properties found for the model was the inverse of those represented in the materials data provided. It is critical that there be a follow up on the pedigree of the data to ensure that it was gathered and reported correctly. Additionally, further investigation into the model and understanding of the physics is necessary to determine whether or not model form uncertainty is the cause of the discrepancy.

The results of laboratory tests for the full tank under a pressure loading, in addition to the finite element model results, provided valuable insight into the behavior of the tank system. We observed that for each test, the model output that the angle, $ϕ$, where the maximum stress occurred was always zero (along the bottom of the tank) and the corresponding maximum displacement was near the supports. A plot of the deformation curve and corresponding stress contour is shown in Fig. 8.

Based on the physical observations discussed in the validation section, more studies should be conducted regarding possible imperfections in the tank geometry or misalignment of the connections, which could lead to premature failure. To be able to make such a decision in the future, we would need to prioritize and request resources to address the most pressing model and data needs. For example, the displacement field should be experimentally sampled on tanks 3–6 along the bottom in the critical area of failure.

In addition to computing the probability of failure at nominal conditions, participants were also asked to determine the loading levels which will violate the probability of failure threshold, P(Fail) < 10−3. In order to determine if the operating regime is acceptable, we propose an optimization approach. In particular, we would seek to find the parameter values within the operating range that resulted in the highest probability of failure. This would be done by nesting EGRA computations within the optimization iterations. If the highest probability of failure is less than 10−3, then the current operational range can be considered adequate to ensure safety. If not, then we must discover the boundaries that are safe. This may be revealed as a side effect of the computations done for the optimization but will likely require additional model exploration.

Acknowledgements

The authors would like to thank Ken Hu for all of his hard work organizing the V&V Challenge Problem activities and for making sure the participants had multiple opportunities to interact with one another, George Orient for his efforts in putting together the software and models used, and the other participants for stimulating discussion and intriguing ideas. We would also like to thank three anonymous referees for their helpful comments; they were extremely useful in improving the transparency and completeness of the paper. Finally, we would like to thank the ASC V&V program under whose funding this work was performed.

This work was performed at Sandia National Laboratories. Sandia National Laboratories is a multiprogram laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-AC04-94AL85000.

Nomenclature

• d =

tank wall displacement, normal to the surface

• E =

Young's modulus

• H =

liquid height

• L =

length

• m =

mesh ID

• P =

gauge pressure

• R =

• T =

wall thickness

• x =

axial location

• γ =

liquid specific weight

• ν =

Poisson's ratio

• σ =

Von Mises stress

• $ϕ$ =

circumferential angle

• χ =

liquid composition (mass fraction)

References

Oberkampf, W. , and Roy, C. , 2010, Verification and Validation in Scientific Computing, Cambridge University Press, New York.
Hu, K. , 2013, “ 2014 V&V Challenge: Problem Statement,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2013-10486P.
Adams, B. M. , Bauman, L. E. , Bohnhoff, W. J. , Dalbey, K. R. , Eddy, J. P. , Ebeida, M. S. , Eldred, M. S. , Hough, P. D. , Hu, K. T. , Jakeman, J. D. , Swiler, L. P. , Stephens, J. A. , Vigil, D. M. , and Wildey, T. M. , 2014, “ Dakota, a Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.1 Users Manual,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2014-4633.
Adams, B. , Ebeida, M. , Eldred, M. , Jakeman, J. , Swiler, L. , Bohnhoff, W. , Dalbey, K. , Eddy, J. , Hu, K. , Vigil, D. , Bauman, L. , and Hough, P. , 2011, “ Dakota, a Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2011-9106.
Oberkampf, W. , Pilch, M. , and Trucano, T. , 2007, “ Predictive Capability Maturity Model for Computational Modeling and Simulation,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2007-5948.
Montgomery, D. , and Runger, G. , 1994, Applied Statistics and Probability for Engineers, Wiley, New York.
Hahn, G. , and Meeker, W. , 1991, Statistical Intervals—A Guide for Practitioners, Wiley, New York.
Computer Software by Minitab, Inc., “ Minitab 17 Statistical Software,”
Howe, W. , 1969, “ Two-Sided Tolerance Limits for Normal Populations—Some Improvements,” J. Am. Stat. Assoc., 64(326), pp. 610–620.
Romero, V. , Swiler, L. , Urbina, A. , and Mullins, J. , 2013, “ A Comparison of Methods for Representing Sparsely Sampled Random Quantities,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2013-4561.
Iman, R. L. , and Shortencarier, M. J. , 1984, “ A Fortran 77 Program and User's Guide for the Generation of Latin Hypercube Samples for Use With Computer Models,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. NUREG/CR-3624, SAND83-2365.
Saltelli, A. , Tarantola, S. , Compolongo, F. , and Ratto, M. , 2004, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models, Wiley, New York.
Dennis, J. E. , Gay, D. M. , and Welsch, R. E. , 1981, “ ALGORITHM 573: NL2SOL—An Adaptive Nonlinear Least-Squares Algorithm,” ACM Trans. Math. Software, 7(3), pp. 369–383.
Xiu, D. , 2010, Numerical Methods for Stochastic Computations: A Spectral Method Approach, Princeton University Press, Princeton, NJ.
Hastie, T. , Tibshirani, R. , and Friedman, J. , 2001, The Elements of Statistical Learning: Data Mining, Inference, and Prediction: With 200 Full-Color Illustrations, Springer-Verlag, Berlin.
Bichon, B. , Eldred, M. , Swiler, L. , Mahadevan, S. , and McFarland, J. , 2008, “ Efficient Global Reliability Analysis for Nonlinear Implicit Performance Functions,” AIAA J., 46(10), pp. 2459–2468.
MacKay, D. , 1998, “ Introduction to Gaussian Processes,” Neural Networks and Machine Learning, Vol. 168, C. M. Bishop , ed., Springer, Berlin, pp. 133–165.
View article in PDF format.

References

Oberkampf, W. , and Roy, C. , 2010, Verification and Validation in Scientific Computing, Cambridge University Press, New York.
Hu, K. , 2013, “ 2014 V&V Challenge: Problem Statement,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2013-10486P.
Adams, B. M. , Bauman, L. E. , Bohnhoff, W. J. , Dalbey, K. R. , Eddy, J. P. , Ebeida, M. S. , Eldred, M. S. , Hough, P. D. , Hu, K. T. , Jakeman, J. D. , Swiler, L. P. , Stephens, J. A. , Vigil, D. M. , and Wildey, T. M. , 2014, “ Dakota, a Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.1 Users Manual,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2014-4633.
Adams, B. , Ebeida, M. , Eldred, M. , Jakeman, J. , Swiler, L. , Bohnhoff, W. , Dalbey, K. , Eddy, J. , Hu, K. , Vigil, D. , Bauman, L. , and Hough, P. , 2011, “ Dakota, a Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2011-9106.
Oberkampf, W. , Pilch, M. , and Trucano, T. , 2007, “ Predictive Capability Maturity Model for Computational Modeling and Simulation,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2007-5948.
Montgomery, D. , and Runger, G. , 1994, Applied Statistics and Probability for Engineers, Wiley, New York.
Hahn, G. , and Meeker, W. , 1991, Statistical Intervals—A Guide for Practitioners, Wiley, New York.
Computer Software by Minitab, Inc., “ Minitab 17 Statistical Software,”
Howe, W. , 1969, “ Two-Sided Tolerance Limits for Normal Populations—Some Improvements,” J. Am. Stat. Assoc., 64(326), pp. 610–620.
Romero, V. , Swiler, L. , Urbina, A. , and Mullins, J. , 2013, “ A Comparison of Methods for Representing Sparsely Sampled Random Quantities,” Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report No. SAND2013-4561.
Iman, R. L. , and Shortencarier, M. J. , 1984, “ A Fortran 77 Program and User's Guide for the Generation of Latin Hypercube Samples for Use With Computer Models,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. NUREG/CR-3624, SAND83-2365.
Saltelli, A. , Tarantola, S. , Compolongo, F. , and Ratto, M. , 2004, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models, Wiley, New York.
Dennis, J. E. , Gay, D. M. , and Welsch, R. E. , 1981, “ ALGORITHM 573: NL2SOL—An Adaptive Nonlinear Least-Squares Algorithm,” ACM Trans. Math. Software, 7(3), pp. 369–383.
Xiu, D. , 2010, Numerical Methods for Stochastic Computations: A Spectral Method Approach, Princeton University Press, Princeton, NJ.
Hastie, T. , Tibshirani, R. , and Friedman, J. , 2001, The Elements of Statistical Learning: Data Mining, Inference, and Prediction: With 200 Full-Color Illustrations, Springer-Verlag, Berlin.
Bichon, B. , Eldred, M. , Swiler, L. , Mahadevan, S. , and McFarland, J. , 2008, “ Efficient Global Reliability Analysis for Nonlinear Implicit Performance Functions,” AIAA J., 46(10), pp. 2459–2468.
MacKay, D. , 1998, “ Introduction to Gaussian Processes,” Neural Networks and Machine Learning, Vol. 168, C. M. Bishop , ed., Springer, Berlin, pp. 133–165.

Figures

Fig. 1

This figure shows the flow of information from experimental data through to predictions about tank failure. Key activities in the collection of credibility evidence are noted.

Fig. 4

Calibration yielded parameter values for Young's modulus and thickness that were inconsistent with experimentally measured material properties, calling into question both data and model

Fig. 5

Relative errors between model and data displacement values at all measurement locations. Computational studies consistently showed the largest stresses and displacements along the bottom of the tank. The model to experiment comparisons closest to the bottom occur at ϕ=30, which corresponds to locations 1, 6, 11, and 16.

Fig. 6

Laboratory tests showing the deformed shape of tank 1 under pressure loading

Fig. 7

Yield stress thresholds used for probability of failure computations are overlaid on experimental yield stress measurements. Given scatter in the data, failure threshold is considered to be uncertain, so we chose three possible values that span the data.

Fig. 8

Deformed shape and corresponding stresses along the bottom of the tank

Tables

Table 3 Material characterization data (tank 0), from Ref. [2]
Table 4 Summary of our PCMM assessment: The darkest shade of gray is the target requirement. The second darkest does not meet the requirement by one level, the third does not meet it by two levels, and the lightest does not meet it by three levels.

Errata

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related Proceedings Articles
Related eBook Content
Topic Collections