0
Discussion

Summary of the 2014 Sandia Verification and Validation Challenge Workshop OPEN ACCESS

[+] Author and Article Information
Benjamin B. Schroeder

V&V, UQ, Credibility Processes Department,
Sandia National Laboratories,
P.O. Box 5800, MS 0828,
Albuquerque, NM 87185-0828
e-mail: bbschro@sandia.gov

Kenneth T. Hu

V&V, UQ, Credibility Processes Department,
Sandia National Laboratories,
P.O. Box 5800, MS 0828,
Albuquerque, NM 87185-0828
e-mail: khu@sandia.gov

Joshua G. Mullins

V&V, UQ, Credibility Processes Department,
Sandia National Laboratories,
P.O. Box 5800, MS 0828,
Albuquerque, NM 87185-0828
e-mail: jmullin@sandia.gov

Justin G. Winokur

V&V, UQ, Credibility Processes Department,
Sandia National Laboratories,
P.O. Box 5800, MS 0828,
Albuquerque, NM 87185-0828
e-mail: jgwinok@sandia.gov

Manuscript received December 4, 2015; final manuscript received January 13, 2016; published online February 19, 2016. Editor: Ashley F. Emery.

J. Verif. Valid. Uncert 1(1), 015501 (Feb 19, 2016) (9 pages) Paper No: VVUQ-15-1055; doi: 10.1115/1.4032563 History: Received December 04, 2015; Revised January 13, 2016

A discussion of the five responses to the 2014 Sandia Verification and Validation (V&V) Challenge Problem, presented within this special issue, is provided hereafter. Overviews of the challenge problem workshop, workshop participants, and the problem statement are also included. Brief summations of teams' responses to the challenge problem are provided. Issues that arose throughout the responses that are deemed applicable to the general verification, validation, and uncertainty quantification (VVUQ) community are the main focal point of this paper. The discussion is oriented and organized into big picture comparison of data and model usage, VVUQ activities, and differentiating conceptual themes behind the teams' VVUQ strategies. Significant differences are noted in the teams' approaches toward all VVUQ activities, and those deemed most relevant are discussed. Beyond the specific details of VVUQ implementations, thematic concepts are found to create differences among the approaches; some of the major themes are discussed. Finally, an encapsulation of the key contributions, the lessons learned, and advice for the future are presented.

FIGURES IN THIS ARTICLE
<>

The 2014 Sandia V&V Challenge Workshop [1] was held at the 3rd ASME Verification and Validation Symposium in Las Vegas, NV on May 5–8, 2014. The workshop was built around a challenge problem that Sandia National Laboratories posed to the VVUQ community [2,3]. The problem was a hypothetical engineering investigation that required integration of experimental data, modeling and simulation, and VVUQ. This served as a focal point for demonstration of the methodology, and discussions about the role of VVUQ in establishing credibility and supporting decisions. It was the third such challenge workshop organized by the Sandia National Laboratories [4,5].

Workshop Participants.

At the workshop, nine teams presented responses to the challenge problem. They represented academia, a national laboratory, the U.S. Department of Defense, and private industry. In addition, they were predominantly engineers with significant experience with VVUQ. Some of the participants, listed in Table 1, prepared journal articles for their responses. In addition, two discussion papers were produced, which did not give solutions to the challenge problem, but provided perspectives on key VVUQ issues [11,12]. These papers are included in this special edition of the ASME Journal of Verification, Validation, and Uncertainty Quantification. A team led by Professor Michael Shields at the Johns Hopkins University also presented a response at the workshop [13] but did not prepare a paper for this special issue. It should be noted that the Sandia team responding to the challenge problem was independent of the team which proposed the problem.

The 2014 V&V Challenge Problem.

The 2014 V&V Challenge Problem was posed as a realistic engineering project in which the participating teams had to complete an engineering analysis in order to support a significant decision. The project goal was to estimate the probability of failure, Pfail, for a population of liquid storage tanks under various loading conditions. Pfail is the crucial piece of information that will be used to determine whether the tanks should be replaced, so the estimates should also be credible [2,3]. The biggest challenges were to apply a wide range of VVUQ methods, aggregate uncertainties from many sources, and to make predictions for which no comparative data are available. This required participants to create a VVUQ strategy by selecting which activities and methods to apply. In a similar fashion to previous challenge problems, a preferred VVUQ strategy was not specified. That meant each team had to come up with their own approach by choosing: which methods to apply, which models to run, what datasets to use, and which VVUQ activities to perform. The participants were tasked with the following items: make predictions with uncertainty for Pfail, assess the credibility of those predictions, and communicate the VVUQ strategy used.

Two distinct predictions were requested: Pfail at a specified operating condition and an operating region with acceptable Pfail. All the participants provided the first prediction, but only the Northwestern team [8] reported on the second prediction. The second prediction was designed as a follow up to the first prediction, but focused on the teams' strategies toward generalizing their approach across operating conditions. Because the second prediction was only reported by one team, no comparison of the approaches is possible. The remainder of this paper refers only to the first prediction.

Outline of Challenge Problem Summary.

This paper summarizes the contributed solutions to the challenge problem and adds a discussion of the major ideas from the workshop and special edition. Section 2 summarizes the analyses and conclusions for each response. Because this challenge problem was so open-ended, the responses were tough to compare. Ultimately, three classifications were used to highlight the significant commonalities and distinctions between VVUQ strategies: an overview of how the data and models were used (Sec. 3), which VVUQ activities were performed (Sec. 4), and the themes that appear throughout the overall approach (Sec. 5). The first mode of comparison describes what was done, the second mode depicts how it was accomplished, and the third mode extracts the philosophical positions or priorities that influenced the VVUQ strategies, or why things were done. It should also be noted that all the comparisons and commentary are based on the published papers and workshop presentations; these reported processes are not necessarily a complete description of the work that was done. Concluding remarks are presented in Sec. 6.

This section gives summaries of the five responses, along with some commentary on their approaches. The five teams reported a prediction of Pfail as well as the associated uncertainty and credibility as shown in Table 2. To determine Pfail, teams had to predict maximum stress loads and compare those to experimental yield stress data. Model predictions and experimental data both contained uncertainty, thus leading to an uncertainty associated with Pfail predictions. Credibility of the uncertain predictions was an additional element to the problem that will be discussed later. Within Table 2, it can be noted that two teams predicted approximately no probability of failure, while the three other teams found a probability on the order of 1 × 10−3.

The workshop organizers stressed that the challenge problem was not a competition between participants. Due to the very different priorities, time commitments, and VVUQ methods and approaches, it was not clear how the results could be ranked or scored. Instead, the goal here is to survey the state-of-the-art in VVUQ methodology and gain experience formulating VVUQ strategies in a decision support context.

Sandia Response.

The Sandia team applied a practical approach that relied on readily available simulation analysis software [14]. This meant that the methods and uncertainty descriptions were not consistent throughout the analysis, and transferring information from one activity to the next was challenging. For this reason, model parameters were mainly described as independent intervals. Initial bounds were used for sensitivity analysis and an optimization-based parameter calibration. All the meshes were used in a multifidelity-modeling approach, in order to reduce computational costs. Formal mesh sensitivity analysis was not performed; the mesh for subsequent analyses was selected based on judgment informed by errors in the stress prediction. The calibration information was not transferred to the uncertainty propagation or Pfail calculation steps—the initial bounds were used instead. Pfail was also computed using normal distributions derived from the material property data. A validation comparison was performed, but the results did not impact the predictions of stress. In fact, a stress distribution was not explicitly computed. Instead, a reliability method, the efficient global reliability analysis [15], was used to directly compute Pfail based on a deterministic failure criterion. A significant amount of engineering judgment was applied to simplify the activities that were performed, and to recover when analysis results were deemed untrustworthy. The Sandia team was the only one to complete a formal credibility assessment, the predictive capability maturity model [16], and they reported point estimates of Pfail with a subjective “low to medium” credibility.

Virginia Tech Response.

The Virginia Tech team built a strategy to carefully deal with epistemic and aleatory uncertainties and conservatively capture the effects of lack of knowledge. Sensitivity analyses and a parametric study were conducted to understand parametric sensitivities and their relative importance. They chose to skip the calibration and instead focus on assessing the quality of their stress predictions. Model inputs were characterized as intervals or distributions, using a mix of statistical models and expert judgment. Because of the combination of uncertainty types, the model predictions were described with p-boxes, a hybrid of intervals, and distributions. This team was the only one to perform a formal solution verification study and estimate the numerical error of the tank model. An empirical model was created to relate stresses at a small number of locations to displacements across the tank. This was not a surrogate that related the inputs of the tank model to predictions of stress, but an empirical model that related model predictions of displacement to model predictions of stress. The empirical model was used to transform displacement data into stresses. This transformed data were then compared to model predictions in a validation comparison using u-pooling [17] and a modified area validation metric [18,19]. The validation metric value was used to adjust the existing p-box (from input uncertainty and numerical uncertainty) to account for model form uncertainty. All the uncertainty sources were treated as independent. The adjusted p-box for predicted stress was then compared to a p-box for failure criterion. The final result was an interval estimate of Pfail that serves as both prediction and uncertainty estimate. Credibility was not explicitly mentioned, but one interpretation is that the credibility indication comes from the interval-valued probability resulting from the analysis.

Northwestern Response.

The Northwestern team applied Bayesian methods to a broad range of VVUQ activities. All the sources of uncertainty were described with probability distributions. A sensitivity analysis was run, and parameters were aggressively down-selected to two parameters. The available data were used extensively to improve the knowledge of model parameters and model form error; put another way, the data had significant impact on the model predictions. The team used statistical analysis and judgment to eliminate data outliers. Since the data of full tanks were limited, they used that data several ways: calibration and model form error correction via a Kennedy and O‘Hagan framework [20] and subsequent u-pooling validation to check the quality of the model form correction. The first step resulted in parameter estimates, model predictions, and model correction for displacements that was deemed acceptable. The comparison of the corrected model to data is referred to as a cross-validation because the training data and validation data are partitions of the same dataset. A model correction for displacement was created with the first step, but not for the stress. Instead of building a correction from data, an empirical model was created to relate tank model predictions of displacement to tank model predictions of stress. This empirical model was used to transform predictions from the corrected displacement model into stress predictions. These were then compared against a probabilistic failure criterion. Bayesian methods were used throughout the analysis for calibration and model form error modeling, so the joint probability densities had to be transferred between analyses. To reduce the computational cost, the team used Gaussian process surrogate models. In the end, point estimates of Pfail were given without a formal assessment of uncertainty or credibility, as the authors believe that all the uncertainties should be lumped into the prediction of Pfail.

UM-Dearborn Response.

The UM-Dearborn contribution focused on correcting for model form error using a copula-based approach and utilized several methods in uncertainty modeling and uncertainty quantification. Pearson distributions were demonstrated as a method to characterize aleatory (or irreducible) parameter uncertainty, but a Bayesian method with normal prior distributions was ultimately used because of the simplicity and better fits to limited data. Parametric uncertainty was propagated through the physics code with the eigenvector dimension reduction method. To account for model bias, a constant correction was calculated at each tank operation location where experimental data existed by minimizing the u-pooling metric. Bias corrections and tank operating locations were correlated using copulas. Copulas were also used to correlate the bias with maximum displacement and maximum stress. Combining all these analyses, parametric uncertainty was propagated through the model to displacement predictions and stress predictions, and these were compared to a probabilistic failure criterion. A point estimate and confidence interval for Pfail was reported, along with qualitative concerns about the approach and assumptions.

Vanderbilt Response.

The Vanderbilt team developed an approach to compensate for model form error, while also accounting for epistemic and aleatory uncertainty. The approach used a Bayesian framework in order to integrate varied sources of information and uncertainty. Each set of experimental data provided was applied to a different step within the overall VVUQ strategy. The concept of a hierarchy was implicitly adopted to guide the VVUQ strategy (e.g., prior distributions from coupon level data, calibration using data from simplified loading scenarios, and validation using data most closely resembling the application of interest). A key focus of the methodology was to avoid over-fitting the parameters to the sparse and incomplete information. Relevant model parameters were described with both aleatory and epistemic uncertainty components, by modeling parameters with Johnson distributions whose hyperparameters contained uncertainty determined using a Markov chain Monte Carlo technique. Validation was then performed by applying the model reliability metric to compare samples from a family of model output distributions with the experimental data. The validation result was used to make a statement about model credibility. This validation information was also used to add conservatism by applying the reliability metric result as a weight for the combination of the uncertain parameter distribution and a distribution created by combining legacy data with expert opinion. The resulting predictions of stress were a combination of model predictions and expert opinion, which compensated for model form error. The stresses were compared to a probabilistic failure criterion to compute a point estimate of Pfail. Key assumptions were listed, and a qualitative connection to credibility was made.

Three Tasks From the Problem Statement.

Challenge workshop participants were asked (1) to make predictions and to estimate uncertainty, (2) to assess the credibility of their predictions, and (3) to describe their VVUQ strategy [2,3]. All the participants made a prediction of Pfail, but only two teams provided uncertainty estimates: Virginia Tech in the form of an interval and UM-Dearborn with an interval and confidence level. The Sandia team found Pfail = 0 at three locations spanning the data, which can potentially be interpreted at zero uncertainty. Sandia did not make a distinction between epistemic and aleatory uncertainty, so it was not clear how to compute uncertainty of a probabilistic quantity. Northwestern did not separate the two uncertainty types, which leads to a combined effect on the predictions. Although Vanderbilt did separate the two types of uncertainty, they also combined them in their final prediction by integrating out the aleatory uncertainty in both the model and failure criteria.

The problem statement defined credibility as the quality of being trusted and believed in Ref. [3]. The proper way to assess credibility is still an open question, though a description of the available data and models, the VVUQ strategy and an interpretation of the VVUQ results should be important components. Only the Sandia team had a formal process for organizing and evaluating credibility evidence, and they explicitly tied their assessment and evidence to a (qualitative) credibility statement. The other teams discussed various shortcomings in the data, model, and analysis, but did not make a credibility statement.

Beyond simply claiming credibility, the ability to clearly articulate the process is an important component of convincing others that the results are believable. Sandia National Laboratories has often used a hierarchy [21] to describe the available models and data. Figure 1 illustrates how this visualization focuses on the complexity of the relevant physical systems and environments. This type of visual hierarchy was not used in the responses to communicate VVUQ strategy, however Sandia, Vanderbilt, and Northwestern each effectively described their strategies in a visual way. These teams included workflows that focused on describing their process (i.e., methods) instead of the available information or the products of each analysis. All of the teams also described their processes and use of data and models in text.

The visual and written descriptions were helpful to elucidate VVUQ strategies. However, the participants did not effectively assess and communicate credibility. This may have been a function of the journal article format, as compared to a project report or in-person briefing.

The responses are first compared by the way they utilize the provided experimental data and models. The problem statement had the full description that was provided to the participants [2] and a more compact version is published in this edition [3], so the details are only minimally repeated here. A visual depiction of the relationship between the available data and models is shown in Fig. 1. Although not shown in the figure, a material model was also given for the liquid.

Data Usage.

Major differences between the VVUQ strategies and philosophies were reflected in the way each team used the available data. Six distinct datasets were provided, as described in Table 3 and no further external data were permitted. Direct measurements were available for some quantities, while other quantities were presented as processed information instead of raw data. The datasets varied in terms of quality and quantity, but there was consensus among participants that the data severely limited the quality of their predictions.

Datasets 1, 2, and 4 are informative about the same model inputs (material properties) and the failure criterion, but the quality of the data varies. No tolerances or uncertainty was given for dataset 1, whereas datasets 2 and 4 had uncertainty estimates and multiple repeated tests, which revealed correlations. All of the responses used datasets 1, 2, and 4 to characterize uncertainty in the model parameters.

Northwestern and Sandia used datasets 1, 2, and 4 as a whole to inform their initial parameter estimates; Northwestern used Bayesian prior parameter distributions and Sandia used tolerance intervals. In both the cases, it is not clear how the heterogeneous data were combined.

Virginia Tech and UM-Dearborn used datasets 1 and 2 to specify the uncertainties in material parameters. They both used normal distributions, setting the means with dataset 1 and the variances with dataset 2. This was the final uncertainty description of those parameters for Virginia Tech, but UM-Dearborn used the distributions within conjugate Bayesian updating to further calibrate the uncertainties with dataset 2. These uses of the datasets avoided the issues of directly combining heterogenous data and the lack of information about dataset 1. UM-Dearborn used the same approach for datasets 1 and 4—the tank dimension parameters. However, Virginia Tech treated these as fundamentally different from material parameters and created imprecise probability distributions. The probability distribution standard deviations were chosen based on encompassing 95% of the data, and the mean intervals such that the legacy and experimental means were bounded.

Dataset 1 was used to create an “alternative distribution” for the Vanderbilt team when accounting for model form uncertainty, while datasets 2 and 4 were used to create Bayesian prior distributions. Within Vanderbilt's approach, the alternative distribution would more typically be the prior distribution; it was not clear why the datasets were used in this fashion.

The liquid characterization provided by dataset 3 was used by all teams because the equation was embedded in the tank model. Participants had the option of performing the calculations outside the tank model and incorporating uncertainty, but either ignored it or argued that it was negligible.

Datasets 5 and 6 were from tests on “full tanks”—meaning intact, functional tanks. The tests for dataset 6 were similar to the loading conditions of the desired prediction, while the tests for dataset 5 lacked the liquid loading and were less representative of the ultimate prediction. Several teams noted that the measurement uncertainty associated with displacement data was quite large [13] compared to other sources. This observation was used to argue that some sources of uncertainty were negligible. All the teams separated these two datasets and used them differently. Three teams used dataset 5 for calibration of model parameters (Sandia, Northwestern, and Vanderbilt). Northwestern's approach also created a model form error correction, using the Kennedy and O‘Hagan framework [20]. Virginia Tech and UM-Dearborn did not use this data at all, because the response of the pressure only case was not representative of the response to pressure and liquid loading. Dataset 6 was used to either assess model accuracy (Sandia, Vanderbilt, and UM-Dearborn), correct for model form uncertainty (Northwestern and UM-Dearborn), or extrapolate (Virginia Tech). Interestingly, the traditional approach of applying a validation metric to assess accuracy was not employed by all the teams. Instead, UM-Dearborn and Virginia Tech prioritized extrapolation to unmeasured prediction quantities and built corrections. Within the space of the prediction quantities, Virginia Tech then validated the model against the transformed experimental data.

Model Usage.

Since the tank model was provided with fixed capabilities, there was little flexibility in the way it was used. However, unlike the dataset situation, participants had to create a wide range of models to address the full range of VVUQ activities. The tank model was a finite-element model to simulate tank response to a wide range of loading conditions. Four meshes with increasing fidelity were available; all were based on the same geometry. In addition to that provided model, the teams created or assumed various models—statistical models to characterize datasets, surrogate models to replace the tank model, and empirical models to mathematically relate two quantities of interest.

Most teams ran the tank model at multiple mesh levels as part of their analysis, from level 1 (coarsest) to 4 (finest). Only Virginia Tech did a formal mesh convergence study to justify the selection of a mesh level (level 2) and estimate the numerical uncertainty. Sandia (level 3), UM-Dearborn (level 2), and Northwestern (level 1) used ad hoc methods to select a mesh size, but did not address the resulting numerical uncertainty as an independent source of uncertainty. Sandia used multiple meshes in a multifidelity optimization scheme, but used a single mesh (level 3) for most activities. Vanderbilt did not explain their choice of mesh (level 2). Not much can be said in comparison about these choices.

Vanderbilt and UM-Dearborn both applied very general statistical models in order to better characterize the data, while other teams used normal or uniform distributions. Only Virginia Tech separated epistemic and aleatory uncertainty by using different uncertainty models. These different approaches show the tradeoff between accuracy and simplicity for uncertainty modeling.

Surrogate models were a popular tool. Gaussian process surrogates were extensively used by Northwestern, Vanderbilt, and Sandia for various analyses. Additionally, Sandia and UM-Dearborn used surrogate-based uncertainty quantification methods: polynomial chaos expansions and eigenvector dimension reduction, respectively. Virginia Tech avoided surrogate models entirely, instead relying on the tank model.

Finally, three teams used empirical models to address the need for predicting stress, for which there was no experimental measurement. Instead of building a surrogate model to relate the tank model's inputs and outputs, an empirical model was built to relate the tank model's outputs to each other. Each empirical model used a different mathematical form and differed in the way they were built and applied.

The first comparison of response was in the way data and models were used. Another way to describe VVUQ strategies is as an assortment of activities connected together in order to improve or assess the model predictions. The responses showed differences for every VVUQ activity, but only the most significant is discussed in detail below.

Characterizing Uncertainty.

Understanding and modeling uncertainty is a major issue for VVUQ and this challenge problem [22]. The participants had to decide how to deal with the experimental data with different qualities, including lack of specified tolerances, high measurement noise, and also questionable relevance to the current project. Each team choose particular mathematical descriptions of uncertainty and variability, although these choices were not always explicitly justified. Teams used probability distributions, intervals, and imprecise probabilities via hierarchical probability distributions or p-boxes. The imprecise probability concept models a parameter with a family of distributions—because data are too limited to select one specific distribution. The family is defined by distribution parameters, which are themselves uncertain—either defined by probabilistic hyperparameters, or with an interval (p-boxes).

Which uncertainty characterization method teams chose depended upon their VVUQ approach type (i.e., Bayesian versus intervals) and also if they differentiated types of uncertainty in the parameters (i.e., aleatoric versus epistemic). Most teams at least mentioned the distinction between epistemic and aleatory uncertainty, though the results and degree of sophistication varied. Vanderbilt and Virginia Tech utilized methods that could separately account for the uncertainty types. Only Virginia Tech had a separate modeling scheme for epistemic and aleatory uncertainty and developed the methodology to maintain that separation. In contrast, Sandia accommodated whatever uncertainty interpretation enabled them to apply their methods. UM-Dearborn and Northwestern fell in the middle ground. They relied on Bayesian methods and interpreted probability distributions to represent either aleatory variability or epistemic beliefs. They were not precise in separating epistemic and aleatory uncertainty. Finally, Vanderbilt did separate epistemic and aleatory uncertainty throughout the characterization, calibration, and validation activities, but did not carry this to the final prediction of Pfail.

The details of teams' characterizations are outlined below:

  • Sandia used tolerance intervals to bound the model input uncertainties to provide conservative estimates of sparse-data based uncertainty. The tolerance intervals incorporated datasets 1, 2, and 4—including the legacy data—though it was unclear how they combined this information. Despite this original characterization, they later gave the bounds a probabilistic connotation for the calculation of Pfail. Yield stress was characterized with three specific values deemed to span the available data.

  • Vanderbilt considered parametric uncertainty to be epistemic for a specific tank, but aleatoric across the tank population. To account for this description of uncertainty, they used hierarchical probability distributions. The Johnson distribution family was used to describe the aleatoric uncertainty of each tank model parameter. That family is defined by four hyperparameters, also random variables, that captured epistemic uncertainty about the form on the aleatory uncertainty. The hyperparameters are essentially calibrated to datasets 2, 4, and 5. Additionally, Vanderbilt assumed that the uncertainty in the legacy data was normally distributed, where the mean was the reported value and the standard deviation was based upon external literature. This was not an approved addition to the challenge problem statement, but is equivalent to using expert judgment.

  • The Northwestern team used normal distributions to describe data measurement error. These distributions had zero mean and standard deviation based upon the three-sigma rule applied to the data's spread. Legacy data were included when defining the distributions, and some of the measurements from dataset 5 were assumed to be outliers and removed.

  • Virginia Tech distinguished between aleatoric and epistemic uncertainty in a fashion similar to the Vanderbilt team. However, while the Vanderbilt approach assumed that all the parameters had components of both aleatory and epistemic uncertainty, the Virginia Tech team made more specific assumptions. Most material parameters were deemed aleatoric, except for the failure criterion and tank dimensions which were mixed epistemic and aleatory. The mixed uncertainties were represented with a hierarchical family of normal distributions, for which the mean value was an interval. Bounds for intervals and means/standard deviations were based upon a mix of experimental data statistical properties and legacy data, as described in Sec. 3.1.

  • The UM-Dearborn team also considers parameter uncertainty as either epistemic or aleatoric, which they denoted reducible or irreducible. They tried modeling epistemic uncertainty with normal distributions and aleatoric with Pearson distributions, but found that due to data sparsity the Pearson distribution could create significant amounts of statistical error. The construction of these distributions was mentioned in Sec. 3.1.

Although several teams found a strong correlation between model parameters, only the Vanderbilt team carried this information forward in their analyses. It was not clear if this was for convenience or an intentional interpretation of what the data and models represent.

Code and Solution Verification.

Code verification is the process of demonstrating that the code is computing the model results correctly [23]. Code verification was not expected, however, all the participants discovered a code bug which prevented solution of the problem at certain input values. Each team had to work around the bug in order to complete their analyses and several cited the bug during qualitative discussions of credibility. The bug and the resulting lack of information from the max loading regime did not appear to impact any team's predictions.

Analyses to estimate numerical error and uncertainty fall under the umbrella term “solution verification” [24]. The problem statement instructs participants to treat the provided software as a finite-element code which provides numerical solutions to the tank model. Four finite-element meshes of different refinement were provided, which could be used to estimate numerical uncertainty from discretization. Only the Virginia Tech team performed a formal mesh convergence study, using Richardson's extrapolation and the grid convergence index to estimate a bias and numerical uncertainty. This was represented with an interval. The Sandia and UM-Dearborn teams did ad hoc mesh comparisons, but assumed that numerical uncertainty was negligible. The other teams did not investigate numerical uncertainty from mesh discretization.

Another possible source of numerical uncertainty is the use of surrogate models to reduce the computational cost. Many teams used surrogate models and some performed assessments of the surrogate quality. None of these teams examined the ability of the surrogate models to extrapolate beyond their training data (cross-validation is regarded as an interpolative check) or incorporated the numerical error estimates into the surrogate predictions. Some teams mentioned that these effects were assumed to be negligible, but did not verify this claim. A related issue is the quality of the empirical models used to transform displacements to stresses. Many teams developed such models based upon relationships discovered through studying the provided code. It is unclear how these empirical models would be tested and none of the teams addressed numerical uncertainty for these models.

Sensitivity Analysis, Calibration, and Propagation.

Although the computational cost was not a factor when solving the challenge problem, the participants were instructed to formulate VVUQ strategies as if they were under computational constraints. Computational cost is often dictated by the number of inputs in the analysis, so sensitivity analysis is used to identify which inputs have significant impact on the quantity of interest [25]. Common differences are the method/fidelity of the sensitivity analysis method, as well as the selection of the quantity of interest, and how many parameters are deemed significant. The quantities of interest also reveal the team priorities; Sandia and Virginia Tech focused on the stresses in order to prioritize parameters that impact the final prediction, while the Northwestern team relied heavily on calibration to displacement data and therefore computed sensitivity with respect to displacements. Table 4 reviews the sensitivity analysis and calibration work for each team.

The calibration step had less variety in methods, but a large difference in formulation [26]. Vanderbilt and Northwestern used Bayesian calibration, Sandia used a deterministic parameter optimization, and Virginia Tech and UM-Dearborn did not calibrate model parameters. Northwestern used a stochastic approach to calibrate the model parameters to match data, while Sandia applied a deterministic approach. The Vanderbilt team calibrated hyperparameters that controlled model parameter distributions, so the result was a family of posterior parameter distributions. A direct comparison of postcalibration parameters would be very enlightening. As expected, the Northwestern (Fig. 6 of Ref. [8]) and Sandia (Fig. 4 of Ref. [6]) papers are quite consistent, even though Sandia included additional parameters, but the Vanderbilt paper does not show any samples from the family of posterior joint-densities. The postcalibration parameters are significantly different than any uncertainty characterization based only on data. This inconsistency was mentioned by the Sandia team.

Propagation refers to the quantification of model prediction uncertainty caused by uncertainty in the model parameters. The challenge problem was not demanding for propagation methods and this was not the focus of the paper, so these are simply noted within Table 4. When a surrogate was used in lieu of the tank model, it was assumed that random sampling of some kind was exhaustively performed for propagation.

Validate Model Versus Estimate Model Form Uncertainty.

Validation was the focus of the previous challenge workshop [5] and various guides and standards [27,28,30]. A wide range of validation methods and metrics for comparing predictions to experimental data exist in the V&V literature. Validation has traditionally been regarded as the final and most important step for assessing model accuracy and prediction credibility. However, only two of the teams performed a validation comparison directly between the presented experimental data to the corresponding model predictions. Sandia used relative differences between point estimates as a metric—sampled over the uncertain parameters. Vanderbilt used their reliability metric to compare a family of distributions from the model and experimental data.

The biggest issue with the traditional validation approach is that the comparisons rely on exclusive use of the data to assess the accuracy and do not provide a mechanism to improve the accuracy. An alternative approach is to use the data to characterize model form uncertainty, with the goal of correcting any model inaccuracies [29]. Northwestern took this approach using the popular Kennedy and O‘Hagan framework [20] to simultaneously calibrate model parameters and produce an additive correction. UM-Dearborn used similar ideas, but used copulas instead of Gaussian processes as the surrogate.

Between these two philosophies, there is a tradeoff: assessment of accuracy using validation to infer credibility of model extrapolation versus ensuring accuracy by correcting but not assessing calibration. Data used for validation can then be applied toward creating a correction, but the validation result would no longer be true for the model. Northwestern actually used a hybrid approach, where they calibrated and corrected the model with a subset of the dataset and then assessed the accuracy with the remainder. Even they experienced the tradeoff because these activities were based on half as much data.

Virginia Tech also used a novel approach, in which they transformed data such that it is most relevant to the final prediction and then compared the data to the model predictions. In this approach, the extrapolation required is minimal, but the data transformation does make use of the same model used for the predictions. Unlike other validation approaches, the discrepancy between the model and the data was directly available in terms of the final prediction.

Aggregate Uncertainties and Make Extrapolative Predictions.

In this paper, uncertainty quantification is used generally to refer to the quantitative estimation of uncertainty in model predictions. This covers all sources of uncertainty: propagated uncertainty from model parameters, numerical uncertainties from discretization or surrogate modeling, data-related uncertainties, and model form uncertainties. Aggregation refers to the combination of all these effects, to describe a single uncertainty about the model prediction [22]. A large set of literature exists about treating individual sources of uncertainty, but little has been published about proper aggregation methods.

The Virginia Tech team was the only one that explicitly aggregated model prediction uncertainties. Their methodology provided corrections to the propagated parametric uncertainty, based on mesh-related uncertainty and estimated model form uncertainty. Aggregation was not addressed by the other teams, because the only quantified source of uncertainty at the prediction level was parametric. If other sources of uncertainty were investigated, they were rolled into the parametric uncertainty or neglected. In particular, estimates of numerical uncertainty, validation metrics, and model form uncertainties were discussed but did not directly impact the model predictions. One notable exception is the approach of Vanderbilt, which modified the model parameters in response to the validation result.

In addition to quantifying uncertainty in model predictions, the final challenge was to account for the fact that the predictions are extrapolative [5]—in the sense that the loading conditions of the final prediction are beyond those that were experimentally observed and also in the sense that the quantity of interest being predicted (stress) had not been experimentally measured. Sandia and Vanderbilt did not address the fact that they used the provided model in an extrapolative capacity. The other three teams relied on empirical relationships between the observed quantity (displacement) and predicted quantity (stress). These are described in Sec. 4.4. The general idea was to either assess the accuracy of the model for the displacement and then transform those predictions to stress or transform the data and then assess the accuracy of the model of stress. They noted that it was not clear what could be said about the accuracy of the stress measurements from this approach. All of the teams are implicitly trusted that the modeled relationship between displacement and stress was correct.

In addition to comparing differences within VVUQ activities, conceptual themes that appeared throughout the workshop and papers were identified. Even when two teams used a similar approach for a VVUQ activity, different mindsets led to contrasting implementation and results. Secs. 5.15.4 highlight some of the themes that emerged from the workshop and this special edition.

Mathematical Descriptions of Uncertainty and Variability.

Uncertainty characterization and treatment of epistemic and aleatory uncertainty is a reoccurring issue within the VVUQ field, having also been the focal point of the first challenge workshop [22]. When constructing a strategy to tie together many VVUQ activities, the treatment of uncertainty should be compatible throughout. The experimental uncertainties should factor into the model parameters and predictions and then be combined with the effects of numerical uncertainty, model form uncertainty, and extrapolation.

This section examines the aspects of separating epistemic and aleatory uncertainty and how to model them mathematically, but not the application of specific methods or assumptions. Some teams struggled to communicate the precise assumptions about the nature of their uncertainties and to maintain a consistent interpretation of uncertainty throughout multiple analyses. As an example, consider whether it makes sense to capture spatial variation of a material property on one tank and to use that to represent the variation of the bulk material properties over a population of tanks (with a model that cannot use spatially varying parameters). Even if data clearly contain one type of uncertainty, perhaps a model of that type uncertainty is not the right way to model the effects when propagating. If the materials data were characterized with both aleatory (from natural variation) and epistemic (from limited data) uncertainty, what impact should that have on predictions of stress, the failure criterion, and Pfail—and are methods available to produce those predictions?

No consensus was developed about how to distinguish epistemic or aleatoric uncertainty. Teams looked at the same data and reached opposite interpretations. The selected description of uncertainty had a significant effect upon how uncertainties were aggregated and propagated. However, it was not clear which problem aspect was the driver: did the uncertainty model dictate the choice of methods or vice versa? This was just a small part of the challenge and ultimately these issues come down to project-specific, subjective choices. It may be difficult to develop a universal approach for separating and modeling epistemic and aleatory uncertainty. More experience and attention are required to fully explore this issue.

Information Relevance.

One significant augmentation from prior challenges to the current problem was the prediction of a quantity of interest (stress) for which no experimental data were available. More generally, one of the biggest open questions in the VVUQ field is how to understand and account for the relevancy of experimental data with respect to the desired predictions. This covers many aspects of the challenge problem: the prediction at max loading based on VVUQ activities at lower loading conditions, the characterization of tank materials based on coupon tests, or the prediction of stress when only displacement data are available. For each pair, what is the relevance of the available information to the task at hand? The issue of information relevance is of paramount concern when constructing the VVUQ strategy.

The same issue arises after the VVUQ activities are complete and the results must be interpreted. All the participants gave quantitative results plus qualitative statements. What is the relevance of subjective evaluation of assumptions or uncertainty characterization and how should this information be incorporated with quantitative results? This was a major hurdle for participants, who often struggled to make concrete credibility assertions. Many teams presented a list of questionable assumptions, results, or other evidence, but did not connect this to the credibility of their predictions.

Engineering Judgment and Decision Support.

Another open question for VVUQ as an engineering discipline is what a VVUQ analysis can and should provide. This theme did not receive explicit attention in the papers, but the resulting strategies and the discussion at the 2014 Sandia Challenge Workshop [13,31] hint that it is a significant issue. At one end of the spectrum, the Virginia Tech team believes that VVUQ analyses should make minimal assumptions and avoid introducing unsupported information and should instead describe the state of knowledge as accurately as possible. A contrasting viewpoint from Michael Shields at the Johns Hopkins University is that VVUQ analysis should introduce expert judgment when possible and justified in order to interpret the information from available data and models. The rest of the participants spanned this spectrum of philosophies on expert judgment, which reflected their different backgrounds and priorities. These attitudes had subtle influence on the numerous choices and tradeoffs that were required in the challenge problem—especially between accuracy and efficiency—due to the broad scope. The consequences are significant because of the implications for how VVUQ evidence is gathered and presented on real engineering projects.

Balancing Activities Within the VVUQ Strategy.

The participants were asked to make predictions, to estimate uncertainty, and to assess credibility. The challenge was to first determine what credibility means, and then how to prioritize VVUQ activities to achieve these three objectives. Unfortunately, no one has been able to explicitly define credibility or explain how VVUQ activities can measure it, which are two prerequisites for prioritization. Instead, this workshop attempted to investigate this issue by trial and error. The workshop gathered VVUQ experts, and they demonstrated a diverse set of approaches. From the results, we can evaluate the significance of each VVUQ activity and the impact on credibility.

We will highlight just a few examples. One strategy prioritizes understanding the effects of uncertain model parameters. A careful separation of epistemic and aleatory uncertainty is performed via analysis of the data, calibration, and propagation of the parameters in order to estimate the model prediction uncertainty. A second strategy focuses on how to extrapolate and make predictions of an unobserved quantity of interest. That would entail a large degree of calibration to ensure model accuracy with respect to the most relevant data and a model-based transformation to predict the quantity of interest. A third strategy examines mesh convergence to assess the numerical uncertainty. Certainly, these are all incomplete strategies, but what can be said about the credibility of the predictions and uncertainty estimates, if these strategies were followed? How would it impact credibility if we could afford all the three strategies? If time and resources do not allow for a complete effort, should these be balanced or should one dominate?

What the workshop participants have done is to propose and execute different strategies that will serve as case studies. Demonstrating a few strategies and reporting the outcomes allow a subjective comparison about what each VVUQ analysis contributes to the credibility story. Computational costs should also be an important point of consideration, but most responses did not include this information.

The 2014 Sandia V&V Challenge Workshop showcased the encouraging progress in the VVUQ field, especially when viewed as the latest in a series of challenge workshops. The 2002 Epistemic Uncertainty Workshop was held to explore ways to model uncertainty [22]. The 2006 Model Validation Challenge Workshop focused on validation comparisons and the consequences for prediction [5]. The current workshop extended upon these ideas with the addition of experimental errors, numerical uncertainty, and significant extrapolation from data.

The problem was significantly more complex and less prescriptive, which required a major time commitment and a wide range of VVUQ expertise. Teams' efforts showed that the diverse set of established VVUQ methodologies continues to mature and that new ideas and approaches are being developed and applied. The results also revealed a new set of challenges for the VVUQ community. Many of these are not in realm of quantitative methods, but are related to subjective, engineering judgments, and interpretation, e.g., credibility and strategy. While the primary goal of the workshop was to demonstrate the state-of-the-art, and the secondary effect of the workshop was to expose these topics and start discussions. The organizers hope that this workshop has provided valuable experience and motivated the VVUQ community to address the new challenges.

This paper has summarized each of the challenge workshop responses in this special edition. Each team had to develop a strategy to deal with the limited quantity and low quality data, as well as imperfect models. Table 2 and Sec. 2 give an overview of the results, while Secs. 3 and 4 compared the VVUQ strategies. The comparisons revealed nuances in philosophy and priorities of each team, which were discussed in Sec. 5.

The paper concludes with a list of comments, lessons learned, and thoughts for the future:

  • The participants must be recognized for their efforts to complete the challenge problem. From a practical point of view, the problem was too open-ended and too broad; very few teams had the ability and resources to address the problem satisfactorily. This paper was not even able to include all the aspects of the problem.

  • The philosophies affect strategy and assumptions, which impact the results.

  • The larger scope forced participants to prioritize between VVUQ activities. Interestingly, validation was not included in all the responses to this V&V challenge problem.

  • VVUQ has been depicted as a cyclic process [21,32]. Typically, if a VVUQ study finds too much uncertainty in a prediction, an additional analysis cycle could be recommended. Instead of this issue, when considering the challenge problem responses (Table 2) a decision maker would be faced with the dilemma of contradicting advice where two analyses found approximately no probability of failure and three determined the probability to be greater than the safety requirement specified by the challenge problem's second prediction. What course of action can be taken when faced with such polarity is less clear, leaving the decision maker in a difficult position.

  • A diversity of methods will continue to be used for the various VVUQ activities. A more targeted challenge problem is required to properly compare methods. Although more experience would be valuable in all the areas of VVUQ, the practical, subjective aspects seem to lag behind the quantitative methods in terms of maturity.

  • In particular, formality and consistency were lacking in the areas of: characterizing epistemic and aleatory uncertainty, assessing and communicating credibility, and connecting VVUQ activities together.

The responses were not evaluated or ranked against each other; it would have little meaning due to the heterogeneity of the approaches. Instead, readers are encouraged to learn the methodologies, to evaluate the effect of VVUQ evidence on credibility, and to imagine what a decision maker could glean from the responses.

This challenge problem and the resulting workshop were made possible with support from the Sandia National Laboratories and ASME. We wish to recognize George Orient, Brian Carnes, Vicente Romero, Greg Weirs, Laura Swiler, Walt Witkowski, and the Dakota team at the Sandia National Laboratories, and Ryan Crane and the V&V Standards committees at ASME for their work in creating and organizing the workshop. We are especially grateful to the workshop participants, who made this special edition possible, including Thomas Brodrick and James Elele with the U.S. Navy, Michael Shields at the Johns Hopkins University, and Thomas Paez, Paul Paez, and Timothy Hasselman. Sandia National Laboratories is a multiprogram laboratory managed and operated by the Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-AC04-94AL85000.

Hu, K. T. , Carnes, B. , and Romero, V. , “ The 2014 Sandia V&V Challenge Problem Workshop,” J. Verif., Validation Uncertainty Quantif., 1(1).
Hu, K. T. , 2013, “ 2014 V&V Challenge: Problem Statement,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2013-10486P.
Hu, K. T. , and Orient, G. E. , “ The 2014 Sandia V&V Challenge Problem Statement,” J. Verif., Validation Uncertainty Quantif., 1(1).
Helton, J. C. , and Oberkampf, W. L. , 2004, “ Alternative Representations of Epistemic Uncertainty,” Reliab. Eng. Syst. Saf., 85, pp. 1–10. [CrossRef]
Hills, R. G. , Pilch, M. , Dowding, K. J. , Red-Horse, J. , Paez, T. L. , Babuška, I. , and Tempone, R. , 2008, “ Validation Challenge Workshop,” Comput. Methods Appl. Mech., 197, pp. 2375–2380. [CrossRef]
Beghini, L. L. , and Hough, P. D. , “ Sandia V&V Challenge Problem: A PCMM-Based Approach to Assessing Prediction Credibility,” J. Verif., Validation Uncertainty Quantif., 1(1).
Choudhary, A. , Voyles, I . T. , Roy, C. J. , Oberkampf, W. L. , and Patil, M. , “ Probability Bounds Analysis Applied to the Sandia Verification and Validation Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Li, W. , Chen, S. , Jiang, Z. , Apley, D. W. , Lu, Z. , and Chen, W. , “ Integrating Calibration, Bias Correction, and Machine Learning for the Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Xi, Z. , and Yang, R.-J. , “ Reliability Analysis With Model Uncertainty Coupling With Parameter and Experimental Uncertainties: A Case Study of 2014 V&V Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Mullins, J. , and Mahadevan, S. , “ Bayesian Information Fusion for Model Calibration, Validation, and Prediction,” J. Verif., Validation Uncertainty Quantif., 1(1).
Hu, K. T. , and Paez, T. L. , “ Why Do Verification and Validation?” J. Verif., Validation Uncertainty Quantif., 1(1).
Paez, P. J. , Paez, T. L. , and Hasselman, T. J. , “ The Economics of V&V,” J. Verif., Validation Uncertainty Quantif., 1(1).
Shields, M. D. , Teferra, K. , and Kim, H. , 2014. “ V&V Challenge Problem: An Efficient Monte Carlo Method Incorporating the Effects of Model Error,” ASME Paper No. V&V2014-7214.
Adams, B. M. , Bauman, L. E. , Bohnhoff, W. J. , Dalbey, K. R. , Eddy, J. P. , Ebieda, M. S. , Eldred, M. S. , Hough, P. D. , Hu, K. T. , Jakeman, J. D. , Swiler, L. P. , Stephens, J. A. , Vigil, D. M. , and Wildey, T. M. , 2014, “ Dakota—A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.1 User's Manual,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2014-4633.
Bichon, B. J. , Eldred, M. S. , Swiler, L. P. , Mahadevan, S. , and McFarland, J. M. , 2008, “ Efficient Global Reliability Analysis for Nonlinear Implicit Performance Functions,” AIAA J., 46(10), pp. 2459–2468. [CrossRef]
Oberkampf, W. , Pilch, M. , and Trucano, T. , 2007, “ Predictive Capability Maturity Model for Computational Modeling and Simulation,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2007-5948.
Ferson, S. , Oberkampf, W. L. , and Ginzburg, L. , 2008, “ Model Validation and Predictive Capability for the Thermal Challenge Problem,” Comput. Methods Appl. Mech., 197, pp. 2408–2430. [CrossRef]
Voyles, I. T. , and Roy, C. J. , 2014, “ Evaluation of Model Validation Techniques in the Presence of Uncertainty,” AIAA Paper No. 2014-0120.
Voyles, I . T. , and Roy, C. J. , 2015, “ Evaluation of Model Validation Techniques in the Presence of Aleatory and Epistemic Input Uncertainties,” AIAA Paper No. 2015-1374.
Kennedy, M. C. , and O'Hagan, A. , 2001, “ Bayesian Calibration of Computer Models,” J. R. Stat. Soc. B, 63(3), pp. 425–464. [CrossRef]
Oberkampf, W. L. , and Trucano, T. G. , 2002, “ Verification and Validation in Computational Fluid Dynamics,” Prog. Aerosp. Sci., 38(3), pp. 209–272. [CrossRef]
Ferson, S. , Joslyn, C. A. , Helton, J. C. , Oberkampf, W. L. , and Sentz, K. , 2004, “ Summary From the Epistemic Uncertainty Workshop: Consensus Amid Diversity,” Reliab. Eng. Syst. Saf., 85, pp. 355–369. [CrossRef]
Knupp, P. , and Salari, K. , 2002, Verification of Computer Codes in Computational Science and Engineering, CRC Press, Boca Raton, FL.
Roy, C. J. , 2005, “ Review of Code and Solution Verification Procedures for Computational Simulation,” J. Comput. Phys., 205(1), pp. 131–156. [CrossRef]
Saltelli, A. , Tarantola, S. , Campolongo, F. , and Ratto, M. , 2004, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models, Wiley, Chichester, UK.
Trucano, T. G. , Swiler, L. P. , Igusa, T. , Oberkampf, W. L. , and Pilch, M. , 2006, “ Calibration, Validation, and Sensitivity Analysis: What's What,” Reliab. Eng. Syst. Saf., 91, pp. 1331–1357. [CrossRef]
ASME V&V 10 Committee, 2006, “ Guide for Verification and Validation in Computational Solid Mechanics,” The American Society of Mechanical Engineers, New York, Technical Report No. V&V 10-2006.
ASME V&V 20 Committee, 2009, “ Standard for Verification and Validation in Computational Fluids and Heat Transfer,” The American Society of Mechanical Engineers, New York, Technical Report No V&V 20-2009.
Ling, Y. , Mullins, J. , and Mahadevan, S. , 2014, “ Selection of Model Discrepancy Priors in Bayesian Calibration,” J. Comput. Phys., 276, pp. 665–680. [CrossRef]
AIAA Standards, 2002, “ Guide for the Verification and Validation of Computational Fluid Dynamics Simulations,” AIAA Paper No. G-077-1998.
Hu, K. T. , 2014, “ The Sandia National Laboratories 2014 Verification & Validation Challenge Workshop,” ASME Paper No. V&V2014-7211.
Sargent, R. G. , 2011, “ Verification and Validation of Simulation Models,” Proceedings of the Winter Simulation Conference, Phoenix, AZ, pp.183–198.
Copyright © 2016 by ASME
View article in PDF format.

References

Hu, K. T. , Carnes, B. , and Romero, V. , “ The 2014 Sandia V&V Challenge Problem Workshop,” J. Verif., Validation Uncertainty Quantif., 1(1).
Hu, K. T. , 2013, “ 2014 V&V Challenge: Problem Statement,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2013-10486P.
Hu, K. T. , and Orient, G. E. , “ The 2014 Sandia V&V Challenge Problem Statement,” J. Verif., Validation Uncertainty Quantif., 1(1).
Helton, J. C. , and Oberkampf, W. L. , 2004, “ Alternative Representations of Epistemic Uncertainty,” Reliab. Eng. Syst. Saf., 85, pp. 1–10. [CrossRef]
Hills, R. G. , Pilch, M. , Dowding, K. J. , Red-Horse, J. , Paez, T. L. , Babuška, I. , and Tempone, R. , 2008, “ Validation Challenge Workshop,” Comput. Methods Appl. Mech., 197, pp. 2375–2380. [CrossRef]
Beghini, L. L. , and Hough, P. D. , “ Sandia V&V Challenge Problem: A PCMM-Based Approach to Assessing Prediction Credibility,” J. Verif., Validation Uncertainty Quantif., 1(1).
Choudhary, A. , Voyles, I . T. , Roy, C. J. , Oberkampf, W. L. , and Patil, M. , “ Probability Bounds Analysis Applied to the Sandia Verification and Validation Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Li, W. , Chen, S. , Jiang, Z. , Apley, D. W. , Lu, Z. , and Chen, W. , “ Integrating Calibration, Bias Correction, and Machine Learning for the Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Xi, Z. , and Yang, R.-J. , “ Reliability Analysis With Model Uncertainty Coupling With Parameter and Experimental Uncertainties: A Case Study of 2014 V&V Challenge Problem,” J. Verif., Validation Uncertainty Quantif., 1(1).
Mullins, J. , and Mahadevan, S. , “ Bayesian Information Fusion for Model Calibration, Validation, and Prediction,” J. Verif., Validation Uncertainty Quantif., 1(1).
Hu, K. T. , and Paez, T. L. , “ Why Do Verification and Validation?” J. Verif., Validation Uncertainty Quantif., 1(1).
Paez, P. J. , Paez, T. L. , and Hasselman, T. J. , “ The Economics of V&V,” J. Verif., Validation Uncertainty Quantif., 1(1).
Shields, M. D. , Teferra, K. , and Kim, H. , 2014. “ V&V Challenge Problem: An Efficient Monte Carlo Method Incorporating the Effects of Model Error,” ASME Paper No. V&V2014-7214.
Adams, B. M. , Bauman, L. E. , Bohnhoff, W. J. , Dalbey, K. R. , Eddy, J. P. , Ebieda, M. S. , Eldred, M. S. , Hough, P. D. , Hu, K. T. , Jakeman, J. D. , Swiler, L. P. , Stephens, J. A. , Vigil, D. M. , and Wildey, T. M. , 2014, “ Dakota—A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.1 User's Manual,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2014-4633.
Bichon, B. J. , Eldred, M. S. , Swiler, L. P. , Mahadevan, S. , and McFarland, J. M. , 2008, “ Efficient Global Reliability Analysis for Nonlinear Implicit Performance Functions,” AIAA J., 46(10), pp. 2459–2468. [CrossRef]
Oberkampf, W. , Pilch, M. , and Trucano, T. , 2007, “ Predictive Capability Maturity Model for Computational Modeling and Simulation,” Sandia National Laboratories, Albuquerque, NM, Technical Report No. SAND2007-5948.
Ferson, S. , Oberkampf, W. L. , and Ginzburg, L. , 2008, “ Model Validation and Predictive Capability for the Thermal Challenge Problem,” Comput. Methods Appl. Mech., 197, pp. 2408–2430. [CrossRef]
Voyles, I. T. , and Roy, C. J. , 2014, “ Evaluation of Model Validation Techniques in the Presence of Uncertainty,” AIAA Paper No. 2014-0120.
Voyles, I . T. , and Roy, C. J. , 2015, “ Evaluation of Model Validation Techniques in the Presence of Aleatory and Epistemic Input Uncertainties,” AIAA Paper No. 2015-1374.
Kennedy, M. C. , and O'Hagan, A. , 2001, “ Bayesian Calibration of Computer Models,” J. R. Stat. Soc. B, 63(3), pp. 425–464. [CrossRef]
Oberkampf, W. L. , and Trucano, T. G. , 2002, “ Verification and Validation in Computational Fluid Dynamics,” Prog. Aerosp. Sci., 38(3), pp. 209–272. [CrossRef]
Ferson, S. , Joslyn, C. A. , Helton, J. C. , Oberkampf, W. L. , and Sentz, K. , 2004, “ Summary From the Epistemic Uncertainty Workshop: Consensus Amid Diversity,” Reliab. Eng. Syst. Saf., 85, pp. 355–369. [CrossRef]
Knupp, P. , and Salari, K. , 2002, Verification of Computer Codes in Computational Science and Engineering, CRC Press, Boca Raton, FL.
Roy, C. J. , 2005, “ Review of Code and Solution Verification Procedures for Computational Simulation,” J. Comput. Phys., 205(1), pp. 131–156. [CrossRef]
Saltelli, A. , Tarantola, S. , Campolongo, F. , and Ratto, M. , 2004, Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models, Wiley, Chichester, UK.
Trucano, T. G. , Swiler, L. P. , Igusa, T. , Oberkampf, W. L. , and Pilch, M. , 2006, “ Calibration, Validation, and Sensitivity Analysis: What's What,” Reliab. Eng. Syst. Saf., 91, pp. 1331–1357. [CrossRef]
ASME V&V 10 Committee, 2006, “ Guide for Verification and Validation in Computational Solid Mechanics,” The American Society of Mechanical Engineers, New York, Technical Report No. V&V 10-2006.
ASME V&V 20 Committee, 2009, “ Standard for Verification and Validation in Computational Fluids and Heat Transfer,” The American Society of Mechanical Engineers, New York, Technical Report No V&V 20-2009.
Ling, Y. , Mullins, J. , and Mahadevan, S. , 2014, “ Selection of Model Discrepancy Priors in Bayesian Calibration,” J. Comput. Phys., 276, pp. 665–680. [CrossRef]
AIAA Standards, 2002, “ Guide for the Verification and Validation of Computational Fluid Dynamics Simulations,” AIAA Paper No. G-077-1998.
Hu, K. T. , 2014, “ The Sandia National Laboratories 2014 Verification & Validation Challenge Workshop,” ASME Paper No. V&V2014-7211.
Sargent, R. G. , 2011, “ Verification and Validation of Simulation Models,” Proceedings of the Winter Simulation Conference, Phoenix, AZ, pp.183–198.

Figures

Grahic Jump Location
Fig. 1

A VVUQ hierarchy for the challenge problem. Numbers identify the datasets, and the labels describe the system and environment of the experiments. The capabilities of the provided model are also indicated. (Reproduced with permission from Hu [2] (Figs. 2 and 3). Copyright 2013 by Sandia National Laboratories.)

Tables

Table Grahic Jump Location
Table 1 Reporting challenge problem participants
Table Grahic Jump Location
Table 2 Overview of challenge problem responses
Table Grahic Jump Location
Table 3 The provided datasets
Table Grahic Jump Location
Table 4 Summary of the sensitivity analysis, calibration, and propagation activities. Only the primary method or quantity of interest is listed.

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In