Validation exercises for computational models of materials under impact must contend with sparse experimental data as well as with uncertainties due to microstructural stochasticity and variabilities in thermomechanical properties of the material. This paper develops statistical methods for determining confidence levels for verification and validation of computational models subject to aleatoric and epistemic uncertainties and sparse stochastic experimental datasets. To demonstrate the method, the classical problem of Taylor impact of a copper bar is simulated. Ensembles of simulations are performed to cover the range of variabilities in the material properties of copper, specifically the nominal yield strength A, the hardening constant B, and the hardening exponent n in a Johnson–Cook material model. To quantify uncertainties in the simulation models, we construct probability density functions (PDFs) of the ratios of the quantities of interest, viz., the final bar diameter to the original diameter and the final length to the original length . The uncertainties in the experimental data are quantified by constructing target output distributions for these QoIs ( and ) from the sparse experimental results reported in literature. The simulation output and the experimental output distributions are compared to compute two metrics, viz., the median of the model prediction error and the model confidence at user-specified error level. It is shown that the median is lower and the model confidence is higher for compared to , implying that the simulation models predict the final length of the bar more accurately than the diameter. The calculated confidence levels are shown to be consistent with expectations from the physics of the impact problem and the assumptions in the computational model. Thus, this paper develops and demonstrates physically meaningful metrics for validating simulation models using limited stochastic experimental datasets. The tools and techniques developed in this work can be used for validating a wide range of computational models operating under input uncertainties and sparse experimental datasets.