## Abstract

One major impediment to wider adoption of additive manufacturing (AM) is the presence of larger-than-expected shape deviations between an actual print and the intended design. Since large shape deviations/deformations lead to costly scrap and rework, effective learning from previous prints is critical to improve build accuracy of new products for cost reduction. However, products to be built often differ from the past, posing a significant challenge to achieving learning efficacy. The fundamental issue is how to learn a predictive model from a small set of training shapes to predict the accuracy of a new object. Recently an emerging body of work has attempted to generate parametric models through statistical learning to predict and compensate for shape deviations in AM. However, generating such models for 3D freeform shapes currently requires extensive human intervention. This work takes a completely different path by establishing a random forest model through learning from a small training set. One novelty of this approach is to extract features from training shapes/products represented by triangular meshes, as opposed to point cloud forms. This facilitates fast generation of predictive models for 3D freeform shapes with little human intervention in model specification. A real case study for a fused deposition modeling (FDM) process is conducted to validate model predictions. A practical compensation procedure based on the learned random forest model is also tested for a new part. The overall shape deviation is reduced by 44%, which shows a promising prospect for improving AM print accuracy.

## 1 Introduction

Additive manufacturing (AM) has moved from solely a tool for prototyping to a critical technology for the production of functional parts used in a growing number of fields such as aerospace and medicine [1–6]. Yet, a common problem for AM is the presence of undesirable geometric shape deformations that may lead to scrap or rework [7–9]. This can aggravate existing high costs in AM and hamper further industrial adoption. Geometric complexity of three-dimensional (3D) objects is one major issue, among others, that hinders efforts to achieve consistent shape accuracy across a large variety of products, particularly considering the nature of one-of-a-kind manufacturing or low-product production in AM [10,11]. Unlike mass production, learning to produce one product or one family of products in accurate geometries is insufficient for AM.

A growing body of research seeks to address this shape deformation issue through predictive modeling and compensation approaches. As summarized in Fig. 1, there are two main categories of predictive modeling approaches reported in the literature for shape deformation control: physics-based approaches utilizing finite element modeling [12–16] and data-driven approaches based on statistical and machine learning [7,17–22].

Physics-based modeling uses first principle to simulate the physical phenomena underlying an AM process. Results from these simulations can be effective for predicting thermal and mechanical behavior of parts during a print. For instance, physics-based models have been applied to simulate residual stresses in produced parts, give insight into part distortion, and predict spatiotemporal temperature of feedstock in a build envelope, among many other uses [12–15]. Challenges faced by physics-based modeling include the computational complexity of simulations and the need to account for a wide variety of physical phenomena that affect a process [16]. Furthermore, these phenomena can often be specific to a single method of AM, i.e., results from a simulation of a selective laser melting machine would not be useful for modeling a machine using material extrusion.

Data-driven approaches for shape deformation control utilize data either from processes or from products to establish process-oriented models or product shape-oriented models. These surrogate models greatly reduce computational costs. Process-oriented models seek to address geometric differences caused by process variables. Empirical and statistical methods have been applied to the investigation and modeling of AM processes [23–26]. Factors such as layer thickness and flowrate are varied to discover optimal settings for quality control. Tong et al. [17,27], for example, utilizes polynomial regression models to predict shrinkages in spatial directions and corrects material shrinkage and kinematic errors caused by motion of the extruder by altering computer-aided design (CAD) designs. One downside to process-oriented models is that the product shapes and their impact on shape deformations are often not considered.

Product shape-oriented models seek to account for this by using the geometry of the manufactured part to inform error predictions. A critical step in shape-oriented modeling is the mathematical representation of shape deformation for freeform 3D objects. Three main representation approaches have been reported in the literature: point cloud representation, parametric representation, and triangular mesh representation. Point cloud-based approaches have sought to describe geometry using coordinates of points on a product boundary. Xu et al. [28], for example, presented a framework for establishing the optimal correspondence between points on a deformed shape and a CAD model. A compensation profile based on this correspondence is then developed and applied to prescriptively alter the CAD model. A different point cloud-based approach is presented in Ref. [29], which sought to utilize deep learning to enable thermal distortion modeling. A part's thermal history was captured using a thermal camera focused on the build plate of a printer employing laser-based additive manufacturing. This information was then used to train a convolutional neural network, which gave a distortion prediction for each point. The method was demonstrated using a number of 3D printed disks. Another related study focused on the use of transfer learning between models for different AM materials [30]. The model that was employed utilized information regarding a point's position on the disk shape that was printed to predict geometric distortion. In addition to proper shape registration and correspondence, one challenge for this approach is that models based on point cloud representations of shape deformation can be highly shape dependent, making it hard to translate knowledge from shape to shape. As a result, the datasets in the previous articles are highly homogeneous.

Parametric representation approaches transform the point cloud data to extract deformation patterns or reduce complexity due to shape variety. Huang et al., for example, demonstrated a modeling and compensation method by representing the shape deformation in a polar coordinate system [7]. One advantage of this approach is that it decouples geometric shape complexity from deformation modeling through transformation, making systematic spatial error patterns more apparent and easier to analyze. Huang et al. showed [7,31–33] that this approach was able to reduce errors in stereolithography (SLA) printed cylinders by up to 90% and in SLA printed 2D freeform shapes by 50% or more. One disadvantage of this method is that it requires a parametric function to be fit to the surface of each shape that is to be modeled. Unfortunately, this can become prohibitively tedious for complex 3D shapes [34].

To account for this problem and expedite model building, this article proposes a shape-oriented modeling approach based on features extracted from triangular mesh shape representations of printed objects. This form of shape representation is an ideal candidate because of the ease with which it can describe complex 3D geometries. Furthermore, parts manufactured using AM are almost universally handled as triangular mesh files. The STL file is the most common format for transferring 3D shapes from CAD software or databases to slicing software for a 3D printer. It stores 3D shapes in the form of a simple triangular mesh and has maintained widespread popularity over the past several decades due to its simplicity and wide compatibility across systems. Other more recent file formats for 3D printing such as the additive manufacturing file format (AMF) and 3D manufacturing format (3MF) formats^{2} [35] incorporate functionality beyond the storage of a single triangular mesh, such as color and texture, more naturally defined curves, and more. These formats have found support from government and industry and are growing in adoption. Because this modeling method can utilize the same data structure that the part is produced with, its simplicity and accuracy are increased.

Other work has used triangular mesh representations of geometry in seeking to improve print accuracy often by selecting ideal orientations for printing or by focusing on geometric differences for specific error-prone features. Chowdhury et al. [36] proposed an approach for selecting the optimal build orientation of a part using a model with orientation-based variables relevant to a part's final geometric accuracy. These variables were derived from the part's STL file. Their method combined this model with compensation produced by a neural network trained on finite element analysis data to reduce the overall error of the part [36,37]. Moroni et al. [38] demonstrated a means of identifying cylindrical voids in a part's shape using a triangular mesh. This approach then predicted the dimensional error of the cylindrical voids based on their geometric properties. Moroni et al. [39] also extended this method to selecting optimal print orientation.

The method proposed in this article seeks to predict and compensate for geometric deviations across the entire surface of a given part, making it a useful tool for increasing the shape accuracy of an AM machine. It begins by performing feature extraction from triangular mesh representations of manufactured parts. These features are used alongside deviation measurement data for the respective parts to train a random forest machine learning model. This model can then be used to predict errors for future prints. Finally, a compensated 3D model based on these predictions can be generated and printed, resulting in a part with reduced geometric deviations. This process is illustrated in Fig. 2. One major contribution of this approach is that it quickly facilitates modeling of freeform surfaces that would likely be exceedingly difficult to model using parametric function-based approaches.

An experiment to validate this approach using a number of benchmarking objects produced on an fused deposition modeling (FDM) 3D printer will be presented. The experiment used a dataset of four objects and their corresponding geometric deviations to train a machine learning model. This model was then used to make predictions for a new shape that was treated as a testing dataset. The predicted deviations for this shape compared favorably to the actual deviations of the shape when printed, demonstrating the potential of this approach for applications in error prediction. Finally, these predictions were utilized to generate a compensated CAD file of the shape, which was printed and evaluated. This compensated part was found to have average deviations that were 44% smaller than those of the uncompensated original print.

## 2 Feature Extraction for Triangular Mesh-Based Shape Deviation Representation

Modeling complex surfaces that cannot be easily described analytically present a challenge to many existing modeling methodologies. One way to address this challenge is the use of a finite number of predictor variables that capture certain geometric properties of a surface that are deemed relevant based on prior engineering-informed knowledge. These predictor variables can be computed for an evenly distributed set of points across the surface of an object. These points will then function as instances in the model for which predictions can be made and to which position modifications can be applied for the purpose of compensation. Here, a set of eight predictor variables **x**, corresponding to each relevant property under consideration, is constructed using feature extraction from a triangular mesh describing the shape to be printed. For simplicity, the vertices that make up the shape's triangular mesh can be considered the instances in the model. To produce an unbiased model, it is necessary that the triangular mesh be uniformly dense across the surface of the shape and have triangles of the consistent size. This can be achieved by remeshing an object's STL file using one of several algorithms [40]. This article considers three broad areas of phenomena that have been shown to affect print accuracy. These include position within the print bed, orientation and curvature of a surface, and thermal expansion effects.

### 2.1 Position-Related Predictors.

The first area of significance for feature extraction is the physical position of a vertex in a print bed. Several studies have demonstrated that position within a printer's print bed is significantly correlated with the resulting accuracy of printed parts [17,19]. In the context of FDM, this location dependency can be connected to extruder positioning, while for other processes like digital light processing, this can be connected to optical variation [41]. For the *n*th vertex, the first three predictor variables (*x _{n}*

_{,1},

*x*

_{n}_{,2}, and

*x*

_{n}_{,3}) used in this model correspond to the

*x*,

*y*, and

*z*coordinates of each vertex. These predictors seek to capture errors related to the actual position of the printed object within the print bed. For the validation experiment that will follow, objects were positioned in the slicing software so that the position values from the STL file were exactly each vertex's position within the 3D printer's print bed. One implication of this is that the same object printed in different orientations or locations will have different predictor sets.

### 2.2 Surface Orientation and Curvature Predictors.

*x*–

*y*plane can influence how the material is deposited. The next four predictor variables are derived from the set of normal vectors corresponding to the

*V*triangular faces adjacent to a given vertex

_{n}*n*. Each normal vector $Si=(1\varphi i\u03d1i)$,

*i*= 1, 2, …

*V*, is expressed in spherical notation with radius 1, an elevation angle, and an azimuth angle. The predictor variables are calculated as follows and illustrated in Fig. 3. Figure 3 depicts how these predictor variables would be calculated for a single vertex

_{n}*n*(or instance) on the triangular mesh, which is shown as a black dot on the mesh, and the expanded view to the left.

*φ*. The third of these variables is the median value of the elevation angles in the set. This can be interpreted as the slope of the geometric features. This is of particular interest due to the correlation between slope and common print errors. This variable is also useful for detecting overhangs, which can be difficult to print accurately. Finally, the fourth of these variables is the range in elevation angles (i.e., max

*φ*

_{i}– min

*φ*

_{i}). This can be interpreted as the degree to which the slope changes over the surface described by the triangular faces. This has relevance to shape-dependent errors.

### 2.3 Material Expansion/Shrinkage Predictor.

*z*-direction placed at the center of each shape (which in the case of the validation experiment intersects the point 0,0,0 on the printer's build platform):

This distance between the *z*-axis placed at the center of the shape and a single vertex (or instance) *n* on the triangular mesh is shown in Fig. 4. The feature is of significance due to the thermal expansion effects of the printed materials [43]. If an object is formed at a high temperature, as it cools, the printed material's coefficient of linear thermal expansion dictates the degree to which its overall size is reduced. Such temperature changes can lead to warping, residual stresses, and dimensional inaccuracies [44,45]. This is further complicated by the fact that heat can be concentrated at different locations over short periods of time. Objects of larger size expand and contract by a greater absolute distance due to scaling. Points on the surface that are at a greater distance from what can be considered the center of the object will therefore experience a greater degree of displacement. This necessitates a proxy for a point's distance from the rough center of expansion to be accounted for.

Given an STL file, the set of each of these predictor variables can be quickly calculated for each vertex. They can give a good idea of the relevant geometric factors that can influence the accuracy of a 3D print. The relative efficacy of each of these predictor variables will be briefly evaluated in Sec. 5.4.

## 3 Shape Deviation Measurement and Calculation

A procedure for measuring deviations across the surface of a printed object is presented here. It is important that deviation values be calculated at each vertex on an object's triangular mesh. This then allows for deviations to be used as the response variable corresponding to each set of predictor variables.

This procedure begins by producing a dense point cloud of measurements of the surface of a 3D printed object. In the validation experiment described later, each object was scanned using a ROMER Absolute Arm with an attached laser scanner manufactured by Hexagon Manufacturing Intelligence. According to the manufacturer, this scanner has an accuracy of 80 *μ*m. The objects were each scanned with several passes from different angles so as to create scans with between 5,00,000 and 1.6 million points. In comparison, each design STL file has approximately 50,000 data points.

Registration is performed according to the methodology presented in Ref. [46]. Each point cloud is first aligned against its corresponding STL file manually. Kinematic constraints are applied in this process. For example, the scan points on the table (ground points) are used to fix the height and orientation about *x* and *y*-axes of the scan point cloud. Ground points are produced when the laser scanner detects the surface the scanned object is resting on and are illustrated in Fig. 5. Alignment is then refined using a modified version of the ICP algorithm. In this version, translation is only allowed along the *x* and *y*-axes, while rotation is only allowed about the *z*-axis so as to preserve the initial alignment according to table points.

Once registration is completed, it is necessary to calculate the distances between each vertex in the designed triangular mesh and the 3D scan point cloud. To reduce noise from outlier points in the scan point cloud, a mesh of the scan point cloud was generated using screened Poisson surface reconstruction (SPSR) [47]. The shortest distance between each vertex on the triangular mesh *v*_{n} and the surface of the scanned triangular mesh was calculated. Because shortest distance deviation is used, this SPSR reduces error minimization bias caused by always selecting the noisy points in the cloud that are closest to the designed shape. Instead, distance to the “averaged” or smoothed surface is used. The shortest distance between *v*_{n} and the scanned mesh is returned in the form of a vector *d*_{n}. The magnitude of deviation in the direction normal to the triangular mesh at each vertex is then calculated as *y*_{n} = *d*_{n} · *N*_{n}, where *N*_{n} is the vector (*x*_{n,4}, *x*_{n,6}, 1) expressed in Cartesian coordinates. Signs correspond with whether the deviation represents a dimension that is too large or too small. This results in a set of response values representing deviation values that are normal to the surface of the designed triangular mesh. These values are used as the set of response variables *y*_{1} through *y*_{N}.

For a training dataset containing multiple printed parts, data {(**x*** _{n}*, y

*),*

_{n}*n*= 1, 2, …

*N*} is the ensemble of the total

*N*vertices from all of the shapes. Note that each vertex may have a different number of adjacent triangle faces. For the validation experiment conducted in Sec. 5, for example, there are four triangular mesh files that correspond to four different shapes that are all included in the training dataset.

## 4 Random Forest Model to Predict Shape Deviation With Extracted Features

To learn and predict shape deviations, it is necessary to develop a predictive model based on the training data. Because triangular mesh files often contain tens of thousands of vertices, the size of the datasets generated by this method can be cumbersome, posing a computational challenge for machine learning methods. Conversely, because of the small number of example shapes that might be available for model training, the approach must also be flexible and generalize well under covariate shift. One computationally efficient modeling approach that can be utilized in this situation is the random forest method. One way to quantify the computational efficiency of a machine learning algorithm is time complexity, which reflects the number of computations that must be performed to generate a model and thus time. The random forest algorithm has a worst-case scenario time complexity on the order of $O(MKN~2logN~)$, where *M* is the number of trees in the random forest, *K* is the number of variables drawn at each node, and *Ñ* is the number of data points *N* multiplied by 0.632, since bootstrap samples draw 63.2% of data points on average [48]. As a point of comparison, an algorithm such as Gaussian Process regression has a worst-case scenario time complexity on the order of *O*(*N*^{3}) [49]. For the training sets utilized in the proof-of-concept experiments that will follow, this is roughly three orders of magnitude more complex.

### 4.1 Random Forest Method.

Researchers have successfully applied machine learning to make accurate predictions in a wide range of applications related to manufacturing. One particularly popular algorithm for applications is random forest modeling, which has been applied to predicting surface roughness of parts produced with AM, fault diagnosis of bearings, and tool wear, to name just a few use cases [50–52]. The random forest algorithm is a means of supervised ensemble learning originally conceived by Breiman [53]. It utilizes regression or classification trees, which are a method of machine learning that recursively segments a given dataset into increasingly small groups based on predictor variables, allowing it to produce a response value given a new set of predictors [54]. The resulting structure of this segmentation process resembles the roots of a tree and is shown in Fig. 6. The random forest algorithm constructs an ensemble, or forest, of these trees, each trained on a subset of the overall dataset [40]. This process is explained in further detail later and is illustrated in Fig. 7.

The goal of a regression tree is to generate a set of rules that efficiently segment the given training set using predictor variables in a way that generates accurate predictions of a response variable. This process begins with a single node and randomly chooses a set of predictor variables to be used in dividing the dataset. Given *P* total predictor variables, it is generally recommended that the number of predictor variables sampled for each node be set to *P*/3 in the case of regression and $P$ in the case of classification [55]. By using this subset of predictor variables, the algorithm seeks to split the data at the node in a manner that minimizes the sum of the squares of error for each response label *y*_{i}:

This process is then repeated for each resulting node until a predetermined condition is met. Two common conditions include a predetermined minimum number of data observations at a node or a maximum tree depth. Once the stopping condition is met, each of the terminal nodes is labeled with the average value of the responses for the observations contained by that node. New predictions are generated using a set of predictor variable values to navigate down the tree until arriving at a terminal node that corresponds to the predicted response value.

The random forest algorithm begins by generating subsets or “bootstrap samples” from the overall dataset. These bootstrap samples are drawn randomly from the overall dataset with replacement, allowing for some data to be shared between samples [53]. A regression tree is then trained for each bootstrap sample.

To make predictions using a generated forest, the predictor variables are used to generate individual predictions from each tree. The average of this set of predictions is then given as the overall output of the ensemble.

One benefit of the random forest algorithm for this application is that the addition of irrelevant data (predictor sets are highly dissimilar to those for the predicted part) does not strongly affect predictions based on the most relevant data. In this way, the individual trees can naturally accommodate diverse data sets in training without substantial degradation in prediction quality.

### 4.2 Feature Selection.

*x*

_{:,1}, each of its values in the dataset are permuted so as to randomize the values of that predictor variable's input. A new set of predictions is generated using this data, and the MSE of these predictions is calculated. The change in prediction error is defined as the difference between the original and changed MSE values:

A large value of $\Delta Errorx1$ indicates that this is a significant predictor variable, since randomizing its input causes the predictions of the regression tree to become much worse.

*x*

_{n}) for each predictor variable:

### 4.3 Measuring Covariance Shift to Determine Feasibility of Prediction.

*P*and

*Q*, respectively, and then

*P*=

*Q*is the ideal case and the predictions can be made with confidence. In practice, however, test distribution

*Q*will differ arbitrarily from the training distribution

*P*. Such a change is known as covariate shift [57,58]. This is due to the fact that we wish to predict errors for shapes that are different than the shapes that have already been printed. Sugiyama et al. [59,60] note that the Kullback–Leibler divergence between two distributions for datasets can be interpreted as an estimator for the level of covariate shift between them. An approach based on this is utilized here. Jensen–Shannon divergence is utilized instead in order to gain symmetry between distance measurements, and independent distributions for each predictor variable are calculated for the sake of computational cost. To estimate the distributions

*P*and

*Q*for each feature, kernel density estimation [61,62] is applied to get the density estimation of features

*i*= 1, …, 8 as follows:

*x*

_{j,i}in the dataset for the first shape, and

*x*

_{j,i}in the dataset of the second shape, where

*K*(·) is the normal kernel, which is the same for both distributions:

*i*can be quantified using Jensen–Shannon divergence [63].

*KL*

_{i}(

*P*‖

*Q*) is the Kullback–Leibler divergence [64] defined as follows:

A final divergence metric between two shapes can be given as the sum of the Jensen–Shannon divergences for each predictor variable:

### 4.4 Prescriptive Compensation of Shape Deviation.

Once predictions are made for a part that is to be printed, it becomes necessary to leverage these predictions to improve the part's eventual quality. This method for compensating for positioning error was utilized in Refs. [7,27,28]. The general idea is that if a portion of the object is predicted to be too large or small by a certain amount, the shape of the object can be altered in the opposite direction by a corresponding amount before the object is printed, thus resulting in part with less error. For each vertex on the triangular mesh *v*_{n}, a new compensated vertex is generated by translating the vertex a distance of $\u2212y^n$ along a vector normal to the surface at that point. This vector can be calculated in spherical coordinates as (*x*_{n,4}, *x*_{n,6}, 1). This process is illustrated in Fig. 8. It should be noted that this is not an optimal approach, like what is presented by Huang et al. [7,18], but is instead a heuristic. One implication of this is that in many situations, the optimal compensation for a part is different from the negative value of the observed deviation.

The reason a nonoptimal approach is taken here is due to the nature of random forest modeling, as small changes in the predictor set do not yield large if any changes in the response from the prediction function. This is because according to the regression tree algorithm, all values within a certain region of the *n*-dimensional predictor space will return the same value. In the validation experiment that will follow, for instance, only 29% of the points on the compensated STL file showed different values of predicted error after compensation (assuming the compensated STL file is then considered the ideal shape). Of those that did, the average change in predicted error was 0.0013 mm, which is well below the resolution of the 3D printer used in this study. Once each vertex is modified, a new compensated STL file is generated for printing.

## 5 Validation Experiment

### 5.1 Test Object Design, Printing, and Measurement.

To test the efficacy of the proposed method, a case study utilizing an FDM 3D printer was constructed. The goal of the experiment was to determine whether the geometric accuracy of a previously unseen shape could be improved using accuracy data from several other related shapes using the proposed method. In addition, the predictive accuracy of the model for the unseen shape was also evaluated. The experiment is designed to mirror a situation that might be encountered in an industrial setting—when a manufacturer is about to print a new part, but only has accuracy data for a small number of somewhat related shapes. The previously discussed methodologies for finding the most relevant accuracy data one possesses, leveraging that data to generate predictions, and using those predictions to improve accuracy through compensation are all evaluated.

A dataset of 3D printed shapes was generated on an FDM printer with four shapes being used for model training and one always withheld for model testing. These included a half-ovoid, a half-teardrop, a triangular pyramid, a half-snail shell, and a knob shape [65]. These objects are chosen to represent different geometries including varying curved and flat faces, various topologies, and edges of differing angles. The edge length of each triangle in the mesh of each object was set to be approximately half a millimeter during remeshing. It is important to note that because this process takes the original parts and moves them to a much higher mesh density, accuracy is preserved. This is because small triangles can express a freeform shape with greater accuracy than large triangles. Accuracy for freeform parts would likely not be preserved in the opposite direction. Following this remeshing process, the benchmarking objects were printed on a MakerBot Replicator FDM 3D printer using MakerBot brand Polylactic Acid filament. Each object was printed with full infill. Care was taken to ensure that the point defined as the origin in the triangular mesh file for each object was printed at the exact center of the print bed. This ensured that the positions of each vertex in the triangular mesh directly corresponded to the positions of the printed objects within the printer's build envelope. The printed test objects are shown in Fig. 9.

The deviation values for each of the 3D printed shapes were calculated according to the procedure in Sec. 3. These deviation values are shown in Fig. 10, which is a heatmap of deviation values across the surface of each shape. The color at each point indicates the extent of the deviations across the surface. Red points correspond to parts of the shape that are too large, while blue points correspond to parts of the shape that are too small.

To better understand the distribution of deviation magnitudes, a histogram showing the frequencies with which various magnitudes of deviation values occur is shown in Fig. 11. This histogram is specifically for the half-ovoid shape, which is withheld as the testing dataset for one iteration of the experiment. None of the values from the bottom surface of this shapes is included as the deviations are assumed to be zero based on the assumptions used during registration. It can be seen that most deviations are within 0.3 mm of the desired dimension.

### 5.2 Model Training Results.

To better understand the efficacy of the method for prediction, two different models were trained. For the first model, the half-teardrop, triangular pyramid, half-snail shell, and knob shape were used as the training data, while the half-ovoid was used as the testing dataset. For the second model, the half-ovoid, triangular pyramid, half-snail shell, and knob shape were used as the training data, while the half-teardrop was used as the testing dataset. For each model, an ensemble of regression trees was trained using the random forest method and matlab's Statistics and Machine Learning Toolbox. The minimum number of observations at each node was set to 200, while the number of trees in the ensemble was set to 30. This ensemble size was chosen due to the fact that experimental results indicated that further increases in the ensemble size for this dataset yield increasingly small gains in out-of-bag error, as shown in Fig. 12.

Thanks to the simplicity of the random forest algorithm, the total training time was less than 30 s, while predictions can be generated at a speed of roughly 110,000 predictions per second. The relative significance of each predictor variable was calculated for the trained models according to the procedure described in Sec. 4.2. These values are shown in Fig. 13. These results suggest that each of the predictor variables contributes to the overall accuracy of the model, however to differing degrees.

Table 1 compares the covariate shift metrics between each shape in the dataset. The final values are divided by the maximum covariate shift in the table to produce a normalized set. It can be seen that the half-ovoid dataset withheld for testing in the first model is most similar to the half-teardrop and triangular pyramid shapes. Conversely, the knob shape shows a greater magnitude of covariate shift from most of its peers, indicating that predictions made for this shape would likely be of poorer quality. If one wished to generate predictions for the knob, it would be advisable to train the model on data more representative of its unique shape.

Half-teardrop | Half-ovoid | Triangular pyramid | Knob | Half-snail shell | |
---|---|---|---|---|---|

Half-teardrop | 0 | 0.20 | 0.35 | 0.78 | 0.77 |

Half-ovoid | 0.20 | 0 | 0.25 | 0.91 | 1 |

Triangular pyramid | 0.35 | 0.25 | 0 | 0.64 | 0.74 |

Knob | 0.78 | 0.91 | 0.64 | 0 | 0.27 |

Half-snail shell | 0.77 | 1 | 0.74 | 0.27 | 0 |

Half-teardrop | Half-ovoid | Triangular pyramid | Knob | Half-snail shell | |
---|---|---|---|---|---|

Half-teardrop | 0 | 0.20 | 0.35 | 0.78 | 0.77 |

Half-ovoid | 0.20 | 0 | 0.25 | 0.91 | 1 |

Triangular pyramid | 0.35 | 0.25 | 0 | 0.64 | 0.74 |

Knob | 0.78 | 0.91 | 0.64 | 0 | 0.27 |

Half-snail shell | 0.77 | 1 | 0.74 | 0.27 | 0 |

### 5.3 Model Prediction Results.

By using the testing shapes that were withheld from the training sets, a new set of predictions was generated for each random forest model. The mean absolute error (MAE) of predictions for out-of-bag data in the training dataset, as well as the MAE of predictions for the withheld shapes, are provided in Table 2. The first error quantifies the accuracy of the model when making new predictions for the overall shapes (but not individual datapoints) that it has already seen in training. The second error quantifies the accuracy of predictions made for a new shape that the model has not seen during training. The predictions for deviation across the surface of the half-ovoid are graphed alongside the actual deviation values for the shape, allowing for comparison. This is illustrated in Fig. 14 with the same coloring scheme as shown in Fig. 10.

MAE for out-of-bag data in training dataset (same shape error) (mm) | MAE for testing dataset (new shape error) (mm) | |
---|---|---|

First model | 0.0564 | 0.0457 |

Second model | 0.0513 | 0.0708 |

MAE for out-of-bag data in training dataset (same shape error) (mm) | MAE for testing dataset (new shape error) (mm) | |
---|---|---|

First model | 0.0564 | 0.0457 |

Second model | 0.0513 | 0.0708 |

Plots of predicted deviation values versus actual deviation values for the out-of-bag data used in model training, as well as for the testing shape, are given in Figs. 15 and 16. For reference, the lines $y^=y+0.1mm$ and $y^=y\u20130.1mm$ are provided. Predictions that fall outside these bounds might be considered of low quality. It can be seen from these results that this method is capable of producing reasonably accurate predictions for a previously unseen shape from a small training set of just four related shapes.

The predictions shown in Figs. 15 and 16 might be useful for an operator of a 3D printer seeking to determine whether a specific 3D printed shape will be within a prespecified tolerance before beginning the print. This procedure might also be of use when determining the best orientation with which to print an object to maximize accuracy. Figure 16 also demonstrates that there is room for improving the accuracy of the model. This would likely include expansion of the initial training set and refinement of the initial predictor variables. This article builds upon the work presented in Ref. [66]. For an additional example of how prediction using this methodology can be implemented, see Section 5 of Ref. [66].

### 5.4 Compensation Results.

Uncompensated half-ovoid (mm) | Compensated half-ovoid (mm) | |
---|---|---|

MAE | 0.0723 | 0.0404 |

RMS | 0.1047 | 0.0528 |

Uncompensated half-ovoid (mm) | Compensated half-ovoid (mm) | |
---|---|---|

MAE | 0.0723 | 0.0404 |

RMS | 0.1047 | 0.0528 |

It can be seen from the results in Table 2 that the application of the presented compensation methodology results in a 44% reduction in average vertex error and a 50% reduction in root-mean-square vertex error for the testing shape.

## 6 Conclusion and Future Work

This study establishes a new data-driven, nonparametric model to predict shape accuracy of 3D printed products by learning from triangular meshes of a small set of training shapes. The accuracy of a new 3D shape can be quickly predicted through a trained random forest model with little human intervention in specifying models for complicated 3D geometries. With features extracted from triangular meshes, the proposed modeling approach is shown to produce reasonable predictions of shape deviation for a new part based on a limited training set of previous print data. The trained model's out-of-bag prediction error is 0.0580 mm, while its testing dataset error was 0.0713 mm. Compensation leveraging these predictions is also shown to be effective, resulting in a 44% reduction in average vertex deviation.

One further interesting insight gained from the presented experiment was that quality of the data is a necessary condition for reasonable predictions. Table 1 presents that only two of the shapes in the training set had low covariate shift as compared with the testing data set. This is likely toward the lower bound on what can be utilized to maintain accurate predictions. Those wishing to utilize this methodology should therefore ensure that their training dataset contains an adequate amount of data similar to the shapes they wish to predict for. Applications where this is already naturally the case can be found under the concept of “mass customization,” where similarly shaped products are produced with small custom differences introduced per customer specifications. These might include the 3D printing of retainers, custom footwear, and medical implants among many other fields. The methodology for determining shape similarity based on covariate shift of the presented predictor variables might be utilized across other shape deviation modeling methodologies for which these conditions are significant to ensure the sufficiency of training data.

Future work might focus on a number of areas. First, incorporating information regarding the overall topology to further improve the prediction accuracy might be worthwhile. Second, new predictor variables based on local surface geometry might be added in future studies. Predictor variables for this study were developed using an empirical approach based on domain knowledge. A more rigorous mathematical selection process might be examined in the future work. If new predictors are developed, they can be evaluated using the procedure given in Sec. 4.2, and if shown to be effective, easily added to the methodology. Finally, modeling methodologies that incorporate spatial autocorrelation might be investigated as a means for improving accuracy.

## Footnote

## Acknowledgment

This research was supported by the National Science Foundation (NSF) (Grant No. CMMI-1544917) and by a graduate research fellowship from the Rose Hills Foundation.

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.