## Abstract

For composite laminates, a rising R-curve is observed for their fracture toughness under Mode I stress, which is important for a comprehensive failure analysis of the materials. Since it is laborious to measure the R-curve due to its dependence on both the load and the crack extension, we put forward a novel compact tension specimen by modifying its geometry to eliminate the relation between fracture toughness and crack extension, so as to simplify the experimental process of the R-curve measurement by only recording the load history. Two machine-learning models were developed for the optimum sample design based on the finite element analysis of the effect of sample geometries on the R-curve. A simple neural network model was built for designing tapered specimen and a reinforcement learning model was created for further finding the best design from a broader design space. The results showed that, in contrast to the specimens with a tapered shape, which only ensure the independence between the R-curve and crack extension in the case of a small extension, the design provided by the reinforcement learning provides such independence across a wider range of crack length and an improved accuracy.

## 1 Introduction

Fiber-reinforced plastics (FRP) are widely used in aerospace engineering as structural materials. An important research direction in the field of the mechanics of composites is how to evaluate the damage initiation, damage evolution, and crack propagation of composite materials. A typical progressive damage feature is shown as the load is redistributed in other constituents or plies once failure occurs in a certain part of composite laminates. In this context, researchers have proposed the continuum damage model that has been proven very effective in revealing the failure mechanism in both static and dynamic loading cases [1–3]. While lots of the research effort has been taken on the criteria of damage initiation [4–6], the post-failure model has received relatively less attention. Among others, the most widely-used model is the fracture energy-based model. With a pre-assumed strain-softening law, the damage variable under certain stress conditions can be determined and stress components are then changed accordingly to describe the post-failure behavior [1,7,8]. While researchers have proposed different shapes of the strain-softening curve for considering various features of materials, such as the bi-linear model for brittle interfaces [9], the tri-linear model for fiber bridging [10], and the exponential model for nonlinear softening of materials [7], the area under these curves of damaged response is always equal to the fracture energy. It suggests that fracture toughness, both trans-laminar and inter-laminar, plays an important role in the damage modeling of composite laminates.

The fracture toughness, or the critical energy release rate (ERR) $Gc$ can be thought of as the energy barrier of an additional infinitesimal increment of crack extension, which is an intrinsic property of material. While for perfectly brittle solids like ceramics, the value remains a constant. However, for ductile fracture, the crack grows in a way that leads to accelerated energy dissipation due to the increasing size of the plastic zone [11]. In FRP composites, the rising trend is also experimentally founded [12–15]. Two mechanisms are ascribed, that are the increasing size of fracture process zone and the rougher fracture surface leading to the additional energy dissipation as the additional area of free surfaces in generated. Even though ASTM developed a standard practice (E1820-18) for determining R-curve to accommodate the widespread need for this type of data, the experiments are challenging to perform due to the requirement of exact capture of crack extension. The compliance method which indirectly predicts the crack length by measuring the continuously changing compliance of the specimen is also presented in the ASTM standard. However, multiple loading–unloading processes or multiple specimens are required for obtaining one curve. Some alternative ways have been brought out by researchers to avoid this difficulty of real-time measurement of crack extensions. The modified compliance calibration method [16] determines a compliance curve from the range of discrete crack lengths under linear-elastic FE modeling rather than loading–unloading sequences. The compliance combination method [17], in which a strain gauge is put on the top surface for independently obtain the compliance during the test, allows calculating the crack length. The double compliance method [18] utilized a two-dimensional (2D) elasticity model for a more accurate compliance-crack length relation and directly used data on load–displacement curve to calculate compliance. While these above methods are theoretically extracting the crack length from structural compliance, the J-integral-based method, on the other hand, seeks the solution for crack resistance from the full-field measurement. The contour integral can be calculated by integrating the 2D digital image correlation technique into standard fracture test [19]. Some researchers managed to find alternative test methods instead of the traditional double cantilever beam or compact tension (CT) tests. Size-effect law of the laminates, tested on cracked laminates, was used for deducing the R-curve based on a simple linear-elastic fracture mechanics (LEFM) model [20,21]. This approach has been effectively applied on both Mode I and Mode II loading cases. Similarly, machine learning-based model is proposed to indirectly predict the R-curve from the tensile strength of centre-cracked [22]. The connection between the strength value and R-curve was provided by a finite fracture mechanics model (FFM). These two methods completely eliminate the need to measure crack length, but rely on the accuracy of the size-effect law or FFM model. Meanwhile, data reduction is not straightforward compared to the standard tests.

The geometries of fracture test specimens can be an influencing factor for the fracture behavior. It has been widely studied that for CT tests, different geometric parameters leads to the different failure mechanisms in the loading process. Many researchers performed specimen design with the objective to determine the appropriate geometry to ensure crack extension for fracture toughness characterization before the specimen fails by any other damage mechanism. Typical geometries include the extended compact tension [23], the widened compact tension [24], the tapered compact tension (TCT) [24], the doubly-tapered compact tension (2TCT) [25], and the notched curved compact tension [26]. Among them, the configurations of TCT and 2TCT specimen, proposed with the aim of reducing the compressive stresses generated in the specific area of the specimen which might cause failure due to vertical compressive fiber breaking. This shows that the geometries can play an important role in regulating the stress distribution. Since the fracture toughness in LEFM is equivalent to the extent of stress concentration, it inspires us whether it is feasible to realize the independence of crack extension by modifying specimen geometries. In the field of self-healing polymer composite, a special fracture specimen with similar contour shape as that of the TCT called tapered double cantilever beam (TDCB) is widely used for evaluating the healing efficiency [27,28]. The theoretical analysis [29,30] shows that with the linear tapered shape, the compliance changes approximately linearly with the extending crack, thus resulting in a constant value of compliance rate change determined by the height of the cross section and consequently the no need of measuring crack length. However, such independence only works for a certain range of crack length, making it inappropriate for determining the whole R-curve involving a relatively wider range of crack extension. In this context, we are going to perform a throughout analysis of the geometries of CT specimens regarding to its effect on the R-curve measurement. We aim to provide an optimum design of the contour shape of CT specimens without restricting to the linear tapered shape. Since a vast design space may be explored by the indefinite shape, a machine learning algorithm combined with the finite element model (FEM), especially the reinforcement learning (RL), is employed as a search tool for finding the optimum design ensuring long-range independence of crack extension.

This paper is organized as follows. The finite element analysis was performed on both CT and TDCB specimens to identify the formulation of the problem and validate the numerical model in Sec. 2. In the following section, the effect of the geometries of specimens was analyzed in detail, with a particular emphasis on the contour shape of the tapered line and quadratic curve. In Sec. 4, a RL model was constructed for the best possible design of a CT specimen with a larger range of crack extension, after the disclosure of the geometric effect’s mechanism. Then the results and discussion about the optimized design are presented. The paper is concluded with some conclusions in Sec. 6.

## 2 Theoretical and Finite Element Method Analysis of TDCB Design

The $f(a/w)$ given by the FEM agrees well with that by Eq. (2) in ASTM E1820 for CT specimen, indicating the effectiveness of the FEM model for evaluating the data reduction of the Mode I fracture toughness. For the TDCB specimen, with FEM predictions as the reference, the results by different beam theories are compared in Fig. 3(c) including Euler beam theory of Eq. (5), SBT of Eq. (6), corrected effective length (CEF) of Eq. (7), and our analysis considering the varying height profile of Eq. (19). The results clearly show that while all the models provides values close to FEM when the crack is relatively small ($0.2\u2264a/w\u22640.5$), only our model predicts a sudden rising trend with the increasing crack ($0.5\u2264a/w\u22640.8$). This comparison suggests that the beam root rotation contributes more significantly to the load-line displacement when the crack approaches the boundary since the first term in Eq. (19) whch involves the Timoshenko beam theory decreases with longer crack. It can be found that the $f(a/w)$ value is monotonically decreasing for Euler, SBT, and CEF analysis due to that the continuously increasing height $h(x)$ is accounted rather than a uniform height $h0$. However, the way in which the height profile is considered is different from our model. In our model, $h(x)$ is included initially in solving the bending governing equations. In Euler beam model of Eq. (5) and SBT model of Eq. (7), $h$ is first assumed to be a constant when deriving the compliance, thus these two equations provide lower $f(a/w)$ values as the uniformly changing height is accounted instead of the real-tapered shape. For the ASTM CT specimen, the $f(a/w)$ value continues to increase since the $K$ should obviously increase with growing crack. The TDCB specimen was designed to counter the increasing trend by making the specimen stiffer with a changing section. However, the analysis by Euler and SBT beam theory would overestimate the counter effect since they consider a uniformly changing section instead of the real tapered shape.

For the CEF model where beam root rotation is introduced as an extended crack length, results demonstrate that the correction is good enough for smaller crack but fails in accuracy when crack extends near the boundary. In our model, the correction of beam rotation is dependent on the un-cracked ligament $(w\u2212a)$, which gives rise to the prediction of larger beam root rotation as the remaining ligament goes small. Even though our model shows a same trend as the FEM reference, there is still discrepancy especially for the part after the sudden increase. Such error might come from the elastic foundation model for calculating the beam root rotation where the stiffness of the foundation and the height profile were simplified to reduce the complexity. In the following Fig. 4, the contribution of the two terms in Eq. (19) is shown. The first term refers to Timoshenko beam theory, while the second term relates to the rotation of an elastic foundation. It can be found that the contribution of the term of rotation is negligible for small cracks but significant for large cracks. As a result, the simplification of not taking the varied section into account while considering the rotation of elastic foundation would only affect the part of curve after sudden increase. The simplification would lead to larger increase as the uniformly changing section causes a more significant change of the rotation. This can be validated by comparing the two curves of the FEM and Eq. (19) in Fig. 3(c). Since the main point of the paper is to investigate the shape effect of the specimen on the scenarios of large crack extensions, therefore the FEM results by VCCT technique is utilized as our references.

## 3 Geometric Effect in Mode I R-Curve Measurement

It can be inferred from Eq. (1) that, if the specimen is properly designed to keep a constant value of $f(a/w)$, then the measurement of fracture toughness, especially the crack resistance curve would be easy to implement without the need of capturing the crack lengths. Given that the specimen’s susceptibility to fracture will rise as the crack propagates, it is obvious that this increasing trend on the $f(a/w)$ curve is unavoidable. However, if the specimen can be designed to be stiffener as crack extends, it would be possible to weaken the trend. To this end, in this section, we will conduct analysis on the geometric effect of specimen, or effect of the height profile, on the $f(a/w)$ to discuss the optimum design of specimens for Mode I R-curve measurement.

### 3.1 Tapered Design.

Effect of the tapered shape of TDCB specimen on $f(a/w)$ was investigated here. On the basis of the geometry of standard CT specimen where $h0=30mm$ and $w=30mm$, several different tapered shapes were represented by the varied height of the right-hand end $l$. The results are shown in Fig. 5(a) for a total of five different $l/h0$ from 0 to 2. Compared to the standard CT specimen, where the ratio $l/h0=1$, such tapered design by only varying $l$ at the right-hand end did not make much difference except for the shape of $l/h0=0$. When the $l/h0<1$, the $f(a/w)$ value is higher than the standard CT specimen since the decreasing height leads to a larger value of compliance. But for $l/h0>1$, the $f(a/w)$ is nearly the same. Since the main focus of the design is to ensure a constant $f(a/w)$, the ratio of maximum and minimum values of $f(a/w)$ with $a/w$ in the range of [0.3,0.6] is extracted to evaluate the performance. The closer the ratio to 1, the better the design. As illustrated in the bar plot of Fig. 5(a), all the max/min ratios are over 2 for $l/h0=$1, 1.5, and 2 and up to 2.9 for the case of $l/h0=0$, indicating the tapered design can hardly meet the need of constant $f(a/w)$ value when $h0=30mm$.

Then, we changed the $h0$ to 15 mm as shown in Fig. 5(b), which is similar to the recommended TDCB shape. The same trend can be found that the function values become higher as the $l/h0$ increases. While for both cases the height at the right-hand $l$ make slight difference with $l/h0$ greater than 1.0, a substantial influence is shown with the change of the $h0$. Compared to the shape of $h0=30mm$, the max/min ratio of $f$ function values is significantly reduced to 1.5. We further compared the two cases of $h0=15mm$ and $h0=30mm$ with a same $l=30$. The two curves agrees well with $a/w>0.6$, but shows a clear discrepancy when $a/w<0.6$ as the initial $f$ values are higher for $h0=15mm$, thus resulting in a smaller max/min ratio.

Therefore, two hints can be concluded for making the geometry design:

The $h0$ should be small to ensure a higher initial value of $f(a/w)$.

A positive slope of the tapered shape is needed for the postpone the increasing trend of the $f(a/w)$ thus making it remains small values at the desired range of crack extension.

### 3.2 Quadratic Curve.

Despite that the tapered shape is able to provide a good design for the crack size of $0.3<a/w<0.5$ compared to the standard CT specimen, the $f$ function will still rise significantly for $a/w>0.5$. Inspired by the two hints concluded by the simple tapered design, we further explored the shape of quadratic curve for considering the more possible shapes with small $h0$ and increasing height profile.

The shapes with quadratic curves and the corresponding results are presented in Fig. 7. A total of seven curved shapes were considered and determined by the $yC$ of the controlling point. The value of $yC$ for the tapered shape is 15.5. Then, two convex curves and five concave curves were included here. Interestingly, the convex and concave shapes showed distinct feature on the $f(a/w)$ curves. At the beginning of the crack extension where $0.3<a/w<0.5$, the convex shapes ($yC>$15) close to the tapered shape were clearly better as the increasing height prevents the ERR from rising. A slight increasing trend can be found for tapered shape as shown in Fig. 5(c), which was alleviated by the convex shapes. However, when the crack continues to propagate where $a/w>0.5$, the concave shape is preferred as smaller variance was found. It can be inferred from the theoretical analysis of TDCB that the majority of the compliance is contributed by the root rotation. Therefore, the smaller variance on the $f(a/w)$ might be due to the suppression of the root rotation by the concave–curved shape at the ends. The $f(a/w)$ values for the concave shape of $yC=11$ keeps stable for cracks at $a/w=0.65$ while significant increase has been found early at $a/w=0.55$ for convex shapes.

Compared to the Max/Min ratio of 1.12 for the tapered shape, the minimum value of this ratio is 1.09 for $yC=16$ for the region of $0.3<a/w<0.6$, suggesting that this quadratic curved design can achieve better performance than simple tapered shape with the specific range of crack lengths.

### 3.3 Combination of the Convex and Concave Curve Designs.

Since the convex and concave quadratic curved shapes are preferred for the small and large crack extensions, respectively, in this section, the combined shape design of the convex and concave curve was investigated in order to take both of the two advantages.

Six different shapes by combining the convex and concave curves were compared. As presented in Fig. 8(a), the shapes were determined using the spline approach containing four controlling points, i.e., $(35,yD)$, $(40,yE)$, $(52.5,yF)$, and $(65,yG)$. The four variables of the *y*-axis coordinates were chosen to ensure a convex-to-concave transition on the height profile. The results of $f(a/w)$ of the six shapes are plotted with two curves of convex ($yC=16$) and concave ($yC=11$) as the references. It can be seen that, with the chosen parameters, the combing design strategy is able to generate intermediate curves of $f(a/w)$ that lie between the typical curves of convex and concave designs. The results of the six shapes are clearly divided into two groups. For $yD=11$ and $yE=11.5$, which shows a small curvature for the initial convex shape, the function curves exhibits a slightly higher initial value and a rising phase. Compared to the fully convex shape denoted by the black dash line in the figure, the incorporation of concave shape suppresses further increasing and make the curve maintain around a certain value. For $yD=13$ and $yE=14$, which shows a relatively large curvature for the initial convex shape, a different type of curve path was presented. No obvious initial rising is found and the curve starts to go up as $a/w$ is over 0.5. As the slope brought by the convex shape is higher which is controlled by $yF$ and $yG$, the rise in the curve can be hindered more significantly. Among these six shapes, the smallest Max/Min ratio is 1.13 from the Spine(13,14,24,40) for $0.3<a/w<0.6$. However, the smallest Max/Min ratio is 1.23 from the Spine(11,11.5,24,40) if $a/w<0.65$ was considered. Therefore, it suggests that by choosing the appropriate shapes of the spline, a balance might be obtained between the two groups which may lead to a more consistent curve throughout the larger range of crack extension. With such minds, we then manage to find one spline design with better performance. By using a trial and error method, the shape of Spline(11,12,29,50) was found to show both of the advantages of convex and concave designs, which are the stability at the onset of crack extension and a postponed increasing starting point of $a/w>0.6$. With this shape, the Max/Min ratios are reduced to 1.06 for $0.3<a/w<0.6$ and 1.15 for $a/w<0.65$.

In the process of iteratively modifying the shapes of the spline using the four *y*-axis coordinates of controlling points to search for better results, it is observed that there is no significant improvement when continues to increase the $yE$ and $yG$ which are supposed to lower the Max/Min ratio for $a/w<0.65$. This may due to that the adopted spline generating method only causes a tiny change on the shape around the controlling points, as shown by the enlarged figure in Fig. 8(a). Therefore, we adopted a convex curve and tapered line combination design as illustrated in Fig. 8(b). It is shown that the tapered line between controlling points (40, $yE$) and (65,$yG$) is more favorable than the concave curve. Compared to the curved design, the influence of the $yG$ was more remarkable for $yG$ at smaller values. With the same convex shape at $30<x<40$, the tapered design of $yG=40$ offers the Max/Min ratios of 1.03 for $0.3<a/w<0.6$ and 1.12 for $a/w<0.65$, which is better than the concave design of $yG=50$. The smaller value of $yG$ is very important considering the material consume and fixture adaptability of the CT specimen.

## 4 Reinforcement Learning Model for Layout Design

The above analysis indicates that the best design for a constant $f(a/w)$ value is possibly hidden in the huge design space with irregular shape. Among the strategies for solving the inverse problems in design without investigating the entire design space such as heuristic algorithms [33], gradient-based topology optimization [34], and generative networks-based schemes [35,36], the RL algorithms do not rely on the prior knowledge nor large amount of initialization samples, are not likely to yield an local minimum. The RL method can select the actions that maximize the profit by iteratively interacting with the operation environment. In this section, we proposed a novel reinforcement learning system for the layout design of CT specimen based on off-policy Q-learning structure with the FEM as simulation environment.

The major challenge by using the RL on solving the inverse design problem is the proper expression of the problem by the Marov decision process and the action-state update strategies. To demonstrate our proposed RL based design system, the architecture of the model is shown in Fig. 9. The key components in the proposed intelligent design systems include:

State and action. The design problem is converted to the Marov decision process in which the agent would go through seven states and is trained to execute one action from a pool of four possible actions. At the beginning, the design domain is assigned (e.g. $w=51$, $y0=10$ as illustrated in the figure) and then the seven states refer to the seven controlling points where $x=35,40,45,50,55,60,65$. The action is to determine the

*y*-axis coordinate at each state by introducing a variable of $m$ denoting the $y$ increment. The action space is $m$ choosing from [1,2,4,8] with $yi+1=yi+m$ to ensure the increasing trend on the height profile. The shape design in one training episode can then be identified with the completion of this sequential drawing process.- Q-learning algorithm. The core unit of the RL is the decision maker which gives Q-values representing the expected total rewards of taking the corresponding action. By searching the position of maximum Q-value, the next action for the new state can be determined. The Q-values comprise a tabular collection of data which is updated after each training episode to approach the feedback from the FEM simulation. In this work, the simplest form of $7\xd74$ Q-table is used with states as rows and actions as columns. The values are updated via off-policy based on the following Bellman equation.where $Q(s,a)$ is the current Q-value, $Qnew(s,a)$ is the updated Q-value, $\alpha $ is the learning rate, $R(s,a)$ is the reward, $\gamma $ is a number between [0,1] and is used to discount the reward as the step progresses, valuing rewards received earlier higher than those received later. Besides, an epsilon-greedy exploration strategy is used to balance exploration and exploitation. During training, the agent selects a random action with probability $\epsilon $ and selects the action with the highest Q-value with probability (1 - $\epsilon $).(21)$Qnew(s,a)=Q(s,a)+\alpha (R(s,a)+\gamma maxaQ(s\u2032,a\u2032)\u2212Q(s,a))$
Reward function. The Q-values are updated based on the rewards received for the action taken, even if it is not the optimal action according to the current policy. For our problem, the reward function $R(s,a)$ is calculated by the Min/Max ratio of $f(a/w)$ value from the FEM simulation to accommodate to the goal of maximize the cumulative reward.

## 5 Result and Discussion

### 5.1 Training Performance and Parameters Tuning.

The whole training process was written in a python script and executed in the abaqus environment as the reward was generated by the post-processing of the finite element analysis. It is also feasible to train a surrogate machine learning model to replace this numerical analysis. However, the time efficiency would be a problem as a large dataset is normally required to ensure the accuracy of the surrogate model. In the current study, a pre-processing script was also included to automatically build finite element models, apply boundary conditions, and run solvers according to the shape design provided by the trajectory in RL model. Therefore, the training process contains loops of trajectory chosen according to policy, finite element analysis, and Q-value update. The training process ends after a given number of episodes.

The whole design process shown in Fig. 9 has a sparse reward problem. The reward is obtained after all the seven steps are finished instead of a prompt feedback. Therefore, the parameters of the RL model were studied to adapt to an effective and efficient training process. The parameters in the RL model include $\gamma $ for discount factor, $\epsilon $ for greedy policy, and $\alpha $ for learning rates. A total of four cases were investigated with the parameters shown in Table 1.

Case | Discount factor $\gamma $ | Greedy probability $\epsilon $ | Learning rates $\alpha $ |
---|---|---|---|

1 | 0.8 | 0.8 | 0.1 |

2 | 1.0 | 0.8 | 0.1 |

3 | 1.0 | 0.8 | 0.2 |

4 | 1.0 | 0.9 | 0.2 |

Case | Discount factor $\gamma $ | Greedy probability $\epsilon $ | Learning rates $\alpha $ |
---|---|---|---|

1 | 0.8 | 0.8 | 0.1 |

2 | 1.0 | 0.8 | 0.1 |

3 | 1.0 | 0.8 | 0.2 |

4 | 1.0 | 0.9 | 0.2 |

The training performance of each case is shown by the curve of reward-episode relation, as shown in Fig. 10. Unlike other optimization method like the genetic algorithm, the RL model does not monotonically improve the design quality with more sample numbers. Some perturbations can be observed in the curves due to the $\epsilon $-greedy policy adopted in the training process, leading to a randomness in choosing actions at each step. It can be seen that the best performance was found from the case2 and case3 where the averaged reward reaches it historical best with the training episode increases. The best reward can be up to 0.97 after 180 samples in case3. This sample number is very impressive as 180 is only 1% of all the 16,384 possible action trajectories of the proposed Markov decision process, showing the high efficiency of the potential of RL model in designing structures. However, it is also observed that the choice of the parameters of RL model significantly impact the performance.

The discount factor $\gamma $, as presented in the Bellman equation of Eq. (21), allows to model the agent behavior with respect to future rewards. Generally, when the discount factor is low, the agent becomes greedy and choose actions that yield the maximum immediate reward. This is because, when the cumulative reward is calculated, future rewards will have a limited impact due to their heavy discounting. Besides, it also impacts the convergence of the algorithms. In problems of finite horizons (or time-steps), discount factor can be safely set to one. However, in problems of infinite horizons, the discount factor must be set lower than 1, otherwise convergence is not guaranteed. It can be seen that with the discount factor of 0.8 in case1, the model fails to discover better designs in the given training episodes. By adopting the lower value of discount factor, the Q-values for at the first two actions were one order of magnitude smaller compared to that of other cases. This may lead to the improper shape design at the beginning stage, which significantly impacts the reward according to previous analysis.

The $\epsilon $ in the greedy policy marks the tradeoff between exploration and exploitation. The higher the value of $\epsilon $, the more possible that we chosen actions according to the current policy. This causes issue in exploration as we can get stuck easily at a local optima. Here, we only discussed the influence of the fixed value of $\epsilon $. However, some adaptive greedy policies have been found to make a better balance at both the initial and subsequent training episodes. Comparative study between case3 and case4 shows that the $\epsilon $ may also greatly influence the training performance. Higher value of $\epsilon $ resulted in smaller deviation but failed to discover the better design even tough the RL model somehow has found several good design at the episode of 10.

The learning rate is the magnitude at which a leap is taken in determining optimal policy. In terms of simple Q-learning, this refers to how much the Q value is updated with each step. When training an RL agent, the learning process should have a visible impact on some metric of problem solving. If the learning rate is set too high, the periodic evaluation score will fluctuate. However, the Q value will change significantly after each optimizer step, which may prevent it from converge to an optimum. Therefore, a trail-and error study on the learning rate should be performed with attention paid on the changes of the reward. In our study, RL model with the learning rate of 0.1 can find optimal design faster than that with learning rate of 0.2. It has to be noted that the chosen parameters are by no means optimal. However, they are tuned within a certain range based on their mathematical meaning in the RL algorithm.

### 5.2 Optimum Shape Design.

The final proposed shape of CT specimen via the RL model and the corresponding $f(a/w)$ curve are shown in Fig. 11. According to the finalized Q-table, the positions of the controlling points of the optimum shape are $y=12,13,17,25,33,41,49$, for $x=35,40,45,50,55,60,65$ respectively. It can be observed that the performance of the RL-designed specimen is better than the best output of the combined curve and line design in the last section. For the crack extension ranges in $0.3<a/w<0.6$, the Max/Min ratio is 1.02. The error is less than 2% by replacing the *f*(*a*/*w*) with a constant number of 15.6 in calculating the fracture toughness in Eq. (1). This improved performance may comes from the enlarged design space that were considered in the RL model. Some common features can be found as the slope of the height profile is always positive and gradually increase from the initial low value. However, the RL model plots the transition area more thoroughly than the previous approach, which did not display. The existing design framework cannot prevent the growing trend in $f(a/w)$ for large cracks $a/w>0.6$, implying that the RL-designed shape is only effective for cracks $a/w<0.6$.

### 5.3 Validation by Virtual Testing.

To validate the proposed design of CT specimen for the measurement of Mode I fracture toughness, especially with the effect of R-curve, we built a virtual testing model in which the crack extensions were simulated via cohesive zone model. Two FEM models were created for the standard CT specimen and the proposed design respectively. The deformed models are shown in Fig. 12. The only difference of the model compared to the previous FEM analysis in Sec. 2 is that the whole specimen was modeled instead of the half considering symmetry. Then the properties of fracture toughness were imported by defining the cohesive contact behavior in the interaction module of abaqus.

Figure 13 presents the results for the fracture process of materials with a constant critical fracture energy. In this model, the base material was an isotropic material with $E=10$ GPa and $\mu =0.3$. The cohesive contact behavior was defined with the damage initiation being 60 MPa and critical ERR being $0.33N/mm$, referring to the values from Ref. [3]. A displacement control was utilized, with a total of 1 mm applied to each of the loading points over a one-second step period. The FEM models were solved by the abaqus/standard solver. To ensure the stability and the convergence of the crack propagation process, a damage stabilization factor of $1\xd710\u22125$ was assigned. The creep dissipated energy was kept to a tiny proportion ($<5%$) of the energy dissipated by cohesive damage, indicating that the damping factor value was chosen properly. The load–displacement curve at the load-line was shown in Fig. 13(a). For the ASTM standard CT specimen, a progressive decrease in load was observed load following the commencement of the crack propagation, which is the typical pheromone of the Mode I fracture of brittle material. This decrease in load was counterbalanced by the growing values of $f(a/w)$ in Eq. (2), thus making a relatively constant critical ERR calculated by the data reduction method in ASTM E1820 standard. The critical ERR values for CT specimen, as shown in Fig. 13(b), were in good agreement with the material input of $0.33N/mm$, indicates the effectiveness of the FEM model as the virtual testing method. The higher values at the initial stage were resulted from the damage stabilization. For the specimen designed by RL, the load-displacement curve shows that first, the structural compliance and failure load is different from that in CT specimen. The load remains nearly unchanged after the onset of crack extension at a lower load value. More importantly, by adopting the constant $f(a/w)$ value of 16, the RL-designed specimen is able to provide as good results of the critical ERR as that by ASTM standard CT specimen. The constant $f(a/w)$ means that the measurement of crack extension is avoid and the data reduction process is simplified. Moreover, the trend of crack resistance curve can be simply observed from the load history.

## 6 Conclusion

In this paper, we have proposed a novel geometric design of compact tension specimen to simplify the measurement of Mode I fracture toughness. The data reduction of the current commonly used CT specimen requires the concurrent determination of both loads and crack extensions, which greatly increases the complexity of experimental operations. We revisited the widely-used ASTM compact tension and TDCB specimen, and aimed to avoid the measurement of the crack lengths through the shape design of the Mode I specimen.

First, by performing a theoretical analysis on the deformation of the specimen, it is claimed that the former analysis on the TDCB design fails to provide accurate fracture toughness especially when the crack propagates to a large size ($a/w>0.5$). The error mainly comes from the neglect of the beam root rotation and the varying height profile.

Second, by taking a throughout study on the geometric design, the independence of the fracture toughness on the crack extension is minimized with the height profile outputted by the machine learning model. It is validated by the virtual testing method that the proposed design shows better performance compared to the CT specimen in ASTM standard and the widely-used TDCB specimen in the literature.

Finally, with the proposed specimen, the experiments for determining the crack resistance curve can be done in the following way:

Cutting the specimen into our proposed shape shown in Fig. 11.

Pin loading for stable Mode I fracture.

Recording the load history. The history of critical fracture toughness can then be calculated by replacing $f(a/w)$ in Eq. (1) with a constant value of 15.5.

If the curve of crack resistance versus crack extensions was desired, capture one ($\Delta a(i)$, $K(i)$) at any crack extension. Then, the complete R-curve can be fitted by $Kinitial$, $Kpropagated$, and ($\Delta a(i)$, $K(i)$) with the form of Eq. (22).

Fracture toughness, especially R-curve, is an important parameter for the evaluation of damage tolerance of composites. Deeper research has been conducted on the analysis of extreme ambient conditions, loading rate dependency, fatigue, and aging. This brings challenge to the experimental measurement of fracture toughness as it is not an easy task even under normal experimental conditions. The current work simplifies the test procedure without introducing any new equipment or techniques, which would provide insights into the experiments in the aforementioned situations. The work also shows the great potential of the machine learning approaches for the structural design with the ability to efficiently explore the vast design space while avoiding the problem of dropping into local optimum. Our ongoing further work extends the current ML-based design system on the problem of Mode II fracture toughness measurement.

## Acknowledgment

This work was supported in part by the Project of Hetao Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone (HZQB-KCZYB-2020083), the Department of Science and Technology of Guangdong Province (Project #: 2022A0505030023), and the international partnership program of the Chinese Academy of Sciences (Grant No. 025GJHZ2022103FN).

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

## Appendix: TDCB Compliance Analysis

- The continuity of displacement $u2(x=x0)=u1(x=x0)$(A3)$x0A1\u2212x0A2+B1\u2212B2=\u22126F(h0\u2212kx0)Etk3h0\u2212ln(h0)(6Fk2+24F+3Fk2v)2Etk3\u22122Fx03Eth03$
- The continuity of rotation $\varphi 1(x=x0)=\varphi 2(x=x0)$(A4)$A1\u2212A2=\u22126Fx02Eth03\u221212F(h0+kx0)+3Fk2(v+2)2Etk2h02$
- The boundary condition of displacement $u2(x=a)=0$(A5)$aA2+B2=ln(h0+ka\u2212kx0)(6Fk2+24F+3Fk2v)2Etk3+6F(h0\u2212kx0)Etk3(h0+ka\u2212kx0)$
- The boundary condition of rotation $\varphi 2(x=a)=0$(A6)$A2=6F(h0+2ka\u2212kx0)Etk2(h0+ka\u2212kx0)2$