Abstract
Powder bed fusion (PBF) is an additive manufacturing process in which laser heat liquefies blown powder particles on top of a powder bed, and cooling solidifies the melted powder particles. During this process, the laser beam heat interacts with the powder causing thermal emission and affecting the melt pool. This paper aims to predict heat emission in PBF by harnessing the strengths of recurrent neural networks. Long short-term memory (LSTM) networks are developed to learn from sequential data (emission readings), while the learning is guided by process physics including laser power, laser speed, layer number, and scanning patterns. To reduce the computational efforts on model training, the LSTM models are integrated with a new approach for down-sampling the pyrometry raw data and extracting useful statistical features from raw data. The structure and hyperparameters of the LSTM model reflect several iterations of tuning based on the training on the pyrometer readings data. Results reveal useful knowledge on how raw pyrometer data should be processed to work the best with LSTM, how physics features are informative in predicting overheating, and the effectiveness of physics-guided LSTM in emission prediction.
1 Introduction
Powder bed fusion (PBF) is one of the most popular techniques in metal additive manufacturing (AM) with a variety of applications [1]. It has become a popular manufacturing technique because of many advantages over traditional technologies: the layer-by-layer fabrication methodology, giving many freedoms on geometry design and capability of process adjustment; less material waste during the printing process; and significant reduction of printing time. Since the technology appeared in 1980s, PBF has developed different metal AM technologies and applications; it can be generally categorized by the power sources (laser or electronic beams) they use during printing.
Laser powder bed fusion (L-PBF) is a specific implementation of PBF that specifically employs laser technology as the heat source for melting and fusing powdered materials. For the L-PBF process, also known as selective laser melting (SLM), metal powder particles are selectively melted by a laser beam. As the laser scans the surface of the powder bed, it follows a predefined pattern to melt the metal powder particles, thus creating a thin layer of the final part. Upon cooling, the melt pool solidifies, fusing with the underlying layer to form a solid part. L-PBF with its powerful heat sources enables a range of capabilities for functional printing. These capabilities include rapid fabrication or prototyping, multimaterials, tunable material properties, and the ability to alter the fabrication of devices on metallic materials [2].
The increasing utilization of 3D printed components in industries such as aerospace, automotive, and medical has emphasized the criticality of reliability for these printed parts, resulting in a surge of interest in recent research. However, the expanding adoption of additive manufacturing technologies, particularly for complex geometries, introduces challenges in metrology and quality control [3]. Ensuring the reliability and quality of L-PBF produced parts becomes crucial in meeting industry standards and requirements. To bridge the gap between the capabilities of L-PBF and the need for reliable and high-quality printed parts, researchers attempted to enhance the L-PBF quality by understanding the thermal behaviors and structural formation. It has been studied that the complex geometry designs and parameters lead to the instability of the melt pool [4]. The importance of identifying process thermal dynamics structure properties is addressed. It has been pointed out that process parameters such as laser power and laser speed can have impacts on the L-PBF process dynamics [5].
However, defect formation during the melting process is a significant concern in ensuring high-quality printed parts. There are various causes for the formation, such as insufficient energy input, localized heating and rapid cooling, and local vaporization in the melt pool, leading to common defect types such as overheating, porosity, oxidation, cracking, balling, and residual stresses [6]. Extensive research has been conducted on process optimization of SLM to mitigate these defects. Harrison et al. pointed out that laser power has significant impact on crack density [7]. Tan et al. explored the effects of varying laser linear energy ranges on sintering formability and performance [8]. Benoit et al. used a Q-learning approach to optimize the 3D printing process pipeline, focusing on improving sample preparation process by policy-based action controlled to improve the sample surface quality roughness [9]. They found that insufficient laser energy will eventually lead to balling, while molten pool’s flowability and wettability are decreased due to the formation of incomplete melted powders. Furthermore, previous experiments indicate that powder particles melt by the laser beam and will generate a series of phenomenon with heat transferred given different process parameters. In addition, emission generated through this process could serve as a more accurate indicator of overheating than temperature readings, as the acquired temperature data can potentially be lower than actual temperatures [10]. In summary, existing literature emphasizes the significance of overheating in SLM due to its detrimental effects on material integrity, dimensional accuracy, desired microstructural characteristics, and the occurrence of common defects such as porosity, oxidation, and cracking. Further analysis of overheating is warranted to enhance the overall quality of SLM-printed parts.
Generally, overheating is the phenomenon when the melt pool’s temperature drastically rises. Related efforts to address overheating in L-PBF can be divided into two categories: modeling methods and in situ monitoring methods.
Modeling methods to address overheating in L-PBF can be summarized in two types. Some aim to design the heating process to improve the parts quality, while others employ simulation technologies to depict the melt pool status during the heating process under specific conditions. However, given the dynamic properties of the printing process, none of these methods can provide real-time analysis of thermal behavior. To address this issue, several studies have explored in situ sensing and process monitoring techniques for L-PBF. These include high-resolution imaging using high-speed cameras [11], online ultrasonic measurements [12], and X-ray tomography for dimensional measurement and porosity analysis [13]. Particularly, pyrometers can capture emission in the wavelength from the process laser area [14]. Given that high-speed pyrometry can monitor melt pool dynamics through emission and temperature, it provides an opportunity to identify inconsistent heat distribution across built layers. Recently, advancements in sensor technology, such as the use of pyrometers, give opportunities for developing innovative methods for in situ monitoring and prediction in L-PBF processes. Therefore, we aim to propose an in situ monitoring method which leveraged rich information in thermal emissions collected by the pyrometers during L-PBF.
The melt pool temperature serves as a crucial indicator of the process’s health. With in situ pyrometer measurement, adjustments to process parameters can be made either between or within layers, provided that data analysis is promptly completed [15]. Our study utilizes this flexibility and further segments pyrometry data for each printed layer into smaller sequential sections. The process parameters are then adjusted according to the previous series of sections thermal behaviors.
To meet the demand of analyzing complex sensing data, a rising popularity of data-driven methods is leveraged for in situ monitoring. In a recent review, Razvi et al. discussed the emerging applications and research in monitoring AM processes and measuring AM materials. Three-dimensional printed parts have become increasingly commonplace, and increasingly precise, and machine learning (ML) has been used as a tool for understanding AM processes at a fundamental level and identify predictive recommendations to optimize part quality and process design [16]. However, ML methods lack interpretability and understanding of the underlying system.
To tackle this problem, there are increasing interests and research efforts to integrate physics features with ML models, thereby enhancing the data-driven models’ comprehension of physical properties. Scime and Beuth [17] developed a method for the classification of melt pool morphology using computer vision techniques. They employed various feature extraction methods, including bag-of-words, scale invariant feature transforms (SIFT), and histogram of oriented gradients (HOG) on data collected from a high-speed visible-light camera. A k-means clustering algorithm was applied to identify in situ melt pool signatures. While combining ex situ observed flaws indeed improved prediction accuracy, it fell short in providing real-time guidance to the printing process. Yang et al. utilized a convolutional neural network (CNN) to classify the melt pool size, acknowledging its impact on overheating, but they did not quantify overheating directly [18]. Yuan et al. designed a CNN to predict L-PBF track properties based on in situ melt pool video data and circumventing time-consuming ex situ measurements [19]. Ye et al. developed a classification method involving deep belief networks based on plume and spatter image data, for the in situ monitoring of melted states [20]. However, their attempt to use the extracted features from the image samples did not turn out well as the classification rates were even lower than to use the original images as inputs, addressing the importance of a proper feature extraction method. Such a method not only improves the interpretability of the model from the extracted features but it also bolsters model training efficiency. Existing research underscores the importance of identifying suitable physical features for defect detection, often employing ex situ data such as thermographic images [21,22]. However, it has been demonstrated that solely integrating physical features into ML models is inadequate for the timely and accurate analysis of thermal emissions in L-PBF. Recent research has been exploring physics-informed and physics-guided ML models, which emphasize on dealing with data from physical systems in the design and training of ML models.
While these two terms, physics-informed and physics-guided, are often used interchangeably, they can imply slightly different concepts. Physics-informed models typically refer to ML models that integrate physical laws or principles into their structure or training process. The goal of such models is to leverage existing knowledge about the system to improve the performance and interpretability of the model [23]. This can be achieved, for example, by encoding physical laws into the model’s architecture, customized loss function, or data preprocessing steps. Physics-guided models, on the other hand, could be seen as a more general term where the guidance of physical understanding to the system through different terms [24]. One of the physical understandings that has been explored is the sequential nature of emission during L-PBF, leading to a time-series ML model. For example, Mahato et al. discussed the sequential time-series nature of heat emission sensor data, and they explored the preliminary performance based on k-nearest neighbors (k-NN) classifications [25]. In his following work, more comprehensive distance measures were applied to the k-NN algorithm and improved the classification accuracy [26]. Gawade et al. built time-series regression models to predict overheating in L-PBF [27]. They found the impact of previous printed layers on the subsequent layers and assessed several regression models based on previous layer average emission. Their findings illustrate the potential of time-series data in improving predicting performance and understanding layer-to-layer interactions in L-PBF. Building on this foundation, our study further explores section-wise analysis in enhancing the early detection of defects. Our study aims to capture the inherent time-series characteristics from sequential pyrometer data, incorporating them as valuable features into the model. Furthermore, we extract additional informative features to improve the model’s interpretability and overall performance.
Analyzing sensor data through time series can be influential in finding potential patterns during the melting process and capture the melt pool signatures including heat transfer rates and thermal gradients. Representative statistical methods such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) are widely used to capture the dependency between an observation and several previous time-steps. Bisheh et al. proposed a layer-wise framework on the printed layer image data that controls the quality of process. ARIMA filters were applied to adjacent layers in removal of autocorrelation, with control charts mechanism for monitoring of printed layer quality [28]. However, classical time-series approaches require the stationary assumption that the observation statistical moments do not change over time, which could be very challenging for the disturbing phenomena in L-PBF [29], thus making the traditional statistical method less feasible. Montazeri et al. improved the detection accuracy on the occurrence of material cross-contamination with the implementation of a spectral graph theoretic approach [30]. They compared the method performance and found that it outperformed the traditional ARMA modeling. Although their research objectives are not necessarily related to overheating, these interesting works provide insights into the potential applications of time-series properties. In line with these insights, our paper also leverages the time-series property and uses it as feature to the deep learning prediction model.
Deep learning methods have garnered significant attention among researchers for analyzing complex sensing data obtained from pyrometers, particularly in the context of data classification and regression problems. Neural networks (NNs) have many variant models and can be applied to different application scenarios, effectively bridging the gap between AM processes, properties and performance with the input data properties [31]. Data-driven physics-informed features show adequate explainability to the NN models. Mao et al. leveraged information from melt pool temperature fields and photodiode signals, and built correlation models from several popular NN models [32]. In this study, we focus on pyrometry emission data, framing the emission prediction as supervised learning task. In this regard, long short-term memory (LSTM), a specific type of recurrent neural network, presents an attractive option due to its ability to remember past information and handle long sequences effectively [33]. Unlike ARMA and ARIMA, LSTM does not require the data to be stationary, making it particularly suitable for dynamic processes such as L-PBF. Pandiyan et al. developed a CNN-LSTM model for L-PBF process monitoring. The proposed model can analyze signals collected by a heterogeneous sensing system consisting of four sensors, namely back reflection, visible, infra-red, and acoustic emissions [34]. Zhang et al. proposed an LSTM-based approach for melt pool size prediction during L-PBF utilizing melt pool images [23]. Shi et al. developed an LSTM-autoencoder based approach for cyber-physical attack detection in additive manufacturing, utilizing sensor signals collected from side channels [35]. To summarize, a comprehensive comparison of the mentioned works that are related to overheating is presented in Table 1.
Summary of recent works related to overheating
Ref. | Sensor | Sensing data | Additional features | Output | Method | Comments |
---|---|---|---|---|---|---|
[17] | High-speed camera | Melt pool morphology image (false-color and coaxially transformed image of the melt pool, 1024 × 1024 pixels) | Flaws observed ex situ | Label of melt pool morphology | Classification, k-means, BoW, SIFT, HOG | Offline monitoring |
[18] | High-speed camera | Melt pool image (grayscale image, 128 × 120 pixels) | Melt pool size | Melt pool size | Classification, CNN | No direct quantification on overheating |
[25,26] | Pyrometer | Melt pool temperature (heat emission light in the range of 1500–1700 nm) | N/A | Porosity prediction | Classification, k-NN | Only consider time-series properties |
[27] | Pyrometer | Melt pool emission | N/A | Predict average emission for next layer | Regression, linear regression | Did not consider time-series properties |
[28] | Top-view camera | Layer-by-layer part image | N/A | Whether the printed object is qualified part or not | Classification, neural network (NN), support vector machine (SVM), gradient boosting classifier (GBC), exponential weighted moving average (EWMA), ARIMA | The prediction based solely on the visual appearance of the printed object can result in significant errors |
Ref. | Sensor | Sensing data | Additional features | Output | Method | Comments |
---|---|---|---|---|---|---|
[17] | High-speed camera | Melt pool morphology image (false-color and coaxially transformed image of the melt pool, 1024 × 1024 pixels) | Flaws observed ex situ | Label of melt pool morphology | Classification, k-means, BoW, SIFT, HOG | Offline monitoring |
[18] | High-speed camera | Melt pool image (grayscale image, 128 × 120 pixels) | Melt pool size | Melt pool size | Classification, CNN | No direct quantification on overheating |
[25,26] | Pyrometer | Melt pool temperature (heat emission light in the range of 1500–1700 nm) | N/A | Porosity prediction | Classification, k-NN | Only consider time-series properties |
[27] | Pyrometer | Melt pool emission | N/A | Predict average emission for next layer | Regression, linear regression | Did not consider time-series properties |
[28] | Top-view camera | Layer-by-layer part image | N/A | Whether the printed object is qualified part or not | Classification, neural network (NN), support vector machine (SVM), gradient boosting classifier (GBC), exponential weighted moving average (EWMA), ARIMA | The prediction based solely on the visual appearance of the printed object can result in significant errors |
Most research mentioned employs directly recorded temperature data as output. Gawade et al. instead used the uniformity of emission for each printed cube layer as the regression output [27]. However, these studies typically preprocess the layer-wise observation data. While this approach substantially reduces the data volume, it may result in the loss of information within each layer. This can be a critical limitation of the current in situ monitoring design, and the powerful flexibility and quick adjustment of SLM is therefore limited. Technologically, the process parameters can be altered even within a specific layer during printing, depending on the specific printer capabilities [29]. Therefore, the gap exists for more agile adjustment methods of process parameters, as compared to the existing layer-by-layer monitoring methods. This paper addresses this gap by proposing a method for section-wise in situ monitoring in additive manufacturing.
In this paper, a physics-guided ML method is proposed that incorporates the underlying physics of the problem across multiple aspects, including feature engineering, data segmentation and splitting strategy, performance evaluation, and hyperparameter tuning. Statistical characteristics of segmented sections are leveraged to enhance layer-wise observations typically employed in the L-PBF process. We give a detailed and applicable exploration into the smaller granularity of time-series properties, using pyrometer emission data to address the overheating issue. Our study focuses on the following objectives:
Propose an end-to-end pipeline to address overheating issue by enabling more granular in situ monitoring, which will eventually enable faster adjustment of process parameters.
Use statistical features extracted from serialized section emission data as input to our predictive model, effectively transforming these features into a measure of potential overheating.
Combine physics-based features and time-series property to enhance model explainability. A baseline LSTM model to predict the overheating issue during L-PBF is constructed, and an improved model with fine-tuned hyperparameters is presented.
Parameter tuning method is discussed to key hyperparameters, offering insights into their influence on model performance and providing a guide for optimal configuration.
With this approach, we offer a predictive perspective on managing overheating, and contribute to the optimization of the L-PBF process.
Remainder of the paper is as follows. Section 2 describes the data collecting process and the preprocessing method. Section 3 discusses the overall pipeline of the data processing on the emission data, LSTM models are proposed, and the hyperparameter tuning steps are illustrated. Section 4 discusses the results from LSTM models and compares the performance to the physics feature only model and statistical features only models. Section 5 provides concluding remarks and future directions.
2 Sensing and Data Collection
Our study uses the in situ pyrometry sensing data from the L-PBF experiment described in Ref. [36]. The experiment used an AconityMINI machine, equipped with a fiber laser source with a capacity up to 400 W and a spot size of 50 μm to print blocks from SS 316L powder. The process incorporated a specific scanning strategy, a raster scan for the interior, and a frame scan. The environment within the build chamber was carefully controlled, maintaining an Argon atmosphere with oxygen levels below 300 ppm and continuous recirculation to remove metal vapor and condensate, ensuring a consistent build environment. Total of 16 blocks were printed with different laser power and laser speed combinations, as shown in Fig. 1. Each block is 10 mm × 10 mm × 5 mm with a layer thickness of 30 μm resulting in 166 layers in each build. The scanning strategy comprised a raster scan to fill in the interior of the sample, followed by a frame scan conducted with the same power and speed as those of raster scans. Details of the experiment can be found in Ref. [36].
Two pyrometers were used to record emission readings at 100 kHz. To ensure the accuracy of measurements, a calibration value of 1310.72 bit/mm was set for the scanner and pyrometers to cover coordinate values (x and y) in the range of −219 to 219 bits covering a 400 × 400 mm area. The pyrometers were coaxial with the process laser measuring thermal radiation from the melt pool. The scanner was also setup to take one measurement every 10 μs, with every 10 μm of travel for a scanning speed of 1000 mm/s. These consistent setups and parameters help ensure repeatability in the data collection process. Point-by-point readings of thermal emissions directly at the melt pool were recorded. There are about 300,000 emission readings from each layer, varying from 240,864 to 397,391.
3 Method
3.1 Method Overview.
The proposed method consists of several stages to incorporate the understanding learned from physics, as illustrated in Fig. 2. This comprehensive approach ensures that the subsequent ML model is guided by a thorough understanding of the underlying physical process, enhancing both its interpretability and predictive performance. Raw emission data are cleaned using the pipeline built by Gawade et al. [27]. Cleaned data are then segmented into smaller pieces. Features are extracted from each segment of each layer to derive the additional informative dimensions on top of the original input space. Lastly, several LSTM structures are developed and compared, and their hyperparameters are tuned to find the optimal model for the best prediction performance. Results from different LSTM designs and feature combinations are compared. The emission predicted by our model can be used to track and monitor overheating issues, enabling future opportunities for process adjustment to optimally control the process.
3.2 Data Segmentation.
During sensing, noise points are generated when the beam stops printing, but the pyrometer continues collecting data. Noise points are first removed from the raw data after several rounds of cleaning, including radius-based clustering, visual inspection, manual override, and automation [27]. Cleaned data are then aligned by rotating the coordinates 20 deg clockwise. More details about this pipeline for data preprocessing are described in Ref. [27]. After preprocessing, all emission readings from each layer are organized in an array consisting of the emission reading at (x, y) location, ordered by time.
Data segmentation is performed on the cleaned emission data to reduce the dataset size, in an attempt to maintain a balance between computational burden and model performance while keeping the rich information of pyrometer data intact. Figure 3 illustrates the data segmentation process using one of the layers from one of the blocks as an example. As shown in the top figure, layer l is one of the 166 layers printed to form the block, and the rest layers were printed on top one after another. The middle portion of Fig. 3 displays the emission readings (about 370,000 readings) from block 1’s layer l, in the printing order. The first section of the emission readings is shown in the bottom left subplot of Fig. 3. Segmentation is performed following the emission reading sequence layer by layer, with each section of equal size (1250 readings).
After segmentation, each section is represented by the average emission in that section, and it is mapped to a new space using the average (x, y) coordinates of the section as its new location, as shown in the bottom right subplot of Fig. 3. The segmentation represents each original layer by around 300 sections in the new space. The bottom right figure in Fig. 3 is the heatmap of the average emission across the new space, with deeper colors indicating higher emission readings. The selection of section size is based on the physical understanding that it is sufficiently large to capture the spatial extent of potential localized overheating areas while being small enough to provide section-wise descriptive information and to effectively reduce the dimensionality of the data. The precise choice of section size, however, is not deterministic and may need to be adjusted based on specific print conditions, such as part geometry, material properties, or machine settings, as well as the quality of the available data. The section size of 1250 is chosen from preliminary experiments. When comparing sections of size 125, the overall training time will be 6 times slower, while the model performance does not improve significantly. In this way, we are able to reduce the original data scale by around 1000 times. Segmentation facilitates finer granularity in understanding the thermal behavior within a layer, which is particularly valuable for detecting and analyzing localized anomalies such as overheating.
3.3 Feature Extraction.
We propose to use both physical features and statistical features for the model inputs. Physical features refer to the predefined manufacturing process parameters before printing and the serial information generated during printing and data processing. Process parameters are inherently part of the L-PBF process, such as laser power, laser speed, energy density, and scanning phase. They are considered informative as inputs. These parameters directly influence the melt pool behavior and resultant part quality, serving as the primary descriptors of the process conditions, correlating with physical phenomena like energy absorption, melt pool dynamics, and layer-wise material deposition.
Moreover, sequential information such as layer number and section order are the features acquired in this specific dataset, and they are highly related to the order of printing. This information is intrinsically linked to the sequential and layer-wise nature of L-PBF. Layer number, for example, captures the cumulative thermal history and residual stress buildup, both of which are critical to the process outcome. We normalize the features into a common, comparable scale. This can help to improve the model’s performance. The physical features are listed in Table 2. The converted locations of the emission records in the new spaces are also included as key physical features.
Summary of physical features
Feature | Description |
---|---|
Laser power (W) | [120, 150, 180, 210] |
Laser speed (mm/s) | [600, 800, 1000, 1200] |
Energy density (J/mm3) | To ensure the sample with >99% density |
Layer number | 1−166 for each block |
Scanning phase (deg) | Periodic angles by layer |
x | x coordinate value in the new space |
y | y coordinate value in the new space |
Feature | Description |
---|---|
Laser power (W) | [120, 150, 180, 210] |
Laser speed (mm/s) | [600, 800, 1000, 1200] |
Energy density (J/mm3) | To ensure the sample with >99% density |
Layer number | 1−166 for each block |
Scanning phase (deg) | Periodic angles by layer |
x | x coordinate value in the new space |
y | y coordinate value in the new space |
Statistical features about each section of each layer are extracted and listed in Table 3. Extraction is performed among all the segmented sections of each layer. These features are selected according to their ability to explain the emission distribution across a single section, which can provide section-wise descriptive information compared to only using the average observed emission of each layer (which was used in the regression models in Ref. [27]). This information can provide an instructive view of the localized thermal behavior and energy distribution. They serve as a physics-guided method of capturing the section-wise variations and potential anomalies in the thermal process, contributing to the early detection of overheating.
Summary of statistical features
Feature | Description |
---|---|
Mean | Average emission value in each section |
Std | Standard deviation of emission values in each section |
Min | Minimum emission value in each section |
Max | Maximum emission value in each section |
q1 | 25th percentile of the emission values in each section |
q3 | 75th percentile of the emission values in each section |
Feature | Description |
---|---|
Mean | Average emission value in each section |
Std | Standard deviation of emission values in each section |
Min | Minimum emission value in each section |
Max | Maximum emission value in each section |
q1 | 25th percentile of the emission values in each section |
q3 | 75th percentile of the emission values in each section |
3.4 LSTM Model.
In order to predict emission for the next section or next layer using information from previous sections and/or previous layers, a model that is capable of handling sequential data is needed. Traditional modeling approaches that assume independent and identically distributed data may not be suitable for this type of dataset. LSTM is a popular recurrent neural network model in the time-series domain. Figure 4 presents the original internal architecture of LSTMs [37], which can be broken down into several gate functions. These gates work together to take care of the long-term dependencies between the current layer’s status and the previous ones. It has been found that previous layers’ emission values are correlated to the current layer’s because of the predefined physical features and the heat transfer characteristic across layers [27].
Two network structures are proposed in this paper, as shown in Fig. 5. Design 1 has three layers: an input layer, an LSTM layer, and an output layer. This is a standard LSTM network layout, and it is usually considered to have a good performance in other studies, as explained in Ref. [11]. The model is designed to make one-step ahead predictions using the previous multiple steps of records. The input taken at section i, zi ∈ Rn×d, contains the previous n sections’ information, including all the d = 14 physical and statistical features listed in Tables 2 and 3, and it is used to predict yi+1 ∈ R1×s, which is a vector of all s = 6 statistical features in the next section i + 1. A sliding-window approach is used to construct training data. The amount of overlap between consecutive windows is n − 1 as the window moves along the data sequence.
As plotted in Fig. 5(b), design 2 has two additional LSTM layers. Between each LSTM layer, the dropout regularization method is used to randomly ignore certain nodes during training, in consideration of the overfitting effect. The second design is constructed in comparison with the baseline model. In some practices, structures with an increased number of LSTM layers yield good performance among huge datasets.
3.5 Hyperparameter Tuning.
Hyperparameter tuning is important because no single parameter combination has been found to be superior to all others for all datasets. We consider a range of values for several selected hyperparameters to find the optimal model. There are many existing hyperparameter optimization methods, such as Bayesian optimization and grid search. Given the huge amount of data and the computational burden in this study, we follow the traditional tuning strategy [38] to tune each hyperparameter one at a time while fixing other hyperparameters. However, due to the large amount of hyperparameters and combinations involved, the brute-force way is of low efficiency. We therefore modify the strategy accordingly to improve tuning efficiency. In our approach, we focus on selected hyperparameters, and if a hyperparameter barely affects the experiment result, we stop tuning this hyperparameter in future iterations. We adopt the widely accepted Adam optimizer as it has been suggested by many other practices to be an effective optimizer. Specifically, the Adam optimizer has shown high tolerance to the values of the other hyperparameters, thereby facilitating the tuning process. The process of weight initialization is critical in deep learning as it influences both training speed and final model performance. The “He initialization” approach is adopted in this study. It sets the initial random weights of the layers in a way that helps overcome the vanishing/exploding gradients problem. It is designed for layers that use ReLU (or variants of ReLU) activation functions, and it aims to maintain a controlled and normalized variance of the activations and back-propagated gradients through the network layers [39]. This contributes to more efficient learning and better performance of deep neural networks. Though varying initializations may yield different local minimums post-training, using effective methods like He initialization, alongside ample training epochs, typically results in stable, reliable neural network performance.
The hyperparameter tuning process is summarized in Fig. 6. Considering both model performance and training efficiency, we start by tuning the computationally expensive hyperparameters on the relatively simple LSTM structure (design 1) and a smaller training dataset (block 1 data only), as shown in Fig. 6(a). Specifically, we note that the timestamp length is important due to the transitory period of heating at the beginning of a track and the end of the track [1], while datasets with a larger timestamp require significantly longer training periods. In order to capture this characteristic, we consider multiples of 5 for timestamp: find the best timestamp from τ ∈ {5, 10, 15, 20, 100, 150} in step 1.1. Given that there are approximately 300 sections in one layer, the choice of timestamps can be seen as a representation of the proportion of areas covered, allowing for the inclusion of previous information. Similarly, both batch size and hidden units affect model training speed. Step 1.2 uses the value and finds the best value for hidden units, , from hu ∈ {4, 6, 10, 12}. Step 1.3 then uses with to find the best value for batch size, , from bs ∈ {8, 16, 32, 64}. In this way, step 1 of the tuning process tunes the computationally expensive hyperparameters one at a time using the relatively simple structure of design 1 with block 1 data to help save model training time.

Flowchart of hyperparameter tuning: (a) tuning the computationally expensive hyperparameters, (b) tuning the computationally efficient hyperparameters, and (c) notations
After determining the optimal values for these computationally expensive hyperparameters (, , and ), we move on to tune the computationally efficient hyperparameters in step 2, as shown in Fig. 6(b). Since these parameters are relatively efficient, we implement a full factorial design involving all these hyperparameters and tune them using the more complicated structure of design 2 with the entire dataset (all 16 blocks together). The full factorial design involves four hyperparameters: activation function , learning rate lr ∈ {0.01, 0.02, 0.05}, dropout rate dr ∈ {0.2, 0.5}, and loss function lf ∈ {RMSE, MAE}, totaling 3 × 3 × 2 × 2 = 36 combinations.
In addition to tuning these hyperparameters in step 2, we consider three different data splitting strategies (“fixed,” “by dataset,” and “by layer”) for creating the train/test set split. For each strategy, to maintain the continuity of sequence, we construct a minimum cutting length lc as the multiples of the timestamp parameter we choose, to minimize the sequential information loss during the splitting. The “fixed” strategy splits at the block level in a fixed 80/20 proportion for train/test set, meaning that the first 80% of data in each block are used for training, and the last 20% in that block are used for testing. The “by dataset” strategy adds more randomness to the splitting by ignoring the block information. It cuts across the entire dataset into pieces of sequences with lc length, and randomly picks 80% of data to form the training set, leaving the rest to be the test set. The “by layer” strategy conducts a similar random splitting strategy as the “by dataset” strategy, but it cuts the sequences on each layer of each block, and then forms the 80–20 train/test data. This will end up with a more balanced dataset, while the advantage of randomness still stays. The data splitting strategy also derives its basis from the physics of L-PBF. Recognizing that the L-PBF process operates in a layer-wise fashion and the characteristics of each layer can depend on the previous layers due to residual heat and evolving material properties, the option to split data by layers and blocks while maintaining their temporal sequence can preserve the inherent dependencies and temporal patterns present within the L-PBF process data. Moreover, this strategy aligns with the practical operation of L-PBF, where the build platform is typically sectioned into separate blocks, each containing multiple layers. In such a context, this strategy ensures that the training and test sets are representative of the actual operational scenarios. Hence, it can serve as another key aspect of the proposed physics-guided approach, ensuring that the temporal and spatial dependencies inherent to L-PBF are properly accounted for in the model training and evaluation stages. By combining the two-step hyperparameter tuning process with the data splitting strategies, we are able to achieve maximal efficiency in the tuning process, while ensuring reasonably good model performance on the larger dataset.
4 Results and Discussion
4.1 Comparing With Benchmark Models.
To demonstrate the effectiveness of the proposed physics-guided LSTM model, its performance is compared with other state-of-the-art approaches, including CNNs, one-dimensional CNNs (1DCNN), multilayer perceptron (MLP), and bidirectional LSTM (Bi-LSTM). CNN, primarily designed for processing grid-like data such as an image, is known for its capacity to extract high-level features from local, fixed-sized patches, maintaining spatial relationships between different parts of the data. One-dimensional CNN, a variant of CNN, has been widely applied to sequential data due to its strength in capturing temporal dependencies within local input patches. MLP, on the other hand, is a type of artificial neural network composed of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Despite its simplicity, MLP has proven effective across a range of applications. Finally, Bi-LSTM, an extension of the traditional LSTM, operates on an input sequence in both forward and backward directions, offering advantages in applications where sequential information is crucial [40].
Two comparative analyses of various baseline models are conducted. Considering computational time, a set of default hyperparameters λc = {τ = 10, hu = 6, bs = 32, af = ReLU, lr = 0.005, dr = 0.5, lf | lf ∈ {RMSE, MAE, MAPE}} is employed for each model. The first experiment compares these model performances on relatively simple structures with preprocessed data from block 1. Here, the efficacy of each model in handling data of lower complexity is assessed. The second experiment constructs all the models as a three-layer structure and uses all available data for training. Although this approach may not yield the absolute best performance from each model, it provides a useful comparison in terms of understanding each model’s capability.
The performance metrics used are those typical in regression tasks: the root mean square error (RMSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE). In this study, the RMSE is defined as the standard deviation of the prediction residuals of the section-wise average emission. MAE calculates the absolute values of the prediction residuals for the section-wise average emission, and MAPE is another metric to measure the percentage difference of the residuals. MAE and MAPE are good measures to help explain the model on the same scale. Generally, the lower all these three metrics are, the better the model performance is.
Tables 4 and 5 present the model performance from the two experiments, along with the training time. It can be inferred that LSTM demonstrates a competitive performance in comparison to other state-of-the-art models in both experimental settings. In Table 4, LSTM achieves the smallest RMSE and MAE on the training set. As for the test performance, LSTM gets the second smallest RMSE and MAE, suggesting a commendable generalization ability to unseen data. Moreover, the MAPE for LSTM is the smallest in both the training and testing sets, which highlights its advantage in terms of percentage errors. The second experimental setting provides further evidence for the effectiveness of LSTM. Although the RMSE and MAE of LSTM on the training set are not the smallest, its performance on the testing set is superior, especially in terms of MAPE, where LSTM achieves the lowest error among all the models. This indicates LSTM’s robustness when dealing with more complex structures and larger datasets. Bi-LSTM also shows strong performance. In both experimental settings, Bi-LSTM’s performance is closely competitive with that of LSTM. However, on closer inspection, Bi-LSTM demonstrates slightly higher error metrics on both training and testing sets across the two experiments, suggesting that it might be slightly less efficient in both model fitting and generalization compared to LSTM. While Bi-LSTM’s ability to utilize information from both past and future contexts could be beneficial in some applications, the L-PBF process predominantly relies on the history of the manufacturing process to predict future states, making LSTM’s unidirectional processing more suitable. A two-sample t-test on the difference between the LSTM and Bi-LSTM models yielded a P-value greater than 0.05, indicating that there is no statistically significant difference in performance between the two models based on the chosen evaluation metric. Furthermore, the result shows that Bi-LSTM models are generally more complex and computationally intensive due to their bidirectional nature, which could be a disadvantage in scenarios where computational efficiency is crucial. In conclusion, considering its consistent and superior performance on both simple and complex structures, LSTM is selected as the basis for the proposed physics-guided approach in this study.
Performance comparison of single-layer structure models on block 1 data
Model | RMSE | MAE | MAPE | Time (min) | |||
---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||
LSTM | 7.94 | 18.70 | 2.32 | 3.73 | 0.08 | 0.13 | 2 |
1DCNN | 9.13 | 18.16 | 2.60 | 3.69 | 0.09 | 0.13 | 2 |
CNN | 8.14 | 19.09 | 2.37 | 3.76 | 0.08 | 0.13 | 4 |
MLP | 20.42 | 24.54 | 3.65 | 4.35 | 0.13 | 0.15 | 1 |
Bi-LSTM | 7.94 | 19.37 | 2.32 | 3.77 | 0.08 | 0.13 | 3 |
Model | RMSE | MAE | MAPE | Time (min) | |||
---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||
LSTM | 7.94 | 18.70 | 2.32 | 3.73 | 0.08 | 0.13 | 2 |
1DCNN | 9.13 | 18.16 | 2.60 | 3.69 | 0.09 | 0.13 | 2 |
CNN | 8.14 | 19.09 | 2.37 | 3.76 | 0.08 | 0.13 | 4 |
MLP | 20.42 | 24.54 | 3.65 | 4.35 | 0.13 | 0.15 | 1 |
Bi-LSTM | 7.94 | 19.37 | 2.32 | 3.77 | 0.08 | 0.13 | 3 |
Performance comparison of three-layer structure models across all data
Model | RMSE | MAE | MAPE | Time (min) | |||
---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||
LSTM | 8.19 | 11.49 | 2.30 | 2.75 | 0.08 | 0.10 | 74 |
1DCNN | 7.35 | 13.79 | 2.16 | 3.06 | 0.08 | 0.11 | 195 |
CNN | 9.75 | 12.94 | 2.53 | 2.91 | 0.09 | 0.10 | 194 |
MLP | 9.68 | 13.18 | 2.64 | 2.96 | 0.09 | 0.10 | 18 |
Bi-LSTM | 8.24 | 11.46 | 2.31 | 2.74 | 0.08 | 0.10 | 125 |
Model | RMSE | MAE | MAPE | Time (min) | |||
---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||
LSTM | 8.19 | 11.49 | 2.30 | 2.75 | 0.08 | 0.10 | 74 |
1DCNN | 7.35 | 13.79 | 2.16 | 3.06 | 0.08 | 0.11 | 195 |
CNN | 9.75 | 12.94 | 2.53 | 2.91 | 0.09 | 0.10 | 194 |
MLP | 9.68 | 13.18 | 2.64 | 2.96 | 0.09 | 0.10 | 18 |
Bi-LSTM | 8.24 | 11.46 | 2.31 | 2.74 | 0.08 | 0.10 | 125 |
4.2 LSTM Model Performance.
The same performance metrics are considered: RMSE, MAE, and MAPE. Table 6 lists the result from the step 1.1 tuning on timestamp τ. It appears that both RMSE and MAE test scores for timestamps 5, 10, 15, and 20 are very close, while the scores for 100 and 150 exhibit slight improvement. However, significant computational demands are encountered when training with a longer timestamp, and overfitting issues persist despite the score improvements. We therefore decide to use .
Summary of step 1.1 tuning on timestamp based on design 1 model and block 1 data
Timestamp | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
5 | 3.28 | 17.30 | 1.56 | 3.55 |
10 | 3.39 | 17.35 | 1.59 | 3.55 |
15 | 3.24 | 17.36 | 1.54 | 3.56 |
20 | 3.68 | 16.59 | 1.64 | 3.51 |
100 | 11.03 | 15.88 | 2.90 | 3.49 |
150 | 10.20 | 13.61 | 2.66 | 3.23 |
Timestamp | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
5 | 3.28 | 17.30 | 1.56 | 3.55 |
10 | 3.39 | 17.35 | 1.59 | 3.55 |
15 | 3.24 | 17.36 | 1.54 | 3.56 |
20 | 3.68 | 16.59 | 1.64 | 3.51 |
100 | 11.03 | 15.88 | 2.90 | 3.49 |
150 | 10.20 | 13.61 | 2.66 | 3.23 |
Table 7 lists the result from the step 1.2 tuning on hidden units hu. It is noted that the model has the best test performance when hu = 6. Moreover, the selected values for the number of hidden units do not seem to affect the model performance significantly, while the computational cost increases as hu increases. Therefore, we decide .
Summary of step 1.2 tuning on hidden units based on design 1 model and block 1 data
Hidden units | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
4 | 3.47 | 17.24 | 1.59 | 3.56 |
6 | 3.45 | 16.68 | 1.58 | 3.51 |
10 | 3.38 | 17.29 | 1.58 | 3.55 |
12 | 3.16 | 18.03 | 1.51 | 3.62 |
Hidden units | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
4 | 3.47 | 17.24 | 1.59 | 3.56 |
6 | 3.45 | 16.68 | 1.58 | 3.51 |
10 | 3.38 | 17.29 | 1.58 | 3.55 |
12 | 3.16 | 18.03 | 1.51 | 3.62 |
Table 8 lists the result from the step 1.3 tuning on batch size bs. There is not much difference for the performance between different batch sizes, but the model training time can vary significantly. Training a model with a batch size of six requires almost four times the time needed for training a model with a batch size of 32. It can be concluded that a larger batch size takes less time to train. Therefore, we decide .
Summary of step 1.3 tuning on batch size based on design 1 model and block 1 data
Batch size | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
8 | 11.62 | 13.05 | 2.83 | 3.08 |
16 | 3.38 | 17.56 | 1.58 | 3.59 |
32 | 3.39 | 17.30 | 1.59 | 3.57 |
64 | 3.90 | 17.82 | 1.75 | 3.63 |
Batch size | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
8 | 11.62 | 13.05 | 2.83 | 3.08 |
16 | 3.38 | 17.56 | 1.58 | 3.59 |
32 | 3.39 | 17.30 | 1.59 | 3.57 |
64 | 3.90 | 17.82 | 1.75 | 3.63 |
Since hyperparameters are determined from design 1 but to be used in the more complicated LSTM structure of design 2, we need to confirm that these values are indeed good for design 2. Take timestamp as an example. Table 9 shows the result on design 2 training for timestamp using block 1’s data. Despite the overfitting issue due to small dataset (block 1 data only), we still recommend for step 2 to balance training time and model performance.
Summary of tuning on timestamp based on design 2 model and block 1 data
Timestamp | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
5 | 3.59 | 20.96 | 1.60 | 3.99 |
10 | 3.73 | 20.36 | 1.63 | 3.90 |
15 | 3.30 | 22.20 | 1.55 | 4.10 |
20 | 3.46 | 20.79 | 1.57 | 3.95 |
100 | 3.47 | 21.30 | 1.59 | 3.97 |
150 | 3.34 | 21.83 | 1.56 | 4.05 |
Timestamp | RMSE | MAE | ||
---|---|---|---|---|
Train | Test | Train | Test | |
5 | 3.59 | 20.96 | 1.60 | 3.99 |
10 | 3.73 | 20.36 | 1.63 | 3.90 |
15 | 3.30 | 22.20 | 1.55 | 4.10 |
20 | 3.46 | 20.79 | 1.57 | 3.95 |
100 | 3.47 | 21.30 | 1.59 | 3.97 |
150 | 3.34 | 21.83 | 1.56 | 4.05 |
Selected results from the step 2 tuning on design 2 and entire dataset are shown in Table 10 and Fig. 7. The full factorial design based on λ3 = {τ = 10, hu = 6, bs = 32, af, lr, dr, lf | af ∈ {ReLU, SeLU, PReLU}, lr ∈ {0.01, 0.02, 0.05}, dr ∈ {0.2, 0.5}, lf ∈ {RMSE, MAE}} involves total 3 × 3 × 2 × 2 = 36 combinations with three different data splitting strategies at each combination. To focus on the hyperparameters and data splitting strategies, Table 10 shows the results from using PReLU only. Comparing the RMSE values shows that the overfitting issue is improved when training on the entire dataset, and the increase of dropout rate from 0.2 to 0.5 can also significantly reduce the overfitting on the test dataset. Training batch size does slightly affect the model performance but given the exponentially increased training time with smaller batch size and insignificant improvement of performance, we prefer to use the training batch size of 32. In this way, design 2 can be trained in less than 2 h on a 12th generation i7 Intel CPU with Nvidia RTX 3060Ti GPU desktop PC. Different data splitting strategies have influenced the training performance, as demonstrated in Table 10. Compared with the “fixed” splitting strategy, the other two strategies appear to have positive impact on mitigating overfitting.
Summary of step 2 tuning on data splitting strategy and the computationally efficient hyperparameters based on design 2 model and entire dataset
Experiment index | Data splitting strategy | Training batch | Hidden units | Activation function | Dropout rate | Loss function | Learning rate | RMSE | MAE | MAPE | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||||||||
1 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 4.09 | 9.07 | 1.66 | 2.37 | 0.06 | 0.08 |
2 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 4.25 | 8.69 | 1.81 | 2.29 | 0.06 | 0.08 |
3 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 6.44 | 10.29 | 1.91 | 2.47 | 0.07 | 0.09 |
4 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 3.62 | 9.10 | 1.62 | 2.33 | 0.06 | 0.08 |
5 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 4.27 | 10.21 | 1.63 | 2.56 | 0.06 | 0.09 |
6 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 6.12 | 9.34 | 2.00 | 2.35 | 0.07 | 0.08 |
7 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 3.71 | 8.95 | 1.61 | 2.30 | 0.06 | 0.08 |
8 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 4.28 | 8.99 | 1.74 | 2.28 | 0.06 | 0.08 |
9 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.15 | 9.64 | 1.91 | 2.39 | 0.07 | 0.08 |
10 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 3.72 | 8.82 | 1.67 | 2.29 | 0.06 | 0.08 |
11 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 3.88 | 9.75 | 1.63 | 2.45 | 0.06 | 0.08 |
12 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 6.71 | 10.61 | 1.95 | 2.57 | 0.07 | 0.09 |
13 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 5.48 | 9.33 | 2.12 | 2.41 | 0.07 | 0.08 |
14 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 6.14 | 9.81 | 1.92 | 2.38 | 0.07 | 0.08 |
15 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 7.38 | 11.63 | 2.06 | 2.73 | 0.07 | 0.09 |
16 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 5.42 | 9.11 | 1.92 | 2.28 | 0.07 | 0.08 |
17 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 8.03 | 9.17 | 2.62 | 2.61 | 0.09 | 0.09 |
18 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 6.44 | 10.34 | 1.92 | 2.45 | 0.07 | 0.08 |
19 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 4.32 | 9.22 | 1.64 | 2.31 | 0.06 | 0.08 |
20 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 7.57 | 10.69 | 2.15 | 2.50 | 0.07 | 0.09 |
21 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.44 | 10.28 | 1.91 | 2.43 | 0.07 | 0.08 |
22 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 4.33 | 9.85 | 1.60 | 2.43 | 0.06 | 0.08 |
23 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 6.12 | 8.98 | 2.09 | 2.31 | 0.07 | 0.08 |
24 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 5.35 | 9.62 | 1.73 | 2.34 | 0.06 | 0.08 |
25 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 4.53 | 10.43 | 1.94 | 2.48 | 0.07 | 0.09 |
26 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 7.01 | 11.16 | 2.01 | 2.58 | 0.07 | 0.09 |
27 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 6.80 | 10.32 | 2.11 | 2.47 | 0.07 | 0.09 |
28 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 6.90 | 9.67 | 2.51 | 2.60 | 0.09 | 0.09 |
29 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 6.92 | 10.04 | 2.20 | 2.47 | 0.08 | 0.09 |
30 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 8.26 | 9.79 | 2.63 | 2.64 | 0.09 | 0.09 |
31 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 3.46 | 11.08 | 1.51 | 2.58 | 0.05 | 0.09 |
32 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 7.10 | 11.34 | 2.01 | 2.61 | 0.07 | 0.09 |
33 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.98 | 11.10 | 2.01 | 2.57 | 0.07 | 0.09 |
34 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 2.72 | 11.12 | 1.40 | 2.60 | 0.05 | 0.09 |
35 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 6.84 | 10.17 | 2.16 | 2.47 | 0.07 | 0.09 |
36 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 11.38 | 16.14 | 3.02 | 3.59 | 0.11 | 0.12 |
Experiment index | Data splitting strategy | Training batch | Hidden units | Activation function | Dropout rate | Loss function | Learning rate | RMSE | MAE | MAPE | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | ||||||||
1 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 4.09 | 9.07 | 1.66 | 2.37 | 0.06 | 0.08 |
2 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 4.25 | 8.69 | 1.81 | 2.29 | 0.06 | 0.08 |
3 | By layer | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 6.44 | 10.29 | 1.91 | 2.47 | 0.07 | 0.09 |
4 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 3.62 | 9.10 | 1.62 | 2.33 | 0.06 | 0.08 |
5 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 4.27 | 10.21 | 1.63 | 2.56 | 0.06 | 0.09 |
6 | By layer | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 6.12 | 9.34 | 2.00 | 2.35 | 0.07 | 0.08 |
7 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 3.71 | 8.95 | 1.61 | 2.30 | 0.06 | 0.08 |
8 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 4.28 | 8.99 | 1.74 | 2.28 | 0.06 | 0.08 |
9 | By layer | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.15 | 9.64 | 1.91 | 2.39 | 0.07 | 0.08 |
10 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 3.72 | 8.82 | 1.67 | 2.29 | 0.06 | 0.08 |
11 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 3.88 | 9.75 | 1.63 | 2.45 | 0.06 | 0.08 |
12 | By layer | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 6.71 | 10.61 | 1.95 | 2.57 | 0.07 | 0.09 |
13 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 5.48 | 9.33 | 2.12 | 2.41 | 0.07 | 0.08 |
14 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 6.14 | 9.81 | 1.92 | 2.38 | 0.07 | 0.08 |
15 | By dataset | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 7.38 | 11.63 | 2.06 | 2.73 | 0.07 | 0.09 |
16 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 5.42 | 9.11 | 1.92 | 2.28 | 0.07 | 0.08 |
17 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 8.03 | 9.17 | 2.62 | 2.61 | 0.09 | 0.09 |
18 | By dataset | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 6.44 | 10.34 | 1.92 | 2.45 | 0.07 | 0.08 |
19 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 4.32 | 9.22 | 1.64 | 2.31 | 0.06 | 0.08 |
20 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 7.57 | 10.69 | 2.15 | 2.50 | 0.07 | 0.09 |
21 | By dataset | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.44 | 10.28 | 1.91 | 2.43 | 0.07 | 0.08 |
22 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 4.33 | 9.85 | 1.60 | 2.43 | 0.06 | 0.08 |
23 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 6.12 | 8.98 | 2.09 | 2.31 | 0.07 | 0.08 |
24 | By dataset | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 5.35 | 9.62 | 1.73 | 2.34 | 0.06 | 0.08 |
25 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.01 | 4.53 | 10.43 | 1.94 | 2.48 | 0.07 | 0.09 |
26 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.02 | 7.01 | 11.16 | 2.01 | 2.58 | 0.07 | 0.09 |
27 | Fixed | 32 | 6 | PReLU | 0.5 | RMSE | 0.05 | 6.80 | 10.32 | 2.11 | 2.47 | 0.07 | 0.09 |
28 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.01 | 6.90 | 9.67 | 2.51 | 2.60 | 0.09 | 0.09 |
29 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.02 | 6.92 | 10.04 | 2.20 | 2.47 | 0.08 | 0.09 |
30 | Fixed | 32 | 6 | PReLU | 0.2 | RMSE | 0.05 | 8.26 | 9.79 | 2.63 | 2.64 | 0.09 | 0.09 |
31 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.01 | 3.46 | 11.08 | 1.51 | 2.58 | 0.05 | 0.09 |
32 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.02 | 7.10 | 11.34 | 2.01 | 2.61 | 0.07 | 0.09 |
33 | Fixed | 32 | 6 | PReLU | 0.5 | MAE | 0.05 | 6.98 | 11.10 | 2.01 | 2.57 | 0.07 | 0.09 |
34 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.01 | 2.72 | 11.12 | 1.40 | 2.60 | 0.05 | 0.09 |
35 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.02 | 6.84 | 10.17 | 2.16 | 2.47 | 0.07 | 0.09 |
36 | Fixed | 32 | 6 | PReLU | 0.2 | MAE | 0.05 | 11.38 | 16.14 | 3.02 | 3.59 | 0.11 | 0.12 |
As for the activation function af ∈ {ReLU, SeLU, PReLU}, it is found that SeLU performs worse than the other two based on the RMSE and MAE scores, whereas ReLU and PReLU perform similarly. Figure 7 compares the RMSE and MAE scores from using activation functions ReLU and PReLU, respectively. Out of the 36 cases, PReLU performed better than ReLU in 19 cases, yet the values are very close. The mean and median RMSE for PReLU are 10.102 and 9.830, whereas the mean and median RMSE for ReLU are 10.078 and 9.820. The mean and median MAE values for PReLU are 2.486 and 2.460, whereas the mean and median MAE for ReLU are 2.488 and 2.460, respectively. Therefore, we recommend the use of PReLU as the activation function since it outperforms ReLU in more than half of the 36 cases.
The learning rate parameter shows a correlation with the performance: generally, the smaller learning rates such as 0.01 and 0.02 achieve better performance than 0.05 in this experiment. We also tried using different loss functions, as can be seen in Table 10, for the same set of hyperparameters, but previous studies from other papers have already shown that this is not the key factor, and the results of our own experiment agree with their observation.
Results from the above comparisons provide several key insights. Values for the computationally expensive parameters were chosen to be {τ = 10, hu = 6, bs = 32} based on a consideration of both model performance and computational efficiency. We observed that increasing the learning rate did not always result in better performance and having a higher dropout rate helped prevent overfitting. Our experiments demonstrated that the use of PReLU as a loss function resulted in a more stable training process compared to other loss functions experimented with. While we found that loss function did not have a significant impact on the model performance, we chose RMSE for our purposes. Therefore, a set of hyperparameters with the “by layer” data splitting strategy is selected and recommended.
In summary, our study underscores the crucial role of thoughtful hyperparameter selection in achieving optimal performance in LSTM models. To further validate our results, we employed the same set of hyperparameters to train the model in three distinct scenarios, including design 1, design 2 with statistical features only, and design 2 with physical features only. Our comparative analysis of the model’s performance in these scenarios showed that design 2, with its higher complexity structure, outperformed design 1 on this larger dataset. As presented in Table 11, design 2 outperforms design 1 across all three given performance metrics. Furthermore, the performance comparison based on the partial features of the input dataset emphasized the significance of physical features, as we explained. By combining both physical and statistical features, we were able to significantly improve the model’s performance. The observations from the experiment highlight the significance of the interplay between model complexity and the physical characteristics of the input dataset when constructing LSTM models.
Model effectiveness comparison
Model | RMSE | MAE | MAPE | |||
---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | |
Design 1 | 9.03 | 14.89 | 2.46 | 3.15 | 0.10 | 0.12 |
Design 2 with statistical features only | 7.78 | 12.42 | 2.13 | 2.82 | 0.07 | 0.10 |
Design 2 with physical features only | 4.35 | 10.31 | 1.67 | 2.58 | 0.06 | 0.09 |
Design 2 | 4.25 | 8.69 | 1.81 | 2.29 | 0.06 | 0.08 |
Model | RMSE | MAE | MAPE | |||
---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | |
Design 1 | 9.03 | 14.89 | 2.46 | 3.15 | 0.10 | 0.12 |
Design 2 with statistical features only | 7.78 | 12.42 | 2.13 | 2.82 | 0.07 | 0.10 |
Design 2 with physical features only | 4.35 | 10.31 | 1.67 | 2.58 | 0.06 | 0.09 |
Design 2 | 4.25 | 8.69 | 1.81 | 2.29 | 0.06 | 0.08 |
The quantification of overheating is indeed a crucial aspect of our work. However, it is important to note that our current model focuses primarily on predicting the thermal characteristics within a section, not explicitly on defining or determining an overheating threshold. As a potential solution, we can consider integrating the model with a threshold determination mechanism in the future. This could be accomplished by linking the model’s predicted thermal characteristics to an adaptively defined threshold. For example, a control chart could be used to establish the upper control limit (UCL). It can be effective to first determine the initial thresholds by researcher, then refined over time using moving technologies such as EWMA. The predicted statistical features from the LSTM model for the next section could be treated as a new observation. The EWMA based on this prediction and the previous EWMAs is computed and compared with the UCL. If it exceeds the UCL, then it is likely to overheat in the next section. The EWMA control chart, with its definition to give more weight to recent observation sections, is also well suited for the time-series nature of emission observations, as well as the segmented sequence of data.
By implementing the method, small section-wise shifts can be continuously monitored, which enables early detection of potential overheating situations and allows for prompt corrective actions to process parameters.
5 Conclusion
Data from high-speed coaxial pyrometry during the laser-based powder bed fusion process provide opportunities for in situ monitoring and overheating prognostics. In this paper, we develop a physics-guided long short-term memory network to predict section-wise thermal characteristics in the L-PBF process, based on process parameters and the time-series nature of emission history. Collaborating with physics-based features improves the model’s explainability and performance. This method offers an advantage over traditional layer-wise approaches by providing finer granularity in process monitoring. In traditional L-PBF processes, parameters are usually set before the printing begins and remain constant throughout. Analysis and adjustments typically occur on a layer-by-layer basis, with a whole layer completed before any analysis and adjustments for the next layer are made. Adjustments to process parameters, like laser power or speed, are then made before printing the next layer. However, this approach has its limitations as it does not allow for modifications during the printing of a single layer. This means that if a problem arises in a specific section of the layer, it cannot be addressed until the entire layer has been completed. In contrast, our section-wise research divides each layer into smaller sections, allowing for more detailed observations during the printing process. The advantage of this section-wise methodology is that it provides the opportunity for adjustments to process parameters during the printing of a single layer. We compare the prediction performance of various combinations of hyperparameters and data splitting strategies. Results demonstrate the feasibility and effectiveness of the proposed model.
There are several directions for future research. The implementation of real-time input adjustments to the process parameters during the L-PBF process can be explored. Given our section-wise analysis approach, the input adjustments could potentially enhance the print quality and efficiency of the L-PBF process by mitigating issues like overheating as soon as they are detected. For instance, if overheating is detected in a particular section, laser parameters could potentially be adjusted in real time to mitigate the issue. The feasibility and impact of adjusting the laser input frequency can therefore be examined. A more detailed investigation into the thermal dynamics during the printing process and more comprehensive integration of physics nature to current model are therefore needed. It will also be valuable to have an exhaustive investigation of the optimal combination of hyperparameters for the proposed model. Moreover, it will be of great interest to resolve the trade-off between the number of hidden layers and the computational cost, which will help achieve satisfying prediction accuracy at a minimum cost. Importantly, this predictive model serves as a foundation for future works aimed at the quantification of overheating. Lastly, the proposed model can serve as a building block to be combined with transfer learning and encoder tools to predict emission in L-PBF for other geometry designs or other metals.
Acknowledgment
The authors would like to thank the financial support of the National Science Foundation under the Grant CMMI-2152908.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.