Abstract
Phase-resolved wave prediction capability, even if only over two wave periods in advance, is of value for optimal control of wave energy converters, resulting in a dramatic increase in power generation efficiency. Previous studies on wave-by-wave predictions have shown that an artificial neural network (ANN) model can outperform the traditional linear wave theory-based model in terms of both prediction accuracy and prediction horizon when using synthetic wave data. However, the prediction performance of ANN models is significantly reduced by the varying wave conditions and buoy positions that occur in the field. To overcome these limitations, a novel wave prediction method is developed based on the neural network with an attention mechanism. This study validates the new model using wave data measured at sea. The model utilizes past time histories of three Sofar Spotter wave buoys at upwave locations to predict the vertical motion of a Datawell Waverider-4 at a downwave location. The results show that the attention-based neural network model is capable of capturing the slow variation in the displacement of the buoys, which reduces the prediction error compared to a standard ANN and long short-term memory model.
1 Introduction
Phase-resolved wave prediction in real-time is valuable for offshore operations and renewable energy sectors, such as facilitating safe ship-to-ship transfer [1], mitigating extreme loads on floating wind turbines [2], and personnel transfer for offshore wind maintenance [3]. The present study focuses on leveraging surface wave prediction to enhance the efficiency of wave energy converters (WECs). Earlier studies [4–6] have shown the significance of accurate wave predictions of two wave periods for active control, which could result in a dramatic increase in the power output. Additionally, wave predictions can help to ensure their survivability under severe wave conditions.
Physics-based models, such as the “algebraic” model [7] based on linear wave theory, have shown success in predicting waves with small to moderate directional spreading. However, challenges persist in highly directionally spreading waves due to swells from different storm sources and local refraction caused by sea-bed topography [8]. A data-driven based approach, such as an artificial neural network (ANN), can potentially address the complexity of wave predictions. Chen et al. [9] demonstrated that ANN-based models can provide more accurate predictions over than linear wave theory-based models using synthetic waves with moderate spreading. However, in real-world conditions, the wave buoys move on their mooring around their watch circles. A recent study [10] highlighted the challenges in generalizing the ANN model due to substantial variations in positions between the upwave and downwave buoys, as well as varying wave conditions during the buoy deployment. These factors resulted in poor prediction accuracy, emphasizing the significance of addressing these challenges.
This paper builds upon prior work of Ref. [10], addressing difficulties associated with adapting a standard ANN model to real-world field data. In particular, this work focuses on the development of an advanced machine-learning method based on an attention mechanism for wave predictions. Incorporating the attention mechanism empowers the model to learn and adapt critical features, such as the prediction of the relative locations between the upwave and downwave buoys.
To evaluate the prediction performance, we compare models using the past time histories of an upwave array consisting of three Sofar Spotter wave buoys to predict Datawell Waverider-4 (DWR4) at a downwave location. The results show that the proposed method can achieve accurate phase-resolved wave prediction for up to typically two wave periods. The prediction error increases rapidly beyond two wave periods, primarily due to the loss of wave information from the upwave. Specifically, the variations in positions and wave conditions are well captured, demonstrating the capability of the attention mechanism in wave prediction.
This paper is organized as follows. Section 2 provides an overview of field measurements. Section 3 describes the proposed attention-based machine learning methods. The prediction performance of the proposed model on varying wave conditions and buoy positions is discussed in Sec. 4. Section 5 concludes with a discussion of our major results and findings.
2 Field Measurements and Variations in Buoy Positions
Wave data used in this work were collected from July 21, 2022, to Oct. 16, 2022 (88-day deployment), in the Southern Ocean off Albany, Western Australia. As illustrated in Fig. 1, a DWR4 buoy and a Sofar Spotter buoy are deployed for measuring the surface wave elevation and the horizontal displacement components to the east and the north in a local coordinate system relative to the centroid of buoys. The DWR4 buoy, manufactured by Datawell BV [11], has a 0.90 m diameter. It is equipped with a battery and electrical components to measure waves at a sampling rate of 2.56 Hz. Its Earth-fixed global positioning system (GPS) coordinates are recorded at intervals of every second and tenth minute. One can refer to Ref. [12] for details of the mooring configuration for the DWR4. The Sofar Spotter buoy [13] has a diameter of 0.42 m and a hemispherical shape below the water surface and is powered by solar panels. Waves are measured with a sampling frequency of 2.50 Hz, with GPS position being recorded every minute. It is noted that for the following studies, the buoy GPS positions were referenced and converted into east and north coordinates with respect to the center of the watch circle of Spotter 1 (coordinate (0, 0)) for the following studies.
Figure 2 depicts the initial anchoring positions of the DWR4 (red star) and Spotter wave buoys (blue dots). All buoys were anchored to the seafloor at the centers of their watch circles using conventional mooring, which are shaded as red (DWR4) and blue (Spotters). The discrepancy in sizes is due to the use of different buoy types and mooring systems. It is noted that the DWR4 was previously deployed for other purposes, and three Spotters were later deployed specifically for wave prediction. The wave buoys are deployed away from the shore at a water depth of 33 m, and the buoy array (blue dashed triangle array and red star in Fig. 2) is aligned with the average mean wave direction and points toward the north–northeast.
Figure 3 indicates the instantaneous GPS positions for the DWR4 wave buoy during 88 days of deployment. The heat maps represent the relative time buoy spent in a particular location. Notably, a significant portion of its time was spent to the west and east of the anchor position, suggesting that the buoy positions were mainly affected by currents flowing approximately parallel to the coast. It is worth noting that the DWR4 stopped recording during the 88-day deployment due to an unknown accident, which led to unknown prediction errors in Sec. 4 between Aug. 26 and Aug. 28.
It is noted that the measurement data from the Spotters and DWR4 have been interpolated in time to obtain a consistent sampling frequency. To speed up the training process, the sampling resolution for all models used in this work has been set to 0.5 s (equivalent to a sampling frequency of 2 Hz). It has been observed that reducing the resolution further does not enhance accuracy. The GPS positions of DWR4 were also interpolated to match those of the Spotters. It should also be noted that the data have been filtered using a band-pass filter within the linear frequency range of [, ], where is the spectral peak frequency. This filtering range retains components within the linear frequency range and eliminates nonlinear harmonics outside the specified range, thereby improving prediction accuracy, as demonstrated in previous work [8].
As shown in Figs. 4(a) and 4(b), the variation in east and north GPS positions from each buoy is relatively large during 88 days of deployment. The relative distance in Fig. 4(c) is the distance between the mean GPS positions of the detecting (Spotter) array and the prediction point (DWR4), which is obtained using the square root of the sum of the squares of the GPS positions for east and north. Although the GPS positions seem to be moving in a similar direction, the relative distance shows a large variation, ranging from 190 m to 320 m. Further, the difference in relative distance is expected to be increased if the DWR4 at the prediction location is replaced with a WEC, which has varying time scales and magnitudes of horizontal motion compared to the buoys. This is problematic for the standard ANN model as illustrated in Ref. [10], the prediction performance is significantly limited by large variations in relative distance, resulting in phase offset in predictions.
It can be seen that it is important to incorporate the relative distance in the machine learning models. However, it is difficult to train a standard ANN model to learn the variations in field data effectively. With more input features, the ANN can become overly complex and start to overfit the training data rather than learning the underlying patterns. This overfitting can cause the model to struggle with accurately mapping less relevant input features, such as relative distance, to the downwave target surface wave elevation, potentially compromising the overall accuracy. The prediction accuracy drops significantly when considering relative distance as an additional input, as will be demonstrated in the following section. Hence, a more robust model is required to improve its generalization.
3 Methodology
In Sec. 3.1, we discuss the preprocessing steps for the training data. Sections 3.2 and 3.3 describe the standard neural network and the long short-term memory (LSTM) models. In Sec. 3.4, we present the proposed attention-based model. Section 3.5 describes the data-driven models utilized in Sec. 4. Section 3.6 provides the formula for calculating prediction errors.
3.1 Preprocessing of Training Data.
In this study, the model inputs consist of the record of surface wave elevation and horizontal displacements in the east and north directions of the three Spotters at upwave locations, and the GPS positions (east and north). The outputs (target prediction) are the surface wave elevation of the DWR4 at the downwave location. Consider five data streams (horizontal displacements , , surface wave elevation , and GPS positions ) measured by each Spotter, denoted by , , , for , , where the indicates the Spotter index and is the length of time-steps measured at the upwave location. The GPS positions measured by DWR4, denoted by , for .
We used 14 days of data from July 21 to Aug. 4, 2022, to train our models, which was split into two sets: training (80%) and validation (20%). We used a sliding window technique to rearrange the data into approximately 240,000 sets, where the sliding offset is 5 s. Each training set consisted of 204 s (equivalent to 408 time-steps with the data sampled every 0.5 s) of inputs and 264 s (equivalent to 528 time-steps) of output ( for reconstruction and for forecasting). For instance, the initial set of input comprised , for and . The corresponding output at the downwave location was .
It is noted that the input time histories of the buoy motions in three degrees-of-freedom (DOF) are normalized by the significant wave height , and the GPS positions for all buoys are also normalized by subtracting their mean and dividing by their standard deviation.
3.2 Artificial Neural Network Model.
Any ANN model is a universal approximation function mapping a set of input values to output values . The primary objective of an ANN model is to establish the optimal approximation function by learning the model parameters—sets of weights and biases , such that .
A simple ANN has an input layer, one hidden layer, and an output layer. The input layer takes in the data, which is then passed to the hidden layer. Each layer has several neurons, and each neuron in one layer is connected to every neuron in the next layer. Neurons calculate their values by applying a nonlinear activation function to the weighted sum of their inputs with a bias term. For more details of the ANN model architecture, one can refer to Refs. [9,15,16]. Section 3.5 describes the two different versions of the ANN model used in this paper.
3.3 Long Short-Term Memory Model.
The LSTM is a variant of the recurrent neural network model (RNN), which is a neural network specifically designed to handle temporal input data, making them well-suited for tasks involving sequential data, such as text or time series data [17]. The LSTM model extends the traditional ANN model by incorporating feedback connections and internal memory. These feedback connections allow the network to retain information from previous inputs, as the activation values are fed back into the network at each time-step, enabling the model to consider past inputs in its processing.
A typical LSTM unit comprises a cell, an input gate, an output gate, and a forget gate, all of which regulate the flow of information. These gates enable the network to retain important information while discarding irrelevant data, thereby capturing long-term dependencies more effectively and mitigating the vanishing gradient problem, leading to improved performance over the standard RNN model [18]. Therefore, the LSTM model is preferred over the RNN model for the subsequent predictions in this study. For more details of the LSTM model architecture, one can refer to Refs. [19,20].
The LSTM model can outperform a simple ANN model in handling sequential data due to its advanced structure, which includes feedback connections and internal memory to retain information from previous inputs. However, LSTM may not always prioritize the most relevant information, leading to suboptimal utilization of available data. This limitation is particularly important in this study, where the measurement data involve numerous variables and complexities, as will be further discussed in Sec. 4. To overcome this limitation, we propose the attention-based model to enhance the prediction performance.
3.4 Attention-Based Neural Network Model.
Recently, the attention mechanism [21] has emerged as a novel set of layers within neural networks, gathering substantial attention, particularly in the context of sequence-based tasks. The attention mechanism can be understood as a vector of importance weights. The attention vector is used to determine the level of significance (which part of the input needs to be paid attention to) of a specific feature (such as GPS positions) with other features in the input data. The relative distance between the wave buoys and the sea states themselves, experience slow variations over time. The attention algorithm seems to be promising in capturing such variations to improve forecasting capability.
3.4.1 Sequential Encoding.
As shown in Fig. 6, each row represents a sinusoidal wave with different wavelengths in different dimensions ranging from to . Specifically, the first row corresponds to the vector added to one of the features of the input sequence (i.e., horizontal displacement of Spotter 1). Within each row, there are 408 values, each falling within the range of 1 (white) to (black) calculated using Eq. (4). The 17 rows in the sequential encoding represent the 17 dimensions from the input data streams, which are directly added to the input to create distinctive encoding patterns for different orders in the sequence.
3.4.2 Attention Mechanism.
The general attention mechanism consists of three main components: queries , keys , and values . These components play a crucial role in determining how the attention mechanism focuses on different parts of the input sequence when computing the weighted sum for each time-step in the output sequence. For an input sequence which has added sequential encoding, the attention mechanism can be implemented by the following steps [21] as highlighted in the orange box in Fig. 5:
For each input vector, we create a query , key , and value , where , , are the learnable weight matrices that needs to be estimated.
- The attention value for each query is computed by mapping the query and all the key values to an outputwhere is the scaling factor introduced to help stabilize the training process, with being the input length for queries/keys. The output is a weighted sum of values, where each value’s weight is determined by the dot-product of the query with all keys through the softmax activation function [22].(5)
- The attention mechanism is applied times (also called multi-head attention) to enhance the model’s capacity. Each time, the input vectors are transformed into a different query, key, and value vector using different weight matrices for . The outputs from each attention are concatenated and linearly transformed through the output weight matrix , which can be expressed as [21]where .(6)
The output obtained from a multi-head attention block is added with the original input using a residual connection, followed by layer normalization [23]. It is noted that layer normalization helps to enhance the training stability and convergence speed.
3.4.3 Fully Connected Feed-Forward Networks.
After that, a global average pooling was applied to obtain the final representation of a single vector, which contributes to a more efficient computation in training and reduces the risk of overfitting. Finally, this vector is fed into a standard feed-forward network to predict the surface wave elevation at the downwave location. It is noted that for a standard ANN model used in Ref. [10], the highlighted gray box in Fig. 5 is simply replaced by a single fully connected feed-forward layer.
3.5 Model Specifications.
In this study, we consider the following four models:
The ANN model which uses the time history of 3DOF measurements from three spotters as the only inputs of the model is denoted as ANN.
The ANN model which incorporates both 3DOF measurements from three Spotters and the east and north GPS positions for all buoys are denoted as ANN*.
The LSTM model with both 3DOF and GPS positions measurement is denoted as LSTM*.
The neural network model based on attention mechanism with both 3DOF and GPS positions measurement is denoted as ANN*-Attention.
Based on the sensitivity checks, both ANN models were implemented with two hidden layers, each with 200 neurons. The LSTM model consists of two hidden layers with 100 cell units followed by a feed-forward layer with 200 neurons. The attention-based neural network model employed a total number of =8 heads for the multi-head attention and the number of neurons in the feed-forward neural network layer was set to 200. It is noted that increasing the number of neurons, hidden layers, and number of heads does not improve the prediction performance. The Adam optimizer [25] and ReLU activation function [24] are used for training. The training batch size and the learning rate are set to 128 and , respectively. A quantile loss function in Eq. (8) and early stoppage were used. All models were implemented in python using Keras [27] and Tensorflow [28] packages. The computer used for implementation had an 8-core CPU, 14-core integrated GPU, and 16 GB RAM (Apple M1 Pro chip). The computation training time was approximately 15 min for ANN models, whereas it increased to 80 min and 110 min for the LSTM and attention-based neural network models, respectively. Once fully trained, predictions are completed in a fraction of a second.
3.6 Error Assessment.
4 Results
Figure 7 shows the training and validation loss for all models. It can be seen that the ANN*-Attention model exhibits the best performance, with lower loss values compared to the other models during both training and validation processes. It is also noted that the large discrepancy in training and validation losses of ANN models is due to missing GPS input variable, leading to the phase offsets in predictions in the validation set. In particular, the observed lower accuracy of the LSTM* model compared to the attention-based model (ANN*-Attention) can be attributed to several factors inherent in their architectures. The LSTM* incorporated feedback connection and internal memory to retain the information from previous inputs. However, it may not necessarily prioritize the most relevant information. This limitation is particularly significant in our case, as the measurement data involves a high number of variables and complexities, such as the necessity of incorporating the GPS positions and the variations in wave conditions.
With the highly complex data patterns, the forget gate in LSTM* may struggle to accurately identify and retain essential information. This can potentially lead to forgetting relevant information that is important for accurate wave predictions. Additionally, the interaction between the forget gate and other gates (input and output gates) might influence the retention of relevant information. If these gates do not function harmoniously, the overall performance of the LSTM* may be compromised, resulting in the loss of accuracy.
To overcome the above shortage, the attention-based model is used to improve the prediction performance. The attention mechanism enhances model accuracy by explicitly allowing the model to weigh different parts of the input sequence differently. This selective focus enables the attention-based model to capture complex hidden representations and variations in wave conditions and buoy positions more effectively than the LSTM* and ANN* models. Moreover, by leveraging the most relevant information at each time-step, the attention mechanism also allows the model to generalize better for unseen sea states and make more accurate predictions in complex data patterns.
Figure 8(a) depicts a typical directional spectrum based on one-hour displacement records measured from DWR4 from 00:00 a.m. to 01:00 a.m. on Aug. 09, 2022. The spectra show a long-period swell in Albany with a peak period of propagating from 205 deg from the north. Figure 8(b) is computed based on the maximum entropy method [29], and shows narrow-banded waves in frequency with small directional spreading angles and the mean wave direction is aligned with the buoy array. Hence, it is expected that the prediction for these waves will be more accurate as long-period waves with small spreading angles also reduce the complexity of predictions.
The comparison of prediction results using different models is shown in Fig. 9. The corresponding bulk parameters (significant wave height , peak period , separation angle , spreading angle , and the relative distance (Dist)) for the tested case is shown at the top. Here represents the array direction and stands for the mean wave direction in the hour. The directional bulk parameters (, ) are determined from the Fourier coefficients [30].
Figure 9(a) shows that the predicted phase using the ANN* model is significantly better than the ANN model. This improvement is attributed to the inclusion of GPS position information. Such information is missing in the ANN model, resulting in phase off-set in prediction as it only relies on 3DOF buoy motions. However, the ANN* model appears to struggle with predicting amplitude compared to the ANN model. This is because the GPS position information adds extra complexity to the model, making it difficult to generalize well, particularly with large variations in buoy positions. It is also noted the 95% prediction interval is only plotted for the ANN* model in Fig. 9(a), and the ANN model has a wider prediction interval than that of ANN* due to the phase off-set. The level of uncertainty considerably increases beyond 30 s in predictions due to the loss of relevant wave information from the upwave measurements.
In Fig. 9(b), the LSTM* model results show a significant improvement in both phase and amplitude prediction compared to the simpler ANN and ANN* models. The LSTM* model benefits from its architecture, which is designed to handle sequential data and retain relevant information over longer periods. Hence, the LSTM* model effectively captures the temporal dependencies in the data, resulting in more accurate predictions. Furthermore, the prediction intervals for the LSTM* model are slightly narrower than ANN*, indicating reduced uncertainty and enhanced reliability in the predictions.
As shown in Fig. 9(c), it can be seen that the new model using the attention mechanism (ANN*-Attention) has yielded notable enhancements in accuracy compared to models without attention mechanisms, especially the accuracy in crests and troughs. Additionally, the narrower prediction interval further suggests that the attention mechanism can effectively allow the model to selectively learn and adapt to complex hidden representations and variations in GPS positions and wave conditions within the input features.
To compare the prediction performance further, we select four different cases with conditions taken from the three months of field data. The prediction results for the selected cases are shown in Fig. 10. It is noted that we only compared ANN*, LSTM*, and ANN*-Attention models because the phase off-set problem in the ANN model will lead to a very large prediction error, which is useless for practical applications. The models have been trained based on the initial 14-day data. In cases (a) and (b), the ANN* and LSTM* models are less accurate because the variations of different sea state combinations and different relative distances over 14 days introduce dissimilarities in the training data, which can reduce the prediction accuracy and generalization capability of the ANN model. However, by dynamically assigning different weights to various components of the input data during the prediction process, the attention mechanism enables the model to generalize better and contribute to more accurate predictions.
Case (c) presents a more challenging prediction scenario due to its larger separation and directional spreading angles compared to cases (a) and (b). It is expected that the prediction accuracy decreases with increasing directional spreading angles, which is consistent with previous studies using synthetic data, as demonstrated by Chen et al. [9]. To enhance model accuracy for waves with significant directional spreading angles, additional upwave measurement buoys would be helpful.
In case (d), the separation and spreading angles are comparable to those in cases (a) and (b). However, the prediction accuracy is notably lower, especially there are some phase offsets observed in ANN* and LSTM* models. The loss of accuracy is primarily attributed to the occurrence of a relative distance of approximately 216 m. The low occurrence of such relative distances results in an insufficient number of observations in the training set. Hence, the prediction accuracy for this case would be expected to improve if more training data had been available.
The overall prediction error over the 88-day deployment is shown in Fig. 11. The dataset includes samples taken every 6 h over 88 days. Figure 11(a) indicates the separation angles and its variation range within an hour. Alongside the separation angles, the average angle away from the mean wave direction is plotted. In Fig. 11(b), we show the averaged prediction error over a 12 s horizon for all models.
The prediction performance aligns with expectations, with the ANN*-Attention model outperforming the other models. The ANN model exhibits numerous error spikes due to missing GPS position information and the models without the incorporation of the attention mechanism are generally less accurate due to the limitation in addressing the variation in both GPS positions and wave conditions effectively. Prediction performance is poor for waves with large directional spreading angles, regardless of all models because large spreading waves are often associated with shorter wave periods and a lack of wave information when using only three Spotters at the upwave locations. Therefore, additional upwave measurement buoys are required to improve the accuracy of models with large directional spreading angles of waves.
5 Conclusion
In this work, an attention mechanism-based model was applied to predict phase-resolved waves in the Southern Ocean near Albany, Western Australia. The model utilizes past measurement data on ocean surface elevation and horizontal displacements obtained from a set of three Spotter buoys to predict the surface elevation measured by a DWR4 located downwave of the Spotter array.
The relative positions of the buoys keep changing during the measurement, making it challenging to obtain accurate predictions. To overcome this limitation, we proposed a new method based on the neural network with an attention mechanism. The prediction performance of this method was compared with artificial neural network and long short-term memory models.
The analysis results suggest that the proposed new method has the capability to capture the variation in both buoy positions and wave conditions and thus is able to improve the forecasting capability. Furthermore, the attention-based model is shown to significantly enhance the robustness and accuracy of the conventional neural network model applied in Ref. [10]. The results indicate the promising potential of the proposed model in predicting moderate directionally spread waves based on an array of three spotters. For improved accuracy in predicting more highly spread waves, future work will investigate the use of additional wave buoys and larger, optimized arrays at upwave locations.
Acknowledgment
This research is supported by the ARC ITRH for Transforming Energy Infrastructure through Digital Engineering (TIDE) which is funded by the Australian Research Council (ARC), INPEX Operations Australia, Shell Australia, Woodside Energy, Fugro Australia Marine, Wood Group Kenny Australia, RPS Group, Bureau Veritas, and Lloyd’s Register Global Technology (Grant No. IH200100009). W.Z. is grateful for the Future Fellowship (Grant No. FT230100109) funded by ARC. We also would like to thank Professor Jeff Hansen and Dr. Thobani Hlophe from The University of Western Australia for providing access to the processed field data.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.