Graphical Abstract Figure
Graphical Abstract Figure
Close modal

Abstract

Phase-resolved wave prediction capability, even if only over two wave periods in advance, is of value for optimal control of wave energy converters, resulting in a dramatic increase in power generation efficiency. Previous studies on wave-by-wave predictions have shown that an artificial neural network (ANN) model can outperform the traditional linear wave theory-based model in terms of both prediction accuracy and prediction horizon when using synthetic wave data. However, the prediction performance of ANN models is significantly reduced by the varying wave conditions and buoy positions that occur in the field. To overcome these limitations, a novel wave prediction method is developed based on the neural network with an attention mechanism. This study validates the new model using wave data measured at sea. The model utilizes past time histories of three Sofar Spotter wave buoys at upwave locations to predict the vertical motion of a Datawell Waverider-4 at a downwave location. The results show that the attention-based neural network model is capable of capturing the slow variation in the displacement of the buoys, which reduces the prediction error compared to a standard ANN and long short-term memory model.

1 Introduction

Phase-resolved wave prediction in real-time is valuable for offshore operations and renewable energy sectors, such as facilitating safe ship-to-ship transfer [1], mitigating extreme loads on floating wind turbines [2], and personnel transfer for offshore wind maintenance [3]. The present study focuses on leveraging surface wave prediction to enhance the efficiency of wave energy converters (WECs). Earlier studies [46] have shown the significance of accurate wave predictions of two wave periods for active control, which could result in a dramatic increase in the power output. Additionally, wave predictions can help to ensure their survivability under severe wave conditions.

Physics-based models, such as the “algebraic” model [7] based on linear wave theory, have shown success in predicting waves with small to moderate directional spreading. However, challenges persist in highly directionally spreading waves due to swells from different storm sources and local refraction caused by sea-bed topography [8]. A data-driven based approach, such as an artificial neural network (ANN), can potentially address the complexity of wave predictions. Chen et al. [9] demonstrated that ANN-based models can provide more accurate predictions over than linear wave theory-based models using synthetic waves with moderate spreading. However, in real-world conditions, the wave buoys move on their mooring around their watch circles. A recent study [10] highlighted the challenges in generalizing the ANN model due to substantial variations in positions between the upwave and downwave buoys, as well as varying wave conditions during the buoy deployment. These factors resulted in poor prediction accuracy, emphasizing the significance of addressing these challenges.

This paper builds upon prior work of Ref. [10], addressing difficulties associated with adapting a standard ANN model to real-world field data. In particular, this work focuses on the development of an advanced machine-learning method based on an attention mechanism for wave predictions. Incorporating the attention mechanism empowers the model to learn and adapt critical features, such as the prediction of the relative locations between the upwave and downwave buoys.

To evaluate the prediction performance, we compare models using the past time histories of an upwave array consisting of three Sofar Spotter wave buoys to predict Datawell Waverider-4 (DWR4) at a downwave location. The results show that the proposed method can achieve accurate phase-resolved wave prediction for up to typically two wave periods. The prediction error increases rapidly beyond two wave periods, primarily due to the loss of wave information from the upwave. Specifically, the variations in positions and wave conditions are well captured, demonstrating the capability of the attention mechanism in wave prediction.

This paper is organized as follows. Section 2 provides an overview of field measurements. Section 3 describes the proposed attention-based machine learning methods. The prediction performance of the proposed model on varying wave conditions and buoy positions is discussed in Sec. 4. Section 5 concludes with a discussion of our major results and findings.

2 Field Measurements and Variations in Buoy Positions

Wave data used in this work were collected from July 21, 2022, to Oct. 16, 2022 (88-day deployment), in the Southern Ocean off Albany, Western Australia. As illustrated in Fig. 1, a DWR4 buoy and a Sofar Spotter buoy are deployed for measuring the surface wave elevation η and the horizontal displacement components to the east ζx and the north ζy in a local coordinate system relative to the centroid of buoys. The DWR4 buoy, manufactured by Datawell BV [11], has a 0.90 m diameter. It is equipped with a battery and electrical components to measure waves at a sampling rate of 2.56 Hz. Its Earth-fixed global positioning system (GPS) coordinates are recorded at intervals of every second and tenth minute. One can refer to Ref. [12] for details of the mooring configuration for the DWR4. The Sofar Spotter buoy [13] has a diameter of 0.42 m and a hemispherical shape below the water surface and is powered by solar panels. Waves are measured with a sampling frequency of 2.50 Hz, with GPS position being recorded every minute. It is noted that for the following studies, the buoy GPS positions were referenced and converted into east and north coordinates with respect to the center of the watch circle of Spotter 1 (coordinate (0, 0)) for the following studies.

Fig. 1
(a) Datawell Waverider-4 (DWR4) buoy and (b) Sofar Spotter buoy deployed in Albany
Fig. 1
(a) Datawell Waverider-4 (DWR4) buoy and (b) Sofar Spotter buoy deployed in Albany
Close modal

Figure 2 depicts the initial anchoring positions of the DWR4 (red star) and Spotter wave buoys (blue dots). All buoys were anchored to the seafloor at the centers of their watch circles using conventional mooring, which are shaded as red (DWR4) and blue (Spotters). The discrepancy in sizes is due to the use of different buoy types and mooring systems. It is noted that the DWR4 was previously deployed for other purposes, and three Spotters were later deployed specifically for wave prediction. The wave buoys are deployed 1km away from the shore at a water depth of 33 m, and the buoy array (blue dashed triangle array and red star in Fig. 2) is aligned with the average mean wave direction and points toward the north–northeast.

Fig. 2
(a) Location of buoys deployment in Albany and (b) initial anchor positions of DWR4 (red star) and Spotter buoys (blue dots). The blue and red shaded areas correspond to Spotter and DWR4 watch circle sizes along their mooring lines. A dashed triangle indicates the Spotter array geometry. (Color version online.)
Fig. 2
(a) Location of buoys deployment in Albany and (b) initial anchor positions of DWR4 (red star) and Spotter buoys (blue dots). The blue and red shaded areas correspond to Spotter and DWR4 watch circle sizes along their mooring lines. A dashed triangle indicates the Spotter array geometry. (Color version online.)
Close modal

Figure 3 indicates the instantaneous GPS positions for the DWR4 wave buoy during 88 days of deployment. The heat maps represent the relative time buoy spent in a particular location. Notably, a significant portion of its time was spent to the west and east of the anchor position, suggesting that the buoy positions were mainly affected by currents flowing approximately parallel to the coast. It is worth noting that the DWR4 stopped recording during the 88-day deployment due to an unknown accident, which led to unknown prediction errors in Sec. 4 between Aug. 26 and Aug. 28.

Fig. 3
Instantaneous GPS positions DWR4 buoy during 88-day deployment. A red region indicates the longest duration at a position. (Color version online.)
Fig. 3
Instantaneous GPS positions DWR4 buoy during 88-day deployment. A red region indicates the longest duration at a position. (Color version online.)
Close modal

It is noted that the measurement data from the Spotters and DWR4 have been interpolated in time to obtain a consistent sampling frequency. To speed up the training process, the sampling resolution for all models used in this work has been set to 0.5 s (equivalent to a sampling frequency of 2 Hz). It has been observed that reducing the resolution further does not enhance accuracy. The GPS positions of DWR4 were also interpolated to match those of the Spotters. It should also be noted that the data have been filtered using a band-pass filter within the linear frequency range of [0.6ωp, 3.0ωp], where ωp is the spectral peak frequency. This filtering range retains components within the linear frequency range and eliminates nonlinear harmonics outside the specified range, thereby improving prediction accuracy, as demonstrated in previous work [8].

As shown in Figs. 4(a) and 4(b), the variation in east and north GPS positions from each buoy is relatively large during 88 days of deployment. The relative distance in Fig. 4(c) is the distance between the mean GPS positions of the detecting (Spotter) array and the prediction point (DWR4), which is obtained using the square root of the sum of the squares of the GPS positions for east and north. Although the GPS positions seem to be moving in a similar direction, the relative distance shows a large variation, ranging from 190 m to 320 m. Further, the difference in relative distance is expected to be increased if the DWR4 at the prediction location is replaced with a WEC, which has varying time scales and magnitudes of horizontal motion compared to the buoys. This is problematic for the standard ANN model as illustrated in Ref. [10], the prediction performance is significantly limited by large variations in relative distance, resulting in phase offset in predictions.

Fig. 4
GPS positions from the first hour of every 6 h during 88 days of deployment, with (a) for east and (b) for north; and (c) shows the relative distance between the mean position of the spotter array and the DWR4
Fig. 4
GPS positions from the first hour of every 6 h during 88 days of deployment, with (a) for east and (b) for north; and (c) shows the relative distance between the mean position of the spotter array and the DWR4
Close modal
The relative distance is a critical parameter to the wave prediction as the predicted surface wave at downwave location can be expressed as a function of relative distance based on linear wave theory, where each spectral component propagates in space. The physics-based representation of the linear surface wave elevation η(x,t) with random amplitudes can be expressed through a double summation across frequency and direction [14]
(1)
where Aij is the wave amplitudes, t is time, x=(x,y) is the GPS position in east and north directions, kij is the wavenumber vectors in the x and y directions, εij are initial phase shifts, Nθ and Nω are the number of directional and frequency components, respectively. The wavenumber k and wave frequency ω are related based on linear dispersion relation
(2)
where g is the gravity acceleration and d is water depth.

It can be seen that it is important to incorporate the relative distance in the machine learning models. However, it is difficult to train a standard ANN model to learn the variations in field data effectively. With more input features, the ANN can become overly complex and start to overfit the training data rather than learning the underlying patterns. This overfitting can cause the model to struggle with accurately mapping less relevant input features, such as relative distance, to the downwave target surface wave elevation, potentially compromising the overall accuracy. The prediction accuracy drops significantly when considering relative distance as an additional input, as will be demonstrated in the following section. Hence, a more robust model is required to improve its generalization.

3 Methodology

In Sec. 3.1, we discuss the preprocessing steps for the training data. Sections 3.2 and 3.3 describe the standard neural network and the long short-term memory (LSTM) models. In Sec. 3.4, we present the proposed attention-based model. Section 3.5 describes the data-driven models utilized in Sec. 4. Section 3.6 provides the formula for calculating prediction errors.

3.1 Preprocessing of Training Data.

In this study, the model inputs consist of the record of surface wave elevation and horizontal displacements in the east and north directions of the three Spotters at upwave locations, and the GPS positions (east and north). The outputs (target prediction) are the surface wave elevation of the DWR4 at the downwave location. Consider five data streams (horizontal displacements ζx, ζy, surface wave elevation η, and GPS positions GE,GN) measured by each Spotter, denoted by Sα,t=(ζx(xα,t),ζy(xα,t), η(xα,t), GE(xα,t), GN(xα,t)) for α=1,2,3, t=1,,n, where the α indicates the Spotter index and n is the length of time-steps measured at the upwave location. The GPS positions measured by DWR4, denoted by D4,t=(GE(x4,t), GN(x4,t)) for t=1,,n.

The surface wave of DWR4 at position x4, ηx4=(ηx4,1,,ηx4,m) (m is the length of time-steps predicted at the downwave location), can be obtained using the approximation function F
(3)
where Sα=(Sα,1,,Sα,n), for α=1,2,3, D4=(D4,1,,D4,n), and F is approximated using ANN, LSTM, and attention-based neural network models in this paper.

We used 14 days of data from July 21 to Aug. 4, 2022, to train our models, which was split into two sets: training (80%) and validation (20%). We used a sliding window technique to rearrange the data into approximately 240,000 sets, where the sliding offset is 5 s. Each training set consisted of 204 s (equivalent to 408 time-steps with the data sampled every 0.5 s) of inputs and 264 s (equivalent to 528 time-steps) of output (204t0s for reconstruction and 0<t60s for forecasting). For instance, the initial set of input comprised Sα,1,Sα,2,,Sα,408, for α=1,2,3 and D4,1,D4,2,,D4,408. The corresponding output at the downwave location was ηx4,1,ηx4,2,,ηx4,528.

It is noted that the input time histories of the buoy motions in three degrees-of-freedom (DOF) are normalized by the significant wave height Hs, and the GPS positions for all buoys are also normalized by subtracting their mean and dividing by their standard deviation.

3.2 Artificial Neural Network Model.

Any ANN model is a universal approximation function mapping a set of input values X to output values Y. The primary objective of an ANN model is to establish the optimal approximation function by learning the model parameters—sets of weights W and biases b, such that ηx4=FANN(S1,S2,S3,D4;W,b).

A simple ANN has an input layer, one hidden layer, and an output layer. The input layer takes in the data, which is then passed to the hidden layer. Each layer has several neurons, and each neuron in one layer is connected to every neuron in the next layer. Neurons calculate their values by applying a nonlinear activation function to the weighted sum of their inputs with a bias term. For more details of the ANN model architecture, one can refer to Refs. [9,15,16]. Section 3.5 describes the two different versions of the ANN model used in this paper.

3.3 Long Short-Term Memory Model.

The LSTM is a variant of the recurrent neural network model (RNN), which is a neural network specifically designed to handle temporal input data, making them well-suited for tasks involving sequential data, such as text or time series data [17]. The LSTM model extends the traditional ANN model by incorporating feedback connections and internal memory. These feedback connections allow the network to retain information from previous inputs, as the activation values are fed back into the network at each time-step, enabling the model to consider past inputs in its processing.

A typical LSTM unit comprises a cell, an input gate, an output gate, and a forget gate, all of which regulate the flow of information. These gates enable the network to retain important information while discarding irrelevant data, thereby capturing long-term dependencies more effectively and mitigating the vanishing gradient problem, leading to improved performance over the standard RNN model [18]. Therefore, the LSTM model is preferred over the RNN model for the subsequent predictions in this study. For more details of the LSTM model architecture, one can refer to Refs. [19,20].

The LSTM model can outperform a simple ANN model in handling sequential data due to its advanced structure, which includes feedback connections and internal memory to retain information from previous inputs. However, LSTM may not always prioritize the most relevant information, leading to suboptimal utilization of available data. This limitation is particularly important in this study, where the measurement data involve numerous variables and complexities, as will be further discussed in Sec. 4. To overcome this limitation, we propose the attention-based model to enhance the prediction performance.

3.4 Attention-Based Neural Network Model.

Recently, the attention mechanism [21] has emerged as a novel set of layers within neural networks, gathering substantial attention, particularly in the context of sequence-based tasks. The attention mechanism can be understood as a vector of importance weights. The attention vector is used to determine the level of significance (which part of the input needs to be paid attention to) of a specific feature (such as GPS positions) with other features in the input data. The relative distance between the wave buoys and the sea states themselves, experience slow variations over time. The attention algorithm seems to be promising in capturing such variations to improve forecasting capability.

3.4.1 Sequential Encoding.

The schematic of the proposed attention-based model is shown in Fig. 5. Sequential encoding (SE) is a technique used to incorporate sequential information into the original input, which enables the model to capture the input order (i.e., the sequential order of input time-steps). In the proposed attention-based neural network model, the SE was calculated using specific sine and cosine functions of different frequencies as follows [21]:
(4)
where Seq is the index of input sequential order, Seq=1,,n, where n length of input time-steps (corresponding to n=408 input time-steps or equivalent to 204 s) and D is the input variable dimension, D=1,,dim, where dim is the number of input variables (i.e., dim=17 with nine data streams from 3DOF measurements and eight from GPS positions). The dimensions were distinguished into even (Dmod2=0) or odd (Dmod2=1), applying sine and cosine functions, respectively.
Fig. 5
Schematic of an attention-based neural network model for mapping input of 3DOF time history measurements of spotters (blue dashed box) and GPS positions for all buoys (green dashed box) to the output of DWR4 surface wave at downwave location (Color version online.)
Fig. 5
Schematic of an attention-based neural network model for mapping input of 3DOF time history measurements of spotters (blue dashed box) and GPS positions for all buoys (green dashed box) to the output of DWR4 surface wave at downwave location (Color version online.)
Close modal

As shown in Fig. 6, each row represents a sinusoidal wave with different wavelengths in different dimensions ranging from 2π to 10,0002π. Specifically, the first row corresponds to the vector added to one of the features of the input sequence (i.e., horizontal displacement ζx of Spotter 1). Within each row, there are 408 values, each falling within the range of 1 (white) to 1 (black) calculated using Eq. (4). The 17 rows in the sequential encoding represent the 17 dimensions from the input data streams, which are directly added to the input to create distinctive encoding patterns for different orders in the sequence.

Fig. 6
Sequential encoding with sinusoidal patterns, where colors represent values ranging from −1 (black) to 1 (white), with 0 represented as gray (Color version online.)
Fig. 6
Sequential encoding with sinusoidal patterns, where colors represent values ranging from −1 (black) to 1 (white), with 0 represented as gray (Color version online.)
Close modal

3.4.2 Attention Mechanism.

The general attention mechanism consists of three main components: queries Q, keys K, and values V. These components play a crucial role in determining how the attention mechanism focuses on different parts of the input sequence when computing the weighted sum for each time-step in the output sequence. For an input sequence X which has added sequential encoding, the attention mechanism can be implemented by the following steps [21] as highlighted in the orange box in Fig. 5:

  1. For each input vector, we create a query Q=XWQ, key K=XWK, and value V=XWV, where WQ, WK, WV are the learnable weight matrices that needs to be estimated.

  2. The attention value for each query Q is computed by mapping the query and all the key values to an output
    (5)
    where dk is the scaling factor introduced to help stabilize the training process, with dk being the input length for queries/keys. The output is a weighted sum of values, where each value’s weight is determined by the dot-product of the query with all keys through the softmax activation function [22].
  3. The attention mechanism is applied h times (also called multi-head attention) to enhance the model’s capacity. Each time, the input vectors are transformed into a different query, key, and value vector using different weight matrices WτQ,WτK,WτV for τ=1,,h. The outputs from each attention are concatenated and linearly transformed through the output weight matrix WO, which can be expressed as [21]
    (6)
    where {headτ}τ=1,,h=Attention(QWτQ,KWτK,VWτV).

The output obtained from a multi-head attention block is added with the original input using a residual connection, followed by layer normalization [23]. It is noted that layer normalization helps to enhance the training stability and convergence speed.

3.4.3 Fully Connected Feed-Forward Networks.

The output of the multi-head attention is then fed into a fully connected feed-forward network, which is applied to each position of the sequence from the previous layer separately using two linear transformations with a rectified linear unit (ReLU) activation function in between [24]. The feed-forward network with a residual connection can be expressed as
(7)
where x is the output from the previous layer, W and b are the weights matrix and the bias terms, respectively. Subscripts 1 and 2 correspond to the first and second layers.

After that, a global average pooling was applied to obtain the final representation of a single vector, which contributes to a more efficient computation in training and reduces the risk of overfitting. Finally, this vector is fed into a standard feed-forward network to predict the surface wave elevation at the downwave location. It is noted that for a standard ANN model used in Ref. [10], the highlighted gray box in Fig. 5 is simply replaced by a single fully connected feed-forward layer.

3.5 Model Specifications.

In this study, we consider the following four models:

  1. The ANN model which uses the time history of 3DOF measurements from three spotters as the only inputs of the model is denoted as ANN.

  2. The ANN model which incorporates both 3DOF measurements from three Spotters and the east and north GPS positions for all buoys are denoted as ANN*.

  3. The LSTM model with both 3DOF and GPS positions measurement is denoted as LSTM*.

  4. The neural network model based on attention mechanism with both 3DOF and GPS positions measurement is denoted as ANN*-Attention.

For the following results, we employ the Adam variant of the stochastic gradient descent algorithm [25] to estimate the parameters of all models. The objective is to minimize a loss between the measured and predicted surface wave elevation at position x4. To obtain both the median predictions and the 95% prediction intervals, we utilize a quantile loss function, expressed as follows [26]:
(8)
where {ηt}t=1,2,,m is the measured surface elevation of DWR4 at position x4, while {η~t}t=1,2,,m corresponds to the predictions, P is the estimated model parameters, q denotes the quantile value within the range of 0 and 1, where the 0.025 and 0.975 quantiles give the lower and upper bounds of the 95% prediction interval, respectively.

Based on the sensitivity checks, both ANN models were implemented with two hidden layers, each with 200 neurons. The LSTM model consists of two hidden layers with 100 cell units followed by a feed-forward layer with 200 neurons. The attention-based neural network model employed a total number of h=8 heads for the multi-head attention and the number of neurons in the feed-forward neural network layer was set to 200. It is noted that increasing the number of neurons, hidden layers, and number of heads does not improve the prediction performance. The Adam optimizer [25] and ReLU activation function [24] are used for training. The training batch size and the learning rate are set to 128 and 103, respectively. A quantile loss function in Eq. (8) and early stoppage were used. All models were implemented in python using Keras [27] and Tensorflow [28] packages. The computer used for implementation had an 8-core CPU, 14-core integrated GPU, and 16 GB RAM (Apple M1 Pro chip). The computation training time was approximately 15 min for ANN models, whereas it increased to 80 min and 110 min for the LSTM and attention-based neural network models, respectively. Once fully trained, predictions are completed in a fraction of a second.

3.6 Error Assessment.

To assess the prediction performance between different models, we calculate the normalized prediction error between the target value η and the prediction η~
(9)
where σ is the standard deviation of the target surface waves, and NR is the number of predictions being averaged over an hour with increments of 20 s for t=1,,m.

4 Results

Figure 7 shows the training and validation loss for all models. It can be seen that the ANN*-Attention model exhibits the best performance, with lower loss values compared to the other models during both training and validation processes. It is also noted that the large discrepancy in training and validation losses of ANN models is due to missing GPS input variable, leading to the phase offsets in predictions in the validation set. In particular, the observed lower accuracy of the LSTM* model compared to the attention-based model (ANN*-Attention) can be attributed to several factors inherent in their architectures. The LSTM* incorporated feedback connection and internal memory to retain the information from previous inputs. However, it may not necessarily prioritize the most relevant information. This limitation is particularly significant in our case, as the measurement data involves a high number of variables and complexities, such as the necessity of incorporating the GPS positions and the variations in wave conditions.

Fig. 7
Training (solid lines) and validation (dashed lines) losses of different models during the training process. To prevent overfitting, an early stoppage is applied to stop the training process at a point when performance on a validation set decreases.
Fig. 7
Training (solid lines) and validation (dashed lines) losses of different models during the training process. To prevent overfitting, an early stoppage is applied to stop the training process at a point when performance on a validation set decreases.
Close modal

With the highly complex data patterns, the forget gate in LSTM* may struggle to accurately identify and retain essential information. This can potentially lead to forgetting relevant information that is important for accurate wave predictions. Additionally, the interaction between the forget gate and other gates (input and output gates) might influence the retention of relevant information. If these gates do not function harmoniously, the overall performance of the LSTM* may be compromised, resulting in the loss of accuracy.

To overcome the above shortage, the attention-based model is used to improve the prediction performance. The attention mechanism enhances model accuracy by explicitly allowing the model to weigh different parts of the input sequence differently. This selective focus enables the attention-based model to capture complex hidden representations and variations in wave conditions and buoy positions more effectively than the LSTM* and ANN* models. Moreover, by leveraging the most relevant information at each time-step, the attention mechanism also allows the model to generalize better for unseen sea states and make more accurate predictions in complex data patterns.

Figure 8(a) depicts a typical directional spectrum based on one-hour displacement records measured from DWR4 from 00:00 a.m. to 01:00 a.m. on Aug. 09, 2022. The spectra show a long-period swell in Albany with a peak period of 15s propagating from 205 deg from the north. Figure 8(b) is computed based on the maximum entropy method [29], and shows narrow-banded waves in frequency with small directional spreading angles and the mean wave direction is aligned with the buoy array. Hence, it is expected that the prediction for these waves will be more accurate as long-period waves with small spreading angles also reduce the complexity of predictions.

Fig. 8
Directional wave spectra (in m2/Hz/deg) derived from one-hour buoy displacement records. Angular coordinates indicate the direction from which the waves propagate in degrees, with the radii representing wave periods in seconds: (a) a typical wave condition with relatively small directional spreading and (b) directional spreading function at the spectral peak frequency ωp.
Fig. 8
Directional wave spectra (in m2/Hz/deg) derived from one-hour buoy displacement records. Angular coordinates indicate the direction from which the waves propagate in degrees, with the radii representing wave periods in seconds: (a) a typical wave condition with relatively small directional spreading and (b) directional spreading function at the spectral peak frequency ωp.
Close modal

The comparison of prediction results using different models is shown in Fig. 9. The corresponding bulk parameters (significant wave height Hs, peak period Tp, separation angle θ¯θA, spreading angle σθ, and the relative distance (Dist)) for the tested case is shown at the top. Here θA represents the array direction and θ¯ stands for the mean wave direction in the hour. The directional bulk parameters (θ¯, σθ) are determined from the Fourier coefficients [30].

Fig. 9
Prediction comparison for single time series using different models: (a) ANN model with 3DOF of three spotter motion measurements only (ANN); ANN model with 3DOF measurements with GPS positions (ANN*) and its corresponding 95% prediction intervals, (b) LSTM* model, and (c) attention-based model using 3DOF measurements with GPS positions (ANN*-Attention) and its corresponding 95% prediction intervals. The bulk parameters and the relative distance (Dist) are shown at the top.
Fig. 9
Prediction comparison for single time series using different models: (a) ANN model with 3DOF of three spotter motion measurements only (ANN); ANN model with 3DOF measurements with GPS positions (ANN*) and its corresponding 95% prediction intervals, (b) LSTM* model, and (c) attention-based model using 3DOF measurements with GPS positions (ANN*-Attention) and its corresponding 95% prediction intervals. The bulk parameters and the relative distance (Dist) are shown at the top.
Close modal

Figure 9(a) shows that the predicted phase using the ANN* model is significantly better than the ANN model. This improvement is attributed to the inclusion of GPS position information. Such information is missing in the ANN model, resulting in phase off-set in prediction as it only relies on 3DOF buoy motions. However, the ANN* model appears to struggle with predicting amplitude compared to the ANN model. This is because the GPS position information adds extra complexity to the model, making it difficult to generalize well, particularly with large variations in buoy positions. It is also noted the 95% prediction interval is only plotted for the ANN* model in Fig. 9(a), and the ANN model has a wider prediction interval than that of ANN* due to the phase off-set. The level of uncertainty considerably increases beyond 30 s in predictions due to the loss of relevant wave information from the upwave measurements.

In Fig. 9(b), the LSTM* model results show a significant improvement in both phase and amplitude prediction compared to the simpler ANN and ANN* models. The LSTM* model benefits from its architecture, which is designed to handle sequential data and retain relevant information over longer periods. Hence, the LSTM* model effectively captures the temporal dependencies in the data, resulting in more accurate predictions. Furthermore, the prediction intervals for the LSTM* model are slightly narrower than ANN*, indicating reduced uncertainty and enhanced reliability in the predictions.

As shown in Fig. 9(c), it can be seen that the new model using the attention mechanism (ANN*-Attention) has yielded notable enhancements in accuracy compared to models without attention mechanisms, especially the accuracy in crests and troughs. Additionally, the narrower prediction interval further suggests that the attention mechanism can effectively allow the model to selectively learn and adapt to complex hidden representations and variations in GPS positions and wave conditions within the input features.

To compare the prediction performance further, we select four different cases with conditions taken from the three months of field data. The prediction results for the selected cases are shown in Fig. 10. It is noted that we only compared ANN*, LSTM*, and ANN*-Attention models because the phase off-set problem in the ANN model will lead to a very large prediction error, which is useless for practical applications. The models have been trained based on the initial 14-day data. In cases (a) and (b), the ANN* and LSTM* models are less accurate because the variations of different sea state combinations and different relative distances over 14 days introduce dissimilarities in the training data, which can reduce the prediction accuracy and generalization capability of the ANN model. However, by dynamically assigning different weights to various components of the input data during the prediction process, the attention mechanism enables the model to generalize better and contribute to more accurate predictions.

Fig. 10
Comparison of time series predictions on different dates and wave conditions between ANN*, LSTM*, and ANN*-Attention. The 95% prediction interval is only displayed for the ANN*-Attention model, and it is narrower than that of the ANN* and LSTM* models. The corresponding bulk parameters and the relative distance (Dist) are shown on the right side of the plots.
Fig. 10
Comparison of time series predictions on different dates and wave conditions between ANN*, LSTM*, and ANN*-Attention. The 95% prediction interval is only displayed for the ANN*-Attention model, and it is narrower than that of the ANN* and LSTM* models. The corresponding bulk parameters and the relative distance (Dist) are shown on the right side of the plots.
Close modal

Case (c) presents a more challenging prediction scenario due to its larger separation and directional spreading angles compared to cases (a) and (b). It is expected that the prediction accuracy decreases with increasing directional spreading angles, which is consistent with previous studies using synthetic data, as demonstrated by Chen et al. [9]. To enhance model accuracy for waves with significant directional spreading angles, additional upwave measurement buoys would be helpful.

In case (d), the separation and spreading angles are comparable to those in cases (a) and (b). However, the prediction accuracy is notably lower, especially there are some phase offsets observed in ANN* and LSTM* models. The loss of accuracy is primarily attributed to the occurrence of a relative distance of approximately 216 m. The low occurrence of such relative distances results in an insufficient number of observations in the training set. Hence, the prediction accuracy for this case would be expected to improve if more training data had been available.

The overall prediction error over the 88-day deployment is shown in Fig. 11. The dataset includes samples taken every 6 h over 88 days. Figure 11(a) indicates the separation angles θ¯θA and its variation range within an hour. Alongside the separation angles, the average angle away from the mean wave direction σθ is plotted. In Fig. 11(b), we show the averaged prediction error over a 12 s horizon for all models.

Fig. 11
(a) Hourly mean wave direction and spreading angles and (b) comparison of time-averaged prediction errors between different models every 6 h. The vertical dotted dashed purple lines correspond to the predictions in Fig. 10. (Color version online.)
Fig. 11
(a) Hourly mean wave direction and spreading angles and (b) comparison of time-averaged prediction errors between different models every 6 h. The vertical dotted dashed purple lines correspond to the predictions in Fig. 10. (Color version online.)
Close modal

The prediction performance aligns with expectations, with the ANN*-Attention model outperforming the other models. The ANN model exhibits numerous error spikes due to missing GPS position information and the models without the incorporation of the attention mechanism are generally less accurate due to the limitation in addressing the variation in both GPS positions and wave conditions effectively. Prediction performance is poor for waves with large directional spreading angles, regardless of all models because large spreading waves are often associated with shorter wave periods and a lack of wave information when using only three Spotters at the upwave locations. Therefore, additional upwave measurement buoys are required to improve the accuracy of models with large directional spreading angles of waves.

5 Conclusion

In this work, an attention mechanism-based model was applied to predict phase-resolved waves in the Southern Ocean near Albany, Western Australia. The model utilizes past measurement data on ocean surface elevation and horizontal displacements obtained from a set of three Spotter buoys to predict the surface elevation measured by a DWR4 located downwave of the Spotter array.

The relative positions of the buoys keep changing during the measurement, making it challenging to obtain accurate predictions. To overcome this limitation, we proposed a new method based on the neural network with an attention mechanism. The prediction performance of this method was compared with artificial neural network and long short-term memory models.

The analysis results suggest that the proposed new method has the capability to capture the variation in both buoy positions and wave conditions and thus is able to improve the forecasting capability. Furthermore, the attention-based model is shown to significantly enhance the robustness and accuracy of the conventional neural network model applied in Ref. [10]. The results indicate the promising potential of the proposed model in predicting moderate directionally spread waves based on an array of three spotters. For improved accuracy in predicting more highly spread waves, future work will investigate the use of additional wave buoys and larger, optimized arrays at upwave locations.

Acknowledgment

This research is supported by the ARC ITRH for Transforming Energy Infrastructure through Digital Engineering (TIDE) which is funded by the Australian Research Council (ARC), INPEX Operations Australia, Shell Australia, Woodside Energy, Fugro Australia Marine, Wood Group Kenny Australia, RPS Group, Bureau Veritas, and Lloyd’s Register Global Technology (Grant No. IH200100009). W.Z. is grateful for the Future Fellowship (Grant No. FT230100109) funded by ARC. We also would like to thank Professor Jeff Hansen and Dr. Thobani Hlophe from The University of Western Australia for providing access to the processed field data.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Naaijen
,
P.
, and
Blondel-Couprie
,
E.
,
2012
, “
Wave Induced Motion Prediction as Operational Decision Support for Offshore Operations
,”
Proceedings of the International Conference Marine Heavy Transport & Lift III
,
London, UK
, Vol. 3, pp.
24
25
.
2.
Ma
,
Y.
,
Sclavounos
,
P. D.
,
Cross-Whiter
,
J.
, and
Arora
,
D.
,
2018
, “
Wave Forecast and Its Application to the Optimal Control of Offshore Floating Wind Turbine for Load Mitigation
,”
Renewable Energy
,
128
, pp.
163
176
.
3.
Halstensen
,
S. O.
,
Vasilyev
,
L.
,
Zinchenko
,
V.
, and
Liu
,
Y.
,
2020
, “
‘Next Minutes’ Ocean Waves and Vessel Motion Predictions for More Efficient Offshore Lifting Operations
,”
SNAME Maritime Convention
,
Virtual
,
September 2020
.
4.
Hals
,
J.
,
Bjarte-Larsson
,
T.
, and
Falnes
,
J.
,
2002
, “
Optimum Reactive Control and Control by Latching of a Wave-Absorbing Semisubmerged Heaving Sphere
,”
Proceedings of the 21st International Conference on Offshore Mechanics and Arctic Engineering
,
Oslo, Norway
,
June 23–28
, Vol. 36142, pp.
415
423
.
5.
Falnes
,
J.
,
2001
, “
Optimum Control of Oscillation of Wave-Energy Converters
,”
The Eleventh International Offshore and Polar Engineering Conference
,
Stavanger, Norway
,
June 17
, Vol. 12, pp.
147
155
.
6.
Henriques
,
J. C. C.
,
Gato
,
L. M. C.
,
Falcão
,
A. F. D. O.
,
Robles
,
E.
, and
Faÿ
,
F. -X.
,
2016
, “
Latching Control of a Floating Oscillating-Water-Column Wave Energy Converter
,”
Renewable Energy
,
90
, pp.
229
241
.
7.
Hlophe
,
T.
,
Taylor
,
P. H.
,
Kurniawan
,
A.
,
Orszaghova
,
J.
, and
Wolgamot
,
H.
,
2023
, “
Phase-Resolved Wave Prediction in Highly Spread Seas Using Optimised Arrays of Buoys
,”
Appl. Ocean Res.
,
130
, p.
103435
.
8.
Hlophe
,
T.
,
Taylor
,
P. H.
,
Kurniawan
,
A.
,
Orszaghova
,
J.
, and
Wolgamot
,
H.
,
2023
, “
Optimised Wave-by-Wave Prediction of Spread Waves: Comparison With Field Data
,” International Conference on Offshore Mechanics and Arctic Engineering, Vol.
86878
,
American Society of Mechanical Engineers
, p.
V005T06A103
.
9.
Chen
,
J.
,
Taylor
,
P. H.
,
Milne
,
I. A.
,
Gunawan
,
D.
, and
Zhao
,
W.
,
2023
, “
Wave-by-Wave Prediction for Spread Seas Using a Machine Learning Model With Physical Understanding
,”
Ocean Eng.
,
285
, p.
115450
.
10.
Chen
,
J.
,
Hlophe
,
T.
,
Zhao
,
W.
,
Milne
,
I.
,
Gunawan
,
D.
,
Kurniawan
,
A.
,
Wolgamot
,
H.
,
Taylor
,
P.
, and
Orszaghova
,
J
,
2023
, “
Comparison of Physics-Based and Machine Learning Methods for Phase-Resolved Prediction of Waves Measured in the Field
,”
Proceedings of the 15th European Wave and Tidal Energy Conference
,
Bilbao, Spain
,
Sept. 3–7
, pp.
1
9
.
11.
Datawell
,
B. V.
,
2023
, “Datawell Waverider-4 (DWR4) Manual”.
12.
Yue
,
D.
,
Taylor
,
P. H.
,
Thobani
,
H.
, and
Zhao
,
W.
,
2024
, “
Comparison of Two Types of Wave Buoys: Linear and Second-Order Motion
,”
Proceedings of the ASME 2024 43rd International Conference on Ocean, Offshore and Arctic Engineering
,
Singapore
,
June 9–14
.
13.
Ocean
,
S.
,
2023
, “Sofar Spotter”.
14.
Hlophe
,
T.
,
Wolgamot
,
H.
,
Taylor
,
P. H.
,
Kurniawan
,
A.
,
Orszaghova
,
J.
, and
Draper
,
S.
,
2022
, “
Wave-by-Wave Prediction in Weakly Nonlinear and Narrowly Spread Seas Using Fixed-Point Surface-Elevation Time Histories
,”
Appl. Ocean Res.
,
122
, p.
103112
.
15.
Law
,
Y. Z.
,
Santo
,
H.
,
Lim
,
K. Y.
, and
Chan
,
E. S.
,
2020
, “
Deterministic Wave Prediction for Unidirectional Sea-States in Real-Time Using Artificial Neural Network
,”
Ocean Eng.
,
195
, p.
106722
.
16.
Chen
,
J.
,
Hlophe
,
T.
,
Gunawan
,
D.
,
Taylor
,
P. H.
,
Milne
,
I. A.
, and
Zhao
,
W.
,
2024
, “
Phase-Resolved Wave Prediction With Varying Buoy Positions in the Field Using Machine Learning-Based Methods
,”
Ocean Eng.
,
307
, p.
118107
.
17.
Chung
,
J.
,
Gulcehre
,
C.
,
Cho
,
K.
, and
Bengio
,
Y.
,
2015
, “
Gated Feedback Recurrent Neural Networks
,”
Proceedings of the 32nd International Conference on Machine Learning
,
Lille, France
,
July 7–9
, PMLR, pp.
2067
2075
.
18.
Hochreiter
,
S.
, and
Schmidhuber
,
J.
,
1997
, “
Long Short-Term Memory
,”
Neural Comput.
,
9
(
8
), pp.
1735
1780
.
19.
Ma
,
X.
,
Duan
,
W.
,
Huang
,
L.
,
Qin
,
Y.
, and
Yin
,
H.
,
2022
, “
Phase-Resolved Wave Prediction for Short Crest Wave Fields Using Deep Learning
,”
Ocean Eng.
,
262
, p.
112170
.
20.
Yu
,
Y.
,
Si
,
X.
,
Hu
,
C.
, and
Zhang
,
J.
,
2019
, “
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures
,”
Neural Comput.
,
31
(
7
), pp.
1235
1270
.
21.
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A. N.
,
Kaiser
,
Ł.
, and
Polosukhin
,
I.
,
2017
, “
Attention is All You Need
,”
31st Conference on Neural Information Processing Systems (NIPS 2017)
,
Long Beach, CA
.
22.
Liu
,
W.
,
Wen
,
Y.
,
Yu
,
Z.
, and
Yang
,
M.
,
2016
, “Large-Margin Softmax Loss for Convolutional Neural Networks,” arXiv preprint arXiv:1612.02295.
23.
Ba
,
J. L.
,
Kiros
,
J. R.
, and
Hinton
,
G. E.
,
2016
, “Layer Normalization”. arXiv preprint arXiv:1607.06450.
24.
Jarrett
,
K.
,
Kavukcuoglu
,
K.
,
Ranzato
,
M.
, and
LeCun
,
Y.
,
2009
, “
What is the Best Multi-stage Architecture for Object Recognition?
IEEE 12th International Conference on Computer Vision
,
Kyoto, Japan
,
Sept. 29
, IEEE, pp.
2146
2153
.
25.
Kingma
,
D. P.
, and
Ba
,
J.
,
2015
, “
Adam: A method for stochastic optimization
,”
The 3rd International Conference for Learning Representations
,
San Diego, CA
,
May 7–9
.
26.
Or
,
D. B.
,
Kolomenkin
,
M.
, and
Shabat
,
G.
,
2020
, “Generalized Quantile Loss For Deep Neural Networks,” arXiv preprint arXiv:2012.14348.
27.
Chollet
,
F.
,
2015
, “Keras: Deep Learning Library For Theano and Tensorflow”.
28.
Abadi
,
M.
,
Agarwal
,
A.
,
Barham
,
P.
,
Brevdo
,
E.
,
Chen
,
Z.
,
Citro
,
C.
,
Corrado
,
G. S.
,
Davis
,
A.
,
Dean
,
J.
,
Devin
,
M.
, and
Ghemawat
,
S.
,
2016
, “Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”. arXiv preprint arXiv:1603.04467.
29.
Lygre
,
A.
, and
Krogstad
,
H. E.
,
1986
, “
Maximum Entropy Estimation of the Directional Distribution in Ocean Wave Spectra
,”
J. Phys. Oceanogr.
,
16
(
12
), pp.
2052
2060
.
30.
Kuik
,
A. J.
,
Van Vledder
,
G. P.
, and
Holthuijsen
,
L. H.
,
1988
, “
A Method for the Routine Analysis of Pitch-and-Roll Buoy Wave Data
,”
J. Phys. Oceanogr.
,
18
(
7
), pp.
1020
1034
.