## Abstract

The computational cost of modern simulation-based optimization tends to be prohibitive in practice. Complex design problems often involve expensive constraints evaluated through finite element analysis or other computationally intensive procedures. To speed up the optimization process and deal with expensive constraints, a new dimension selection-based constrained multi-objective optimization (MOO) algorithm is developed combining least absolute shrinkage and selection operator (LASSO) regression, artificial neural networks, and grey wolf optimizer, named L-ANN-GWO. Instead of considering all variables at each iteration during the optimization, the proposed algorithm only adaptively retains the variables that are highly influential on the objectives. The unselected variables are adjusted to satisfy the constraints through a local search. With numerical benchmark problems and a simulation-based engineering design problem, L-ANN-GWO outperforms state-of-the-art constrained MOO algorithms. The method is then applied to solve a highly complex optimization problem, the design of a high-temperature superconducting magnet. The optimal solution shows significant improvement as compared to the baseline design.

## 1 Introduction

Two fundamental challenges exist for a complex simulation-based optimization problem: (1) long simulation time and (2) expensive constraints. Finite element analysis (FEA) and other numerical simulation methods are often employed to predict the performance of a complex system, where a single simulation can take hours or days to complete. Since optimization, especially multi-objective optimization (MOO), usually needs to call the simulation iteratively to search for optima, the computational cost for optimization often becomes prohibitive. On the other hand, constraints involved in complex engineering problems are usually obtained together with objective values as a result of simulations. Therefore, how to deal with expensive constraints becomes a question in improving the optimization efficiency for complex problems.

By reducing the number of expensive simulation calls during the optimization of such types of problems, surrogate models are generally employed. The common way to use a surrogate model in optimization is to construct and validate them before optimization. The structure of the model remains stable during optimization [1]. In the cases studied in this article, the accuracy of the surrogate model significantly influences the optimization results. If the approximation accuracy is low, the Pareto frontier obtained from the surrogate model will not be acceptable. Therefore, the state-of-art surrogate model-based optimization algorithms usually update the surrogate model adaptively during the optimization iterations to improve the accuracy in specific areas of interest. This is the case for the Pareto set pursuing method [2,3]. Employing a kernel function and a prior distribution, Bayesian optimization (BO) [4] focuses on using acquisition functions to guide new sample generation. In multi-objective BO, two kinds of operations are applied usually: one is to convert the multi-objective problems to single objective [5–8], and the other is to employ Pareto frontier-related acquisition functions, such as expected hypervolume improvement [9–11] and predicted entropy search [12]. In addition, adaptive surrogate model-assisted evolutionary algorithms can be employed as a design space exploration strategy, where surrogate models are updated when new samples are generated [13–15].

To further improve the efficiency of the optimization algorithm, the idea of focusing on only a subset of the dimensions at each iteration is proposed in single-objective optimization algorithms. In dynamic coordinate search (DYCORS) [16], the dimensions are selected randomly and only the selected dimensions are perturbated during the iteration. A more aggressive dimension selection method, based on the sensitivity of the variables, has been developed in methods such as mode-pursuing sampling using discriminative coordinate perturbation [17] and partial metamodel optimization [18].

With the development of artificial intelligence (AI), more and more AI-based MOO algorithms are proposed. Machine learning methods, such as extreme learning machines [19] and convolutional neural networks [20], are employed as surrogate models to deal with MOO problems. However, the networks used in both studies are static and a large number of initial samples are needed to ensure the accuracy of the surrogate. Additionally, compared to other surrogate models, such as radial basis function (RBF), deep neural networks usually need much more samples to train the model. On the other hand, a k-means clustering method is used in a large-scale MOO algorithm to divide the design variables into two groups [21]. Since the cluster needs samples by perturbing each design variable, extra computational costs are required before optimization.

How to deal with expensive constraints is another research direction. Constraint handling strategies can be classified into four categories: using a penalty function, separating objectives and constraints, converting constraints into an extra objective, or hybrid methods [22]. Different penalty types are utilized in the category of using penalty functions. The main issue with using a penalty function is the difficulty to determine the penalty factors. The adaptive penalty factor is the most attractive in this category [23]. In the category of separating objectives and constraint methods, one of the common approaches is considering constraints in the dominance principles, such as constraint dominance principle [24] and epsilon constraint handling [25]. Other constraint handling approaches are developed recently. In the push and pull search framework [26], the optimization process is divided into the pull and push stages. Only the objectives are considered in the pull stage, and all objectives and constraints are involved in the pushing stage. In Ref. [27], a coevolutionary framework is constructed by optimizing two separated groups of the population concerning two problems: one is the original problem with constraints and the other is the original problem without constraints. During optimization, the two population groups are connected to share information to converge to the feasible Pareto solutions. The method of converting constraints into an extra objective is proposed in Ref. [28], where the degree of constraint violation is calculated and treated as an extra objective to be optimized to find feasible solutions located at the constraint boundaries. But in MOO, the Pareto solutions do not always locate on the boundaries. Therefore, hybrid methods are developed, such as infeasibility-driven evolutionary algorithms (IDEA) [29]. In IDEA, two groups of the population are generated: one is solved for the original constrained MOO problem and the other smaller group is solved by converting the constraint violation as an extra objective to focus on the boundaries of the constraints. However, the number of function evaluations for this kind of method is usually higher than for other kinds of methods.

Meanwhile, surrogate models are employed to handle expensive constraints to reduce the computational cost. One way is to employ surrogate models in the constrained MOO algorithms. For example, a surrogate model-based evolution strategy uses the prediction values from RBF models of objectives and constraints to estimate the constraint dominance scores for each offspring in a framework of non dominated sorting genetic algorithm-II (NSGA-II) [30]. By directly replacing expensive simulations with surrogate models, it becomes harder to find feasible solutions when the accuracy of the surrogate model is low. Another way is to adjust the surrogate model-based constraint handling strategies in single-objective optimization to solve multi-objective problems. In Refs. [31,32], a surrogate-based constraint handling process, named constrained optimization by radial basis function approximation [33], is employed to deal with the expensive constraints by optimizing a problem based on surrogate models. In both cases, an optimization problem based on surrogate models is solved at each iteration to produce a new sample for a MOO problem. As a result, the extra optimization process will increase the complexity when dealing with high-dimensional problems. In multi-objective BO, one way to deal with constraints is to add constraints to the acquisition function to solve a constrained suboptimization problem to find feasible new samples [34]. Another way is to include constraint satisfaction in the acquisition function [9]. However, multi-objective BO needs to solve a suboptimization problem on the acquisition function, and it is an inefficient approach when high-dimensional problems are considered.

In this article, we focus on the strategy of separating the objectives and constraints to avoid involving unnecessary complexity in the original problem. To inherit the high efficiency of a partial model and fully utilize each dimension to deal with constrained optimization problems, an AI technique-based dimension selection strategy and related constraints handling processes are developed in this article. A novel sensitivity-based dimension selection method using the least absolute shrinkage and selection operator (LASSO) [35] regression is proposed to separate the variables into two subsets, one for objectives and one for constraints. The subset of variables related to objectives is updated by a grey wolf optimizer (GWO) only considering the objectives, while the other subset of variables is adjusted to find feasible solutions through a random local search process. To reduce the number of expensive simulations evaluated during optimization, artificial neural networks (ANNs) for objectives and constraints are trained based on the existing samples and updated when new samples are generated along with optimization.

Superconducting magnetic field optimization is an excellent example of a complex multi-objective optimization problem with expensive constraints. Superconducting magnet has highly nonlinear material properties. FEA simulations are needed to obtain the magnet field properties. However, few studies have been developed using optimization algorithms to deal with superconducting magnetic field optimization [23,24]. In this article, the proposed MOO algorithm is employed to solve a superconducting magnetic field design problem to minimize the amount of required superconductor material and to obtain the largest magnetic field under the requirements of the field qualities.

This article is organized as follows. Section 2 details the proposed method. To validate the performance of the proposed algorithm, comparisons with ANN-GWO and well-known state-of-the-art optimization algorithms are performed in Sec. 3. Then, the proposed method is applied to solve solenoid magnetic design problems using Cu tapes and second-generation high-temperature superconductor (2G HTS) tapes in Sec. 4. Section 5 concludes the article by summarizing the properties of the algorithm.

## 2 Proposed Optimization Algorithm

The proposed constrained multi-objective optimization algorithm is introduced in this section. At the beginning of each iteration, dimensions are divided into two subsets, one subset is for objectives and the other is for constraints. The GWO updating mechanism updates the values of the samples focusing on the subset of dimensions for objectives. Constraint handling operations are performed on the subset of dimensions for constraints to find feasible solutions afterward. To utilize the expensive simulation efficiently, only the candidates selected based on the prediction values from ANNs are evaluated by the simulation. In this section, the components of the algorithm are introduced first and the overall flowchart of the algorithm is summarized at the end.

### 2.1 Dimension Selection Strategy Using LASSO.

LASSO is a variable selection method derived from the ordinary least square (OLS) regression method [35]. LASSO is a method of choice for cases where several potential features or variables are most influential. LASSO is not employed alone in our approach but in conjunction with an L2 regularization approach such as OLS methods or ANN, for example. As an L1 regularization method, LASSO requires a few numbers of samples to achieve a relatively accurate prediction. Additionally, since the goal of using LASSO in this article is for variable selection rather than for function prediction, using nonlinear selection methods such as random forest or autoencoder methods would bring additional computing costs. In addition, those nonlinear approaches demand large training data sets and thus are not compatible with the goal of optimization using the least number of samples in this work. Additionally, the process is iterative, and the selection is also updated with the iterations. Thus, LASSO is selected to realize the dimension selection strategy. The details are presented as follows.

**represents the position, $Y^$ is the predicted response from the regression model, and**

*X**w*

_{0}and

**are the coefficients for the linear model. The LASSO regression is to find the value**

*w**w*

_{0}and

**to solve the minimization problem as shown in Eq. (1).**

*w**n*is the number of positions,

*λ*is a nonnegative parameter, and

*p*is the number of the weights

**. For a given**

*w**λ*, due to the property of the L1 regression function (i.e., the second term in Eq. (1)), some elements in

*w*

_{0}and

**will be zero. By gradually increasing the value of**

*w**λ*, more coefficients become zero, which means fewer variables contribute to the linear model. However, the prediction accuracy of the linear model will become worse with the decreasing number of selected variables. Thus, the question is, what value of

*λ*is to be selected to determine the regression model? In this article, to find the optimal set of selected variables, an accuracy criterion is proposed to select the value of

*λ*. The largest

*λ*value (

*λ*max) that gives a nonnull model is calculated. Then, a vector of a geometric sequence of

*λ*values is generated in the range [10

^{−4}*

*λ*max,

*λ*max]. For each

*λ*value, the problem in Eq. (1) is solved to obtain a regression model and the regression error of the model is estimated. The regression errors of the models (i.e., mean square error (MSE)) with two adjacent

*λ*values are compared. If the relative difference between two regression errors is larger than a threshold (e.g., 10

^{−2}), it means an important variable is missing in the model with the larger

*λ*value. Thus, the regression model using the smaller

*λ*value is selected, and the variables whose coefficient is not zero are selected as the important variables. Note that the important variables are those that have a large impact on the objectives. According to the selection result, the set of variables can be divided into two subsets: one includes the selected variables from LASSO regression that have a large impact on the objectives and the other includes the rest of the variables that are used in constraint handling. In this article, $Xvar_obj$ is used to represent the variables selected in the objective subset, while $Xvar_con$ represents the variables in the constraint subset.

An example of how the dimension selection strategy works is presented as follows. In this case, the first objective of the CF6 problem (equations in the Appendix) is approximated using LASSO regression. First, *λ*max is determined as 0.1519 since when *λ* is larger than the value, all the weights (*w*_{0} and ** w**) in Eq. (1) shrink to zero. Then, a geometric sequence vector of

*λ*is generated in the range [1.5e-5, 0.1519], where 100

*λ*s are generated. For each

*λ*value, Eq. (1) is solved to obtain the regression model. The prediction errors presented by MSE are plotted in Fig. 1(a), while the relative differences of MSE values for two adjacent

*λ*values are plotted in Fig. 1(b). As the threshold is set to 0.01, the 75th

*λ*is selected, where

*λ*= 0.0148. For the given

*λ*for the CF6 problem, the weight for the ninth variable (

*X*_{9}) is zero. Therefore, $Xvar_obj$ is [

*X*_{1},

*X*_{2}, …,

*X*_{8},

*X*_{10}] and $Xvar_con$ is

*X*_{9}.

**) and their objective values (**

*X***) to create nonlinearity. Then, the regression problem in Eq. (1) is modified to Eq. (2).**

*Y*An adaptive strategy is used in the proposed algorithm in that the LASSO regression model is updated at each iteration. In the early iterations, few samples have been generated. Some variables are not selected as the objective subset, which means more variables can be used in constraint handling (details in Sec. 3.1) to find feasible solutions. Additionally, the uncertainty caused by the lack of samples may lead to the selection varying at each iteration, which increases the explorative ability of the algorithm. On the other hand, by updating the LASSO regression with new samples, errors in selections can be corrected. As a result, the optimization can focus on the real important variables at the latter iterations for exploitation. More details are presented in Sec. 3.1.

### 2.2 Artificial Neural Network Approximation.

Multilayer neural networks [36] are employed to approximate the objectives and constraints to reduce the number of expensive simulations in the optimization. Compared to conventional surrogates, such as RBF and Kriging, ANN is a regression model that does not always go through every sample point. The partial model with the selected variables can be regarded as a sparse dataset with zero values for the nonselected variables. In Ref. [37], it is reported that ANN performs better in sparse data regression problems compared to the Kriging model. The usage of ANNs in this article is to judge which newly generated sample is not dominated by others at the current iteration. New ANNs are constructed at each iteration and only used at the iteration. For this purpose, a multilayer ANN satisfies the requirement. Thus, ANNs with two hidden layers and four hidden neurons for each layer are selected to construct for each objective and constraint, respectively. The nonlinear tansigmoid function is used as the activation function in all ANNs. For the objective functions, only the variables selected from the LASSO regression are used as the input nodes. At each iteration, the number of inputs for the objective ANNs is updated and the ANNs are re-trained with all the existing samples. On the other hand, all the variables are used to construct the ANNs for constraints, and the ANNs will also be updated for all samples.

### 2.3 GWO Updating Mechanism.

*r*

_{1}and

*r*

_{2}are random numbers generated in the range [0, 1]. To speed up the updating process, the dimensions that have a large influence on the objective values participate in the updating mechanisms. The nonselected dimensions remain at the original value and are updated in the constraint handling process to find a feasible solution. The parameter

*a*is used to balance the exploitation and exploration in the GWO updating process, which is controlled by the current number of function evaluations and the maximal function evaluations (maxNFE).

$x\alpha $, $x\beta $, and $x\delta $ represent the three leaders of the wolves (i.e., the three best-performing samples) to guide the position updating process, which are selected following the criteria in the original multi-objective GWO algorithm [39]. The entire design space is divided into multiple subspaces, and the subspace with a smaller number of nondominated solutions has a large chance to be selected. In this case, the proposed algorithm employs the GWO updating mechanism to ensure the spread of the Pareto set.

### 2.4 Constraint Handling Operations.

The constraint handling operations are performed at the initial sampling step, the potential nondominated solution determination step, and the current Pareto set determination step. Note that a nondominated solution is used among the newly generated samples at one iteration to represent the sample that is not dominated by other newly generated samples. On the other hand, if one sample is not dominated by all the other existing samples, it is called a Pareto solution. At initial sampling, the goal is to generate an initial Pareto set. The second handling operation is the key part that adjusts the values for the subset of variables to satisfy constraints. This operation is performed after updating the positions of the samples using GWO. The last operation is a checking process to ensure all the solutions in the Pareto set are feasible. The details are introduced as follows.

*Constraint operation 1*: *Initial Pareto set determination considering the feasibility*

Initial samples are generated randomly in the design space and their feasibility is checked to determine the initial Pareto set. As per the requirements of the GWO updating mechanism, there is at least one solution in the current Pareto set. Thus, when there are no feasible solutions found from the initial sampling, a constraint operation is performed to find a potential Pareto solution for GWO. If one solution is the optimum for one objective, it could be a Pareto solution. Following this logic, an ANN is constructed for one objective, and a single-objective local search is performed on the ANN to obtain the potential Pareto solution. To be clarified, the ANN constructed in *constraint operation 1* is only used in the initial sampling step. ANNs are constructed for each objective at the following optimization steps.

*Step 1.1: Start point determination*Calculate the maximal constraint violation values for each sample.

Select the sample with the smallest violation value as the starting point,

*x*_{0}.

*Step 1.2: ANN construction*Construct ANN_obj(

) based on a randomly selected objective.*X*Construct the ANN_con(

) for all constraints.*X*

*Step 1.3: Local search*Use sequential quadratic programming (SQP) to perform a local search on ANN_obj and ANN_con. Obtain the solution as

*x*_{fe}.Evaluate

*x*_{fe}using simulations to obtain the objectives response (*f*_{fe}) and constraints response (*g*_{fe}).Create the Pareto set, [

*X*_{P},*F*_{P},*G*_{p}] ← [*x*_{fe},*f*_{fe},*g*_{fe}].Store the new sample [

*x*_{fe},*f*_{fe},*g*_{fe}] into the sample base.

*Constraint operation 2: Local search considering feasibility*After updating the dimensions related to objectives, the new positions of the samples can be presented as $[Xvar_objnew,Xvar_con]$. For each sample $xnew=[xvar_objnew,xvar_con]$, the following constraint operation is performed.

*Step 2.1: Feasibility checking*Estimate the constraint values for the sample using ANNs, $g^=ANN_con(xnew)$.

Check the feasibility of the sample. If the sample is feasible, return $xnew,c=[xvar_objnew,xvar_con]$, else go to

*Step 2.2*.

*Step 2.2: Local search*Randomly change the values of the dimensions for constraints, $xvar_connew$.

Check the feasibility of the new sample, $xnew,c=[xvar_objnew,xvar_connew]$ by the predicted values from ANNs.

If the sample is feasible, return the new sample

*x*^{new,c}; otherwise, repeat this step until a feasible solution is found or the maximum repetition number is reached. In this case, the maximum repetition number is set to 20.

Note that no feasible solution may be found at this step. In this case, one sample will be randomly selected from the set of samples to be evaluated by the simulation.

*Constraint operation 3.*This operation is performed at the end of each iteration when determining the current Pareto set. As infeasible solutions may exist from the initial samples, the feasibility of the solutions in the current Pareto set is checked. When the number of current solutions is larger than one and the infeasible solution exists, the infeasible solution is removed from the current set; otherwise, the infeasible solution is the only solution in the current set, and it will be kept until new solutions are found.

### 2.5 Overall Algorithm.

Figure 2 shows the flowchart of the L-ANN-GWO algorithm. The details of each step are listed as follows.

*Step 1: Initial random sampling:*Generate

*N*uniform random points*X*_{wolf}, where*N*is the number of wolves.Evaluate samples by the simulation, [

*F*_{wolf},*G*_{wolf}] ← Simulation (*X*_{wolf}).Store initial samples in the sample base, [

,*X*,*F*] ← [*G**X*_{wolf},*F*_{wolf},*G*_{wolf}].

*Step 2: Pareto set determination*If a feasible solution exists, determine the current Pareto set, [

*X*_{P},*F*_{P},*G*_{p}].Else, perform constraint handling operation 1 to produce the Pareto set.

*Step 3: Dimension selection*Use LASSO to divide the variables into two subsets: one for objectives, $Xvar_obj$, and the other for constraints, $Xvar_con$.

*Step 4: ANN approximation*Train ANN_obj with respect to $Xvar_obj$.

Train ANN_con with respect to

.*X*

*Step 5: Position updating*Update the position of the selected variables via Eq. (3), $Xwolf,var_objnew\u2190GWO(Xwolf,var_obj,XP)$.

*Step 6: Constraint operation 2*Perform

*constraint operation 2*to generate the feasible sample set, $Xwolfnew,c=[Xwolf,var_objnew,c,Xwolf,var_connew,c]$.Output the feasible solution set $Xfenew,c$.

*Step 7: Nondominated solution determination*If $Xfenew,c\u2209\Phi $, find the nondominated solutions from $Xfenew,c$ according to the objective predicted value from ANNs, $Xnon\u2212dom\u2190findNonDom(Xfenew,c,ANN_obj(Xfenew,c))$.

Otherwise, a sample is randomly selected from $Xwolfnew,c$, $Xnon\u2212dom\u2190randSelect(Xwolfnew,c)$.

Evaluate the nondominated solution(s) by simulation, [

*X*_{non-dom},*F*_{non-dom},*G*_{non-dom}] ← simulation(*X*_{non-dom}).Store the samples in the sample base, [

,*X*,*F*] ← [[*G*,*X*,*F*];[*G**X*_{non-dom},*F*_{non-dom},*G*_{non-dom}]].

*Step 8: Pareto set updating*Check the feasibility of

*G*_{non-dom}by the real constraint values (*G*_{non-dom}),*X*_{non-dom,fe}←*checkFeasible*(*X*_{non-dom},*G*_{non-dom}).If $Xnon\u2212dom,fe\u2209\Phi $, update the Pareto set, [

*X*_{P},*F*_{P},*G*_{p}] ←*updatePareto*([*X*_{P},*F*_{P},*G*_{p}], [*X*_{non-dom,fe},*F*_{non-dom,fe},*G*_{non-dom,fe}]).Otherwise, Pareto set remains.

*Step 9: Constraint operation 3*Perform the

*constraints operation 3*to check the current Pareto set.

*Step 10: Stopping criterion checking*If the number of simulation evaluations is larger than the preset maximum number of function evaluations, terminate the optimization; otherwise, go back to

*Step 3*.

## 3 Numerical Benchmark Problems Tests and Results

To test and verify the performance of the proposed algorithm, seven constrained multi-objective optimization problems [3] are employed in this section. The function of each benchmark is shown in the Appendix. Note that the objectives and constraints of the benchmarks are assumed to be black-box and expensive simulation models. In this case, the maximum number of function evaluations is set to a small number (i.e., 500) for all the algorithms to test the effectiveness of the proposed algorithm when the number of simulations is limited.

### 3.1 Comparison with ANN-GWO.

The effectiveness of the dimension selection strategies used in the proposed algorithm is tested first by comparing it to the ANN-GWO, which does not contain the dimension selection stage. Different from L-ANN-GWO, in ANN-GWO, all variables are used to construct the ANNs for objectives and participate in the GWO updating process. Accordingly, *constraint operation 2* is not performed in the ANN-GWO. Instead, the candidate solutions are checked by the predicted values from the constraint ANN models, and only the feasible nondominated solutions are evaluated by the actual objective and constraint functions. In the tests, the maximum number of function evaluations for both methods is set to 500. The number of wolves is set to be 20.

The performance criterion, hypervolume [40], is applied to judge the performance of each algorithm. The hypervolume measures the volume in the objective space that is encapsulated by the Pareto frontier and a reference point. To make the reference point consistent in the comparison of different algorithms, the point is defined as the maximum of all objective values in all three sets of Pareto solutions. For a given Pareto solution in a set of solutions, the hypercube using the solution and the reference point as the diagonal corners is defined and the volume of the hypercube is calculated. The summation of the volumes of the hypercubes defined by all the Pareto solutions is defined as the hypervolume of the Pareto set. As shown in Fig. 3, the area within the dotted line indicates the hypervolume of the Pareto set. A large hypervolume value means that the obtained solutions are close to the actual Pareto frontier, which indicates a better performance of the optimization algorithm. In this article, all the benchmark problems are solved by each algorithm 30 times, and the mean and standard deviation values are shown in Table 1.

L-ANN-GWO | ANN-GWO | |||
---|---|---|---|---|

Hypervolume | Rate of success | Hypervolume | Rate of success | |

CF6 | 21.71 (1.24) | 100% | 21.00 (1.08) | 100% |

CF7 | 254.77 (9.75) | 100% | 245.15 (19.13) | 100% |

P113mod | 8.60E + 05 (4.81E + 04) | 100% | 6.89E + 05 (5.28E + 04) | 33.3% |

TP3mod | 89.40 (27.09) | 100% | 74.08 (32.72) | 66.7% |

P106mod^{a} | 6.6E + 03 (0.54E + 03) | – | – | – |

P116mod | 8.45E + 03 (1.16E + 03) | 100% | 6.26E + 03 (2.46E + 03) | 100% |

Beam | 0.026 (0.001) | 100% | 0.025 (0.001) | 60% |

L-ANN-GWO | ANN-GWO | |||
---|---|---|---|---|

Hypervolume | Rate of success | Hypervolume | Rate of success | |

CF6 | 21.71 (1.24) | 100% | 21.00 (1.08) | 100% |

CF7 | 254.77 (9.75) | 100% | 245.15 (19.13) | 100% |

P113mod | 8.60E + 05 (4.81E + 04) | 100% | 6.89E + 05 (5.28E + 04) | 33.3% |

TP3mod | 89.40 (27.09) | 100% | 74.08 (32.72) | 66.7% |

P106mod^{a} | 6.6E + 03 (0.54E + 03) | – | – | – |

P116mod | 8.45E + 03 (1.16E + 03) | 100% | 6.26E + 03 (2.46E + 03) | 100% |

Beam | 0.026 (0.001) | 100% | 0.025 (0.001) | 60% |

Note: Data are presented as mean (standard deviation). The best performances are presensed in bold face.

P106mod is a special case, which is explained in Sec. 4.

As shown in Table 1, by adding the dimension selection strategy, L-ANN-GWO outperforms ANN-GWO in all the test problems with higher hypervolume values. The column named rate of success shows the percentage of the runs that find feasible Pareto solutions out of the total 30 runs. For example, for P113mod, the number of runs finding feasible Pareto solutions using ANN-GWO is 10 out of 30; therefore, the rate of success is 33.3%. Tables 2 and 3 show examples of selected variables in L-ANN-GWO solving CF6 and P116mod problems concerning the iterations. Note that the algorithm terminates according to the maximum number of function evaluations. As shown in step 7 in Sec. 2.5, every feasible nondominated solution generated at one iteration is evaluated by the simulation, which means the number of function evaluations at each iteration may vary for different test problems. Thus, the total number of iterations (i.e., the iteration number in the last row of Tables 2 and 3) is different for different problems. For the functions in the CF6 problem, every dimension shares similar importance to both objectives. As shown in Table 2, a few dimensions are not selected at the early stage of the optimization due to the lack of samples. By updating the LASSO selection with newly generated samples, all the dimensions are selected from the 15th iteration, which reflects well the mathematical definition of the problem. At the early stage, the update mechanism focusing on some of the dimensions makes the algorithm more aggressive to generate more nondominant solutions. As a result, when all the variables participated in the optimization, more nondominant solutions increase the spread of the Pareto frontier according to the leader selection strategy in the GWO updating mechanism.

No. of iteration | x_{1} | x_{2} | x_{3} | x_{4} | x_{5} | x_{6} | x_{7} | x_{8} | x_{9} | x_{10} |
---|---|---|---|---|---|---|---|---|---|---|

1 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

2 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

3 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

5 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

10 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

15 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

20 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

30 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

50 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

99 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

No. of iteration | x_{1} | x_{2} | x_{3} | x_{4} | x_{5} | x_{6} | x_{7} | x_{8} | x_{9} | x_{10} |
---|---|---|---|---|---|---|---|---|---|---|

1 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

2 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

3 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

5 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

10 | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

15 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

20 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

30 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

50 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

99 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |

No. of iteration | x_{1} | x_{2} | x_{3} | x_{4} | x_{5} | x_{6} | x_{7} | x_{8} | x_{9} | x_{10} | x_{11} | x_{12} | x_{13} |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

2 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

3 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||

4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

5 | ● | ● | ● | ● | ● | ● | ● | ||||||

10 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

15 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

20 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

30 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||

35 | ● | ● | ● | ● | ● | ● | ● | ||||||

47 | ● | ● | ● | ● | ● | ● | ● |

No. of iteration | x_{1} | x_{2} | x_{3} | x_{4} | x_{5} | x_{6} | x_{7} | x_{8} | x_{9} | x_{10} | x_{11} | x_{12} | x_{13} |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

2 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

3 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||

4 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

5 | ● | ● | ● | ● | ● | ● | ● | ||||||

10 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

15 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |

20 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | |||

30 | ● | ● | ● | ● | ● | ● | ● | ● | ● | ||||

35 | ● | ● | ● | ● | ● | ● | ● | ||||||

47 | ● | ● | ● | ● | ● | ● | ● |

Different from the CF6 problem, the objective functions of P116mod are only influenced by six variables, [*x*_{1}, *x*_{2}, *x*_{3}, *x*_{11}, *x*_{12}, *x*_{13}]. As shown in Table 3, the variables *x*_{4}, *x*_{5}, …, *x*_{10} are mistakenly selected at the early iterations. Then, the error is corrected by updating the LASSO regression with new samples. In the last few iterations, seven variables, [*x*_{1}, *x*_{2}, *x*_{3}, *x*_{10}, *x*_{11}, *x*_{12}, *x*_{13}], are selected to participate in the GWO updating process. In this case, the variables of the objectives are distinguished by the dimension selection strategy. The GWO update can focus on the variables affecting the objectives, and other variables are used to ensure the feasibility of the solutions with no change to the objective values. To be noticed, *x*_{10} is selected mistakenly even at the last 12 iterations. This may be because the log10 transfer creates a nonlinearity for each variable that may have caused this problem.

Compared to ANN-GWO, by performing the dimension selection strategy and the related *constraint operation 2*, L-ANN-GWO succeeded to find feasible solutions in all the tests. In the constraint handling process proposed in this article, the key is to find the first feasible solution as the leader. Since the GWO updating mechanism used in this article has no constraint handling operations, the feasibility of the new samples generated from GWO is unknown. In ANN-GWO, missing *constraint operation 2* generates randomness in the process of finding feasible solutions. On the other hand, due to the lack of samples at the early stage, it is common that some of the dimensions are not selected from the LASSO selection method. Thus, via a local search on the nonselected variables, the chance to find feasible solutions at an early stage of the optimization increases. Therefore, the proposed L-ANN-GWO can successfully find feasible solutions in those problems where ANN-GWO fails (i.e., P113mod, TP3mod, and Beam).

For the P106mod problem, the objective functions are influenced by [*x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}, *x*_{6}], and the constraints are with respect to [*x*_{4}, *x*_{5}, *x*_{7}, *x*_{8}]. A Pareto solution [*f*_{1} = 2100, *f*_{2} = −0.95] with [*x*_{1} = 100, *x*_{2} = 1000, *x*_{3} = 1000, *x*_{4} = 10, *x*_{6} = 10] can be found to dominate almost any other solution. The values of *x*_{5}, *x*_{7}, and *x*_{8} vary to satisfy the constraints. In other words, for P106mod, there exist multiple feasible Pareto solutions that have the same objective values. It is observed that ANN-GWO always finds those solutions with the same values for *x*_{1}, *x*_{2}, *x*_{3}, *x*_{4}, and *x*_{6}, but different values for *x*_{5}, *x*_{7}, and *x*_{8}. In this case, the rate of success for ANN-GWO is 100% but there is only one point shown in the Pareto frontier in the objective space. That is why the hypervolume value and rate of success are not reported in Table 1. L-ANN-GWO also has the same situation in 10 out of 30 tests, but it can also find other solutions such as [*f*_{1} = 2104, *f*_{2} = −0.95] and [*f*_{1} = 2100, *f*_{2} = −0.93] in other runs. One reason to find other solutions is that when one dimension always has the same value for the samples, the LASSO dimension selection process has the chance to eliminate this dimension to participate in the GWO updating mechanism. As shown in Table 4 of one example in the dimension selection,*x*_{6} is not selected in the 23rd iteration. Then, the local search in *constraint operation 2* has a chance to modify the value of *x*_{6} according to the feasibility to reach other Pareto solutions. As a result, a Pareto solution different from [*f*_{1} = 2100, *f*_{2} = −0.95] may be generated during the constraint handling step.

### 3.2 Comparison With Other Algorithms.

Two multi-objective optimization algorithms (NGPM [41] and MOFEPSO [42]), which are based on evolutionary algorithms, as well as one multi-objective BO method (MOEGO [34]) are employed as a comparative benchmark with the proposed algorithm. The NPGM (NSGA-II Program in Matlab v1.4) adds a rudimentary expensive constraint handling process in the NSGA-II algorithm. On the other hand, MOFEPSO is a multi-objective particle swarm optimization-based employing constraints handling strategies for feasible and infeasible particles. In MOEGO, the constrained achievement scalarization function optimization is solved to generate new samples at each iteration under the Bayesian optimization framework. NGPM, MOFEPSO, and MOEGO are recently included in matlab [41–43]. The maximum function evaluation for each algorithm is set to be 500. The population size is set to 20 for L-ANN-GWO and NPGM and 10 for MOFEPSO following the setting in Ref. [3]. Each problem is repeated 30 times, and the mean and standard deviation values of hypervolume are shown in Table 5. Note that since the reference points are different, the hypervolume values of L-ANN-GWO are different between Tables 1 and 5.

L-ANN-GWO | NGPM | MOFEPSO | MOEGO | |
---|---|---|---|---|

CF6 | 76.18 (2.64) | 78.44 (2.08) | 79.86 (1.32) | 66.97 (4.85) |

CF7 | 549.48 (14.92) | 534.55 (28.44) | 518.26 (20.06) | 439.13 (25.98) |

P113mod | 8.75E + 05 (4.95E + 04) | 8.11E + 05 (3.70E + 04) | 8.44E + 05 (1.10E + 04) | 7.21E + 5 (6.36E + 04) |

TP3mod | 92.29 (23.29) | 84.53 (32.65) | – | 48.75 (7.33) |

P106mod | 9.44E + 04 (1.86E + 04) | 8.57E + 04 (2.85E + 04) | 9.04E + 04 (2.91E + 04) | 5.44E + 04 (0.69E + 04) |

P116mod | 7.46E + 03 (1.05E + 03) | 4.19E + 03 (1.06E + 03) | 6.30E + 03 (1.98E + 03) | 0.44E + 03 (0.13E + 03) |

Beam | 0.067 (0.0050) | 0.042 (0.0376) | 0.037 (0.0052) | — |

L-ANN-GWO | NGPM | MOFEPSO | MOEGO | |
---|---|---|---|---|

CF6 | 76.18 (2.64) | 78.44 (2.08) | 79.86 (1.32) | 66.97 (4.85) |

CF7 | 549.48 (14.92) | 534.55 (28.44) | 518.26 (20.06) | 439.13 (25.98) |

P113mod | 8.75E + 05 (4.95E + 04) | 8.11E + 05 (3.70E + 04) | 8.44E + 05 (1.10E + 04) | 7.21E + 5 (6.36E + 04) |

TP3mod | 92.29 (23.29) | 84.53 (32.65) | – | 48.75 (7.33) |

P106mod | 9.44E + 04 (1.86E + 04) | 8.57E + 04 (2.85E + 04) | 9.04E + 04 (2.91E + 04) | 5.44E + 04 (0.69E + 04) |

P116mod | 7.46E + 03 (1.05E + 03) | 4.19E + 03 (1.06E + 03) | 6.30E + 03 (1.98E + 03) | 0.44E + 03 (0.13E + 03) |

Beam | 0.067 (0.0050) | 0.042 (0.0376) | 0.037 (0.0052) | — |

Note: Data are presented as mean (standard deviation).

As shown in Table 5, L-ANN-GWO has the largest mean hypervolume values in six out of seven tests. MOFEPSO performs the best for CF6. For the TP3mod problem, however, MOFEPSO cannot find any feasible solutions in the given computational budget. MOEGO fails to find feasible solutions for the Beam problem. Compared to another surrogate model-based optimization algorithm, MOEGO, the proposed algorithm outperforms in all the tests at the same computational budget. Especially, MOGEO fails for the Beam problem, which is high-dimensional (30 dimensions) with many constraints (21 constraints).

To illustrate the comparison clearly, one single run that can represent the average performance of each algorithm is selected and plotted in Fig. 4. For the CF6 problem, most of the solutions from L-ANN-GWO are gathered in the lower f1 values, which reduces the performance of L-ANN-GWO. For CF7, L-ANN-GWO can generate more solutions with lower f1 function values, which leads to higher performance compared to NGPM. For the P113mode problem, although the performance of each algorithm is similar in the middle part, more solutions are found by L-ANN-GWO at each end of the frontier. The solutions for TP3mod, generated from L-ANN-GWO, clearly outperform the solutions found by NGPM. For the P106mod problem, L-ANN-GWO finds solutions with smaller values for the second objective, while MOFEPSO focuses on the smaller value of the first objective. Both algorithms have better performance than NGPM. The beam problem is a difficult problem with 30 design variables and 21 constraints. As shown in the results and plot, L-ANN-GWO reaches better solutions than the other two algorithms in dealing with a large number of design variables and many constraints. For the tri-objective optimization problem, P116mod, the results of L-ANN-GWO and MOFEPSO are similar and L-ANN-GWO can find more results on the edges, while the results of NGPM are gathered in a smaller area in the objective space. As a result, L-ANN-GWO performs well in numerical-constrained optimization problems when the number of function evaluations is limited. For some specific problems (e.g., CF6), the variable selection process using LASSO may generate results with a lower value for one objective. In that case, tightening the regression model accuracy threshold in LASSO regression will enable the algorithm to select more variables. The variables influencing any objective will have a large chance to be selected. On the other hand, the Pareto frontiers found by MOEGO are always dominated by the frontiers found by the other three algorithms.

## 4 Application in Solenoid Magnetic Design

Due to the capability of generating high magnetic fields, the magnet using 2G HTS tape is becoming an attractive solution for different engineering fields, such as particle accelerators. However, the nonlinear material property of the HTS tape makes it difficult and time consuming to calculate the magnetic field for an HTS magnet. Thus, the proposed algorithm is applied to deal with an optimization design problem of an HTS magnet using YBCO tape. Figure 5(a) shows the CAD model of a solenoid.

When applying an HTS magnet in the particle accelerators to guide the particle beam, a larger central magnetic field is needed when the speed of the particle increases. Additionally, superconducting materials are usually very expensive. Therefore, the goal is to design a magnet geometry so that the magnetic field at the center of the magnet is maximized with a minimal amount of required superconductor materials (measured by the length of the HTS tape). Finite element method (FEM) simulation with commercial software (comsol) is used to compute the magnetic field distribution in the coil, and each simulation can take up to 30 min. It is estimated that detailed mapping of the design space (e.g., six dimensions in this study) takes hundreds of simulations, so that there is a clear motivation to apply the L-ANN-GWO algorithm to reduce the magnet design time. Before going to the nonlinear HTS tape, we first solved a simpler case where the HTS tape was replaced with Cu tape. FEM simulation for Cu tape lasts only 2–3 min. We used the Cu tape simulation to benchmark the algorithm performance with the other three algorithms, NGPM, MOFEPSO, and MOEGO. After showing the capability of the proposed algorithm in dealing with the FEM simulation-based optimization problem, L-ANN-GWO is applied to solve the HTS tape magnetic field design problem.

It is worth mentioning that the model is parametric so that the design constraints (e.g., area of the uniform field at the center of the coil, or field homogeneity requirement) can be easily changed for future optimization studies. Moreover, the optimization algorithm is directly linked with the FEM solver, so that the expert user's time for preparing the FEM simulations and interpreting the results is minimized.

### 4.1 Formulation of the Optimization Problems.

The magnetic design includes defining the coil geometry and choosing the transport current (*I*_{t}). The geometry is described by the number of turns in each winding (in each horizontal coil layer), the number of windings (number of coil layers stacked vertically), the thickness of the tape, the distance of tapes vertically, and the distance between tapes horizontally (radially), see Fig. 5(b) and Table 6. The goal of this design optimization case is to

– maximize the central magnetic field (

*B*0)– minimize the length of the tape conductor (

*L*)

The following design criteria must be fulfilled:

Parameters | Descriptions |
---|---|

L | The total length of the tape |

th_{tape} | The thickness of the tape |

n_{r} | Number of tapes in each winding |

n_{z} | Number of windings in the coil |

r_{m} | The radius of the magnet bore |

d_{br} | Distance between tapes in each winding |

d_{bz} | Distance between windings |

Parameters | Descriptions |
---|---|

L | The total length of the tape |

th_{tape} | The thickness of the tape |

n_{r} | Number of tapes in each winding |

n_{z} | Number of windings in the coil |

r_{m} | The radius of the magnet bore |

d_{br} | Distance between tapes in each winding |

d_{bz} | Distance between windings |

– In the 2 cm × 2 cm area at the center of the solenoid, the difference between the maximal and minimal magnetic field (

*Bsq*_{diff}) is less than 0.1 T. (The unit T is Tesla for the magnetic flux density.)– In Cu tape: the maximum magnetic field (norm) in any of the tapes (

*B*_{tape}) is 2 T.– In HTS tape: the transport current does not exceed the critical current,

*I*_{c}(*B*), in any of the tapes. This ensures that the tape stays in the superconducting state. Note, that the tape*I*is a function of the magnetic field seen by the tape._{c}– The upper and lower bounds of each design variable are shown in Table 7.

*L*) is calculated by the equation of the length of the spiral in polar coordinates, as shown in Eq. (4).

Design variables | Upper and lower bounds |
---|---|

n_{r} | [9, 36] |

n_{z} | [4, 12] |

r_{m} (cm) | [2.5, 10] |

d_{br} (mm) | [0.05, 0.1] |

I_{t} (A) | [100, 300] |

d_{bz} (mm) | [0.25, 1] |

Design variables | Upper and lower bounds |
---|---|

n_{r} | [9, 36] |

n_{z} | [4, 12] |

r_{m} (cm) | [2.5, 10] |

d_{br} (mm) | [0.05, 0.1] |

I_{t} (A) | [100, 300] |

d_{bz} (mm) | [0.25, 1] |

The total length of the tape used is calculated by multiplying the number of windings, *n*_{z} and the length of each winding, *l*, i.e., *L* = *n*_{z} × *l*. In this article, a 2G HTS tape provided by S-innovations [44] is used in the tests. The thickness of the tape, *th*_{tape}, is 109 *μ*m. The Cu tape problem uses the same *th*_{tape} value.

The magnetic field in the center and the magnetic field-related criteria, i.e., *B*0, *Bsq*_{diff}, *B*_{tape}, and *I*_{c}, are computed from the FEM simulation. Maxwell's equations are solved for both the Cu tape magnet and the HTS magnet using the FEM solver in comsol. A 2D axisymmetric geometry for the solenoid model is used for the simulation. An A-H formulation, presented in Ref. [45], is chosen for Maxwell's expressions because it is high simulation speed and high stability. The mesh used in the simulation is shown in Fig. 6. The simulation and optimization are run on a laptop with Intel Core i7-7700HQ @2.8 Hz and 32 GB RAM.

*d*

_{br}and

*d*

_{bz}are related to the Kapton insulation thickness (

*hKp*) around the tape. The relation to the distance parameters is expressed as follows:

Therefore, in the HTS tape problem, *hKp* is defined as one of the design variables to replace *d*_{br} and *d*_{bz}.

### 4.2 Optimization Results for Cu Tape Problem.

L-ANN-GWO, along with NGPM, MOFEPSO, and MOEGO, is applied to solve the Cu tape solenoid magnetic design problem. The maximum number of function evaluations for all three algorithms is set to 300. The population size for NGPM and the wolve size of L-ANN-GWO are set to 20, while the number of particles in MOFEPSO is set to 10. The Pareto frontiers obtained from three algorithms are shown in Fig. 7. The hypervolume values for the three Pareto sets are presented in Table 8. The running time for each of the algorithms is around 6 h.

L-ANN-GWO | NPGM | MOFEPSO | MOEGO | |
---|---|---|---|---|

Hypervolume | 89.21 | 59.73 | 77.00 | 80.66 |

L-ANN-GWO | NPGM | MOFEPSO | MOEGO | |
---|---|---|---|---|

Hypervolume | 89.21 | 59.73 | 77.00 | 80.66 |

As shown in the results, with the limited number of function evaluations, L-ANN-GWO generates better solutions compared to NGPM and MOFEPSO. When the length of the tape is in the range between 20 m and 50 m, the results of the three algorithms are comparable with L-ANN-GWO outperforming the other two algorithms. In specific, L-ANN-GWO finds more solutions whose length is lower than 50 and the magnetic field is higher. When the length of the tape is longer than 50 m, the advantages of L-ANN-GWO become greater. On the other hand, compared to the MOEGO, L-ANN-GWO generates more solutions along the Pareto frontier, especially when the length of the tape is lower than 50. That is the reason that the Hypervolume value of L-ANN-GWO is larger than that of the MOEGO. As a result, L-ANN-GWO leads to designs with a much stronger magnetic field with the same tape length as the other three algorithms.

### 4.3 2G Second-Generation High-Temperature Superconductor Tape Problem Results and Analysis.

A baseline of the 2G HTS tape design is simulated first. The values of the design variables, the length of the tapes, and the central magnetic field value are shown in Table 9. The magnetic field distribution for the baseline design is shown in Fig. 8. Then, L-ANN-GWO is applied in a 2G HTS tapes magnetic design problem to test the performance of the proposed algorithm in real superconductivity magnetic design cases. As the 2G HTS tape is used in the simulation, one simulation may last from several minutes to half an hour to obtain the magnetic field results, according to the number of windings and the number of tapes in each winding. Considering the computational cost of the simulation, the number of function evaluations is limited to 200. The number of wolves is set to be 20. The total running time for the optimization is around 50 h.

n_{r} | n_{z} | r_{m} (cm) | hKp (µm) | I_{t} (A) | B0 (T) | Length (m) | |
---|---|---|---|---|---|---|---|

Baseline | 9 | 8 | 4 | 30 | 200 | 0.1929 | 18.4213 |

Solution 1 | 11 | 9 | 2.56 | 18.99 | 146.30 | 0.2677 | 16.3799 |

Solution 2 | 9 | 8 | 2.50 | 46.22 | 146.68 | 0.2046 | 11.7015 |

n_{r} | n_{z} | r_{m} (cm) | hKp (µm) | I_{t} (A) | B0 (T) | Length (m) | |
---|---|---|---|---|---|---|---|

Baseline | 9 | 8 | 4 | 30 | 200 | 0.1929 | 18.4213 |

Solution 1 | 11 | 9 | 2.56 | 18.99 | 146.30 | 0.2677 | 16.3799 |

Solution 2 | 9 | 8 | 2.50 | 46.22 | 146.68 | 0.2046 | 11.7015 |

Figure 9 shows the Pareto frontier obtained from the optimization, compared with the baseline design. In summary, L-ANN-GWO provides 15 feasible Pareto solutions using 200 simulation evaluations in the 2G HTS tape magnetic design case. The central magnetic field varies from 0.08 T to 0.30 T with the length of the tape between 5.7 m and 22.0 m. Table 9 shows two examples of the Pareto solutions obtained. Compared to the baseline design, the magnetic field can be improved by 31% with a similar tape length (solution 1), or the length of the tape can be reduced by 33% when the magnetic field is similar to the baseline (solution 2). The magnetic field distributions for the two solutions are shown in Fig. 10. The magnetic field strength at the original point of both Figs. 7 and 9 shows the values of *B0* for the three designs. Thus, solution 1 in Fig. 10(a) shows the highest *B0* value, while solution 2 is comparable to the baseline. Compared to the baseline design, the HTS tape windings in both solutions are closer to the original point, as shown in the *x*-axis values in Figs. 8 and 10. In other words, the radius of the magnet bore (*r*_{m}) in both solutions is smaller than that in the baseline design. In Fig. 8, the magnetic field reduces starting from 0.02 m to the center of the bore, which means that the HTS tape is too far from the center. On the other hand, by reducing the *r*_{m} values, solution 1 and solution 2 generate a larger magnet field at the center of the bore. Additionally, for solution 1, *B0* is further improved by increasing the number of windings and the number of tapes in each winding, while the total length of tapes is reduced by reducing the thickness *hKp*. The current (*I*_{t}) is reduced to fulfill the constraints of the constraints. For solution 2, reducing *r*_{m} leads to a decrease in the length. The values of *hKp* and *I*_{t} are adjusted to fulfill the constraints.

## 5 Conclusion

This work proposes a dimension selection-based multi-objective optimization algorithm with the capability of handling expensive constraints. The motivation for such an effort lies in the very tight requirements involved in the 2G HTS magnetic field design. The results demonstrate that the new L-ANN-GWO algorithm outperforms the NGPM, MOFEPSO, and MOEGO in most of the numerical benchmark problems and the Cu tape solenoid magnetic field optimization problem. Finally, 15 Pareto solutions were generated from the optimization of the 2G HTS design.

Besides finding good designs for the HTS design case, key contributions of the proposed method are listed as follows:

A new dimension selection-based multi-objective optimization algorithm is designed for simulation-based problems. The algorithm is benchmarked and successfully applied to a complex real-world engineering design problem.

The dimension selection strategy using LASSO regression improves the efficiency of the optimization when the number of simulations is limited. The dimensions selected from LASSO have large influences on the objective values. Thus, focusing on those dimensions in the GWO updating mechanism generates large improvement in the objective values. Additionally, since the LASSO regression model is updated along the optimization process, more accurate regression models are generated and gradually more dimensions are considered in the GWO updating process, which helps to find new nondominant solutions at the latter stage of the optimization. Additionally, the error selection due to the lack of data at the early iterations is corrected due to the updating process at each iteration.

The separation of variables for objective minimization and constraint satisfaction shows promises. Since the nonselected dimensions have less impact on the objective values, those dimensions are more efficiently used to find a feasible solution than pursuing lower objective values. On the other hand, by modifying the values of selected and nonselected dimensions for separate purposes, new nondominant and feasible solutions can be generated more efficiently at the early stage of the optimization to provide high-quality leaders for GWO updates.

Since the algorithm is designed to solve the multi-objective optimization problem with two or three objectives, the ability of the proposed algorithm in dealing with many-objective optimization problems is not evaluated in this article. This will be our future work.

## Funding Data

• Business Finland Project #3972/31/2019 (SMARAGDI).

## Conflict of Interest

There are no conflicts of interest.

## Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.