Surrogate-Based Design Optimisation Tool for Dual-Phase Fluid Driving Jet Pump Apparatus

A comparative study of four well established surrogate models used to predict the non-linear entrainment performance of a dual-phase fluid driving jet pump (JP) apparatus is performed. A JP design flow configuration comprising a dual-phase (air and water) flow driving a secondary gas-air flow, for which no one has ever provided a unique set of design solutions, is described. For the construction of the global approximations (GA), the response surface methodology (RSM), Kriging and the radial basis function artificial neural network (RBFANN), were primarily used. The stacked/ensemble models methodology was integrated in this study, to improve the predictive model results, thus providing accurate GA that facilitate the multi-variable non-linear response design optimisation. An error analysis of all four models along with a multiple model accuracy analysis of each case study were performed. The RSM, Kriging, RBFANN and stacked models formed part of the surrogate-based optimisation, having the entrainment ratio as the main objective function. Optimisation problems were solved by the interior-point algorithm and the genetic algorithm and incurred a hybrid formulation of both algorithms. A total of 60 optimisation problems were formulated and solved with all three approximation models. Results showed that the hybrid formulation having the level-2 ensemble Kriging model performed best, predicting the experimental performance results for all JP models within an error margin of less than 10 % in 90 % of the cases.


Introduction
Methodologies applied to build adequate learning-models are crucial in performing a model-based optimisation (MBO).With the quick advances in computer science, MBO is becoming more and more applicable for modelling, simulations, experimental and optimisation processes.It has proved to be one of the most efficient techniques for expensive and time demanding real-world optimisation problems [1].Several studies which considered MBO in the context of global optimisation (GO), were performed to solve close-related design problems such as the one considered in this study.
Here, MBO is used for predicting the non-linear entrainment performance of a dual-phase fluid driving jet pump (JP) apparatus, a technology well-known as an artificial, oil and gas lifting method.
Among all pumping equipment one of the most simplistic and effective way to retrieve low-pressure oil and gas wells or boost production from such wells is via the use of JPs.A JP, also known as ejector, eductor, thermo-compressor or injector, is a revolutionised piece of equipment readily known for the pumping and mixing of fluids within a wide range of applications in various engineering industrial segments such as: water, nuclear and aviation technologies [2].In contrast to other Improved Oil Recovery (IOR) and Enhanced Oil Recovery (EOR) solutions, the surface jet pump (SJP) technology is classified as the cheaper and most effective solution, thus highlights its importance in the current oil and gas situation [3][4][5].Besides, among all SJP technological advancements, to date, no unique design solution has been established to design a JP which operates effectively under motive multiphase fluid sources.Given the fact that the commercialised variety of single-phase and multiphase JPs/ejectors are particularly relevant, it was found that multiphase JPs (for applications in the oil and gas industry, where the motive fluid contains more than a single-phase, such as liquid and gas mixtures), undergo several drawbacks.Such drawbacks are mainly attributed to the degradation of entrainment and boost performance, and therefore provides a reduction in the overall operating efficiency.This causes limitations on their applicability due to the lack of flexibility, rangeability and versatility.These drawbacks, either individually or collectively, precinct onshore, remote offshore and totally limit the possibility of subsea JP applications.A feasible and practical means to deal with a typical complex (non-linear) multi-variable design problem is presented here, and a surrogate-based design optimisation is considered.
Relevant work in the literature can be categorised into two types.The first type includes studies which consider one surrogate model, while the second type involves multiple learning algorithms (either comparison or stack models to conduct ensemble modelling techniques), thus involves more than one surrogate model to perform surrogate-based optimisation.Several studies comprising the build-up of models based on one surrogate methodology, include the work of Kajero et al. [6], who used the Kriging meta-model approach.These authors coupled the Kriging to the expected improvement (EI) to assist in the calibration of computational fluid dynamics (CFD) models.They successfully calibrated a single-model CFD parameter with experimental data and considered a case of single-phase flow in straightpipe and convergent-divergent-type annular JP.This study involved fixed design parameters based on the experimental data of Shimizu et al. [7], and considered HP and LP static pressures to compute the pressure coefficient.Yang et al. [8] made use of the RSM and desirability approach to investigate and optimise a JP in a thermoacoustic Stirling heat engine (TASHE).This study considered four designing parameters: position, length, diameter and tapered angle of the nozzle.Also, slightly different than global optimisation (GO), Lyu et al. [9] worked on the design of experiments (DOE) methodology in combination with CFD, to structurally optimise the design of annular liquid-liquid JPs.This work considered the volumetric flow ratio, angle of suction chamber, throat length and the diffuser diverging angle as the input design parameters.For another different engineering design optimisation problem, Di Piazza et al. [10] used the Kriging estimation method to investigate partial shading in photovoltaic fields.It was showed that this learning method provided cheaper and simpler characterisation of the photo voltaic plant output power, and as a result allowed energy forecast.
It is difficult to establish if one of these surrogate modelling methods is superior to others.A comprehensive comparison analysis of multiple surrogate models (built with different methods) has therefore been conducted here to try to answer this question.Simpson et al. [11] compared a maximum of 3 models.He compared the polynomial based response surface with kriging surrogates for the aerodynamic design optimisation of hypersonic spiked blunt bodies.Similarly, Shyy et al. [12] compared the relative performance between polynomials and neural network surrogate models for aerodynamics and rocket propulsion problems.Other methodological related studies include the work of Simpson et al. [13], who emphasised the robustness of Kriging over the RSM surrogate model.The authors used Kriging for global approximation in simulation-based multidisciplinary (multiple input and multiple response) design optimisation problems.In this study Kriging was used for a real aerospace engineering application, a problem related with the design of an aerospike nozzle.The most comprehensive work found in the literature includes the work of Luo and Lu [14].The authors performed a comparison between the RSM, Kriging and RBFANN surrogate models for building surrogate models of a dual-phase flow simulation model in a simplified nitrobenzene contaminated aquifer remediation problem.The surrogate-based optimisation methodology identified the most cost-effective remediation strategy.
However, among all the referred work related to ejectors/ jet pumps, none considered studying a dual-phase driving fluid JP configuration.In addition, to the best knowledge of the authors, none of the referenced work considered applying ensemble modelling, which is a modelling approach which in many cases, has proved to improve global approximations (GA) (relative to GA based on single methodology approaches) when applied to non-linear systems.Considering the potential of using a unique experimental data-set (an unpublished data-set which forms part of an ongoing investigatory work related to the performance of JPs under dual-phase driving fluids), motivated the authors to use this data to build surrogate models to perform surrogate-based optimisation.Thus the objective entails the following tasks: • Develop surrogate models using the RSM, Kriging, RBFANN and ensemble methodologies for constructing global approximations (GA) for use in a real dual-phase JP design application, is particular, the design of the injection body and parts of the ejection portions of a JP apparatus.This research shows novelty as (1) different surrogate modelling methodologies were applied and compared in the design of a dual-phase (water and air) driving gas jet pump apparatus, and (2) it is shown that the ensemble surrogate modelling approach/stacked-generalisation approach having Kriging as a level-2 learning model, is suitable for predicting the optimised design parameters of a highly non-linear design problem attributed to the complex design of a dualphase JP apparatus.

Response Surface Models
The response surface methodology is the simplest and most common method applied for analysing the results of physical experiments and to generate empirically based models for response values [15,16].
The RSM is defined by: where y(x) is the unknown function of interest, f (x) is the polynomial approximation of (x) , and i entails the normal distributed error (having mean of 0 and variance of 2 ).The error i is independent and identically distributed.The polynomial function f (x) , typically comprises a low- order degree polynomial, which in most cases is assumed to be either linear or quadratic.A quadratic polynomial is expressed as follows: where ŷ is the quadratic polynomial, 0 , i , ii , ij , are the polynomial regression coefficients determined through the least-squares regression (LSM), k is the number of variables, while x i and x j are the input variables.
The coefficients of Eq. ( 2) can be found using the following equation: where is the design matrix of sample data points, T is the transpose, while denotes a column vector that includes the response values for each corresponding sample point.Further details about response surface modelling can be found in: Myers et al. [17], Myers et al. [18], Lin and Tu [19], Myers [20] and Myers et al. [21]. (1)

Kriging Models
The Kriging method is a probabilistic approach that involves a statistical estimation technique for spatial interpolation of random quantities.The very initial mathematical formulation of the Kriging method was developed on the basis of experiments performed by Daniel Krige in 1963, who established the distribution of minerals in the subsoil by performing punctual surveys.Later, Sacks et al. [22] developed the model into a surrogate model, thus shaped in the form which is mostly known nowadays.This model is also known as the design and analysis of computer experiment (DACE) [23,24].The ooDACE toolbox, developed by Couckuyt et al. [25], is a versatile Matlab toolbox which incorporates the popular Gaussian Process based Kriging surrogate models.
Unlike RSM, Kriging models were purposely developed for mining and geostatistical, and spatial applications, which induce spatially and temporally correlated data [26,27].More recently, this model gained in popularity and started to be used for other engineering applications, such as real aerospace design engineering applications and for the buildup of a meta-model to assist calibration of computational fluid dynamics models [6,13].
The Kriging model includes the combination of two main components.Thereby a deterministic function/global model and localised departures as given in Eq. ( 4) [28,29]. where denotes the coefficients of the deterministic function, f i (x) are k known regression functions (typically polynomial functions) providing a "global" model of the design space, and Z(x) is the realization of a stochastic stationary process with mean zero, process variance 2 and covariance given by the covariance matrix in Eq. (5).Controversially to f i (x) , Z(x) considers "localised" deviations of the interpolation of the data points n s .
where x i x j is the spatial correlation function (SCF) between any two of the sampled data points x i x j .This func- tion controls the smoothness, the influence on nearby points and differentiability of the response surface model.Equation (4) amalgamates a first part which models the drift of the process mean over the domain, and a second part which models the systematic deviations from the linear model that pulls the response surface along the data via weighing the correlation of the nearby points.Moreover, the spatial correlation matrix function (SCF) x i x j is an ( n s xn s ) symmetric matrix, having unity along its diagonal (4) where n s is the number of sampled points.This correlation function is selected by the user and a variety of functions exists.The most applicable correlation functions are: (a) the Gaussian, (b) exponential, (c) linear and (d) spline functions [22,30].In this study the Gaussian correlation function, given by Eq. ( 6) was used.
where k are the unknown correlation parameters, and are the kth components of the sample points x i and x j , and n dv is the number of design variables.In many cases, such as in McKay et al. [31], Sacks et al. [22], and Osio and Amon [32], a single value of provided good results, though, in this study a different value of is used for each design variable.
Eventually, the predicted estimates ŷ of the response sur- face y(x) at untried values of x for a universal Kriging model, are found by Eq. (7).
where , is a column vector of length n which contains the sample values of the response, and comprises a column vector of length n s , where in the case of ordinary Kriging (not universal Kriging) x * = 1 , thus, reduces to a scalar function fixed with values of unity.The literature comprises the work of: Deutsch and Journel [33], Cassie [27], Simpson et al. [13], Emery [34] and Bayraktar and Turalioglu [35], which involve Ordinary Kriging models, while for the work of Zimmerman et al. [36], Brus and Heuvelink [37] and Sampson et al. [38], universal kriging models are applied.
If a first order polynomial is involved, then T and so on.In this study the 0th, 1st and 2nd order polynomial were considered, is a matrix of the form = f x 1 , f x 2 , … , f x m , T containing all matrix functions calculated for all m training data points.The term T denotes the correlation vector of length n between an untried x and the sampled data points (c) the regression coefficients .First, ̂ is estimated by the maximum likelihood method (MLE) as given in Eq. ( 9) [39].
The process variance 2 between the global model ̂ and y is estimated by Eq. ( 10): The fact that (⋅) is sometimes parameterised by =( 1 , 2 , … , d ) the partial derivative of the likelihood function does not always yield to provide an analytical solution for when assigned to zero.However, a constrained iterative search is used, where an optimisation algorithm is applied to evaluate the optimal parameters values.Lastly, the correlation parameter is estimated by solving an optimisation problem as given in Eq. ( 11): All estimated parameters are firstly used as inputs in Eq. ( 7) to obtain the prediction mean.Secondly the corresponding prediction variance 2 is obtained as the estimated mean square error (MSE) of the predictor.

Radial Basis Function Neural Network (RBFANN)
The radial basis function neural network, initially developed by Broomhead and Lowe [40], comprises an artificial neural network having a 3-layer feed forward network and makes use of the radial basis functions as activation functions.As illustrated in Fig. 1, the network comprises input, hidden and output layers.The network hidden layer involves a nonlinear RBF activation function, while the network output layer contains a linear combination of radial basis functions of both inputs and neuron parameters.Considering the input vector X, being X∈ℝ n , then the output of the neurons in the RBFANN hidden layer is given by: where c i is the centre vector for the ith neuron in the RBF hidden layer, i = 1, 2, … , N ; where N is the number of neurons in the hidden layer, ‖ ‖ X − c i ‖ ‖ is the norm of X − c i , which is either the Euclidean distance or the Mahalanobis distance, while (⋅) is the radial basis function, commonly taken to be Gaussian, as given in (14), or either the 'Cauchy' function or the 'Multiquadratic' function is used instead [41].
The Gaussian basis functions are local to the centre vector in the sense that: Ultimately, the output of the network, being a scalar function of the input vector is given by Eq. ( 16): where w ki is the connecting weights from the ith hidden layer neuron to the kth output layer and k is the threshold value of the kth output layer neuron.
This model is widely applicable for approximation, classification, prediction and system control.Park and Sandberg [42] added that the RBF network model performs highly well as universal approximates for compact subsets of ℝ n ; implying a RBF network having enough hidden neurons, (13) will approximate a continuous function on a closed, bounded set with arbitrary precision.

Model Ensembling
Ensemble methods are algorithms which amalgamate several machine learning methodologies into one predictive model, to either decrease the variance, or improve predictions.This model methodology is illustrated in Fig. 2, and includes a mechanism which is divided into level-1 and level-2 processes.Typical ensemble models are used to provide global approximations to solve optimisation problems of highly nonlinear situations.Typical applications include diverse topics, such as: weather forecasting [43], climate change [44], ecological models [45], wind speed [46] and chemical flooding process for an oil and gas application [47].

Model Averaging
Model averaging is the most commonly used of all listed methods.It involves the averaging of N predictions via Eq.(17).Each prediction is the output from N trained models, used to perform the scalar ensemble predictions.
where y k is the averaging scalar ensemble output for a sample K, and ŷn is the predictive output from each learning-model.

Bagging Model/Bootstrap Aggregating
Bagging is considered very similar to model averaging, but it comprises a slightly modified training procedure.Bagging uses a subset of samples to train each model to train the base learners.This is also known as bootstrap sampling.Thus, if the RBFANN method is used, multiple models are generated to include all subsets of sample data.However, when done, the corresponding responses from different models are then averaged to obtain a scalar ensemble output.

Stacking Model
Stacking, better known as meta-ensembling, is a model ensemble technique which combines response data from a plurality of predictive models, to generate a new and improved model.This model is therefore an extension (or a second procedure), after responses have been generated by each single predictive model [50].
Stacking requires a modified training procedure relative to other described methods.In this case, N trained models are used to predict output of the new sample.Thus, the outputs from the 1st level (separated-models) learning are used as inputs for another model that is 'stacked' upon the other models.This will lead to a layer-chain of models.Thereby, the 2nd level model is used to predict the actual output for the new samples.In most cases, it is expected that the model will outperform each of the individual models, due to its smoothing nature, and the capability of selectivity between each case model at regions where it performs best, and avoids other regions where it performs poorly.Eventually, this will make stacking the most effective when base model predictions are significantly different.

Boosting Mode
Boosting involves a family of algorithms which transform weak learners into effective learners.This method deals with weak learner's models such as decision trees.It functions by combining the predictions via a weighted majority vote (classification) or a weighted sum (regression), to generate the final prediction.It differs from bagging as the base learners are trained in a sequenced manner and on a weighted version of the data.

Applied Methodology: JP Device Apparatus
As described in the introduction section, a JP is a passive apparatus, thus reluctant to change in operating conditions.Investigative work from Mifsud et al. [51], clearly demonstrates that the performance of a JP apparatus is dictated by its internal geometric features, mainly the injection, entraining and mixing bodies.These three bodies include unique geometrical features, which vary in both shape and clearance under different flow conditions; namely, type of fluid and more specifically the fluid properties such as fluid density, viscosity, compressibility and diffusivity, mutually dictating the hydrodynamic behaviour.Thus, a gas-driving-gas JP varies in design from a liquid-driving-liquid JP or a liquid-driving-gas JP.However, this work focuses specifically on a JP application having a dual-phase (water and air) HP fluid driving a relative low-pressure air.An analysis of experimental results from unpublished work showed that under dual-phase operating conditions, there exists no linear behaviour which correlates the design parameters against entrainment performance.
A total of five different nozzle bodies were used in this study.A brief description of each injection bodyis provided in Table 1 and accompanied by the schematics of four  Table 1 includes models M-01 to M-05 which describe the JP injection bodies without the use of any swirl-body mechanism, while M-06 to M-10 involves the use of a swirl-body mechanism.Such swirl-body mechanism (illustrated in Fig. 3a-d) is fixed within the motive fluid conduit, closely to the converging portion of the respective nozzles.Note that for cases with no swirl-body mechanism, a void cylindrical transition piece is used instead.
A total of five design variables were considered, see Fig. 4.These are: (1) nozzle-to-throat clearance X, (2) throat-inlet angle At, (3) two-phase mixture composition in

JP Device Design Analysis
The selected variables are classified into: (a) discrete design variables, including: X, At, and GVF, and (b) non-discrete variables, N and S. Figure 5 shows a data flow diagram, having a black box that bridges the inputs and outputs with a learning algorithm.
As the non-discrete variables N and S are neither continuous nor discrete, all learning models (both level-1 and level-2) were distinctively developed for each model.This led to the reduction of the number of input response variables, from five to three, as illustrated in Fig. 6.
Also, for further simplification, the bi-response methodology included in Fig. 5, was reduced to a single output.This step was justified on the basis that entrainment ratio will always tend to increase while the magnitude of pressure vibration decreases.The entrainment ratio was preferably selected over magnitude of pressure vibration for the main reason that the focus of this study considers the design parameters which have a significance on the performance of the LP pressure and/or LP/secondary flowrate.

Approximations for the JP Device Design Problem
The sample data for all learning models was obtained from a unique experimental data-set comprising tests performed on a dual-phase (water and air) facility located in the Process Systems Engineering Lab at Cranfield University, UK.Such experimental data-set includes a total of 1440 tests-setups combinations, which comprised both 100 % liquid-water and dual-phase ( 0 ≤ GVF ≤ 50 ) motive fluid flows driving a secondary gas-air flow.All experiments were performed at high-pressures below 8 bara.
The whole data-set was divided into 10 data sub-sets.The sets of data were discretised according to the type of injector body, for the JP configurations equipped with or without spin body mechanism.This categorised the data into 10 unique models (M-01...M-10), as listed in Table 2.Note that the range of the nozzle-to-throat clearance X comprised eight divisions, (0.1, 0.25, 0.6, 0.9, 1.4, 1.8, 2.  The arrangement shows that each At is linked to every value of X first, then the joint combination is further linked to every value of GVF, and ultimately the full joint combinations (comprising a value of GVF, X and At) are linked with every JP model.
The respective ranges of each discrete design variables were set according to specific justifications.The range of the secondary-nozzle/throat inlet angle At, includes an upperbound limit of 50 • , being an optimal design angle for liquid- driving liquid JP applications.Thus, a lower bound (low as reasonable possible) and intermediate angles of 13 • and 30 • , were then considered.
The range of the nozzle-to-throat clearance X varied between a ratio of 0.1 and 4.This cophered the well known design ratios (as applicable for gas-gas, liquid-liquid and liquid-gas JP applications) and beyond.Also, the fact that a swirl induced flow was included in half the tests setups, lower values of X were considered than used for gas-gas applications.In the latter cases, optimal values of X can even go down below 0.4.Besides, more frequent values of X were considered to avoid black spots due to high sensitivity, even for small increment of X.
Lastly, the range denoting the values of GVF, cophered a range of HP fluid compositions, including 100 % liquid motive flow and a combination of liquid dominant twophase (water-air) motive fluid compositions.Particularly, one should consider that some practical intuition was also applied based on practical real field applications (mainly for selection the range of GVF), and limitations were considered due to difficulties to manufacturing the designed components (mainly for selecting the range of At).  was concluded that such complex behaviour could only be exemplified via polynomial equations.Such complex relationships led to the development of a series of third-order response surface models for entrainment ratio ER.Such models, fitted the 118 sample points by using the ordinary least-squares regression.The quadratic expressions of the developed response surface models are given in Eqs. ( 27) to (36), in "Appendix 1".
A pair of polynomial expressions are generated for each JP model design, a first expression including a device body without the swirl-body mechanism, and a second device body, having the same injection body as the first one, but amalgamated with the swirl-body mechanism.Eventually a total of 10 expressions covers all 5 JP models.
A reference to the resulting R 2 , R 2 adjusted , and root mean square error for each response surface model given in  6 of "Appendix 2", results high (ideal close to 1) R 2 , R 2 adjusted and low (ideal close to 0) RMSE values.Thus, the response surface models appear to capture a large portion of the observed variance, resulting in an acceptable good fit.

Kriging Models for the Jet Pump Device Study
For the Kriging models, a Gaussian correlation of either 0th, 1st or 2nd order regression function was applied, while Eq. ( 10) was used for the local deviations.It was also noted that a single parameter was insufficient to model the data accurately, thus a simulated annealing algorithm was used to determine the maximum likelihood estimates (MLEs) for the three parameters needed (one for each variable) to generate the best Kriging model.
The optimal parameters values for each case study are given in Table 3.This was accomplished via Eq.( 11), and simulations were performed and executed via a dedicated generated script written in MATLAB.Eventually, the Kriging models were identified once all parameters for the Gaussian correlation function and the 118 sample data points were obtained.
For each Kriging model, the testing of models included a total of 26 data points; thereby including the samples which were not considered for training purposes during the buildup of the models.The regression R values for the Kriging models are given in Table 6 of "Appendix 2".

Neural Network Models for the Jet Pump Device Study
For the Neural Network models, a script was generated by the Neural Fitting application provided in MATLAB 2018.
The input and response samples data sets were randomly selected by the syntax 'dividerand' and divided into three Furthermore, this function performs backpropagation to calculate the Jacobian jX, of the performance syntax 'perf' with respect to the weight and the bias variables X.Each variable is adjusted according to Levenberg-Marquardt.Further details about the Bayesian regularisation can be found in MacKay [52] and Foresee and Hagan [53].The mean squared error performance function syntax 'mse' was applied to the model performance function.The mean squared error is the average squared difference between outputs and targets.Lower values are better.Additionally, the number of hidden layers as denoted by the letter 'N' in Fig. 9, varied according to the results illustrated in the model accuracy analysis section.The first trial consisted of N = 15.Regression R values are presented and discussed in section 4.2.

Model Averaging and Stacking for the Jet Pump Device Study
For the ensemble-stacking model, a more advanced command script (more complex than for the single-model buildup) was formulated and executed in MATLAB 2018.
The ensemble-modelling approach denotes a 2nd level algorithm, which ultimately led to the final scalar predictions.In brief, the applied methodology followed the  The applied method for the segregation of sample data, involved a different procedure than applied in the other three former discussed models.To formulate the out-of-sample predictions, data were divided similarly as done for the wellknown 'K-fold' cross-validation method.
The out-of-sample method involved the division of the training sample into a number of folds N. In each N th fold, some of the sample data were held out for validation and/or testing (holdout fold), while the remaining number of folds were used to obtain predictions for all 26 samples.This brief explanation is demonstrated in Fig. 11a, b.Each sub-figure represents a Nth fold and the highlighted cells include the holdout sample.
Opting to select the cross-validation method was based on the fact that the out-of-sample predictions sustain a higher chance of capturing distinct regions where each model performs the best.

Learning Model's Accuracy Analysis
In this study, the absolute error (AE) was selected as the main loss function to estimate the accuracy of the learning models.Figures 23,24,25,26,27,28,29,30,31,32,33,34,35,36, and 37 in "Appendix 3" illustrate the boxplots of absolute error for the RSM, Kriging and RBFANN models, as applicable for the JP models; for both with and without swirl-body mechanism.Table 4 details the varied parameters in each learning model.Table 5 includes the optimal parameter values, resulting from the presented box-plots for each JP models (M-01-M-10).
The results demonstrated that for the RSM models, the approximation accuracy for the 3rd order polynomial is higher than that of the 1st and 2nd order in all cases.The Kriging models showed that the 2nd order polynomial function (as a regression function) obtained the highest approximation accuracy, except for models M-02, M-04

Error Analysis of the RSM, Kriging and RBFANN Models
Towards the end, an error comparison was performed to determine the capability of producing accurate global approximations for a dual-phase fluid-driving SJP.The error is defined between the actual response samples y (26 samples used in each model) and the predicted values ŷ from either the RSM, Kriging, RBFANN or ensemble models.
The accuracy of the 26 validation points for all models was estimated via numerical error analysis equations, given from Eqs. (18) to (22).
The absolute value in this calculation is summed up for every predicted value and divided by the number of testing points.Note that in this work, the mean absolute error was calculated, thus Eq. ( 19) was divided by 100.Also, similarly to MaxAPE, MAPE indicates lower difference, thus variance reduces as MAPE yields ⟶0 .

RMSE-Root Mean Square Error
The RMSE signifies the variances, better known as residuals or prediction errors between predicted and observed values.This error estimator method as given in Eq. (20), is capable to aggregate the individual error magnitude into a single measure of prediction power.

Lower values of RMSE indicate lower variance. Variance tends to decrease as
This is known as the coefficient of determination or multiple determination in this case since models involve multiple regression.This measures the correlation between outputs and targets via: where e i denotes the residuals for each y i , while y is the mean of the observed data. 5. Adjusted R Squared ( R 2 adj ) Adjusted R 2 developed by Henri Theil (1961) includes a modification that accounts for the adjustment of the number of explanatory terms in a model relative to the number of data points.where p is the total number of variables, and n denotes the sample size.
For both regression R values, close relationships are determined if results are close to 1.In the case of the R 2 adj , the result would always be less or equal to that of R 2 .
The results of all performed quality criterion methods and for all considered JP models (M-01 to M-10) are presented in Table 6, of "Appendix 2". Figure 12 shows a collective set of three plots, where Fig. 12a includes R 2 errors, Fig. 12b includes RMSE errors, and Fig. 12c includes MaxAPE errors.The comparison of errors exhibit a consistent behaviour in all three error methodologies.
In general, it can be concluded that the stacked-ensemble model performed best, followed by the RBFANN, then the Kriging and lastly the RSM.It is reassuring to note, that besides finding the optimal values of the parameters for the Gaussian correlation function used by the Kriging model, in most cases, it is only slightly less accurate than the 3rd order RSM.The low RMSE values (for the RBFANN model) proved to perform slightly better than the ensemble model, for predicting the response in several JP models.As shown in the three subfigures, it appears that all models predicted values within reasonable prediction errors.

Results Comparison of Actual Against Predicted via Scatter Plots for RSM, Kriging, RBFANN and Ensemble Models
This section aims at illustrating a comparison analysis of the behaviour between actual and predicted entrainment ratio results.The scatter plots shown in Figs. 13, 14, 15, 16, and 17 provide a more comprehensive way to illustrate and analyse the behavioural fit between all four learning-models.
Each figure comprises two sub-figures, which include comparison results between models without swirl and with swirl induced flow.Generally, it can be noted that in most cases, the points are scattered symmetrically around the 45 • diagonal line and fitted within the (± 10 %) error band margin.However, the scattering tends to increase in cases involving a swirl induced flow.As expected this could be reasoned by the complex nature of the hydrodynamic behaviour of dualphase flow inside the JP.This emphasises further the nonlinearity behaviour which exists between design parameters and the JP performance when operated under dual-phase flow conditions.Once again, all cases showed that the RSM registered the highest scattering while the RBFANN and ensemble models resulted in minimal scattering and shared high similarity between one another.Another point of interest which is well demonstrated involves the plotting behaviour of the ensemble model.The overall results for such model, minimised drastically the scattering effect and smoothened prediction in areas where other models failed to perform within the (± 10 % error) bandwidth.

Optimisation Heuristics
Several types of optimisation algorithms exist.However, such methods can be used for constrained and unconstrained optimisation or are able to perform only one of the latter types.
In this work, constrained optimisation was applied.This method comprised the process of optimising an objective function with respect to some variables in the presence of constraints on those variables.The objective function (hardtype), is to be maximised, so, the negative part of the process function f (x) is taken in the constrained minimisation prob- lems.The hard-type involved constraints which set conditions for the variable required to be satisfied [54].
Three optimisation algorithms which are well applicable in engineering-related problems include: (1) the multiple response desirability approach, (2) the interior-point algorithm (IPA), and (3) the augmented Langrangian genetic algorithm (ALGA).Also, a combination of the latter two types of optimisation algorithms can form a hybrid function, which in most cases turns to be more robust and accurate than single-optimisation algorithms.A brief overview of the methodology of the interior-point and the Augmented Langrangan genetic algorithms (both optimisation methods applied in this study) are given hereunder.

Interior-Point Algorithm
The interior-point algorithm comprises a variety of solvers capable of solving both linear and nonlinear convex optimisation problems which have constraint inequalities.
The types of interior-point algorithms which are found in the MATLAB Optimisation Toolbox TM , include the 'fmin- con', 'quadprog', 'lsqlin', and 'linprog' solvers.All have good characteristics, such as low memory usage and the ability to solve large problems quickly.However, besides their simplicity, they are considered as slightly less accurate than other algorithms.Such inaccuracy may result from the calculation process, in which the internally calculated barrier function tends to keep iterating away from the set inequality constraint boundaries.The constrained minimisation involves finding a vector x, which is the local minimum of a scalar function f (x) , subject to the set constraints on the allowable x.
such that one or more of the following equality constraints hold: Moreover, the 'fmincon' as referred in the MATLAB Optimisation Toolbox TM solvers (being the algorithm used in this work) is based on a trust-region method for nonlinear (23) minimisation.The trust-region method involves the approximation of a function f with a simpler function q⋅ This is done to increase the resolution, thus for understanding better the behaviour of the function f in a neighborhood around the point x.Thereby the neighborhood is referred to as the trust region [55,56].As provided in Eq. ( 24), this is computed in the form of a sub-problem in parallel to the main minimisation problem.

ALGA: Augmented Langrangian Genetic Algorithm
The genetic algorithm (GA), is a method able to solve both constrained and unconstrained problems and involves a natural selection process that imitates the biological evolution.It solves problems that differ from other 'standard' optimisation algorithms; namely: stochastic, nonlinear and discontinuous.
The solving procedure comprises a continual process which first modifies the population of individual solutions, and then the algorithm picks random individuals (via random number generators) to form the modified population and use such individuals, referred to as 'parents', to produce the 'children' for the next generation.This procedure is repeated until the population evolves and yields the optimal prediction [57].
By default, the genetic algorithm uses the Augmented Lagrangian Genetic Algorithm (ALGA) to solve nonlinear problems without integer constraints.
The optimisation problem solved by the ALGA algorithm is given by Eq. ( 25): and ceq(x) denote the nonlinear inequality and equality constraints respectively, while m and mt describe the number of nonlinear inequality and total number of nonlinear constraints respectively.
However, in this study, bounded constraints optimisation problems were solved.Bounds and linear constraints were handled separately from nonlinear constraints.Thereby the sub-problem formulation for the ALGA as given in Eq. ( 26), included only the fitness function, and excluded the nonlinear constraint function.(24) min s {q(s), s ∈ N} (25) where the components i of the vector are non-negative (Lagrange multiplier estimates).The elements s i of the vec- tor s are non-negative shifts, while is a positive penalty parameter.

Hybrid (Fmincon and Genetic Algorithm)
The hybrid optimisation method comprises a combination of 2 or more single-optimisation algorithms.Whenever a hybrid optimisation problem is to be solved, a hybrid function needs to be first formulated.A typical function will contain the order of execution of each algorithm.When the ALGA stops, the hybrid function will then start from the final point as returned by the same generic algorithm.In this work, the 'fminunc' (the hybrid function), was set to be automatically called and initiates the execution with the optimised point found by the former method.Since 'fminunc' has its own options structure an additional argument has to be provided when specifying the hybrid function.The hybrid function can improve the accuracy of the solution.

Optimisation Results using the RSM, Kriging and RBFANN Models
As a final comparison of the accuracy of both individual and ensemble learning models, 6 pairs of optimisation problems were first formulated and then solved for each JP model.However, the term 'pair' exemplifies that a total of 12 optimisation problems were both formulated and solved for each JP model.In all the optimisation problems, the entrainment ratio was set to be maximised, at specific GVFs.This led to a total of 60 optimisation problems.
As provided in Tables 7, 8, 9, 10, and 11 in "Appendix 4", the set objective functions denote traditional, singleobjective/discipline optimisation problems.Also note that each one of the six boxes, namely 'Prob.#1M-X & M-X', includes a common objective function of (a) swirl and (b) no swirl flow respectively.In each optimisation problem, constraints are placed on the maximum and minimum allowable values of responses (not part of the objective function).Each optimisation is formulated and solved using a plurality of optimisation algorithms which were all adopted to solve the developed nonlinear optimisation models within the MATLAB platform.The three optimisation algorithms used included: (1) the 'interior-point' algorithm, (2) the Augmented Lagrangian genetic algorithm (ALGA) and (3) a hybrid formulation.The latter algorithm combines the former two optimisation algorithms.
Each optimisation is solved 4 times, firstly for RSM, secondly for Kriging, thirdly for RBFANN and finally for the ensemble models.In each case, three different starting points (the lower, middle and upper bounds) are used for each objective function to assess the number of analysis and gradient calls necessary to obtain the optimum design.To proceed with the three types of optimisation algorithms, four separate predictive functions (one for each learning model) were created.In the case of RSM, a dedicated script was generated to formulate a predictive function, which was later called in the respective optimisation algorithms.In the case of Kriging, the predictor function (the function which is based on the developed model, referred to as 'dmodel', was generated by the ooDACE toolbox), while for the RBFANN, a dedicated function called 'MyNeuralNetworkFunction', was purposely set to be automatically generated during the execution of the neural network model when using the MATLAB neural network toolbox application.Also, for the ensemble model, the same procedure as applied for Kriging was followed, but this time a new, namely 'dmodel2' was generated, thereby containing samples which were derived from the combination of model-averaging and stacking procedures.
During the applied procedures for solving all optimisation problems, a negative predictive response function was taken to maximise the objective function.This approach was adopted as the MATLAB toolbox software always tries to find the minimum of the fitness function.
The results of all 60 optimisation problems using all discussed learning models are summarised in Tables 12, 13, 14, 15, and 16 of "Appendix 4".Note, that each table includes results for two JP bodies, thus a single nozzle body with and without swirl.
From the numerical results (Tables 8, 9, 10, 11, and 12), it can be noticed that in general, the optimisation requires fewer iterations and/or generations for the RSM than for the Kriging and RBFANN models.Note that iterations for the ensemble level-2 model are not included in the table, but results were identical to level-1 Kriging model.The variance in computational time, and iterations is attributed to the complicity of the respective model.Thus, fewer iterations were needed for the RSM simple 3rd order polynomial equations, while more iterations and generations were required for the Kriging models [comprising non-linear equations as given in Eqs. ( 4) to (12)] and the RBFANN models.Besides, the computational expense for all sets of approximations still lies in the order of seconds per evaluation.The optimum designs obtained from the RSM, Kriging, RBFANN and ensemble models are in their majority identical for each objective function.However, it is noted that there are some drastic variances in both X and At when maximising the entrainment ratio Er, using the interior-point algorithm based on predicted results from the RSM.
Furthermore, to check the accuracy of the predicted optima and prediction errors, the optimum design values X, GVF and At, for the level-1 Kriging model and the level-2 ensemble-Kriging model were again used as inputs to the predictor function of their respective model.Thus, the predicted responses of each model were then used to calculate the percentage difference between the actual and the predicted values.Note that the actual values were taken from the corresponding experimental results.Ultimately, to illustrate the benefits of the ensemble model, error residual plots were generated for each of the 10 JP models.Each respective pair of plots as exemplified from Figs. 18, 19, 20, 21, and 22, includes the same type of nozzle body, but have a different setup configuration.All labelled 'a' figures describe the results for JP bodies with no swirl-body mechanism, whereas those labelled 'b' involve the use of the swirl-body mechanism.
A comparison performed (between the level-1 kriging models and the level-2 ensemble-kriging models) in each error residual plots, demonstrated that the ensemble-Kriging models predicted optima values with error less than 10 % for all cases, and less than 4 % in 90 % of the results.

Conclusions
This study has demonstrated the use of four learning-models for constructing global approximations to facilitate singlediscipline nonlinear design optimisation.
The accuracies of each set of approximations were compared via numerical error analysis, graphical analysis and tested ability in the generation of accurate solutions for 60 different optimisation problems.It was found that the response surfaces registered the highest model fit error among all other models.It was clear that neither 1st nor 2nd order polynomial models proved capable to model the dual-phase-fluid driving JP nonlinear performance behaviour.Eventually, it was found that the 3rd order response surface models could be used to approximate the nonlinear design space within a reasonable margin of error.However, instabilities arose when considering higher order polynomials.This may have resulted from a lack of sample points to estimate all coefficients of the polynomial equation.
Kriging models in conjunction with a Gaussian correlation function (comprising of either 0th, 1st or 2nd order regression function), yield global approximations which were slightly more accurate than RSM.
The RBFANN models (including varied optima number of hidden neurons), showed a drastic decrease in prediction error.However, this improvement was not registered throughout all prediction ranges respective to all JP model cases.
Ultimately, the ensemble models (including a combination of model averaging and stacking methodologies) gave the best overall performance results throughout all prediction ranges.Such results illustrated the benefits of the stacked generalisation approach; thereby the combination of model information, including information from level-1 models with poor approximations capabilities.However, the performance increase attributed to the ensemble approach is not computationally cheap.The complexity and additional evaluations slowed down the modelling process.Furthermore, a comparison between the three optimisation algorithms has been performed for solving a total of 60 optimisation problems.It was noted that the three algorithms, including: (a) the 'interior-point' algorithm, (b) the Augmented Lagrangian genetic algorithm (ALGA) and (c) a hybrid formulation, produced similar results for the same optimisation algorithms, but varied for the four types of applied global approximation methods.
As expected, higher accuracy and consistency were obtained for the optimisation algorithms based on the predicted data from both RBFANN and ensemble models.
The ensemble model (having Kriging as the level-2 learning model) estimated the optimised design parameters, which closely matched the actual data, thereby proved that such model-based optimiser was a powerful optimiser.Thus, in situations where the dual-phase driving-fluid JP is considered, where the data or the structure of the fitness function are highly nonlinear, the stacked generalisation approach might be an adequate first approach.
This comprehensive study should serve as a model-based optimiser tool to assist in the design of dual-phase surface JPs and in particular to cases involving dual-phase fluid composition as driving fluids.Also, due to the nature of the input model variables, mainly the nozzle-to-throat clearance X (considered as an adjustable parameter), such modelbased optimisation tool has the potential to be implemented for on-line control purposes.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Error Analysis of RSM, Kriging RBFANN and Ensemble Models
See Table 6.( 34)

x 1 ,
… , x ns .Such r T is expressed by: Three parameters are included in the Kriging model: (a) , which is included in the correlation function ( ( = 1 , 2 , … , n T ) , (b) the process variance 2 and

Fig. 1
Fig. 1 Architecture of a decomposed RBFANN with hidden neurons

Fig. 2
Fig. 2 Concept diagram of ensemble modelling Fig. 3 to protect the proprietary nature of the design Model Injection body description Flow M-01 Standard converging-nozzle (single-orifice) No swirl-induced flow M-02 Converging-diverging nozzle (single-orifice) No swirl-induced flow M-03 Converging-diverging nozzle (multiple-orifice) horizontally drilled No swirl-induced flow M-04 Converging-diverging nozzle (multiple-orifice) drilled at an inclination angle of 5 • No swirl-induced flow M-05* A bi-nozzle design configuration (multiple-orifices) No swirl-induced flow M-06 Standard converging-nozzle (single-orifice) Swirl-induced flow M-07 Converging-diverging nozzle (single-orifice) Swirl-induced flow M-08 Converging-diverging nozzle (multiple-orifice) horizontally drilled Swirl-induced flow M-09 Converging-diverging nozzle (multiple-orifice) drilled at an inclination angle of 5 • Swirl-induced flow M-10* A bi-nozzle design configuration (multiple-orifices) Swirl-induced flow injection bodies (M-01 to M-04) shown in Fig. 3, while the design of model M-05 and M-10 cannot be shown to protect the proprietary nature of the design.Figure 3 highlights the differences attributed to the number of orifice holes and their positioning.

Fig. 3
Fig. 3 Schematics of the swirlbody mechanism and injection body for: a M-01 and M-06, b M-02 and M-07, c M-03 and M-08 and d M-04 and M-09

Fig. 4
Fig. 4 Schematic of a typical JP device configuration denoting the selected design variables 8 and 4), 3 throat-inlet angle At were considred for ( 13 • , 30 • and 50 • ), and the motive fluid GVFs included the values 0, 10, 20, 30, 40 and 50.Furthermore, each of the 10 data sub-sets was further divided into another two mutually exclusive subsets called the training and the validation sets.The first set (training set) included three quarters of the whole sub-set samples which

Fig. 5 Fig. 6
Fig. 5 Problem formulation for a multi-variable input and multiple non-linear response global approximations for the dualphase driving fluid JP device

Fig. 7
Fig. 7 Schematic illustration the build-up of this study orthogonal array

Fig. 8
Fig. 8 3D-surface plots, for models M-01 and M-06, having: a At = 13 • , b At = 30 • and c At = 50 • main categories: training, validation and testing.Each category was specifically assigned a portion out of the whole dataset, allowing 118 samples for training, 7 samples for validation, and 19 samples for testing.The training samples are presented to the network during training, while the network adjusts in correspondence to its error.The validation samples are used to measure network generalisation and to stop training when generalisation stops improving, while testing offers an autonomous measure of network performance during and after training.The Bayesian Regularisation backpropagation training function was used via the syntax 'trainbr'.This type of algorithm typically requires more time than the Levenberg-Marquardt algorithm and the Scaled Conjugate-Gradient algorithm.Though, such algorithms have the potential to result in good generalisation for difficult, small or noisy datasets.Using the Bayesian Regularization algorithm, training stops according to adaptive weight minimisation (regularization).

Fig. 9
Fig. 9 This problem neural network diagram

Fig. 12 A
Fig. 12 A comparison of the error analysis: a R 2 , b RMSE [%] and c MaxAPE [%], for the RSM, Kriging, RBFANN and Ensemble models and for each JP model

Table 1
Details about the considered test-setup models *Model not illustrated in

Table 3
Theta parameters for Kriging models for all case studies * The optimal parameters values for each case study

Table 4
Varied model settings to estimate the accuracy of the surrogate models

Table 10
Optimisation problem for the JP design models: M-04 and M-09

Table 12
X stack GVF At sole At stack X sole X stack GVF At sole At stack X stack GVF At sole At stack X sole X stack GVF At sole At stack X stack GVF At sole At stack X sole X stack GVF At sole At stack Algorithm X sole X stack GVF At sole At stack X sole X stack GVF At sole At stack Problem 1: maximise [Er] at 0 % GVF for M-04 & M-09 RSM 18 17 Interior-pt.X stack GVF At sole At stack X sole X stack GVF At sole At stack Problem 1: maximise [Er] at 0 % GVF for M-05 & M-10 RSM 18 18 Interior-pt.
sole sole