Energy Savings in EAF Steelmaking by Process Simulation and Data-Science Modeling on the Reproduced Results Energy Savings in EAF Steelmaking by Process Simulation and Data-Science Modeling on the Reproduced Results

Electric-Arc-Furnace (EAF)-based process route in modern steelmaking for the production of plates and special quality bars requires a series of stations for the secondary metallurgy treatment (Ladle-Furnace, and potentially Vacuum-Degasser), till the final casting for the production of slabs and blooms in the corresponding continuous casting machines. However, since every steel grade has its own melting characteristics, the melting (liqui- dus) temperature per grade is generally different and plays an important role in the final casting temperature, which has to exceed by somewhat the melting temperature by an amount called superheat. The superheat is adjusted at the ladle-furnace (LF) station by the operator who decides mostly on personal experience but, since the ladle has to pass from downstream processes, the liquid steel loses temperature not only due to the duration of the processes till casting but also due to the ladle refractory history. Simulation software was developed in order to reproduce the phenomena involved in a meltshop and influence downstream superheats. Data science models were deployed in order to check the poten- tial of controlling casting temperatures by adjusting liquid-steel exit temperatures at LF.


Introduction
The effect of superheat (SPH) on the potential of surface and sub-surface defects generation in the continuous cast products is known for many years. Ayata et al. [1] have pointed out the advantage of low SPH teeming upon product quality since 1995, and in the same year Thomas [2] has discussed the need to include SPH in thermal-mechanical models for continuous casting. Guyot et al. [3] have discussed the effect of SPH on surface quality issues in peritectic slabs. Jansto [4] has pointed out the effect of SPH on quality issues for Nb-based microalloyed steels. Jacobi and Schwerdtfeger [5] talking about ripple marks on cast steel surfaces have notified the importance of keeping SPH values low at casting; furthermore, if the superheat is too high for a grade, it may give rise to defects in the product. As 10°C increase per ton of liquid steel requires theoretically 2.2 kWh of electrical energy [6], one may realize the energy lost annually at casting if superheat is much larger than required. It is understood that a system that will notify the LF operator to adjust the liquid steel SPH in order to match the required casting temperature later on at the continuous caster is of paramount importance and is under research for a long time. Offline models based on heat transfer and thermodynamics have been developed in the past, but the focus is mostly appropriate to online statistical models which are faster to be generated and can be tuned. Nevertheless, due to the nature of liquid steel processing there is still a great deal of work on the subject to be carried out to reach this milestone. Gupta and Chandra [7] have developed a coupled heat transfer and a simple regression model in order to manage to control SPH at the caster floor; great attention was given to the holding time of liquid steel in the ladle, as well as the ladle turnaround time, that is, the time from teeming till next tapping for a ladle; a fourth-degree polynomial was derived as a regression formula simulating the initial temperature at the tundish (T tun1 ) as a function of the holding time (t), ladle life (LL), ladle turnaround time (TAT), exit temperature at the LF (T LF ), and previous liquid-steel in the tundish temperature (T past ): Where: Based on plant data the regression coefficient R 2 was found to be 0.73. Addes et al. [8] tried to control the casting superheat temperature by specific factors depending upon the heat sequence in the tundish, steel residence in the ladle, grade, ladle condition, tundish preheat time, and casting speed. On the other hand, Fredman et al. [9] applied the solution of the heat transfer equation in 2D in order to simulate the thermal state of the ladles. Tian et al. [10,11] developed a hybrid model based on the energy transfer at the LF and by deploying the ensemble ELM algorithm using the modified AdaBoost.RT method to train and validate the model by plant data that were collected from a 300 t LF. Chen et al. [12,13] have developed a model that recommends the liquid-steel exit temperature at LF in order to achieve the proper casting SPH; the model follows the input and output liquid steel energy in a ladle; it has been applied in a steelmaking plant. Sonoda et al. [14] have also developed a statistical model for predicting the liquid steel temperature at the casting floor. In the recent years, ladle-tracking systems [15,16] have been developed that follow the route of each ladle and in this way the refractory history can be recorded; consequently, a more reliable statistical model can be developed that will predict the casting floor superheat temperatures by time. A Monte-Carlo resembling simulation software was developed for this study in order to reproduce the phenomena involved in a meltshop with respect to process times, ladle-refractory history, vacuum degasser (VD) or not treatment, and 30 different grades for blooms and slabs produced in the Stomana plant, Pernik, Bulgaria. The purpose of this study was to illustrate the potential benefits of the installation of a ladle tracking system giving online data to a supervising data-science model that will ultimately notify the proper superheat adjustment to the LF operator. On this basis, two data-science models (a distributed random forest, DRF, and a gradient boosting machine, GBM) were deduced to analyze the reproduced data. DRF and GBM models were also deduced from existing plant data and even though these data did not come from a ladle tracking system, the analysis of variance exhibited an important statistical significance. Furthermore, a GBM model was derived for the prediction of the first liquid-steel SPH at the tundish following the problem formulation of Gupta et al. [7].

Simulation tests
The approach to come up with a solution to the problem consisted of two procedures: at first, a Monte-Carlo type of simulation [17] was developed in order to quantify the effect of various parameters upon the required superheat (SPH) correction at the ladle-furnace (LF) station, as well as the final attained SPH at the continuous casters; second, the generated results were fed into machine-learning systems in order to identify the degree of correlation of predicted superheat values at the casting machines with respect to the reproduced corrected SPH values at the LF. Table 1 presents the selected times for the processes involved in the computations: Although two different casters were involved in the computations, the same transfer-time values from LF or VD were used. The simulation software was developed exclusively in R [18], as it has unique programming instructions for simulation purposes. For example, the following two commands generate 10,000 EAF process-time values derived from a normal distribution with an average value of μ = 60.0 and a standard deviation value of σ = 10.0: HeatNr < − 1 : 10000 EAF _ Pr oc < − rnorm ( HeatNr, 60.0, 10.0 ) The greatest advantage R has is the very fast execution of instructions that are written in a form compatible for vectorization. Commands similar to (3) were written for the generation of process-time values for the rest of the processes illustrated in Table 1.
Twenty percent from heats produced by the EAF pass through VD treatment; furthermore, 97.5% from the VD-treated steels were selected to be billets (or blooms) and the rest slabs. The thermal history of a ladle refractory-insulation is of paramount importance for the amount of heat the contained liquid steel will absorb during reheating at the LF. Every time a ladle is placed in the position for tapping from the furnace, it may come from previous heat (almost immediately after casting) or from a refractory maintenance process that has taken some adequate time to resist the liquid-steel temperature increase at LF by absorbing some heat. The refractory insulation has also some life cycle so a new ladle may come into the production cycle at some point. Table 2 presents some plant data related to ladle refractory maintenance that were taken under consideration in the development of the simulation program together with the need for extra liquid-steel temperature (SPH). Figure 1 presents the process times that were taken under consideration in the simulation part; the total process time is the sum of (1) the actual process time, that is, the total time spent at EAF, LF, VD (if the grade is VD-treated), and CCM time (which is either the bloom caster, BCCM, and slab caster, SCCM, depending upon the nature of the cast product which may be billet/bloom or slab, respectively) and (2) the transfer time, that is, the time required for the liquid steel movement between the process stations.
Liquid steel is transferred from the EAF to the LF station, then it may be transferred directly to the caster or to the VD station if this type of treatment is required, and then finally to the CCM (BCCM or SCCM). Figure 2 depicts the time spent in this type of transfer and this is generated in the simulation software. Since VD-treated production is limited to 20% of the products, the average transfer values from LF to VD, and VD to CCM are small; on the other hand, since LF may send the ladle directly to CCM, or via VD, it is realized that two regions of points can be accumulated.     Furthermore, Figure 3 illustrates the partial process-time distributions of the five metallurgical stations: EAF, LF, VD, SCCM, and BCCM. One may notice that in case that the greatest percentage (80%) of the products is not VD-treated the related process-times distributions are broadly extended. These data sets are also generated during the simulation runs. Based on the ladle refractory maintenance data that are presented in Table 2, the simulation program generated the refractory history for the ladle just before EAF tapping in a probabilistic fashion that is illustrated in Figure 4. Depending upon the ladle refractory condition a SPH correction as presented in Table 2 was applied at the LF. Again, here, the great advantage of R upon very fast SPH correction computation should be noted: As described by (4), the vectorization potential of instructions like replicate can perform a computing set of commands-in a function like get_Ladle_SPH_Correction-for a large number of repetitions within a very short period of time. At Stomana meltshop, a great number of grades are produced. In this study, a total of 24 grades for blooms and 6 grades for slabs have been selected. Figure 5 depicts the average liquidus temperatures based on results that were gathered in the last 17 months. As seen on the graph, the grades are designated in the range of 1-24 for blooms and 51-56 for slabs (Figure 6).

Deploying the DRF and GBM models
The present chapter was based upon data provided from the Stomana meltshop which is hosted in a steelmaking plant located in Pernik, Bulgaria, and belongs to the SIDENOR/ VIOHALCO group of companies; furthermore, another set of data was reproduced by a Monte-Carlo simulation as explained in the previous section. The main task was to generate at least one supervised model that will identify critical parameters that affect the casting floor SPH by adjusting the liquid steel SPH at the LF. The H2O Flow package [19] was deployed for this type of work. This package is available for free from the web, and it is extensively used by many companies and scientific institutions worldwide. Two machine-learning algorithms (models) were used from this package: the distributed random forest (DRF) [20] and the gradient boosting method (GBM) [21]. A GBM is an ensemble of either regression or classification tree models. Both are forward-learning ensemble methods that obtain predictive results using gradually improved estimations. Boosting is a flexible nonlinear regression procedure that helps improve the accuracy of trees. Weak classification algorithms are sequentially applied to the incrementally changed data to create a series of decision trees, producing an ensemble of weak prediction models. While boosting trees increases their accuracy, it also decreases speed and user interpretability. The gradient boosting method generalizes tree boosting to minimize these drawbacks. Finally, the distributed random forest (DRF) is a variation of a general technique called ensemble learning. An ensemble model is composed of the combination of several smaller simple models (often small decision trees). The random forest approach tries to de-correlate the trees by randomizing the set of variables that each tree is allowed to use. The  final ensemble of trees is then bagged to make the random forest predictions [22]. In total, up to 100,000 cases (rows) of data were collected by the simulation software; each case included a heat produced at the EAF, processed at LF, and then directly transferred to the CCM, or after an extra treatment at the VD. The software was run in a DELL Alienware laptop with the Intel i7-6700HQ CPU (8 cores) @2.6 GHz, 16 GB RAM, running under a 64-bit Windows 10 Professional OS. At first, a cluster was generated by Java-Virtual-Machine 64-bit-software called by a program developed for this purpose in R in which the memory size, the number of CPU-cores, and the H2O Flow connection was initialized and established. Then the set of data (data frame) was imported into the cluster. Each time the data frame was split in two frames, in a random fashion: the training data frame consisted of the 75% of the data and the validation data frame consisted of the rest 25%. The models (algorithms) were trained from the 75% of the data and tested (validated) on the rest 25%, generating supervised models that are valid within a measurable statistical error. Two types of running programs were executed per algorithm: in the first part, a grid search was performed in order to deduce the proper tuning parameters that potentially minimized the validation error, and in the second part, the execution of the tuned model resulted in the derivation of the final supervised model. The grid search is time consuming as it requires a trial-and-error procedure. One final remark concerning the deployment of the H2O Flow package: it may be initiated by R and run in a stand-alone program in R, or run in a web-based framework (e.g., Mozilla Firefox); the latter was extensively used in this study.

Results and discussion
Preliminary investigations showed that from the initial set of parameters that were reproduced by the simulation runs, only a few were found critical enough to be included in this  type of study. Although, in data-science modeling, all parameters are included in the computations and the algorithms are allowed to select the most critical ones, in this analysis it was considered to decrease the number of most important parameters in order to have the ability to appraise better the phenomena involved. Table 3 presents the parameters that were finally selected in this part.
The parameter SPH_Overall_ESt was computed based on some assumptions for the temperature loss at the casting floor. Table 4 presents the values used for the calculation of this term.
The values Cte1, Cte2, etc., used in every simulated test were picked up randomly from a normal distribution with the corresponding (μ, σ) values as shown in Table 4; the formula used for parameter SPH_Overall_Est was: The SPH_HtInSeq_CORR term is randomly drafted from a normal distribution of (μ, σ) equal to (15.0, 2.0) for the heats that are cast first in a tundish casting sequence. From practice experience, an extra 15°C temperature is generally required for the first heat in a casting sequence as Name Description

SPH_Corr3
The liquid steel SPH at the LF exit

Holding_Time
The time liquid steel is contained in a ladle

VD_Proc_tot_Time
Total processing time of VD process (if any for a heat) SPH_HtInSeq_CORR SPH correction if the heat is supposed to be first in a sequence of castings in a tundish

VAR_Grade_Sel
The 30 selected grades for analysis

VAR_Grade_SPH_CCM
The casting floor SPH for the 30 selected grades as experienced in the current actual meltshop practice

SPH_Overall_ESt
The simulated expected/estimated SPH at the casting floor tundish comes from a preheating station at about 1100°C and absorbs some heat from liquid steel. Normally, the ladle-to-tundish liquid-steel transfer operation absorbs some heat; the Tund_Temp_Drop term corresponds to that effect and is also randomly chosen from a normal distribution with (μ, σ) equal to (35.0, 5.0) for all heats. Figures 7 and 8 illustrate the DRF and GBM results with respect to predicting the SPH_Overall_ESt term.
For both cases, the ANOVA (analysis of variance) [23] gave some good statistical figures; simplifying results for the GBM model only, the residual standard error was 3.159 on 99,998 degrees of freedom, the multiple R-squared was 0.9484, and the F-statistic gave 1.838·10 6 on 1 and 99,998 DF, with a p value <2.2·10 −16 . Normally, the GBM algorithm suffices to come up with a reasonable supervised model; however, the DRF algorithm was added for comparison purposes.   Table 5. Relative importance of variables for the prediction of SPH_Overall_ESt (GBM model). Table 5 shows the relative importance of the considered parameters for the prediction of SPH_Overall_ESt given by the GBM model; the recommended LF-exit SPH (SPH_Corr3) plays a great role, indeed. Ignoring the SPH_Overall_ESt term, one interesting analysis could be the prediction of the current practice superheats (actual SPH, term VAR_Grade_ SPH_CCM) at the casting floor for the selected grades; it should be pointed out that the selection of these grades is completely at random, that is, the simulated heats do not follow at all the SPH data from the current meltshop practice. Nevertheless, the deduced DRF and GBM supervised models exhibited a remarkable statistical significance: again, simplifying results for the GBM model only, the residual standard error was 4.988 on 99,998 degrees of freedom, the multiple R-squared was 0.5352, and the F-statistic was 1.152·10 5 on  Table 6 illustrates the relative importance of the parameters that were considered for the prediction of the current practice superheats (VAR_Grade_SPH_CCM) given by the GBM model. The great importance of the selected grade parameter (VAR_Grade_Sel) seems as expected due to the nature of this supervised model; however, SPH_Corr3 still appears to be very important. Apart from the analysis so far, one extra step was taken in order to test whether the derived results may be attributed to pure coincidence. In the position of the SPH_Overall_ESt term, the term SPH_tun1 was placed. This resembles more to the initial tundish temperature (1) of the Gupta et al [7] work, that exhibited a correlation coefficient R 2 = 0.73; indeed, after some manipulation the following equation was derived: One should recall that for the term SPH LF the known term SPH_Corr3 can be used. T liq is the liquidus temperature of the selected grades, and f is a function of the Holding_Time. The T liq and SPH past terms were randomly gathered from normal distributions with (μ, σ) equal to (1490.0, 10.0) and (40.0, 5.0), respectively. Figure 11 illustrates the derived GBM supervised model for the prediction of the SPH_tun1 term as computed in (6).
The ANOVA for the model results presented in Figure 11 exhibited the following statistical significance: the residual standard error was 2.446 on 56,189 degrees of freedom, the multiple R-squared was 0.9659, and the F-statistic was 1.589·10 6 on 1 and 56,189 DF, with a p-value <2.2·10 −16 . In Table 7, the recommended LF-exit superheat (SPH_Corr3) still appears to be of great importance.
Although 100,000 heats were simulated, a number of data had to be excluded from the data-science analysis in case that some SPH_tun1 predictions were outside the (10.0, 70.0) range. The statistical significance appears to be more than satisfactory, realizing that the parameters presented in Table 3 were taken under consideration with the only substitution of term SPH_tun1 in the place of term SPH_Overall_ESt. One final thing has to be mentioned: normally, Monte-Carlo type simulations converge to an average value (μ) and a standard deviation (σ) that tends to decrease as the number of repetitions (number of heats in this case) increases. Figure 12 describes these findings by simulating meltshop production from 1000 till 250,000 heats. The computed SPH values for μ + 3*σ exhibit a tendency to decrease as the number of heats increases. At the same time, the reduction of the expected SPH values, as the number of heats increases, seems to point out that there is a tendency for improvement once some logic is involved in the recommendation of LF exit SPH temperatures.  Table 7. Relative importance of variables for the prediction of SPH_tun1 (GBM model).

Conclusions
A Monte-Carlo simulation software was developed in order to reproduce meltshop data concerning process times, ladle refractory history, and effect on liquid-steel temperature loss at the casting floor. Data-science modeling was applied in order to deduce supervised algorithms for the prediction of casting floor superheats based on critical parameters from reproduced and plant data. The results were also related with findings from a published work. In most cases, the derived supervised models exhibited a remarkable statistical significance, which seems to be too difficult to occur due to pure coincidence. It is very likely that a ladle tracking system will greatly result in a better achievement of desired casting floor superheats, and therefore, important economic savings.