MATLAB-Assisted Regression Modeling of Mean Daily Global Solar Radiation in Al-Ain, UAE

Many researchers have modeled weather data using classical regression, time-series regression and Artificial Neural Networks (ANN) techniques. MATLAB is used in handling data and writing task specific codes for models as well as in performing statistical analysis and curve fitting works. This is due to the dynamic nature of MATLAB and its rich toolboxes that cover almost every aspect of mathematical and statistical engineering applications. Numerous authors (Abdalla & Feregh, 1988; Assi & Jama, 2010; Akinoglu & Ecevit, 1990; Al Mahdi et al., 1992; Ampratwum & Drovlo, 1999; Elagib & Mansell, 2000; Khalil & Alnajjar, 1995; Menges et al., 2006; Newland, 1988; Podesta’ et al., 2004; Sahin, 2007; Samuel, 1991; Ulgen & Hepbasli, 2002) to count few developed empirical regression models to predict the monthly average daily global solar radiation (GSR) in their region using various parameters. The mean daily sunshine duration was the most commonly used and available parameter. The most popular model was the linear model by Angström-Prescott (Podesta’ et al., 2004; Assi & Jama, 2010) which establishes a linear relationship between GSR and sunshine duration with knowledge of extra-terrestrial solar radiation and the theoretical maximum daily solar hours. Many studies with empirical regression models were done for diverse regions around the world. (Menges et al. 2006) reviewed 50 GSR empirical models available in literature for computing the monthly average daily GSR on a horizontal surface. They tested the models on data recorded in Konya, Turkey for comparison of model accuracy. The number of weather parameters varied between models. The diverse regression models used include linear, logarithmic, quadratic, third order polynomial, logarithmic-linear and exponential and power models relating the normalized GSR to normalized sunshine hours. Other models included in Menges work used direct regression models involving various weather parameters such as precipitation, cloud cover, etc., in addition to geographical data (altitude, latitude). (Şahin, 2007) presented a novel method for estimating the solar irradiation and sunshine duration by incorporating the atmospheric effects due to extraterrestrial solar irradiation and length of day. The author compares his model with Angström’s equation with favourable advantages as his method does not use Least Square Method in addition to having no procedural restrictions or assumptions. (Ulgen & Hepbasli, 2002) developed two empirical correlations to estimate the monthly average daily GSR on a horizontal surface for Izmir, Turkey. Their models resemble Angström type equations. They


Introduction
Many researchers have modeled weather data using classical regression, time-series regression and Artificial Neural Networks (ANN) techniques.MATLAB is used in handling data and writing task specific codes for models as well as in performing statistical analysis and curve fitting works.This is due to the dynamic nature of MATLAB and its rich toolboxes that cover almost every aspect of mathematical and statistical engineering applications.Numerous authors (Abdalla & Feregh, 1988;Assi & Jama, 2010;Akinoglu & Ecevit, 1990;Al Mahdi et al., 1992;Ampratwum & Drovlo, 1999;Elagib & Mansell, 2000;Khalil & Alnajjar, 1995;Menges et al., 2006;Newland, 1988;Podesta' et al., 2004;Sahin, 2007;Samuel, 1991;Ulgen & Hepbasli, 2002) to count few developed empirical regression models to predict the monthly average daily global solar radiation (GSR) in their region using various parameters.The mean daily sunshine duration was the most commonly used and available parameter.The most popular model was the linear model by Angström-Prescott (Podesta' et al., 2004;Assi & Jama, 2010) which establishes a linear relationship between GSR and sunshine duration with knowledge of extra-terrestrial solar radiation and the theoretical maximum daily solar hours.Many studies with empirical regression models were done for diverse regions around the world.(Menges et al. 2006) reviewed 50 GSR empirical models available in literature for computing the monthly average daily GSR on a horizontal surface.They tested the models on data recorded in Konya, Turkey for comparison of model accuracy.The number of weather parameters varied between models.The diverse regression models used include linear, logarithmic, quadratic, third order polynomial, logarithmic-linear and exponential and power models relating the normalized GSR to normalized sunshine hours.Other models included in Menges work used direct regression models involving various weather parameters such as precipitation, cloud cover, etc., in addition to geographical data (altitude, latitude).(Şahin, 2007) presented a novel method for estimating the solar irradiation and sunshine duration by incorporating the atmospheric effects due to extraterrestrial solar irradiation and length of day.The author compares his model with Angström's equation with favourable advantages as his method does not use Least Square Method in addition to having no procedural restrictions or assumptions.(Ulgen & Hepbasli, 2002) developed two empirical correlations to estimate the monthly average daily GSR on a horizontal surface for Izmir, Turkey.Their models resemble Angström type equations.They seasonal component accounting for the annual periodicity and a linear trend.The residual error is studied using Box-Jenkins ARIMA modeling techniques for the sake of further enhancing the predicted solar radiation data.The resultant noise residual error renders Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) plots within the 95 % confidence interval bounds and a quasi-normal noise error with zero mean and constant variance.The stationary form of the resulting time-series and the white-noise type residual error provide extra confidence of the long-term prediction accuracy of the estimated model.The MATLAB codes written by authors and their group involve employing various data modeling techniques such as linear regression, curve fitting, detrending, FFT algorithm, statistical data analysis as well as Box-Jenkins method to predict a time-series model for the residual error.The current work on solar radiation data in Al-Ain City, UAE will be correlated with other ongoing studies by authors to come up with solar radiation prediction models for the UAE cities of Abu Dhabi and Sharjah.The final objective is to come up with a good national weather model capable of predicting the mean monthly GSR for the whole UAE within an acceptable prediction error.

Methodology
The weather database meteorological data provided by the National Center of Meteorology & Seismology (NCMS) in Abu Dhabi) for the periods between 1995 and 2007 was divided into two sets: A model data set with daily record of the variables: air temperature, wind speed, sunshine hours and relative humidity for the years 1995-2004 (10 years), and a test data set for the years of 2005-2007.All regression modeling and simulation work is done using MATLAB tools and with the help of the statistical software packages Minitab (Minitab, 2010) and SPSS (IBM SPSS Statistics, 2010).The next section will explain the modeling procedure for both the classical empirical regression and time-series regression techniques.Validation of model accuracy is presented along with corresponding error statistics.Then, a comparison between both empirical approaches is made.

Procedure
In this section we discuss two methods used to generate the weather data models for Al-Ain, UAE for years 1995-2004.Selected models are validated with data from years 2005-2007.The two models discussed in this chapter are: 1. Classical regression methods (Empirical models) 2. Time-series regression model (Regression with Box-Jenkins ARIMA method) The use of MATLAB in each approach will be fully discussed along with the details of commands used and computation results for the models under investigation.

Classical regression modelling approach (empirical models)
The empirical regression models are generated with help of MATLAB and verified using SPSS.The appropriate MATLAB commands used are addressed with each procedural step.

Extra-terrestrial radiation parameters
Mean daily values of GSR data are calculated from the knowledge of the latitude and longitude in the city of Al-Ain (Latitude = 24 0 16' and Longitude = 55 0 36 ' ).The extraterrestrial solar radiation on horizontal surface in kWh/m 2 (G 0 ) and theoretical maximum daily sun hours (S 0 ) are calculated from the equations (Assi et al., 2010): where n is the day index, ω s the mean sunrise hour angle for the month, φ the latitude, and δ the declination angle.G SC is a constant representing the daily extraterrestrial solar radiation on horizontal and is given by 1.367 kWh/m 2 .The declination angle (δ) is defined by the equation: [ ] 1 2 n 284 23.45 sin sin sin 180 365 The GSR and SSH data are next normalized to the extraterrestrial values G 0 and S 0 described in eq. ( 1)-( 2), and the resulting normalized data arrays denoted by clearness index (RSSH=GSR/G 0 ) and Sunshine duration ratio (RSSH = SSH/S 0 ) are stored in excel file solardata.xlsx in the form: >> gsrdata=[N, S0, G0, RSSH, RGSR]; >> s=xlswrite('solardata.xlsx',gsrdata,'Sheet1','A1:E365'); The RGSR-RSSH data is then fitted to different nonlinear regression models as per Table 1.

MATLAB application in nonlinear regression
The nonlinear regression in MATLAB is performed using two tools: 1. Interactive nonlinear regression toolbox (nlintool ) 2. Nonlinear mixed-effects estimation (nlmefit) The procedure followed in each approach is discussed in the next sections including sample output results from the weather GSR modeling.Alternative approaches include specifically written MATLAB m-files or the use of commercial statistical software packages such as SPSS, Minitab or SAS.All our MATLAB results agree well with results obtained using SPSS and Minitab.
Table 2 shows the coefficients obtained for each regression model shown in Table 1.The statistical error parameters resulting from each regression model can be computed using SPSS, Minitab or using MATLAB from the expressions displayed in section 2.3.The nonlinear regression in MATLAB can also be performed using the nonlinear mixedeffects estimation (nlmefit).This tool fits the model by maximizing an approximation to the marginal likelihood with random effects integrated out, assuming that [See MATLAB help] random effects are multivariate normally distributed and independent between groups, and that observation errors are independent, identically normally distributed, and independent of the random effects.As an example, the format for the log-linear regression equation is as follows: >> model = @(phi,t)(phi(:,1)+ phi(:,2).*t+ phi(:,3).*log(t));>> phi0 =[0 0 0]; >> group=[1:365]; >> [beta,psi,stats] = nlmefit(RSSH(:),RGSR(:),group,[],model,phi0) The arguments of the MATLAB function 'nlmefit' used for our regression are: RSSH: is an n-by-1 array of n observations on 1 predictor.RGSR: is an n-by-1 vector of responses.Group: is a grouping variable indicating m groups in the observations.Here, we enter size of rows of predictors, i.The regression model parameters obtained using the 'nlmefit' function match with the values computed using the "nlintool" function and listed in Table 2.The best regression model should yield the lowest value of AIC, BIC and RMSE in addition to the autocorrelation test requirements discussed later in the chapter.Fig. 2 shows the ten-year mean daily GSR computed from the log-linear regression model with estimated coefficients listed in Table 2.Note the very good agreement between regression model and measured data.The error statistics yield a deterministic coefficient of R 2 = 97.74% in addition to low statistics RMSE = 0.2104, MBE = -0.0755,MABE = 0.1872, and MAPE= 3.11 %.A comparison of the remaining regression yields excellent agreement with measured data with R 2 values exceeding 96 %.

Time-series regression modelling with ARIMA approach
The time-series regression modelling approach makes use of the four weather parameters measured daily over a period of 10 years, i.e. 1995-2004 in Al-Ain, UAE.These parameters are: 1. Mean daily temperature ( 0 C) abbreviated as T 2. Mean daily wind speed (knots) abbreviated as W 3. Daily sunshine hours abbreviated as SSH 4. Mean daily relative humidity (%) abbreviated as RH 5. Mean daily global solar radiation (kWh/m2) abbreviated as GSR The modeling procedure steps are as follows: Step 1. Explore the form and characteristics of the dependent variable (GSR) and the predictors (T, W, SSH and RH) by looking at their time-series plots with the help of MATLAB.These plots will help in identifying the behavior of each variable and to check the trend and seasonality of GSR data.
Step 2. Compute the descriptive statistics of the four independent variables (T, W, SSH, and RH) and the dependent variable (mean daily GSR).The correlation between these variables is shown in The Pearson correlation values between the response (GSR) and the four predictor variables (T, W, SSH, RH) are found in MATLAB using the command: >> corr(T, W, SSH, RH, GSR) Table 3 shows that the temperature and sunshine hours have dominant effect on the GSR parameter followed closely by relative humidity and with less influence by wind speed.On the other hand, measures of shape are found using quantiles (0 < p < 1) or percentiles (0< p <100).The percentiles for a data sequence xdata are found in the Statistics toolbox using the command: >> y = prctile(xdata,p); % p= percentile needed >> y = quantile(xdata,p); % p= quantileneeded The shape of a data distribution is also measured by the Statistics Toolbox functions skewness, kurtosis, and, more generally, moment.
Table 5 shows the descriptive statistics obtained for the mean daily global solar radiation data for Al-Ain, UAE for period 1994-2005.
Step 3. Fig. 4 shows the time-series plot of the mean daily GSR for years 1995-2004 with leap days excluded.The plot shows a clear periodicity of one year (365 days).
Step 4. The partial least square regression technique is used to model the relation between GSR data for ten years (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) and the four aforementioned dependent weather variables.The model equation obtained using MATLAB or using either software packages SPSS or Minitab is of the form:  4) The regression model of eq. ( 4) yields a deterministic coefficient R 2 = 0.7824 and MSE = 0.4247.In MATLAB, the regression between GSR and predictor variables (T, W, SSH, RH) is done using the Statistics Toolbox command >> regcoef= robustfit(xdata, GSR ) Where 'xdata' is an array matrix containing the four predictors, i.e. xdata=[T, WS, SSH, RH], and 'regcoef' is a vector returning the regression coefficients between GSR and the four predictor weather variables resulting from the robust multi-linear regression process.The correlation between the regression component GSR regression and GSR can be found using >> corr(GSR, GSRregression)

www.intechopen.com
Engineering Education and Research Using MATLAB 206 Step 5. Trend and Seasonality components of the residual regression error The residual error from the multivariate linear regression model in eq. ( 4), denoted by GSR residue1 (t) = GSR -GSR regression (t), is shown in Fig. 5.The objective is to first examine it for trends and/or seasonality and decompose it to yield: Fig. 5. Residual term of mean daily GSR after subtracting regression model of eq. ( 4) The trend component is extracted in MATLAB using the statement >> gsrdet=detrend(gsrresidue1`); % This gives GSRresidue1 with trend removed >> gsrtrend=gsr-gsrdet; The trend equation can be deduced using the cftool command in MATLAB that invokes the curve fitting GUI and then performs a linear polynomial fit to yield the linear trend equation: trend The mathematical model for the seasonal component, i.e.GSR seasonal (t), is found from MATLAB using the FFT algorithm.The resulting model is: The MATLAB statement to generate the Fourier Coefficients for a periodic sequence (period=NP=365) of data x[n], n=1, 2,.., N=3650 is done using: >> N=3650; >> y=fft(x,N); % here x is an array representing the de-trendedGSR residual The overall decomposition leads to the following regression model: where the multivariable linear regression, trend and seasonal components are described in eqs.( 4), ( 6) and ( 7), respectively.The overall residual error of the time-series regressionbased model is shown in Fig. 6.The Box-Jenkins models (Box & Jenkins, 1994) are only applicable to stationary time series.The identification of an appropriate box-Jenkins model for a particular time-series would first require a check for stationarity.If the residual term exhibits a normal distribution behavior with zero mean and constant variance then it resembles white noise error and there is no need for further ARIMA modeling.The behavior of the ACF and PACF plots can help identify the ARMA model that best describes the resulting stationary time-series.Table 7 summarizes the ARMA model selection criteria (Enders 2010).In general, if the ACF of the time series value either cuts off or dies down fairly quick, then the time series values should be considered stationary.On the other hand, if the ACF dies down extremely slow, then the time series values may be considered non-stationary.If the model is adequate then these plots should show all spikes within the 95 % Confidence Interval (CI) bounds ( 1.96 / N ± ) where N is the sample size.If the series results non-stationary, one could try to apply differencing or log-transformation and then check if these make the series stationary.A stationary time-series would have a quasi-normal distribution with zero mean and constant variance.Fig. 7 shows the ACF and PACF plots for the residual component as obtained from Minitab.Note that the ACF decays in an oscillating form after few lags within CI bounds thus implying a fairly stationary time-series.Differencing the residual component made the ACF and PACF more unstable with further deterioration of their behavior thus implying that differencing is inappropriate for this data.The PACF plot cuts off quickly after 1 lag indicating that an AR(2) or higher could be adequate.

Model
ACF PACF AR (p) Spikes decay towards zero.Coefficients may oscillate.
Spikes decay to zero after lag p MA (q) Spikes decay to zero after lag q Spikes decay towards zero.Coefficients may oscillate.

ARMA (p,q)
Spikes decay (either direct or oscillatory) to zero beginning after lag q Spikes decay (either direct or oscillatory) to zero beginning after lag p Table 7. Behavior of ACF and PACF for each of the general non-seasonal models

Description of ARMA modeling process
The time-series analysis of the residual stochastic component of the mean daily GSR data is conducted using SPSS and Minitab with help of MATLAB.The nature of the ARMA model used is first described followed by explanation of the model selection process with diagnostic measures used to validate the selected model.The non-seasonal autoregressive-moving average of order (p, q), e.g.ARMA(p, q) is described by the equation (Enders, 2010) єє є є − −− are statistically independent random shocks assumed to be randomly selected from a normal distribution with zero mean and constant variance.The ARMA(p, q) model parameters ( , 1, 2,..., , 1, 2,..., ) ij i p and jq φ θ = = described in eq. ( 9) are estimated using Least Square methods with known values of y t representing the residual component data.The estimated parameters of the selected ARMA(p, q) models should have t-values higher than 2.0 in order to be judged significantly different from zero at the 5 % level.Moreover, the coefficients should not be strongly correlated with each other in order to yield a parsimonious ARMA model, whilst passing the diagnostic checks.A parsimonious model is desirable because including irrelevant lags in the model increases the coefficient standard errors and therefore reduces the t-statistics.Models that incorporate large numbers of lags tend not to forecast well as they fit data specific features, explaining much of the noise or random features in the data.Therefore, model coefficients with p-values higher than 0.05 are insignificant and should be eliminated to avoid over-fitting as they have little effect on the prediction model (Enders, 2010).Different models can be obtained for various combinations of AR and MA individually and collectively.The best model is selected using the following diagnostics: (a) Low Akaike Information Criteria (AIC)/ Schwarz-Bayesian Information Criteria (SBC, BIC) These model parameters are dependent on the data sample size, model mean-square error (MSE), and the (p, q) values of the ARMA model.Their definition can be found in MATLAB help (MATLAB, 2010) or (Enders, 2010).SBC selects the more parsimonious model and is better than AIC for large samples.The best model should have the lowest AIC/SBC value and the least MSE.

(b) Plot of residual autocorrelation function (ACF)
The appropriate ARMA model, once fitted, should have a residual error whose ACF plot varies within the 95% CI bounds ( 1.96 / N ± ) where N is the number of observations upon which the model is based.

(c) Non-significance of auto correlations of residuals via Portmanteau tests (Q-tests based on Chi-square statistics) such as Box-Pierce or Ljung-Box tests(White noise tests)
Once the optimal ARMA(p,q) model for the residual GSR time-series is selected, there is a need to check the white noise test if the ACF/PACF correlograms show significant spikes at one or more lags that could be just by chance.These tests indicate whether there is any correlation in the time-series or whether the abnormal spikes encountered in the ACF and PACF of the residual error are just a set of random, identically distributed variables overall.The Ljung-Box Q-statistics can be used to check if the residuals from the ARMA(p, q) model behave as a white-noise process (Ljung & Box, 1978;Enders, 2010) which yields a more accurate variance of ACF [variance becomes (n-K)/n 2 instead of 1/n] compared to the statistic defined earlier by (Box & Pierce, 1970).K is the degrees of freedom representing the maximum lags considered (normally 20).n = N-d with N being the number of data points and d the degree of differencing (no differencing is assumed in this work so d = 0), and r k is the sample ACF at lag k.Under the null-hypothesis that all values of autocorrelation r k = 0, the Q Statistic is compared to critical values from chi-square distribution χ 2 distributed with K-degrees of freedom.If the model is correctly specified, the residuals should be uncorrelated and Q should be small and consequently the probability value should be large.A white noise process would ideally have Q = 0. Therefore, if Q > 2 , DF α χ at the specified DF and significance level α, then the we can reject the null hypothesis.
Several ARMA models were analyzed based on the recommended criteria and two models surfaced out to be the best, namely, ARMA (2,1) and ARMA (4,3).Any further increase in the q-coefficients above 3 (q > 3) lead to over-fitting as witnessed by p-values exceeding 0.05.The best parsimonious model obtained based on the suggested diagnostics is the ARMA(2,1) model.Table 8 shows the ARMA (2, 1) model parameters obtained from SPSS.All the estimated model coefficients have p -value less than 0.05 ( α= 5 %).This implies that all the coefficients of the selected Box-Jenkins model are significant since the null hypothesis H 0 : φ= 0 (AR) or θ=0 (MA) can be rejected for the preset significance level α (can be chosen as 0.05 or 0.01).Table 9 shows the model fit statistics for the ARMA (2,1) model.The Ljung-Box statistic value Q = 15.462 at lag 18 has a corresponding p = 0.419 > 0.05.Hence, we cannot reject the adequacy of model by setting α = 0.05.The resulting ARMA (2,1) model will therefore yield a residual error that resembles white noise error.The ACF and PACF plots of the residual error of the ARMA(2, 1) model, shown in Fig. 8, vary within the 95% CI bounds.The spikes in the ACF and PACF at lag 7 are due to random events and thus cannot be explained.However, since it is a 95% confidence interval, one can expect this to happen once in every twenty lags and so we will not be concerned with this.Another method to ensure that the selected ARMA model yields a stationary residual error is by checking the Unit-root rule for stationarity.Assume that the lag operator in eq. ( 9) is B.

Parameter
Then By t = y t-1 .Eq. ( 9) can then be written in the form (assume zero mean; δ= 0): The stationarity of the time-series sequence y t requires that all the roots of the AR(p) coefficients polynomial φ (B) should lie outside the unit circle (Enders, 2010).Next, the ARMA(2,1) model parameters φ(B) obtained from SPSS are used in MATLAB to perform the stationarity unit-root test as follows : >> where WN(t) is the final residual error which resembles white noise with zero mean and constant variance.

Comparison of Time-series regression model with measured data
Use cftool MATLAB GUI toolbox to study the correlation between the regression model without (eq (8)) and with ARMA modeling (eq ( 14)) and the measured data set for years 1995-2004.This MATLAB tool sketches the data and finds the polynomial fit and descriptive statistics as depicted in Fig. 9  Table 10 shows the result of running Fisher's two-tailed F-test on the Regression and Measured data for years 1995-2004.This variance ratio test follows the Null hypothesis (H 0 ) that the ratio between variances is equal to 1 against the alternative (H a ) that the ratio between variances is different from 1.The computed p-value shown in   GSR is the i th measured value, i p GSR is the predicted value from the regression model, and N is the total number of data points.

Conclusion
This chapter addresses the MATLAB tools employed in finding appropriate models for the mean daily and monthly global solar radiation in the city of Al-Ain, United Arab Emirates.A detailed description of tools used in obtaining the classical empirical regression models as well as time-series ARMA models is presented including computation of error statistics and use of diagnosis tests to validate the selected models.Excellent agreement is observed between the empirical regression and time-series prediction models and measured test data with high deterministic coefficients exceeding 90 % and low MBE, MABE, MAPE and RMSE error statistics that attest to the suitability of these models for long-term weather data prediction.The same MATLAB tools will be used to come up with prediction models for other UAE cities.

Fig. 1 .
Fig. 1.Nonlinear regression curve with a log-linear function between Clearness index (RGSR) and sunshine duration ration (RSSH) e. group=[1:365].Model: is a function handle that accepts predictor values and model parameters and returns fitted values.phi0: contains the initial values of the regression for equation parameters.The program outputs are: beta= contains the estimated regression model parameters psi= an r-by-r estimated covariance matrix for the random effects.By default, r is equal to the number of model parameters p. stats= returns a structure with the following parameters: • logl -The maximized log-likelihood for the fitted model • mse -The estimated error variance for the fitted model • aic -The Akaike information criterion for the fitted model • bic -The Bayesian information criterion for the fitted model • sebeta -The standard errors for beta • dfe -The error degrees of freedom for the model The following is the MATLAB output to the log-linear regression model:

Fig. 2 .
Fig. 2. Ten-year mean daily GSR data comparison between the empirical log-linear regression model and measured data in Al-Ain, UAE for years 1995-2004 2.1.1.4Validation of empirical regression models The six selected empirical regression models are validated by computing the predicted GSR data from these models using the test data set of years 2005-2007.The empirical models compare very well with measured data for the test data period as depicted in Fig. 3.All models yield deterministic coefficients values R 2 better than 98 %.The models also yield low values for RMSE, MBE, MABE and MAPE indicating their adequacy as weather prediction models for Al-Ain, UAE.The 3 rd -order polynomial regression model (Cubic) outperforms the other five empirical models with lowest error statistics and highest deterministic coefficient.

Fig. 3 .
Fig. 3. Comparison of monthly mean regression model GSR with measured data for test period of 2005-2007

Fig. 7 .
Fig. 7. Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) for residual component of the mean daily GSR data

Fig. 8 .
Fig. 8. ACF and PACF plots for the residual error of the ARMA(2, 1) model The MATLAB function "roots(c)" computes the roots of the polynomial P(x) whose coefficients are the elements of the vector c.If c has (n+1) components, the polynomial is P(x)=c(1)*x n + ... + c(n)*x + c(n+1).
Fig. 9. MATLAB "cftool" GUI comparison between regression and measured data for years 1995-2004 (a) with, and (b) without ARMA model Fig. 12 also shows comparison results obtained using Multi-layer Perceptron (MLP) and Radial Basis Function (RBF) Artificial Neural Networks (ANN) obtained by the co-author (Al-Shamisi & Assi, 2011) using the same model and test data sets.Fig. 12 shows a better prediction performance for the regression models over the ANN-based models for Al-Ain test data.The low error parameters (RMSE, MBE, MABE, MBE, and MAPE) obtained for the regression models provide a clear indication of the potential of these techniques for long term GSR data prediction.
Fig. 12. mean GSR data comparison for test data 2005-2007 in Al-Ain city

Table 3 .
It shows that wind speed has the least effect on GSR performance.From the time-series plots of GSR, T and SSH data we can observe a cyclical behavior.Wind and humidity data on the other hand are more random.The best model is obtained when all four independent variables are considered in the GSR regression model.

Table 4
The computation of the descriptive statistics can be done directly in SPSS or Minitab.In MATLAB we can either write a script m-file to compute all needed statistical parameters or we can use the already available functions in MATLAB and other functions that can be run under the Statistics toolbox.One MATLAB command that generates some statistical parameters is: >> [xds,yds] = datastats (xdata, ydata) which returns statistics for the column vectors xdata and ydata to the structures xdata and ydata, respectively.xdataandydata must be of the same size.The returned statistics include: sample size, maximum value, minimum value, mean, median, range, and standard deviation.MATLAB also has standalone functions for statistical values such as: max, min, length (sample size), mean, median, mode, std (standard deviation), and var (variance).The coefficient of variation is found from the ratio of std(x)/mean(x).One can also use hist(x) to get a histogram of the sample data.Other statistical measures need to be programmed or can be determined using the MATLAB Statistics toolbox.Table4lists the commands that can be employed to obtain the central and dispersion measures for the data sets under study.

Table 5 .
Descriptive data statistics for the measured weather data parameters for the city of Al-Ain(Years 1995(Years  -2004) )where N= 3650 data samples Fig. 4. Daily mean Global Solar Radiation (GSR) in Al-Ain, UAE for years 1995-2004

Table 10 .
Table10is greater than the significance level alpha=0.05 and thus we cannot reject the null hypothesis H 0 .The risk to reject the null hypothesis H 0 while it is true is 22.04% and 18.23 % for Regression and Regression with ARMA cases, respectively.Fisher's two-tailed F-test for the model data of 10 years Levene's equal variances test can also be applied as shown in Table11.As the computed pvalue in Tables 11 is greater than the significance level alpha=0.05,one cannot reject the null hypothesis H 0 .The risk to reject the null hypothesis H 0 in Levene's test while it is true is 55.62% and 43.96% for Regression and Regression with ARMA cases, respectively.Levene's test is mostly used in samples with normal distribution.

5 Validation of time-series regression model
In the following section, the time-series regression model is validated with measured test data set for years 2005-2007 in Al-Ain, UAE.A MATLAB code is written to implement the regression model for the input test data (1095 points).The generated regression model test data is then compared with the test data samples to check the correlation and to study the accuracy of the prediction model.An excellent agreement is noted between the mean daily regression and measured test data for years 2005-2007.The statistical error data computed with MATLAB yields deterministic coefficients R 2 = 92.6% and 90.77% with and without ARMA modeling, respectively, thus indicating good model prediction performance.MATLAB is again used to determine the monthly mean GSR data for the test data period(2005)(2006)(2007).The resulting monthly mean GSR data comparison between test (measured) and regression model for years 2005-2007 is shown in Fig.12.Note the excellent agreement between regression model and the test data.