A Learning Algorithm Based on PSO and L-M for Parity Problem

Despite of the many successful applications of backpropagation (BP), it has many drawbacks. For complex problems, it may require a long time to train the networks, and it may run into local minima, and it may not train at all. Particle swarm optimization (PSO) algorithm is a global and stochastic algorithm based on population evolution which mode is simple, it is effective method for optimization of complex modeling. The paper uses PSO algorithm as learning algorithm of neural network used to solve parity problem. The PSO combined with Levenberg-Marquardt algorithm (modified BP algorithm) improve its performance. The simulation results show that this method not only increases the convergence rate of learning but it increases the likelihood of escaping from the local minima.


Introduction
The Back-propagation network(BP network) is the most representative model and has wide application in artificial neural network(J.L. McClelland, D. E. Rumelhart & the PDP Research Group).Owing to the hidden layer and learning rules in the BP network and the Error Back-propagation algorithm, the BP network can be used to recognize and classify nonlinear pattern (Zhou zhihua, Cao Cungen, 2004).Currently, the applications include handwritings recognition, speech recognition, text -language conversion, image recognition and intelligent control.As the BP algorithm is based on gradient descent learning algorithm, it has some drawbacks such as slow convergence speed and easily falling into local minimum, as well as poor robustness.In the last decade, a series of intelligent algorithms, which is developed from nature simulation, are got wide attention, especially the global stochastic optimization algorithm based on the individual organisms and groups makes a rapid development and gets remarkable achievements in the field of engineering design and intelligent control.The most famous are genetic algorithm, the PSO algorithm (Particle Swarm Optimization, PSO), etc.In this chapter, the research focuses on the integration of the improved PSO algorithm and the Levenberg-Marquardt (L-M) algorithm of neural network, and its application in solving the parity problem, which enhances the optimization property of the algorithm, and solves the problems such as slow convergence speed and easily falling into local minimum.
research.The search of PSO spreads all over the solution space, so the global optimal solution can be easily got, what is more, the PSO requires neither continuity nor differentiability for the target function, even doesn't require the format of explicit function, the only requirement is that the problem should be computable.In order to realize the PSO algorithm, a swarm of random particles should be initialized at first, and then get the optimal solution through iteration calculation.For each iteration calculation, the particles found out their individual optimal value of pbest through tracking themselves and the global optimal value of gbest through tracking the whole swarm.The following formula is used to update the velocity and position.
In the formula (1) and ( 2), i=1, 2, …, m, m refers to the total number of the particles in the swarm; d=1, 2,…, n, d refers to the dimension of the particle; .Moreover, in order to prevent excessive particle velocity, set the speed limit for V max , when accelerating the particle velocity into the level: The specific steps of the PSO algorithm are as follows: (1) Setting the number of particles m, the acceleration constant c1,c2，inertia weight coefficient w and the maximum evolution generation Tmax，in the n-dimensional space, generating the initial position X(t) and velocity V (t)of m-particles at random.(2) Evaluation of Swarm X(t) i. Calculating the fitness value fitness of each particle.ii.Comparing the fitness value of the current particle with its optimal value fpbest.If fitness < fpbest, update pbest for the current location, and set the location of pbest for the current location of the particle in the n-dimensional space.iii.Comparing the fitness value of the current particle with the optimal value fGbest of the swarm.If fitness < fGbest, update gbest for the current location, and set the fGbest for the optimal fitness value of the swarm, then the current location gbest of the particle is referred to as the optimal location of the swarm in the ndimensional space.
(3) In accordance with the formula (1) and (2), updating the location and velocity of the particles and generating a new swarm X(t＋1).
(4) Checking the end condition, if meet the end condition, then stop optimizing; Otherwise, t=t＋1 and turn to step (2).In addition, the end condition is referred to as the following two situations: when the optimizing reaches the maximum evolution generation Tmax or the fitness value of gbest meets the requirement of the given precision.

Improved Particle Swarm Optimization Algorithm
The PSO algorithm is simple, but research shows that, when the particle swarm is over concentrated, the global search capability of particle swarm will decline and the algorithm is easy to fall into local minimum.If the aggregation degree of the particle swarm can be controlled effectively, the capability of the particle swarm optimizing to the global minimum will be improved.According to the formula (1), the velocity v of the particle will become smaller gradually as the particles move together in the direction of the global optimal location gbest.Supposed that both the social and cognitive parts of the velocity become smaller, the velocity of the particles will not become larger, when both of them are close to zero, as w<1, the velocity will be rapidly reduced to 0, which leads to the loss of the space exploration ability.When the initial velocity of the particle is not equal to zero, the particles will move away from the global optimal location of gbest by inertial movement.When the velocity is close to zero, all the particles will move closer to the location of gbest and stop movement.Actually, the PSO algorithm does not guarantee convergence to the global optimal location, but to the optimal location gbest of the swarm (LU Zhensu & HOU Zhirong, 2004).Furthermore, as shown in the formula (2), the value of the particle velocity also represents the distance of particle relative to the optimal location gbest.When the particles become farther from the gbest, the particle velocity will be greater, on the contrary, when the particles become closer to the gbest, the velocity will be smaller gradually.Therefore, as shown in the formula (1), by means of the extreme variation of the swarm individual, the velocity of the particles can be controlled in order to prevent the particles from gathering at the location gbest quickly, which can control the swarm diversity effectively.Known from the formula (1), when the variability measures are taken, both the social and cognitive parts of each particle velocity are improved, which enhances the particle activity and increases the global search capability of particle swarm to a large extent.The improved PSO(MPSO) is carried out on the basis of standard PSO, which increases the variation operation of optimal location for the swarm individual.The method includes the following steps: (1) Initializing the position and velocity of particle swarm at random; (2) The value pbest of the particle is set as the current value, and the gbest for the optimal particle location of the initial swarm； (3) Determining whether to meet the convergence criteria or not, if satisfied, turn to step 6; Otherwise, turn to step 4; (4) In accordance with the formula (1) and (2), updating the location and velocity of the particles, and determining the current location of pbest and gbest； (5) Determining whether to meet the convergence criteria or not, if satisfied, turn to step 6; Otherwise, carrying out the optimal location variation operation of swarm individuals according to the formula (3), then turn to step 4;
In the formula (3), the parameter  refers to random number which meets the standard Gaussian distribution, the initial value of the parameter  is 1.0, and set = every 50 generations, where the  refers to the random number between [0.01, 0.9].From above known, the method not only produces a small range of disturbance to achieve the local search with high probability, but also produces a significant disturbance to step out of the local minimum area with large step migration in time.

Test Functions
The six frequently used Benchmark functions of the PSO and GA(genetic algorithm) (Wang Xiaoping & Cao Liming, 2002)are selected as the test functions, where the Sphere and Rosenbrock functions are unimodal functions, and the other four functions are multimodal functions.The Table 1 indicates the definition, the value range and the maximum speed limit Vmax of these Benchmark functions, where: x refers to real type vector and its dimension is n, x i refers to the No. i element.

Simulation and Analysis of the Algorithm
In order to study the property of the improved algorithm, the different performances are compared between the standard PSO and the improved PSO (mPSO) for Benchmark functions, which adopt linear decreased inertia weight coefficient.The optimal contrast test is performed on the common functions as shown in Table 1.For each algorithm, the maximum evolution generation is 3000, the number of the particles is 30 and the dimension is 10, 20 and 30 respectively, where the dimension of Schaffer function is 2. As for the inertia weight coefficient w, the initial value is 0.9 and the end value is 0.4 in the PSO algorithm, while in the mPSO algorithm, the value of w is fixed and taken to 0.375.The optimum point of the Rosenbrock function is in the position X=1 in theory, while for the other functions, the optimum points are in the position X=0 and the optimum value are f(x)= 0. The 50 different optimization search tests are performed on different dimensions of each function.The results are shown in Table 2, where the parameter Avg/Std refers to the average and variance of the optimal fitness value respectively during the 50 tests, iterAvg is the average number of evolution, Ras is the ratio of the number up to target value to the total test number.The desired value of function optimization is set as 1.0e-10, as the fitness value is less than 10e-10, set as 0.  2, except for the Rosenbrock function, the optimization results of the other functions reach the given target value and the average evolutionary generation is also very little.For the Schaffer function, the optimization test is performed on 2-dimension, while for the other functions, the tests are performed on from 10 dimensions to 30 dimensions.Compared with the standard PSO algorithm, whether the convergence accuracy or the convergence speed of the mPSO algorithm has been significantly improved, and the mPSO algorithm has excellent stability and robustness.

PSO mPSO
In order to illustrate the relationship between the particle activity and the algorithm performance in different algorithms, the diversity of particle swarm indicates the particle activity.The higher the diversity of particle swarm is, the greater the particle activity is, and the stronger the global search capability of particles is.The diversity of particle swarm is represented as the average distance of the particles, which is defined by Euclidean distance, and the distance L refers to the maximum diagonal length in the search space; The parameters of S and N represent the population size and the solution space dimension, respectively; p id refers to the No.d dimension coordinate of the No.i particle; d p is the average of the No.d dimension coordinate of all particles, so the average distance of the particles is defined as followed: For the 30-D functions (Schaffer function is 2-D), the optimal fitness value and particles' average distance are shown in Fig. 1-6, which indicates the optimization result contrast of the mPSO and PSO algorithm performed on different functions.As can be seen from the Figure 1-6, except for the Rosenbrock function, the average distance of particle swarm varies considerably, which indicates the particle's high activity as well as the good dynamic flight characteristic, which can also be in favor of the global search due to the avoidance of local minimum.When the particle approaches the global extreme point, the amplitude of its fluctuation reduces gradually, and then the particle converges quickly to the global extreme point.The mPSO algorithm has demonstrated the high accuracy and fast speed of the convergence.Compared with the corresponding graph of PSO algorithm in the chart, the the particles' average distance of the PSO algorithm decreases gradually with the increase of evolution generation, and the fluctuation of the particles is weak, and the activity of the particles disappears little by little, which is the reflection of the algorithm performance, i.e., it means slow convergence speed and the possibility of falling into local minimum.As weak fluctuation means very little diversity of particle swarm, once the particles fall into local minimum, it is quite difficult for them to get out.The above experiments, performed on the test functions, show that: the higher the diversity of particle swarm is, the greater the particle activity is, and the better the dynamic property of particle is, which result in stronger optimization property.Therefore, it is a key step for the PSO to control the activity of the particle swarm effectively.Besides, from the optimization results of mPSO algorithm shown in Table 2, it can be seen that, except for the Rosenbrock function, not only the mean of the other functions has reached the given target value, but also the variance is within the given target value, which shows that the mPSO algorithm has high stability and has better performance than the PSO algorithm.In addition, the chart has also indicated that, for the optimization of Rosenbrock function, whether the mPSO or the PSO algorithm is applied, the particles have high activity at the beginning, then gather around the adaptive value quickly, after which the particle swarm fall into the local minimum with the loss of its activity.Though the optimization result of mPSO for Rosenbrock function is better than the standard PSO algorithm, it has not yet got out of the local minimum.Hence, further study is needed on the optimization of PSO for Rosenbrock function.

BP Neural Network
Artificial Neural Network (ANN) is an engineering system that can simulate the structure and intelligent activity of human brain, which is based on a good knowledge of the structure and operation mechanism of the human brain.According to the manner of neuron interconnection, neural network is divided into feedforward neural network and feedback neural network.According to the hierarchical structure, it is separated into single layer and multi-layer neural network.In terms of the manner of information processing, it is separated into continuous and discrete neural network, or definitive and random neural network, or global and local approximation neural network.According to the learning manner, it is separated into supervision and unsupervised learning or weight and structure learning.There are several dozens of neural network structures such as MLP, Adaline, BP, RBF and Hopfield etc. From a learning viewpoint, the feedforward neural network (FNN) is a powerful learning system, which has simple structure and is easy to program.From a systemic viewpoint, the feedforward neural network is a static nonlinear mapping, which has the capability of complex nonlinear processing through the composite mapping of simple nonlinear processing unit.As the core of feedforward neural network, the BP network is the most essential part of the artificial neural network.Owing to its clear mathematical meaning and steps, Back-Propagation network and its variation form are widely used in more than 80% of artificial neural network model in practice.

BP Network Algorithm Based on PSO
The BP algorithm is highly dependent on the initial connection weight of the network, therefore, it has the tendency of falling into local minimum with improper initial weight.However, the optimization search of the BP algorithm is under the guidance (in the direction of negative gradient), which is superior to the PSO algorithm and other stochastic search algorithm.There is no doubt that it provides a method for the BP optimization with derivative information.The only problem is how to overcome the BP algorithm for the dependence of the initial weight.The PSO algorithm has strong robustness for the initial weight of neural network (Wang Ling, 2001).By the combination of the PSO and BP algorithm, it could improve the precision, speed and convergence rate of BP algorithm, which makes full use of the advantage of the PSO and BP algorithm, i.e., the PSO has great skill in global search and BP excels in local optimization.Compared with the traditional optimization algorithm, the feedforward neural network has great differences such as multiple variables, large search space and complex optimized surface.In order to facilitate the PSO algorithm for BP algorithm in certain network structure, the weight vector of NN is used to represent FNN, and each dimension of the particles represents a connection weights or threshold value of FNN, which consists of the individuals of the particle swarm.To take one input layer, a hidden layer and an output layer of FNN as an example, when the number of input nodes was set as R, the number of output nodes was set as S2 and the number of hidden nodes was set as S1, the dimension N of particles can be obtained from the formula (5): The dimension of the particles and the weight of FNN can be obtained by the following code conversion: When training the BP network through PSO algorithm, the position vector X of particle swarm is defined as the whole connection weights and threshold value of BP network.. On the basis of the vector X, the individual of the optimization process is formed, and the particle swarm is composed of the individuals.So the method is as follows: at first, initializing the position vector, then minimize the sum of squared errors (adaptive value) between the actual output and ideal output of network, and the optimal position can be searched by PSO algorithm, as shown in the following formula ( 6): The PSO algorithm is used to optimize the BP network weight (PSOBP), the method includes the following main steps: (1) The position parameter of particle can be determined by the connection weights and the threshold value between the nodes of neural network.(2) Set the values range [Xmin, Xmax] of the connection weights in neural network, and generate corresponding uniform random numbers of particle swarm, then generate the initial swarm.(3) Evaluate the individuals in the swarm.Decode the individual and assign to the appropriate connection weights (including the threshold value).Introduce the learning samples to calculate the corresponding network output, then get the learning error E, use it as the individual's adaptive value.(4) Execute the PSO operation on the individuals of the swarm (5) Judge the PSO operation whether terminate or not?No, turn to step (3), Otherwise, to step (6).Where, I is the unit matrix, λ is a non-negative value.Making use of the changes in the amplitude of λ, the method varies smoothly between two extremes, i.e., the Newton method (when λ 0) and standard gradient method (when λ).So the L-M algorithm is actually the combination of standard Newton method and the gradient descent method, which has the advantages of both the latter two methods.
The main idea of the combination algorithm of PSO and L-M (PSOLM algorithm) is to take the PSO algorithm as the main framework.Firstly, optimize the PSO algorithm, after the evolution of several generations, the optimum individual can be chosen from the particle swarm to carry out the optimization search of L-M algorithm for several steps, which operates the local depth search.The specific steps of the algorithm is as follows: (1) Generate the initial particle swarm X at random, and k = 0.
(2) Operate the optimization search on X with the PSO algorithm.
(3) If the evolution generation k of PSO is greater than the given constant dl, chose the optimal individual of particle swarm to carry out the optimization search of L-M algorithm for several steps.(4) Based on the returned individual, reassess the new optimal individual and global optimal individual by calculating according to PSO algorithm.(5) If the target function value meets the requirements of precision ε, then terminate the algorithm and output the result; otherwise, k = k + 1, turn to step (2).The above PSO algorithm is actually the particle swarm optimization algorithm (MPSO) by means of the optimal location variation of individual, and the particle number of particle swarm is 30, c1=c2=1.45,w=0.728.

XOR Problem
Firstly, taking the XOR problem (2 bit parity problem) as an example to discuss it.The XOR problem is one of the classical questions on the NN learning algorithm research, which includes the irregular optimal curved surface as well as many local minimums.The learning sample of XOR problem is shown in Table 3.

Sample
Input Output 1 00 0 2 01 1 3 10 1 4 11 0 Table 3. Learning sample of XOR Different network structures result in different learning generations of given precision10 -n (where: n is the accuracy index).In this part, there is a comparison between the learning generations and the actual learning error.The initial weight ranges among [-1, 1] in BP network and conducted 50 random experiments.
As shown in Table4, it displays the experimental results of 2-2-1 NN structure.The activation functions are S-shaped hyperbolic tangent function (Tansig), S-shaped logarithmic function (Logsig) and linear function (Purelin) respectively, and the learning algorithms include the BP, improved BP (BP algorithm with momentum, BPM) and BP based on the Levenberg-Marquardt (BPLM).Judging from the results for XOR problem, as the number of the neurons in the hidden layer is 2, the BP and improved BP (BPM, BPLM) can't converge completely in 50 experiments.It can also be seen that the performance of the improved BP is better than that of the basic BP, as for the improved BP, the BPLM performs better than BPM.In addition, the initial value of the algorithm has great influence on the convergence property of BP algorithm, so is the function form of the neurons in the output layer.3, the algorithm of PSO combined with BP or LM has good convergence property, which is hard to realize for single BP (including BPLM) or PSO algorithm.It's especially necessary to notice that the combination of the PSO and LM algorithm brings about very high convergence speed, and the algorithm of PSOBPLM converges much faster than PSOBP algorithm under the condition of high accuracy index.For example, when the network structure is 2-2-1 and the accuracy index is 10 and 20 respectively, the relevant mean time of PSOBP algorithm is 8.31 and 13.37, while for the PSOBPLM algorithm, the mean time is reduced to 0.73 and 1.97.Obviously, the PSOBPLM algorithm has excellent speed performance.

Parity Problem
The parity problem is one of the famous problems in neural network learning and much more complex than the 2bit XOR problem.The learning sample of parity problem consists of 4-8 bit binary string.When the number of 1 in binary string is odd, the output value is 1; otherwise, the value is 0. When the PSO (including the improved PSO) and PSOBP algorithm are applied to solve the parity problem, the learning speed is quite low and it is impossible to converge to the target value in the given iteration number.The PSOBPLM algorithm, proposed in this article, is applied to test the 4-8bit parity problem.The network structure of 4bit parity problem is 4-4-1, and the activation function of both hidden layer and output layer are Tansig-logsig, the same is with the activation function of NN for 5-8bit parity problem, and the parameter of NN for 5-8bit parity problem can be got from that of NN for 4bit parity problem by analogy.For each parity problem, 50 random experiments are carried out.The Table 7 shows the experimental result of the PSOBPLM algorithm for 4-8bit parity problem under various accuracy indices.In the As seen in Table 7, the integration of the PSO and L-M algorithm can solve the parity problem.The PSOBPLM algorithm makes full use of the advantage of the PSO and L-M algorithm, i.e., the PSO has great skill in global search and the L-M excels in local optimization, which compensate their own drawback and have complementary advantages.So the PSOLM algorithm has not only a good convergence, but also fast optimization property.

Conclusion
As a global evolutionary algorithm, the PSO has simple model and is easy to achieve.The integration of algorithm makes full use of their own advantage, i.e., the PSO has great skill in global search and the L-M excels in local fast optimization, which could avoid falling into local minimum and find the global optimal solution for the parity problem effectively.Meanwhile, the PSOBPLM algorithm has better efficiency and robustness.The only shortage of the algorithm is that it needs the derivative information, which increases the algorithm complexity to some extent.
No. d dimension component of the flight velocity vector of iteration particle i of the No. k times.k id x is the No. d dimension component of the position vector of iteration particle i of the No. k times; id p is the No. d dimension component of the optimization position (pbest) of particle i ; gd p is the No. d dimension component of the optimization position (gbest) of the swarm; w is the inertia weight; c1,c2 refer to the acceleration constants; rand() refers to the random function, which generates random number between [0, 1]

6 )
Where: N is the sample number of training set; T ik is the ideal output of the No. k output node in the No. i sample; Y ik is the actual output of the No. k output node in the No. i sample; C is the number of output neuron in the network.
refers to the serial number of the particles.

Table 2 .
Performance comparison between mPSO and PSO for Benchmark problemAs shown in Table

Table 4 .
Convergence statistics of BP, BPM and BPLM (Accuracy index n=3)The Table5shows the training results under different accuracy indices.The activation functions are Tansig-purelin and tansig-logsig respectively, and the NN algorithms include the BPLM and the PSO with limited factor (cPSO, Clerc, M., 1999).It can be indicated that the basic PSO, which is applied to the BP network for XOR problem, can't converge completely, either.In such circumstance, the number of the neurons in the hidden layer is 2.

Table 5 .
BP training results of BPLM, cPSO, and mPSOBesides, for the BP and the improved BP algorithm, it has never converged in the given number of experiments when the activation function of output layer in NN is Logsig, while the form of activation function has relatively minor influence on the PSO algorithm.It can be seen from the table that the form of activation function has certain influence on the learning speed of NN algorithm based on PSO, and the learning algorithm, which adopts Tangsig-Logsig function converges faster than that adopts Tangsig-Purelin function.The Table6shows the optimization results of the PSOBP and PSOBPLM algorithm, which are the combination of MPSO and standard BP (PSOBP) as well as the combination of MPSO and BP algorithm based on L-M (PSOBPLM) respectively.As seen in Table6, for the given number of experiments, the optimization results of the algorithms have all achieved the specified target value within the given iteration number.

Table 6 .
BP optimization results of PSOBP and PSOBPLM algorithmIn addition，the Table6has also displayed the average iteration number and the mean time of PSO and BP algorithm under different accuracy indices in 50 experiments.As shown in Table

Table 7 .
Table 7, the Mean, Max and Min represent the average iteration number, the maximum and minimum iteration number, respectively.The number below the PSO and BP column represents the iteration number needed by the corresponding algorithm.Result of PSOBPLM algorithm for 4-8 bit parity problem www.intechopen.com