Fast evolutionary image processing using Multi-GPUs

In this paper, the authors propose a fast evolutionary image processing system. The authors employ graphics processing unit (GPU) to automatic construction of tree-structural image transformation (ACTIT) for the purpose of reducing optimization time. Besides, the system calculates in parallel by using multiple GPUs for the fast processing. The optimization speed of the proposed system is several hundred times faster than that of the ordinary ACTIT. Experimental results show that the proposed system is effective.


I. INTRODUCTION
For realization of machine intelligence, image processing and recognition technologies are getting more and more important.However, it is difficult to construct image processing in each problem.Therefore, general-purpose method which constructs image processing without depending on problems becomes necessary.
On the other hand, the Evolutionary Computation [1][2][3] researches are applied to image processing popularly [4].Evolutionary Computation is an optimizing algorithm inspired by evolutional processes of living things.The authors have already proposed the system which automatically constructs an image-processing filter, Automatic Construction of Tree-structural Image Transformation (ACTIT) [5].In this system, ACTIT approximates a target image processing by combining tree-structurally several image-processing filters prepared in advance with Genetic Programming (GP) [3], a kind of Evolutionary Computation.We have ever proven that ACTIT is the effective method for many problems.
However, these complex image processing require much computing time to optimize tree-structural image processing in case we apply ACTIT to the problem which uses large and many images.Therefore, it is important that we get fast evolutionary image processing.There are some methods we get fast processing, improvement on algorithm, implement on fast hardware and parallel processing.
In this paper, we employ Graphics Processing Unit (GPU) [6] [7] as the fast hardware to ACTIT for realization of fast image processing optimization.Besides, the system calculates in parallel by using multiple GPUs, and it becomes faster.We experimentally show that the optimization speed of the proposed method is faster than that of the ordinary ACTIT.
This paper is composed of the following.Section 2 explains the related works, ACTIT, General Purpose GPU (GPGPU) and parallel processing in Evolutionary Computation.Section 3 describes ACTIT using multi-GPUs, which is the proposed system in this paper.Section 4 experimentally shows that the proposed system is effective.Finally, section 5 describes conclusions and future works.

A. ACTIT
Automatic Construction of Tree-structural Image Transformation (ACTIT) [5] is one of the researches of image processing using Genetic Programming (GP) [3].In this system, ACTIT automatically constructs a tree-structural image transformation by combining several image-processing filters prepared in advance with GP by referring to training image sets.The individual in GP is a tree-structural image transformation.A tree-structural image transformation is composed of terminal nodes which mean input images and non-terminal nodes which mean several kinds of image-processing filters and a root which means an output image.Fig. 1 shows the processing flow of ACTIT system.We prepare training image sets including several original images, their target images and weight images which indicate the important degree of pixel.We set parameters which GP uses to optimize tree structure, and give training image sets to ACTIT.Then ACTIT optimizes tree-structural image transformation with GP.As a result, we can get an optimized tree-structural image transformation which has the maximum fitness.
The tree-structural image transformation applies a certain processing to images which have the same characteristics.If the constructed tree-structural image transformation is appropriate, we can expect the similar effects to the images having the same characteristics as the learned ones.We have ever proven that ACTIT is an effective method for many problems, for instance, 2D image processing for detections of defects, 3D medical image processing [8] and so on.

B. General Purpose GPU (GPGPU)
The computational power of GPU on general graphics boards has been rapidly improving.Simple computational power per a unit time of GPU has been already superior to that of CPU.Former GPUs performed only fast fixed CG processing.However, the latest GPUs have graphics pipelines which we can freely program and replace to perform complex CG processing.So presently, the research which puts GPU to practical use for the general purpose of calculating, General Purpose GPU (GPGPU) becomes popular [9] [10].
Fig. 2 shows the progress of computational power of CPU and GPU in this several years.Simple computational power per a unit time of GPU has been already superior to that of CPU in this several years.And the growth rate per year of GPU has been already superior to that of CPU too.
Table I shows the history of GPGPU.The researches relating to General Purpose GPU (GPGPU) are recently started [9] [10].NVIDIA GeForce 3 series GPU which actually supports programmable shader architecture appeared in 2001.NVIDIA released a high level shader language Cg (C for graphics) [11] and a toolkit which includes its compiler in 2002.Cg is a 3D graphics language similar to C language and NVIDIA developed it with Microsoft.We formerly had to code by hand with assembly language to program using GPU.However we can presently generate optimized code which makes GPU made in NVIDIA the best use with Cg.In addition, GPGPU session is newly established in CG festival SIGGRAPH sponsored by Association for Computing Machinery (ACM) in 2005.
GPU programming is definitely different from CPU programming.For instance, GPU does not have random access memory space which can freely read and write when it calculates.GPU has the architecture specialized in parallel processing.These mean GPU is a stream processor.Therefore, GPGPU is effective for the applications which satisfy the following three demands.
¯Processed data are huge size.
¯There is little dependency between each data.¯The data can be processed in highly parallel.Therefore, GPGPU is effective for calculating matrix, image processing, physical simulation and so on.Recently, the programming languages specialized in GPGPU, Sh, Scout and Brook [12] are released.In addition, NVIDIA released CUDA (Compute Unified Device Architecture), which performs general-purpose application on GPU, in 2006.Thus it becomes easy that we program with GPU.

C. Parallel Processing in Evolutionary Computation
Many researches prove the performance of GA (Genetic Algorithm) and GP in parallel [13] [14].The following show the main parallel model in GA and GP.
1) Island Model:: In Island model, the population in GA and GP is divided into parts of population (Island).Each part of population is assigned to multiple processors.And each part of population is applied to normal genetic operator in parallel.Besides, exchanging individuals between parts of population (Migration) is performed.In Island model, each part of population independently evaluates.Therefore we expect that they keep variousness of the whole population.Fig. 3   multiple calculation nodes share calculating fitness of individuals which costs computing time.Fig. 4 shows Masterslave model.
3) Parallel-MGG Model:: Parallel-MGG model [15] is based on Master-slave model for the fast processing.In Parallel-MGG model, a control node sends two individuals as parents to calculation nodes.Each calculation node updates two individuals by using Minimal Generation Gap (MGG) [16] in parallel.And a control node receives two individuals of next generation as children from each calculation node.In Parallel-MGG, transporting time between nodes is reduced because of processing asynchronously.Fig. 5 shows Parallel-MGG model.

A. ACTIT using GPU
ACTIT requires much computing time to optimize treestructural image processing in case we apply it to the problem which uses large and many training image sets, because it repeats creating tree-structural image transformation and calculating fitness of them.The computing time of image transformation part of ACTIT accounts for 99 percents of the whole computing time.Therefore, we implement imageprocessing filters on programmable graphics pipelines of GPU for the purpose of reducing optimization time.
1) CPU and GPU Parts:: Fig. 6 shows the processing flow of CPU and GPU of the proposed system.First, the system loads training image sets and image-processing filters which is written in Cg and is compiled to GPU in initialization.The system performs alternation of generations part composed of selection, crossover and mutation operators of GP on CPU.And the system performs image transformation part and calculating fitness part on GPU.
CPU indicates image-processing filter and its target image which GPU performs from image filters of tree-structural image transformation to GPU one by one in image transformation part.GPU performs tree-structural image transformation by according to CPU.GPU calculates fitness of each individual which means tree-structural image transformation from difference between target image and output image which is a result of image transformation in calculating fitness part.Their processes are repeated until fitness of all individuals which are updated per iteration are calculated.CPU reads back fitness from GPU at once.
The system repeat these processes until fitness of the best individual becomes 1.0 or iteration number becomes max.Finally, we get an optimized tree-structural image transformation having the maximum fitness.We still get faster ACTIT by reducing the number of transporting data between CPU and GPU by loading training image sets firstly and returning fitness at once.We almost allow GPU to perform processing which costs computing time.2) Implement on GPU:: The programs written for CPU can not be applied to GPU as they are because GPU has the limitation CPU does not have.Therefore we implement only simple image-processing filters on GPU in this time.The following shows several image-processing filters implemented on GPU.
1. Calculation of current and neighboring pixels (Mean Filter, Sobel Filter and so on). 2. Calculation of two images (Difference Filter and so on).3. Calculation of average, maximum, minimum value in the whole image (Binarization with Average Value, Linear Transformation of Histogram and so on).We calculate fast average, maximum, minimum value in the whole image with parallel reductions.

B. Proposed Parallel Model
We make ACTIT using GPU perform in parallel by using multiple GPUs for the fast processing.Parallel processing is effective for ACTIT because the computing time of parallelable part of ACTIT accounts for the most of the whole computing time.Fig. 7 shows ACTIT using Multi-GPUs.The proposed system is composed of multiple PCs having one GPU.The factors which prevent the system from the fast processing are synchronous time and transporting time.There are no synchronous time because of processing asynchronously in Parallel-MGG.Besides, we improve Parallel-MGG for the purpose of reducing transporting time.In new Parallel-MGG, the waiting buffer is located in each calculation node.The individual is sent to the waiting buffer in advance.The next processing starts as soon as previous processing finished because of utilizing the waiting buffer.

A. Experimental Setting
We compare the optimization speed of the proposed system with ordinary ACTIT.The proposed system is composed of five PCs (one server and four client) connected by LAN network.Fig. 8 shows the outside of the system.
Table II shows the specification of PC.Intel Core 2 Duo E6400 CPU and NVIDIA GeForce 7900 GS GPU are utilized in these experiments.We program with GPU by using OpenGL and Cg.
We implement thirty-seven kinds of one or two input one output simple image-processing filters.GPU can calculate four planes (red, green, blue, and alpha) at the same time.Therefore, we prepare four training image sets.Each image size is ¢ , ½¾ ¢ ½¾ , ¾ ¢ ¾ , ½¾ ¢ ½¾ and ½¼¾ ¢ ½¼¾ , respectively.GP parameters are employed general value.Alternation model used by GP is Minimal Generation Gap (MGG) [16].

B. Experimental Results
1) Comparison of Ordinary ACTIT and ACTIT using One GPU:: First, we compared the optimization speed of ACTIT  using one GPU with ordinary ACTIT.Fig. 9 and Table III show experimental results.The horizontal axis denotes image size and the vertical axis denotes optimization speed, respectively.In Fig. 9 and Table III, values are optimization speed in case optimization speed of ordinary ACTIT which uses images ( ¢ ) is 1.0.
The optimization speed of the proposed method was about 10 times faster than that of the ordinary ACTIT with an image whose size was small.The optimization speed of the proposed method was about 100 times faster than that of the ordinary ACTIT with an image whose size was large.It is well known that GPU is effective in case it uses large data.Therefore, the proposed method is very effective because we almost use large and many training image sets in real problems.
Next, we experimented to explain influence of transporting  data and synchronous time between CPU and GPU.Fig. 10 shows details of the processing time.
"ACTIT using GPU (Load and Read)" loads and reads images whenever it calculates fitness of an individual.Loading and reading images influence the performance.As a result, the proposed method almost performed the process which costs large computing time on GPU.
2) Parallel of ACTIT using GPU:: We compared the optimization speed of ACTIT using multiple GPUs with ACTIT using one GPU.Parallel model are Master-slave, Parallel-MGG and Parallel-MGG with waiting buffer.GPU number is 1-4.Image size is only ½¾ ¢ ½¾.The optimization speed of ACTIT using four GPUs was about 3.8 times faster than that of ACTIT using one GPU in the proposed parallel model.The optimization speed of ACTIT using four GPUs was about 360 times faster than that of the ordinary ACTIT in the proposed parallel model.We experimentally showed that the proposed parallel method is efficient.
structural Image Transformation (I: Input Image, O: Output Image, F i : i-th Image-processing Filter) Application of Optimized Tree-structural Image Transformation to Non-training Image structural Image Transformation (I: Input Image, O: Output Image, F i : i-th Image-processing Filter) Application of Optimized Tree-structural Image Transformation to Non-training Image

Fig. 2 .
Fig.2.The computational power of CPU and GPU in this several years.

Fig. 8 .
Fig. 8.The outside of the system.

Fig. 11 andFig. 11 .
Fig.11and TableIVshow experimental results.The horizontal axis denotes processor number and the vertical axis denotes optimization speed, respectively.In Fig.11 and

is 1 . 0 .
Parenthetic values are optimization speed in case optimization speed of ACTIT using one GPU which uses images ( ½¾ ¢ ½¾) is 1.0.

TABLE I
shows Island model.2) Master-slave Model:: In Master-slave model, fitness of individuals in GA and GP are calculated fast in parallel.Master-slave model is generally composed of one control node (Master) and multiple calculation nodes (Slave).In Master-slave model, one control node performs genetic operators composed of selection, crossover and mutation.And

TABLE II THE
SPECIFICATION OF PC.

TABLE IV EXPERIMENTAL
RESULTS OF ACTIT USING MULTIPLE GPUS.

Table IV ,
values are optimization speed in case optimization speed of ordinary ACTIT which uses images ( ½¾ ¢ ½¾)