Fuzzy System with Positive and Negative Rules

A typical rule in the rule base of a traditional fuzzy system contains only positive rules (weight is positive). In this case, mining algorithms only search for positive associations like “IF A Then do B”, while negative associations such as “IF A Then do not do B” are ignored. The concept of fuzzy sets was introduced by Zadeh in 1965 as a mathematical tool able to model the partial memberships. Since then, fuzzy set theory (Zadeh, 1973) has found a promising field of application in the domain of image processing, as fuzziness is an intrinsic property of images and the natural outcome of many image processing techniques. The interest in using fuzzy rule-based models arises from the fact that they provide a good platform to deal with noisy, imprecise or incomplete information which is often handled exquisitely by the human-cognition system. In a fuzzy system, we can generate fuzzy rule-bases of one of the following three types: (a) Fuzzy rules with a class in the consequent (Abe & Thawonmas, 1997; Gonzalez & Perez, 1998). This kind of rule has the following structure:

x Fuzzy System with Positive and Negative Rules Thanh Minh Nguyen and Q. M. Jonathan Wu

Introduction
A typical rule in the rule base of a traditional fuzzy system contains only positive rules (weight is positive).In this case, mining algorithms only search for positive associations like "IF A Then do B", while negative associations such as "IF A Then do not do B" are ignored.
The concept of fuzzy sets was introduced by Zadeh in 1965 as a mathematical tool able to model the partial memberships.Since then, fuzzy set theory (Zadeh, 1973) has found a promising field of application in the domain of image processing, as fuzziness is an intrinsic property of images and the natural outcome of many image processing techniques.The interest in using fuzzy rule-based models arises from the fact that they provide a good platform to deal with noisy, imprecise or incomplete information which is often handled exquisitely by the human-cognition system.In a fuzzy system, we can generate fuzzy rule-bases of one of the following three types: (a) Fuzzy rules with a class in the consequent (Abe & Thawonmas, 1997;Gonzalez & Perez, 1998).This kind of rule has the following structure: (1) Where, x=(x 1 ,…,x N ) is an N-dimensional pattern.A rn , n= (1,2,…,N), is an antecedent fuzzy set, and y is the class C m to which the pattern belongs.(b) Fuzzy rules with a class and a rule weight in the consequent (Ishibuchi et al., 1992;Ishibuchi & Nakashima, 2001): (2) Where, W r is the rule weight which is a real number in the unit interval [0,1].(c) Fuzzy rules with rule weight for all classes in the consequent (Pal & Mandai, 1992;Mandai & Murthy, 1992;Ishibuchi & Yamamoto, 2005): (3) Where, W rm , m=(1,2,…,M), is a rule weight for class C m .From Eq.(1), Eq.( 2) and Eq.( 3), we can see that a typical rule in the rule-base of a fuzzy system contains only positive rules (weight is positive).This is one of the limitations of a Then is class with and ... and is class with 1 1 traditional association mining algorithm (Han, 2006).In this case, mining algorithms only search for positive associations like "IF A Then do B", while negative associations such as "IF A Then do not do B" are ignored.In addition to the positive rules, negative rules (weight is negative) can provide valuable information.For example, the negative rule can guide the system away from situations to be avoided, and after avoiding these areas, the positive rules once again take over and direct the process.Interestingly, very few papers have focused on negative association rules due to the difficulty in discovering these rules.Although some researchers point out the importance of negative associations (Brin & Silverstein, 1997), only few groups (Savasere et al., 1998;Wu et al., 2002;Teng et al., 2002) have proposed a system to mine these types of associations.This not only indicates the novelty in the usage of negative association rules, but also the challenges in discovering them.
In this chapter, we propose a new fuzzy rule-based system for application in image classification problems.A significant advantage of the proposed system is that each fuzzy rule can be represented by more than one class.Moreover, while traditional fuzzy systems consider positive fuzzy rules only, in this chapter, we focus on combining negative fuzzy rules with traditional positive ones, leading to fuzzy inference systems.This new approach has been tested on image classification problems with promising results.

Positive and Negative Association Rules
Fuzzy systems can be broadly categorized into two families.The first includes linguistic models based on a collection of fuzzy rules, whose antecedents and consequents utilize fuzzy values.The Mamdani model (Mamdani et al., 1975) falls into this group.The second category, based on Sugeno-type systems (Takagi & Sugeno, 1985), uses a rule structure that has fuzzy antecedents and functional consequent parts.A typical rule in the rule-base of a fuzzy system is of the "IF-Then" type, i.e., "IF A then do B", where A is the premise of the rule and B is the consequent of the rule.This type of rule is called positive rule (weight is positive) because the consequent prescribes something that should be done, or an action to be taken.Another type of reasoning that has not been exploited much, involves negative rules (weight is negative), which prescribe actions to be avoided.Thus, in addition to the positive rules, it is possible to augment the rule-base with rules of the form, "IF A, Then do not do B".Let us consider the following two fuzzy IF-Then rules: (4) In the example above, the negative rule (rule 1) guides the system away from situations to be avoided, after which, the positive rules (rule 2) take over and direct the process.Depending on the probability of such an association, marketing personnel can develop better planning of the shelf space in the store, or can base their discount strategies on correlations that can be found in the data itself.In some situations (Branson & Lilly, 1999;Branson & Lilly, 2001;Lilly, 2007), a combination of positive and negative rules can form a more efficient fuzzy system.One of the limitations of fuzzy IF-Then rules in Eq.( 4) is that the two classes (Coke, bottled water) appearing in the consequent parts of the above rules have the same degree of importance.Clearly, to help the marketing personnel develop better planning of different Based on these considerations, we propose a new adaptive fuzzy system that applies to the image classification problem (Thanh & Jonathan, 2008).The main advantage of this fuzzy model is that every fuzzy rule in its rule-base can describe more than one class.Moreover, it combines both positive and negative rules in its structure.This approach is expressed by: (5) Where, W rm , r=(1,2,…,R), m=(1,2,…,M) is the weight of each class belonging to the rule r.We use the rule weight of the form below: Where, parameters w rml , l=(0,1,…,N) are determined by the least squares estimator, which is discussed in detail, in the following section.R, M, K, and N denote the number of fuzzy rules, number of classes, number of patterns and dimension of patterns, respectively.Classes are denoted by C 1 ,C 2 ,…,C M , and the N-dimensional pattern is denoted by Consider a multiple-input, multiple-output (MIMO) fuzzy system in Eq.( 5), similar to Takagi-Sugeno fuzzy models (Takagi & Sugeno, 1985;Purwar et al., 2005).The m-th output of the MIMO with product inference, centroid defuzzifier and Bell membership functions is given by: (7) Where the normalization degree of activation of the r-th rule  ( ) The fuzzy set A rn (x nk ) and the corresponding rule weight W rm is discussed in detail in the following section.The output of the classifier is determined by the winner-take-all strategy shown in Eq.( 9), whereby "x k will belong to the class with the highest activation". (9) Rule : IF is and ... and is 1 1 Then is class with and ... and is class with 1 1 1

Structure of the proposed fuzzy system
So far, our discussion has focused on class estimation in Eq.( 9) to which class the pattern x k should be assigned.In this section, we suggest a new adaptive fuzzy system that can automatically adjust the values of fuzzy set A rn (x nk ) and rule weight W rm .After training the fuzzy system, we can determine which class the pattern x k should be assigned to.
The proposed structure consists of two visible layers (input and output layer) and three hidden layers as shown in Fig. 1.This fuzzy system can be expressed as a directed graph corresponding to Eq.( 7).Layer 1 (Input layer): each node in this layer only transmits input x nk , n=(1,2,…,N), k=(1,2,…,K) directly to the next layer.No computation is performed in this layer.There are a total of N nodes in this layer, where the output of each node is Layer 2: The number of nodes in this layer is equal to the number of fuzzy rules.Each node in this layer has N inputs from N nodes of the input layer, and feeds its output to the corresponding node of layer 3.One of the major disadvantages of Anfis (Jang et al., 1997) model is, that an explosion in the number of inference rules limits the number of possible inputs.Thus, grid partitioning is not advised when the input dimension is more than six (Nayak et al., 2004).To overcome this problem, a fuzzy scatter partition is used in this layer.Therefore, our system can work well, even when the dimension of pattern ( N ) is high.
x k =(x 1k ,x 2k ) We use the bell type distribution defined over an N-dimensional pattern x k for each node in this layer.The degree of activation of the r-th rule β r (x k ) with the antecedent part A r =(A r1 ,…,A rN ) is expressed as follows: (10) Where, parameters a rn , b rn , c rn , r=(1,2,…,R), n=(1,2,…,N) are constants that characterize the value of β r (x k ).The optimal values of these parameters are determined by training, which is discussed in the next section.There are R distribution nodes in this layer, where each node has 3xN parameters.The output of each node in this layer is O 2r = β r (x k ).Layer 3: This layer performs the normalization operation.The output of each node in this layer is represented by: (11) Layer 4: Each node of this layer represents the rule weight in Eq.( 6), W rm =w rm0 + w rm1 x 1k +…+ w rmN x Nk .Where, parameters w rml , r=(1,2,…,R) , m=(1,2,…,M), l=(0,1,…,N) are determined by least squares estimator, which is discussed in the next section.In the proposed model, for pattern x k , the output of the classifier is determined by the winner-take-all strategy.Therefore, when the rule weight W rm has a negative value, it will narrow the choices for class C m (the higher the negative value of W rm , the smaller the value of y km in Eq.( 13)).In other words, negative rule weight prescribes actions to be avoided rather than performed.The output of each node in this layer is: There are MxR nodes in this layer, where each node has (1+N) parameters.Layer 5 (Output layer): Each node in the output layer determines the value of y km in Eq.( 7).
(13) There are M nodes in the output layer.

Parameter Learning
The goal of the work presented here is perform the parameterized learning to minimize the sum-squared error with respect to the parameters Θ = [a rn , b rn , c rn , w rml ].The objective function E(Θ) for all the training data-sets is defined as: Where, y km is the output of class m obtained from Eq.( 7).For a training data pair, {x k ,y dk }, the input is x k =(x 1k ,x 2k ,…,x Nk ), k = (1,2,…,K), and the desired output y dk is of the form: (15) When the initial structure has been identified with N inputs, R rules and M classes, the fuzzy system then performs the parameter identification to tune the parameters of the existing structure.To minimize the sum-squared error E(Θ), a two-phased hybrid parameter learning algorithm (Jang et al., 1997;Wang et al., 1999;Wang & George Lee, 2002;Lee & Lin, 2004) is applied with a given network structure.In hybrid learning, each iteration is composed of a forward and backward pass.In the forward pass, after the input pattern is presented, we calculate the node outputs in the network layers.In this step, the parameters a rn , b rn , and c rn in layer 2 are fixed.The parameters w rml in layer 4 are identified by least squares estimator.In the backward pass, the error signal propagates from the output towards the input nodes.In this step, the w rml are fixed, and the error signals are propagated backward to update the a rn , b rn and c rn by steepest descent method.This process is repeated many times until the system converges.Next, optimization of the parameters w rml in layer 4 is performed using least-squares algorithm in the forward step.To minimize the error E(Θ) in Eq.( 14) , we have to minimize each output-error (m-th output): (16) When the training pattern x k is fed into the fuzzy system, Eq.( 13) can be written as: (17) For all training patterns, we have K equations of Eq.( 17).Thus, Eq.( 16) can be expressed: (1, 0, ..., 0) , 1 (0, 1, ..., 0) , 2 ( , , ..., ) 1 2 ... (0, 0, ..., 1) , Where, W m , Y m , and A are matrices of ((N+1)*R)x1, Kx1, and Kx((N+1)*R) respectively. ( Next, we apply linear least-squares algorithm (Jang et al., 1997) for each output (m-th output) to tune the parameters w rml . ( After the forward pass in the learning, error signals are propagated backward to update the premise parameters a rn , b rn and c rn by gradient decent with the error function E(Θ) in Eq.( 14).The learning rule is given by: Where, η is the learning rate.The formulae used to update the parameters a rn , b rn and c rn are given in the Appendix.

Simulation Results
In the first set of simulations, the proposed method is compared with Fuzzy C-Means (Hppner et al., 1999), K-Means algorithm (Dubes, 1993), Feedforward Backpropagation Network (Schalkoff & Robert, 1997;Russell et al., 2003) and Anfis methods (Jang, 1991;Jang, 1993;Russell et al., 1997).The performance of our classifier system is demonstrated for SAR Image and a natural image.
To test the effectiveness of our proposed method, in the next set of simulations, fuzzy system is used to detect the edges of the image when it is significantly degraded by high noise.The proposed system is compared with other edge-detection methods: Prewitt (Prewitt, 1970), Roberts (Roberts, 1965), LoG (Marr & Hildreth, 1980), Sobel (Sobel, 1970), and Canny (Canny, 1986).

SAR Image Classification
The JPL L-band polarimetric SAR image (size: 1024x900 pixels) of San Francisco Bay (Tzeng & Chen, 1998;Khan & Yang, 2005;Khan et al., 2007) as shown in Fig. 2(a) is used for this simulation.The goal is to train the fuzzy system to classify three different terrains in this image, namely water, park and urban areas.In this example, the proposed system was used to indicate three distinct classes (M=3), with 3 inputs corresponding to 3 polarimetric channels: hh, vv, and vh (Tan et al., 2007), 4 rules (R=4).The desired outputs for urban, park and water classes were chosen to be [0 0 1], [0 1 0], and [1 0 0], respectively.After training with the patterns, the system was used to classify the whole image.were used to train the Anfis system.Anfis system with 3 inputs and 8 rules was run for 100 training iterations.The desired outputs for urban, park and water classes were chosen to be 1, 2, and 3, respectively.Compared with the Anfis method, clearly, our classifier accuracy is higher and the effect of noise on the performance of the detector is much less.

Natural Image Classification
In this experiment, the proposed system is compared to other classification algorithms by testing them on natural image taken from the Berkeley Dataset (Berkeley Dataset, 2001), as shown in Fig. 4.   To train the proposed system, simple images (see Fig. 6) of size 128x128 pixels are utilized (Yksel, 2007).Fig. 6(a) shows the original image, where each square box of size 4x4 pixels has the same random luminance value.The input to the fuzzy system consists of the corrupted original image with 40% salt and pepper noise, as shown in Fig. 6(b).The target image shown in Fig. 6(c) is a black and white image, with black pixels indicating the locations of true edges in the input training image.Once trained, the model is tested by applying it to a set of natural images taken from the Berkeley Dataset (Berkeley Dataset, 2001) as shown in Fig. 7(a).Images are corrupted with 20% of "salt" (with value 1) and "pepper" (with value 0) noise with equal probability, as shown in Fig. 7(b).The proposed detector is then compared to the existing methods -Prewitt, Roberts, LoG, Sobel and Canny detector.It is not an easy task to select good threshold values for these methods.In this case, all these methods are executed using MATLAB and with default values for auxiliary parameters.It can be easily seen that most of the edge structures of the noisy image cannot be detected by Prewitt in Fig. 7(c), Roberts in Fig. 7(d), LoG in Fig. 7(e), Sobel in Fig. 7(f) and Canny in Fig. 7(g).Besides, the effect of noise is still clearly visible as real edges are significantly distorted by the noise, and many noise pixels are incorrectly detected as edges.Comparing the results with these operators, the proposed method's classification accuracy as shown in Fig. 7(h) is quite high, the effect of noise on the performance of the detector is much less, and the edges in the input images are successfully classified.These results indicate that the proposed system performs well when the even when image quality is significantly degraded by high noise.Error!Reference source not found.shows the edge images which have been detected by our proposed system with different percentages of salt and pepper noise as applied to various natural images.The proposed fuzzy model consists of 16 rules (R=16) and 250 training epochs.The 1-st, 2-nd and 4-th column show the original images, images corrupted by 10%, and 20% salt and pepper noise, respectively.The final edge images corresponding to these noisy images as detected by the proposed system have been shown in 3-rd and 5-th columns.It can be easily seen that the proposed fuzzy system is highly robust with respect to noise in the natural images.

Conclusions
In this chapter, we have introduced a fuzzy rule-based system that combines both positive and negative association rules in its structure.A major advantage of this system is that each rule can represent more than one class.Through experimental tests and comparisons with existing algorithms on a number of natural images, it is found that the proposed system is a powerful tool for image classification.

Acknowledgement
This research has been supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC).

APPENDIX:
We apply the gradient descent technique to modify the parameters a rn , b rn , and c rn .Parameter update formula for k-th data set of a rn is represented in Eq.( 24).Similarly, the update rule of c rn is derived in Eq.( 25).The update rule of b rn is derived in Eq.( 26

Fig. 2 .
Fig. 2. SAR Image Classification, (a): original Image, (b): training data with 3 classes The training patterns are shown enclosed in red boxes in Fig. 2(b).The proposed system was trained using these features to estimate the parameters.The algorithm was run with 100 training iterations.
Fig. 3(d) shows the classification results of the proposed method.A comparison of the proposed classifier with the K-Means classifier and Fuzzy C-Means classifier is shown in Fig. 3(a) and Fig. 3(b), respectively.These two methods were executed using MATLAB with the same 3 inputs (hh, vv, and vh), 3 outputs and default values for auxiliary parameters.As can be seen from Fig. 3, the classification accuracy of K-Means and Fuzzy C-Means methods was lower in water and park regions, as compared to the proposed method.Fig. 3(c) shows the simulation result of Anfis.In this example, the same training areas in red boxes as shown in Fig. 2(b)

Fig. 4 .Feedforward
Fig. 4. Natural Image Classification.Fig.5(a)shows the image corrupted by Gaussian noise (0 mean, 0.1 variance) that we want to segment into 3 classes (snow, wolf, and tree).This input image is scanned left-to-right by taking a square window of size 5x5 pixels around a centre pixel, which is then feed into the trained fuzzy system for classification into snow, wolf or tree.To train our proposed system, the training patterns are generated as shown by red boxes in Fig.5(a).For this experiment, we have chosen a fuzzy system with 25 inputs (corresponding to the 5x5 window), 8 rules (R=8) and 3 distinct classes (M=3) with the desired outputs for snow, wolf and tree classes as [0 0 1], [0 1 0], and [1 0 0], respectively.Fig.5(b)shows the clustering results of Fuzzy C-Means classifier with 25 inputs, and 3 outputs.The image shown in Fig.5(c) is the result obtained using Feedforward Backpropagation networks.In this example, the networks is established with the structure of 25-8-8-8-3, five layer network with 3 hidden layers, 8 neurons in each hidden layer and 3 neurons in the output layer.We use tansig for hidden layers and purelin for the output layer.Both Fuzzy C-Means and Feedforward Backpropagation networks in this example were executed using

Fig. 6 .
Fig. 6.Edge detection training data, (a) Original image, (b) Corrupted original image with 40% salt and pepper noise, (c) Target image.In principle, edge detection is a two-class image classification problem where each pixel in the image is classified as either a part of the background or an edge.For this reason, a fuzzy system consisting of 2 output nodes corresponding to the 2 classes (edge, background) is chosen.In this experiment, a window of size 3x3 is scanned left-to-right across an image taken from the training set, and a determination is made as to whether the centre pixel

Fig. 8 .
Fig. 8.The edge images which have been detected by proposed system with difference salt and pepper noise of difference natural images.