Confocal Laser Scanning Microscopy in Dermatology: Manual and Automated Diagnosis of Skin Tumours

The number of melanocytic tumours has increased in the last decades, whereby the frequency of melanoma doubles every 20 years. At present there is a risk of 1:100 to falling sick with a melanoma. According to a WHO report, in the year 2006, about 48,000 melanoma related deaths occur worldwide per year (Lucas et al., 2006). The malignant melanoma is one of the less common types of skin cancer but causes the majority (75%) of skin cancer related deaths. Most melanomas are brown to black looking lesions (Friedman et al., 1985). Warning signs that might indicate a malignant melanoma include change in size, shape and color. Early signs of melanoma are changes to the shape or color of existing moles (ABCD rule). Skin cancer has many potential causes, including: overexposure to UV-radiation (extreme sun exposure during sun-bathing), which may cause skin cancer either via the direct DNA damage or via the indirect DNA damage mechanism; smoking tobacco can double the risk of skin cancer; chronic non-healing wounds; genetic predisposition and the human papilloma virus (HPV), which is often associated with squamous cell carcinoma of the genitals, anus, mouth and fingers (Oliveria et al., 2006) The detection of malignant changes of skin tissue in the early stages will augment the success of the therapy (Marcovic et al., 2009). Metastasis of the melanoma in a progressive stage may spread out to the lymph nodes or even more distant places like: lungs; brain; bone and liver. Such metastatic melanoma may cause general symptoms like: fatigue; vomiting and loss of appetite. The greatest chance of cure is in the early surgical resection of thin melanomas. Therefore, a periodical screening of risk persons is necessary. In the fight against skin cancer, researchers have high hopes in improved and fast provisional screening methods for the clinical routine. Confocal laser scanning microscopy (CLSM) is a novel imaging device enabling the noninvasive examination of skin lesions in real-time (Rajadhyaksha et al., 1999). Therefore, CLSM is very suitable for routine screening and early recognition of skin tumours. The CLSM technique allows the viewing of micro-anatomic structures and individual cells. In contrast to the conventional examination, where suspicious skin tumours have to be excised, embedded in paraffin and stained, this method is much more agreeable for the patient and faster. However, training and experience is necessary for a successful and accurate diagnosis in this new and powerful imaging technique. To diminish the need for training and to improve diagnostic accuracy, computer aided diagnostic systems are required by the derma pathologists.


Introduction
The number of melanocytic tumours has increased in the last decades, whereby the frequency of melanoma doubles every 20 years. At present there is a risk of 1:100 to falling sick with a melanoma. According to a WHO report, in the year 2006, about 48,000 melanoma related deaths occur worldwide per year (Lucas et al., 2006). The malignant melanoma is one of the less common types of skin cancer but causes the majority (75%) of skin cancer related deaths. Most melanomas are brown to black looking lesions . Warning signs that might indicate a malignant melanoma include change in size, shape and color. Early signs of melanoma are changes to the shape or color of existing moles (ABCD rule). Skin cancer has many potential causes, including: overexposure to UV-radiation (extreme sun exposure during sun-bathing), which may cause skin cancer either via the direct DNA damage or via the indirect DNA damage mechanism; smoking tobacco can double the risk of skin cancer; chronic non-healing wounds; genetic predisposition and the human papilloma virus (HPV), which is often associated with squamous cell carcinoma of the genitals, anus, mouth and fingers (Oliveria et al., 2006) The detection of malignant changes of skin tissue in the early stages will augment the success of the therapy (Marcovic et al., 2009). Metastasis of the melanoma in a progressive stage may spread out to the lymph nodes or even more distant places like: lungs; brain; bone and liver. Such metastatic melanoma may cause general symptoms like: fatigue; vomiting and loss of appetite. The greatest chance of cure is in the early surgical resection of thin melanomas. Therefore, a periodical screening of risk persons is necessary. In the fight against skin cancer, researchers have high hopes in improved and fast provisional screening methods for the clinical routine. Confocal laser scanning microscopy (CLSM) is a novel imaging device enabling the noninvasive examination of skin lesions in real-time (Rajadhyaksha et al., 1999). Therefore, CLSM is very suitable for routine screening and early recognition of skin tumours. The CLSM technique allows the viewing of micro-anatomic structures and individual cells. In contrast to the conventional examination, where suspicious skin tumours have to be excised, embedded in paraffin and stained, this method is much more agreeable for the patient and faster. However, training and experience is necessary for a successful and accurate diagnosis in this new and powerful imaging technique. To diminish the need for training and to improve diagnostic accuracy, computer aided diagnostic systems are required by the derma pathologists.
Computer aided diagnosis means that the harmless (nevi) and malignant cases are discriminated by automatic analysis on a computer, providing optimised preventive medical checkups and accurate and reliable detection of skin tumours. Automated diagnostic systems need no input by the clinician but rather report a likely diagnosis based on computer algorithms. A main task in automated image analysis is the selection of appropriate features for a "computer friendly" description of the tissue. The choice and development of the features is driven by the diagnostic guidelines of the derma pathologists. In the diagnosis of CLSM views of skin lesions, architectural structures at different scales play a crucial role. The images of benign common nevi show pronounced architectural structures, such as: arrangements of nevi cells around basal structures and tumour cell nests. The images of malign melanoma show melanoma cells and connective tissue with few or no architectural structures. Features based on the wavelet transform have been shown to be particularly suitable for the automatic analysis of CLSM images because they enable an exploration of images at different scales (Wiltgen et al., 2008). A further task in automated analysis is the choice of the machine learning algorithm for classification, which enables it, after a previous training, to predict the class of a lesion (nevi or malignant melanoma). For medical diagnosis, the algorithm should duplicate the automated diagnostic process by making it understandable for the human interpreter. By the CART (Classification and Regression Tree) algorithm, the inferring rules are automatically generated during the training of the algorithm. The generated rules have a syntax that is understandable for the human interpreter and they can be discussed and explicitly used as diagnostic rules. The classification results are relocated to the images by use of the inferring rules as diagnostic aid. The regions enabling a high discrimination power are highlighted in the images, showing tissue with features in good accordance with typical diagnostic CLSM features. In this paper, we introduce the basic principles of automated diagnosis of CLSM images of skin lesions. Special attention is given to the wavelet transform for the description of the tissues in a way that conforms with the guidelines of the human interpreter. Further, the machine learning algorithm for the class prediction and its diagnostic rules is discussed in some detail. The application and performance of the discussed methods is demonstrated by a selected study. The procedure for image analysis presented in this paper, was developed with the "Interactive Data Language" software tool IDL, which is a computing environment for image analysis, data visualization, and software application development (IDL 7.1, ITT Visual Information Solutions (ITT VIS), formerly known as Research Systems Inc. (RSI), http://www.ittvis.com/). IDL belongs to the Fourth Generation Languages (4GL) and includes image processing procedures and tools for rapid prototyping and rapid application development. All software runs on a PC under Windows and supports the development of applications with graphical user interface (GUI). The CART (Classification and Regression Trees) analysis was done with the software from Salford Systems, San Diego, USA.

Confocal laser scanning microscopy
The principle of confocal microscopy was developed by Marvin Minsky in 1957. After the development of lasers, confocal laser scanning microscopy became a standard technique toward the end of the 1980s (Patel et al., 2007). Confocal laser scanning microscopy (CLSM) is a technique for obtaining high-resolution optical images with depth selectivity (Paoli et al., 2009). This technique enables the acquisition of in-focus images from selected depths (Pellacani et al., 2008). Therefore, for non-opaque specimens, such as biological tissue, interior structures can be imaged. The images are acquired by a point-by-point scanning and reconstructed using a computer. For interior imaging, the quality of the image is greatly enhanced over conventional microscopy because the image information from multiple depths in the specimen is not superimposed. In a confocal laser scanning microscope, a laser beam passes trough a pinhole and is directed by a beam splitter (dichroic mirror) to the objective lens where it is focused into a small focal volume at a layer within the biological specimen (Fig. 1). The scattered and reflected laser light from the illuminated spot is then re-collected by the objective lens. The beam splitter separates it from the incident light and deflects it to the detector. After passing through a further pinhole, the light intensity is detected by the photon detection device (usually a photomultiplier tube or avalanche photodiode), transforming the intensity of the reflected light signal into an electrical one that is recorded by a computer. The depth of the layer (its vertical position) inside the specimen is controlled by the pinhole at the laser source. Each layer shows horizontal sections of the lesions. The electric signal, which is obtained from the intensity of the reflected light out of the illuminated volume element at a layer within the specimen, represents one pixel in the resulting image (Fig. 1). As the laser scans over the plane of interest, a whole image is obtained pixel-by-pixel and line-by-line. The brightness of a resulting image pixel corresponds to the relative intensity of the reflected light. The contrast in the images results from variations in the refractive index of microstructures within the biological specimen. In this paper, the confocal laser scanning microscopy is performed with a near-infrared reflectance convocal microscope (Vivascope 1000, Lucid Inc., Rochester, NY, USA, http://www.lucid-tech.com/). The microscope uses a diode laser at 830 nm wavelength and a power of <35mW at tissue level. A x30 water-immersion objective lens with a numerical aperture of 0.9 is used with water (refractive index 1.33) as an immersion medium. The spatial resolution is 0.5-1.0 μ m in the lateral and 3-5 μ m in the axial www.intechopen.com dimension. The images contain a field-of-view of 0.5x0.5 mm on the skin tissue providing insights into cellular structures. Up to 16 layers per lesion can be scanned. Usually an examination depth of 350 μ m can be reached (then the reflected light is absorbed by the preceding layers). All images, stored in BMP file format, are monochrome images with a spatial resolution of 640x480 pixels and a gray level resolution of 8 bits per pixel. (For the image analysis, the image matrix is rescaled to 512x512 pixels). The images are taken from the centre of the tumours representing the dermo-epidermal junction.

Automated image analysis
Prior to the automated image analysis, the CLSM views of the lesions are dissected into square elements. The size of the square elements ranges from 128x128 pixels to 512x512 pixels (whole image). This enables the analysis of different parts of the images which are of interest for the evaluation of the diagnostic relevant regions in the CLSM views. For an automated, or computer aided, diagnosis of CLSM views of skin lesions, the following steps are necessary (Wiltgen et al., 2003). First, the images must be appropriately analysed to enable a formulation of suitable features (Fig. 2).

Fig. 2. Steps in the image analysis system
The formulation of appropriate features follows the derma pathologists´ diagnostic guidelines for CLSM views. Second, the images must be categorized according to their features, enabling a class prediction for every CLSM view of skin lesions. For medical purposes, it is important that the knowledge acquired during the training phase of the machine learning algorithm is represented in an understandable and readable form. Knowledge representation by rules, with a syntax that permits their use as diagnostic rules, is suitable. Third, the class prediction performance is evaluated. To this purpose, the single square elements are superimposed (relocated) onto the corresponding images by use of the diagnostic rules generated by the machine learning algorithm. The categorized square elements are highlighted in the images, enabling an identification of diagnostically highly www.intechopen.com relevant regions and an evaluation of typical diagnostic CLSM features by comparing them with the diagnostic decisions by the human observer.

Diagnosis of CLSM images
CLSM is a non-invasively method which takes 3-5 minutes per lesion whereby the microscope optical unit is positioned on the patient's skin (Fig. 3). An important step for a successful automated image analysis of histological tissue is the choice of the appropriate texture features. The selection of the features can be made according to the diagnostic guidelines used by the derma pathologist. For the diagnosis of the corresponding CLSM views, architectural structures such as: micro-anatomic structures; cell nests etc. play an important role in the diagnosis (Fig. 4). Melanocytic cytomorphology and architecture and keratinocyte cell borders are taken into account for diagnostic decisions by the derma pathologist (Pellacani et al., 2008). Due to the high refraction of melanin, basal keratinocytes appear very intensive. The images of benign common nevi show, beside the nevi cells, pronounced architectural structures, whereas images of malign melanoma show melanoma cells and connective tissue with little or no architectural structures (Fig. 4). Therefore the information at different scales (from coarse structures to details) plays a crucial role in the diagnosis of CLSM images of skin lesions (Scope et al., 2007). This can be compared with the opinion of a human observer at different distances from an image. When the observer is close enough to the image he can study details. The greater the distance between the observer and the image, the better he can study the image as whole without being overwhelmed by the details (the details are smoothed out). In other words, he studies the images at different scales. This procedure is mathematically reflected by the wavelet transform. Therefore, features based on the properties of the wavelet transform enable an exploration of architectural structures, of different sizes, at different spatial scales. This makes the wavelet transform suitable for the automatic analysis of CLSM views of skin lesions (Wiltgen et al., 2008). The diagnostic guide lines are reflected by different groups and properties of wavelet coefficients.

Wavelet analysis and wavelet transform
A wavelet is a fast-decaying wave-like oscillation that starts with an amplitude of a given (non-zero) amount and decreases rapidly to zero. This means that a wavelet is zero-valued outside of a defined interval. It can typically be visualized as a "brief oscillation".

Fig. 4. CLSM images of common benign nevi (left) and malignant melanoma (right)
The wavelet transform is the representation of a given function by wavelets (Press et al., 1992). The basis wavelets are scaled and translated copies ("daughter wavelets") of the finitelength waveform ("mother wavelet"). The goal of the transformation is to represent data in another way, one which is more suitable for the analysis and interpretation (Prasad et al., 1997). Similar to the Fourier transformation, the wavelet transformation checks a given function for conformities with the basis functions and calculates the transformation from it. But in contrast to the Fourier transformation, where the basis functions (sine and cosines) are non-local in the spatial domain (and stretch out to infinity) and global properties of the analysed function are localized in prominent peaks of the frequency spectrum, the wavelet basis functions are localized in both: the spatial and the frequency domain. Wavelet transforms have advantages over the traditional Fourier transforms for accurate decomposition and reconstructing of finite, non-periodic functions. This is a crucial advantage for the analysis of the local properties of a function or a texture (Chui, 1992). By use of the wavelet basis functions with increasing spatial extension, the function or texture can be analysed at different scales showing detail (local) to coarse (global) properties (Burrus et al., 1988). This feature gives the wavelet transform a multi resolution property, enabling the study of functions and textures at varying resolutions (Daubechies, 1992). The wavelet transforms can be divided into three classes: continuous transforms (CWT), discrete transforms (DWT) and multi-resolution-analysis (MRA) based transforms. The continuous wavelet transforms operate over every possible scale and translation whereas the discrete wavelet transforms use a discrete subset of scale and translation values. Whereas the CWT´s are mainly used in the mathematically analysis, the DWT´s have many practical applications, such as image processing. The MRA represents a design method of most of the relevant discrete wavelet transforms.

The continuous wavelet transforms
The wavelet transform (integral transform) is a linear and often invertible operation that defines a relationship between the spatial and the frequency domain. If the wavelet transform is invertible then, beside the analysis of a function, the synthesis of the function is also possible. The relationship between the spatial and the frequency domain is done by the wavelet function ("mother wavelet") Φ . The wavelet function provides the source function for the generation of the wavelets ("daughter wavelets"), which are the translated and scaled versions of the wavelet function: The scale index s defines the width of the wavelet and the location index l its position. In this way, wavelets are able to localize behaviour in space (via translation) and at a characteristic scale (via dilation/contraction). The "wavelet family" is then given by: There exists a variety of different types of wavelet functions; such the Hermitian wavelet, the Shanon wavelet, the Morlet wavelet, the Mexican-hat wavelet, the Beta wavelets etc. The different wavelet types have different properties. For illustration, the Mexican-hat wavelet is presented ( ∩ . This is the space of measurable functions that are absolutely integrable ( 1 () L ) and squareintegrable ( 2 () L ): The functions in this space guaranty the admissibility requirement and the square norm. In principle, every finite-length function or fast-decaying oscillating waveform can be used as a wavelet function. But if the wavelet transform is required to be invertible, not every function can be used. For the wavelet transform to be invertible, the wavelet must satisfy the admissibility criterion: To determine the admissibility criterion for a given wavelet () x Φ , first its Fourier transform ˆ( ) ω Φ must be calculated. If a wavelet function fulfils the admissibility criterion, it is called an admissible wavelet. Because the integrand in the admissibility criterion has a null in the denominator for 0 ω = , the Fourier transform ˆ( 0) Φ must be zero. Then the admissibility criterion can be simplified to: Thereby the admissibility criterion can be reconsidered with the original wavelet () x Φ . By the integral wavelet transform, the wavelet functions, with given translation and scaling values, are compared successively with different sections of the function (Fig. 6). The wavelet transform coefficients are determined by scanning the function with the wavelets at different scales. The coefficients describe the conformity of the function at every position along the function with the analyzing wavelets. By analyzing a function with differently scaled wavelets, the degree of conformity is determined. A small wavelet (lower scale) at a certain position is well localized and delivers detailed information at this part of the function. An expanded wavelet (larger scale) has a lower spatial resolution and delivers coarse (details are smoothed out) information of the function. By the translation of the wavelets, the function is then analyzed at every position (Fig. 6). The wavelet transform coefficients reflect the degree of correspondence between the wavelet (at a certain position and scale) and the function. The product of a wavelet with a function section gives a new function and the enclosed area (by integrating out) is the corresponding wavelet coefficient. By use of wavelets with different widths, the function is analyzed at different scales (spatial resolution). With the wavelet transform coefficients together with the corresponding wavelets, the original function () f x can be recovered by the inverse continuous wavelet transform: satisfying the condition: The function is synthesized by the superposition of the analyzing wavelets, weighted by their coefficients.

The discrete wavelet transforms
The discrete wavelet transform is given by extracting a discrete subset of the set of wavelet functions in the continuous wavelet transform. In other words: only wavelet functions with discrete scaling and translation parameters are used. That means the wavelet functions are still continuous, but they exist only at discrete positions with discrete scales (Daubechies, 1988). The basis wavelet functions are translated and dilated by whole numbers m In spit of the reduction to a discrete subset, it can be shown that the information remains completely preserved. For many applications of the discrete wavelet transform, orthonormal basis wavelets are used.
is called an orthonormal wavelet if it can be used to define a Hilbert basis (an orthonormal and complete basis) for the Hilbert space 2 () L of square integrable functions. The Hilbert basis is constructed as the family of basis functions: Thereby, nm δ is the Kronecker delta and , ′ Φ Φ is the standard inner product on 2 () L : The requirement of completeness is that every given function 2 () fL ∈ can be decomposed into a linear combination of the basis wavelet functions weighted by wavelet coefficients: nm nm nm The convergence of the series is understood to be convergence in norm. Such a representation of a function () f x is known as a wavelet series and the basis functions are the building stones of the function (an orthonormal wavelet is self-dual Any function 2 () fL ∈ is therefore characterized by the set of its discrete wavelet mn fm n + Φ∈ × and it is possible to recover any function from its discrete wavelet coefficients in a numerically stable procedure. (Numerically stable means that small perturbations in the wavelet coefficients correspond to small perturbations of the function). for 2 () fL ∈ , the following condition holds: Thereby, C is a given constant value. Then, the discrete wavelet coefficients belong to: (the space of square integrable functions) into 2 () l + × (the space of square-summable, infinite sequences of numbers). For the wavelet coefficients, the following conditions are valid: Then, numerical stability is defined by: given two functions 2 () fL ∈ and 2 () gL ∈ , the characterization of the functions by their wavelet coefficients in 22 () l is stable if: whenever two sequences of wavelet coefficients in 22 () l are close, the corresponding functions are close in 2 () L . As in the case of continuous wavelets, there also exists a variety of discrete wavelets; the Coiflet wavelets, the Legendre wavelets, the Haar wavelets, etc. For illustration, the Haar wavelet is presented (Fig. 7). It is now recognized as the first known wavelet and proposed 1909 by A. Haar (Alfréd Haar was a Hungarian mathematician). The Haar wavelet is also the simplest possible wavelet. The Haar wavelets give an example of a countable orthonormal system for the space of square-integrable functions on the real line. A large subclass of wavelets arises from special structures imposed on 2 () L , known as multi-resolution-analysis (MRA). MRA is a framework for understanding and constructing wavelet bases, which generates discrete wavelet families of dilations and translations that are orthonormal bases in 2 () L .

Multi-resolution analysis
Multi-resolution analysis (MRA) is a design method to generate a lot of practically relevant discrete wavelet transforms (DWT) and is the mathematical basis of the fast wavelet transform (FWT). The MRA satisfies certain self-similarity relations in spatial and the scale domain as well as completeness and regularity relations (Daubechies & Lagarias, 1991). The MRA describes the approximation properties of the discrete wavelet transform. The generated wavelet transformation is iterative and the analyzed function is split in successive "smoother" versions, which contain by progressive iterations successively poorer information. Of special importance in the multi-resolution analysis are the so called two scale equations for a scaling function and a wavelet function.
The generating functions φ are known as scaling functions or "father wavelet". The structure of the MRA allows several conclusions for the construction of wavelet bases for practical applications. The orthonormal basis of each subspace, their scaling properties and resolutions follow from the MRA definition. The scaling property (c) implies that each subspace j V is a scaled version of the central subspace 0 V (Fig. 7). Together with the invariance under translation (b) and the existence of a scaling function (d) it implies that is an orthonormal Hilbert basis of the subspace j V : ,, , Then, the sequence of scaling subspaces can be defined by spaces that are spanned off by a proprietary orthonormal basis system: Vs p a n k φ =∈ Further, the scaling property (c) implies that the resolution of the l-the subspace l V is higher than the resolution of the k-the subspace k V ( : To fulfil filter properties, several conditions must be imposed on the coefficients of the scaling sequence. To demonstrate this, first the Fourier transform φ of the scaling function φ is calculated: With: , the resulting equation can be formulated as follows: The requirements for a low-pass filter are that the Fourier series must have the value 1 at the zero point 0 ω = (ˆ(0) 1 a = ) and the value 0 at ω π = (ˆ() 0 a π = ). From these requirements follows immediately the following restrictions for the coefficients of the scaling sequence: One task of the wavelet design is to impose several conditions on the coefficients k a in order to obtain the desired properties of the scaling function φ . For example: if φ is required to be orthogonal to all dilations of itself, the coefficients of the scaling sequence must fulfil the following conditions: The requirement that the scaling sequence is orthogonal to any shifts of it by an even number of coefficients is the necessary condition for the orthogonality of the wavelets. In terms of the Fourier transform the orthogonality is given by the relation: The trigonometric polynomial ˆ() a ω (filter function or transfer function) plays an important role in wavelet theory. The "mother wavelet" is defined by a similar two scale equation as follows: Where the sequence of coefficients, called a wavelet sequence or wavelet mask, is given by: Whereby, K ∈ is an arbitrary odd number. The corresponding wavelet subspaces i W are spanned up by: The space 01 WV ⊂ is defined as the linear hull of the mother wavelets integer shifts. 0 W is the orthogonal complement to 0 V inside 1 V (that means: 1 V is the orthogonal sum of 0 W and 0 V ). By successive application of the orthogonal sum, the orthogonal dissection of the scaling spaces is obtained: (Explicitly, this formal notation is given by: . This is illustrated for the Haar wavelet in Figure 7. By self-similarity, there exist scaled versions k W of 0 W and by completeness one has: This is the basic analytical requirement for an MRA. Starting from the two scale equation, respectively the equation for the "mother wavelet", the multi-resolution decomposition of 2 () fL ∈ follows by calculating the wavelet coefficients.

Multi-resolution wavelet decomposition of functions
The next task is to compute the wavelet coefficients , lk f Φ o f t h e d i s c r e t e w a v e l e t transform. Starting point is the two scale equation for the "mother wavelet" (formula 27). Then the "daughter wavelets" , j k Φ are generated from the "mother wavelet" Φ and inserted into the equation ( The expression can be rewritten as: The connection between both procedures is illustrated by the following: Thus, successively coarser approximations of the function 2 () fL ∈ are computed along with the difference in information between successive levels of approximation. The wavelet decomposition can be considered as an orthonormal basis transformation: Then it results for the projections: The wavelet decomposition of a function f means that f is successively projected onto the The j d are the coefficients of the projection j g onto the subspace j W . j d is the sequence of wavelet coefficients representing the difference in information between two consecutive www.intechopen.com 0 s is the coefficient sequence of the function 0 f (projection in 0 V ). The procedures discussed above define wavelets abstractly by their properties. They enable a systematical construction of wavelet bases with certain desired properties. The abstract formulation can be illustrated by a wavelet decomposition of a function with filter cascades. This enables also the implementation of the fast wavelet transform. The fast wavelet transform is an algorithm, which implements the discrete wavelet transform by use of the multi scale analysis. By this procedure, the calculation of the inner product of the function with every basis wavelet is replaced by a successive dissection of the frequency bands. This is realized by a sequence of filters. The next step is to determine the scaling sequence { } k a and the wavelet sequence { } k b .

Construction of wavelets
The Z-transform enables it to express the decomposition procedure in a form that is more suitable for the discussion in terms of sub-band filtering. The Z-transform converts a discrete function (sequence of numbers), into a special complex representation. Given a discrete sequence { } n an ∈ , the Z-transform is defined by: , where A is the magnitude of Z, and ϕ is the complex argument). In terms of the Z-transform, the transfer function ˆ() a ω is expressed as: Then the conditions for the coefficients of the scaling sequence to fulfil a low-pass filtering are (formula 24): (1) 2 a = and (1 ) 0 a − = We demonstrate the construction of orthogonal wavelets on hand of the Daubechies wavelets (named after Ingrid Daubechies, a Belgian physicist and mathematician).
In other words, the scaling sequence { } k a and in consequence the wavelet sequence { } k b are determined (Fig. 8). The Daubechies wavelet transform can be easy implemented for practical purposes by use of the fast wavelet transform. Ingrid Daubechies assumes that ˆ() a ω has an A-fold zero at the values ω π = ± . Therefore she formulated the following expression for ˆ() a ω : www.intechopen.com . The Daubechies wavelets are characterized by a maximal number of vanishing moments (A) for some given support. The maximal potency A, enabling ( ) to be a factor of () aZ is called the polynomial approximation degree. This degree reflects the ability of the scaling function φ to represent polynomial until the degree A-1 as a linear combination of translations (by whole numbers) of the scaling function. Then, by use of the Z-transform, the orthogonality condition (formula 25) can be written as: The index number of a Daubechies DN wavelet refers to the number N of coefficients. Each wavelet has a number of vanishing moments (zero moments) equal to half the number of coefficients. (N=2A). In the case of Daubechies 4 (A=2), the polynomial p(Z) is linear and the following function can be used: By use of the factorization of the scaling sequence, the orthogonality condition can be formulated as Laurent polynomial: Whereby u is defined as: The wavelet coefficients k b follow from the scaling coefficients k a by use of (1 ) k kK k ba − =− (formula 28). In the case of Daubechies 4, the index K has the value 3. In the next chapter, the Daubechies 4 scaling and wavelet coefficients are used as filter coefficients to realize the wavelet transform as filter operations.

The relation between the discrete wavelet transform and digital filters
By use of discrete (digitized) data, the function is represented by a data vector with N components: N is the dimension of the data vector or the number of the discrete data points. There exists a relation between the wavelet theory and the digital filter theory (Strang & Nguyen, 1996). Like the wavelet transform, digital filters extract details (high-pass filter) or smooth out details (low-pass filter). This correspondence between wavelets and digital filters enables the realization of a wavelet transform operation without an explicit formulation of basis The discrete wavelet transform can therefore be implemented and performed as convolution of the data vector with a filter bank. To realize a discrete wavelet transform by a filter operation, first a transformation matrix with appropriate filter properties is defined. In terms of the matrix operation, the scaling and wavelet functions apply to matrix rows (row basis). The transformation matrix consists of a set of filter coefficients that define special filter characteristics along the matrix rows. The matrix structure is determined by these filter coefficients, which are ordered using two dominant patterns: one works as a smoothing filter, the other works as a sharpening filter to bring out the data detail information. The associated wavelets are represented by the filter coefficients. Alternating, the even rows act as a high-pass filter and the odd rows act as a low-pass filter. The filter coefficients are associated with the corresponding "mother wavelet" and the scaling function, which determine the number of filter coefficients. The filter matrix for Daubechies 4 is defined as:  0123 ,,, bbbb define the wavelet function (high-pass filter). Therefore, a data vector is decomposed into parts containing "smooth" information and parts with "detail" information. The filter coefficients are related according formula (28) which the low-passed (smooth) and the high-passed (detail) parts serve as input for the next level (Fig. 9). This is called a pyramidal algorithm and the multi-resolution analysis is the theoretical foundation of the filter bank. This is an iterative procedure where the resulting output data vectors are used successively as input vectors (i) for the matrix operation in the next level (i+1): The dimension of the transformation matrix is defined by the length of the input data vector. It should be noticed that by the first iteration, the data vector, with the analyzing data, is used as the input vector and the output vector contains the smooth and detail parts of the data components.

The 1-dimensional and 2-dimensional discrete wavelet transform
The matrix is applied to the data and produces alternating "smooth" (even rows) and "detail" (odd rows) components: The resulting output data vector is subsequently arranged into a part with "detailed" components and a part with "smooth" components ( Fig. 9). Then the matrix is applied on the "smooth" part (halved vector, down sampling) of the output vector, then again on the "smooth-smooth" part of the resulting new output vector and so on until a trivial number of "smooth-…-smooth" components remain. The "detailed" parts obtained always remain conserved at the next operation levels. During the iterative procedure, the "smooth" parts of the resulting output vectors are successively filtered and their components rearranged in smooth ( ( ) i sn) and detail ( ( ) i dn) parts in each operation level. The output of the wavelet transform consists of the remaining "smooth-…smooth" components and all the accumulated "detailed" components. The application of the filter bank to the smooth part, successively decimated by factor 2, corresponds to a successive stretching of the scaling function. In other words: the data vector is analyzed at successively larger scales (from high resolution to low resolution). At every operation level, the input data vectors are reduced to the half resolution (sub sampling). By the halving of the resolution at every level, the respective output vector always has the same number of components (wavelet coefficients) as the respective input vector. There is no loss of information during the procedure. The 2-dimensional wavelet transforms works similarly to the 1-dimensional case. In the first step (or level), the rows of the input image are filtered by a high-pass filter and independently by a low-pass filter (Fig. 10). From both operations two images (sub-bands) result: one shows details and the other is smoothed out. Every second column is then Fig. 9. The 1-dimensional wavelet transformation as filter operation www.intechopen.com Fig. 10. The 2-dimensional wavelet transformation as filter operation removed in both sub-bands, which corresponds to a sub sampling (half resolution). Subsequently the columns of both sub-bands are high-pass filtered and low-pass filtered. Again two sub-bands result from each of the two input sub-bands. A total of four subbands, which differ by the kind of filtering, are obtained in the first step. At the end of the first step, every second row is removed from each of the four sub-bands. Only the double low-passed sub-band is used in the next step, the other three sub-bands remain conserved for the next operation steps. During the next steps, the same procedure is used again and again. At each step the double high-passed sub-band, which results from the previous step, is used as an input sub-band. At every step, the resulting sub-bands are reduced to the half resolution. The sub-bands with higher spatial resolution contain the detail information (high frequencies) whereas the sub-bands with the low resolution represent the large scale coarse information (low frequencies). After the dissection of the quadratic sub-bands, they are usually arranged in a quadratic configuration where the sub-bands of the first level fill 3/4 of the square; the sub-bands of the second level fill 3/16 of the square; etc. (Fig. 11). As in the case of the 2-dimensional function, for the 2-dimensional image array, the wavelet transform is computed along each dimension, e.g.: the array is first filtered on its first index (row), then on its second (column) and so on. The multi resolution analysis takes scale information into consideration and successively decomposes the original image into approximations (smooth parts) and details. That means, by the wavelet transformation, the two dimensional image array is split up into several frequency bands (containing various numbers of wavelet coefficients), which represent information at different scales. At each scale the original image is approximated with more or fewer details. The output of the last low-pass filtering is the mean gray level of the image. The frequency bands, representing information at large scale, are labelled with low indices and the frequency bands representing successively decreasing scales are labelled with higher indices (Fig. 12). Then the architectural structure information is accumulated along the energy bands (from course to fine), enabling the analysis of a given texture by its frequency components: where the subset of coefficients ( ((, ) ) i ckl ) are the coefficients contained in the i-th frequency band. At increasing frequency bands the detail structures are smoothed out and large architectural structures of the texture are analysed. At decreasing frequency bands, successively finer structures and details are registered. In wavelet texture analysis, the features are mostly derived from statistical properties of the wavelet coefficients. For the texture analysis, we choose the Daubechies 4 wavelets transform, because this has good localization properties in space and is computationally efficient (fast wavelet transform).

Wavelet texture analysis and texture features
After the wavelet transformation of the CLSM images, it is necessary to define texture features which reflect and describe the tissue properties. These features must be defined so that they allow a unique distinction between common benign nevi and malign melanoma tissues in the square elements. The texture features are based on the variations of the wavelet coefficients within the frequency bands and the distribution of the energy of the frequency bands in the power spectrum. As features, the standard deviations of the coefficients inside the frequency bands and the energy and entropy of the different frequency bands are used. The standard deviations of the coefficients in the frequency bands are calculated by:  (Fig. 12). In total, www.intechopen.com 39 different features are calculated, representing large scale and low scale information. The features are calculated for 16 frequency bands (labelled from 0 to 15). The mean value is calculated from the 4 first frequency bands, therefore 13 values result for each of the 3 features (Fig. 12). The highest frequency bands contain only information about very fine grey level variations, such as noise, and are therefore not considered for the image analysis. The single square elements are represented by a feature vector: The index n refers to the n-th square element. The next step is the splitting of the square elements, on hand of the features, in square elements with common benign nevi tissue and square elements with malign melanoma tissue.

Classification and class prediction
By the classification procedure, the inhomogeneous set of square elements (referring to the feature values) is split into homogeneous sub-sets, which are assigned to one of the two classes (common benign nevi or malignant melanoma). A homogeneous subset means that it contains only square elements with similar feature values. In the following, we will name the square elements as instances. (The input to a machine learning scheme is, in its general form, a set of instances. The instances are the things to be classified). In the context of machine learning the features are generally called attributes. The thing to be learned, in our case the discrimination of lesion tissues, is called the concept. We will use this terminology in the following. Classification is done by machine learning algorithms (Witten & Frank, 2005). After a training phase, these algorithms enable the automated prediction of the class of a new instance. Machine learning can be defined operationally as: machine learning algorithms learn when they change their behaviour in a way that makes them perform better in the future. In other words, the algorithm learns on hand of a training set how to assign the instances to given classes. Then, in future, the algorithm can apply the gained knowledge to predict the class of unknown instances. The task of machine learning algorithms is in general the search of patterns in big amounts of data, known as data mining (Van Rijsbergen, 1979). The set of square elements is used as instances: To simplify the expressions, the features (attributes) in the feature vector are rewritten as: Here () l i a is the i-th attribute of the l-th instance (L is the number of attributes). As an example, the attribute () 0 n a refers to () () n STD Fi . The set of square elements is split into more or less homogeneous subsets, which are assigned to classes. The partition of the set of instances into g classes is then given by: For every class l C the following conditions should be satisfied: The first condition (i) means that every instance is assigned to a specific class. All the instances are classified, and then the union of all classes equals the set of instances. Or in other words, all the instances are assigned to a class, no one is left over. The second condition (ii) means that no two different classes contain the same instances (the classes are disjunct). Or in other words, no instance belongs to two or more classes. If the attributes have numerical values (as in our case) then for an optimal partition, the Euclidian distance of every feature vector of a class k to the class mean value k x is less or equal to the mean values of every other class (j=1,…,g): Whereby the mean value of a class k containing K instances is () is the Euclidian distance. The first step for a successful use of machine learning algorithms is the representation of the knowledge. In computers, knowledge can be represented in different ways, for example: by numeric values, trees, rules or instance based (Brachman & Levesque, 1985). For medical diagnosis purposes it is important to duplicate the automated diagnostic process. Therefore, the rules that the algorithm uses to predict the class of an instance should be understandable for the human interpreter. Here we chose the CART (Classification and Regression Trees) algorithm for classification and class prediction, which uses a tree representation but where the inferring rules are automatically generated out of the tree.

Classification and Regression Trees (CART)
The CART algorithm was published for the first time in the year 1984 by Leo Breiman et al.
CART is an algorithm which is used for optimal decision finding (Breiman et al., 1993). It is based on the generation of so called binary decision trees. The choice of the attributes is guided by the optimization of the information measure at each decision step. An important feature of the CART algorithm is that for every decision in a given node of the generated tree, the node is split only in two sub nodes (every node has maximally two child nodes). This is called a binary decision and the generated tree is called a binary tree. The central task of the algorithm is to determine the optimal binary decision for separation of the attributes at each step. Therefore, the special merit of the CART algorithm is the ability for the optimal separation of the data relating to classification purposes. Furthermore, the CART algorithm has the ability to capture the decision structure explicitly. This means that the decision rules generated from the tree are intelligible in that way that they can be understood discussed and explicitly used as diagnostic rules. A binary tree has a root node (starting point), several leaves (end points) and inner nodes which are branched into two succeeding nodes (Fig. 13). For a given node, the preceding node is the parent node and its successors are the child nodes. The root node of a tree is the node with no parents. There is, at most, one root node Fig. 13. In computers, knowledge can be represented as binary trees, consisting of a root node, inner nodes and leaves in a rooted tree. A leaf node has no children. The depth of a node in the tree is the length of the path from the root node to the considered node. The set of all nodes at a given depth is called a level of the tree. The height of a tree is the length of the path from the root to the deepest leaf node in the tree. To classify an unknown instance, it is routed down the tree according to the values of the different features (Steinberg & Colla, 1995).

Generation of the decision tree
Divide-and-conquer (lat. divide et impera) algorithms are used to produce decision trees in a recursive way. A divide and conquer algorithm works by recursively breaking down a set of different objects (distinguishable by their specific feature values) into two (or more) subsets of similar kinds of objects, until the sub-sets contain only objects of the same kind. A node in the decision tree involves the testing of a particular feature. The root node, as the first node in a decision tree, contains all instances (square elements). A terminal (leaf) node contains instances that belong to the same class. The divide-and-conquer algorithm for constructing a decision tree consists in principal of three parts: 1. The determination of the optimal splitting at every node; 2. The decision whether the node is a leaf node or a inner node; 3. The assignment of a leaf node to a specific class. First a feature (attribute) is selected at the root node and a branch is made for each possible value. Through this operation the set of the instances is split up into subsets. Then this process is repeated recursively for every branch, at an inner node, with the instances in the corresponding subset (Fig. 14). If the nodes cannot be divided anymore, the process is stopped and all the instances at a terminal node belong to the same class. The features that produce the purest daughter nodes are used for an optimal splitting. In the case of numerical features, the splitting is usually done by use of numeric thresholds that divide the range of the feature values. Trees with averaged numerical values are called regression trees. Trees with tests involving more than one feature at a time are called multivariate decision trees (in contrast to univariate trees, where only one feature is used). Determining the optimal splitting: Each step in the tree construction consists of a binary decision leading to two daughter nodes. At every inner node, the actual instance set (parent set: t ) is split into two children sets ( R t and L t ). ). The information measures the amount of uncertainty associated with each element (object) in a given discrete set, depending on the probability of occurrence in the set. For a pure node (only elements of one kind), the information measure has a minimum (=0) value. For an impure node with equal distribution of the elements, the information measure has a maximum value.

Fig. 14. The decision tree is generated recursively
For sets with different distributions of the elements, the value of the information measure lies between these two extremes. If the information measure is calculated by use of the binary logarithm, which is closely connected to the binary numeral system, the units of the corresponding values are expressed in bits. The information measure of an attribute is considered to be high when it enables a set splitting in such a way that a classification can be done with a high score. The higher the information measure of an attribute is, in relation to the classification values, the higher in the tree the attribute is selected. The difference between the impurities of the parent and the children sets is given by: Then the parent set splits into children sets, by use of a splitting rule * s generated by a test * q Q ∈ , in such a way that the difference between the impurities of the parents and the children sets has the maximum value: This means that the set of parent square elements is split into children sets so that the feature values for the square elements in each of the children sets are as similar as possible. At every node, a feature is interrogated and a decision is made about the selection of the successor nodes . This procedure is repeated until a leaf is reached. Decision whether the node is a leaf node. The decision whether the node is a terminal node or an inner node depends on the purity of a children node and is given by the stop rule: www.intechopen.com The threshold B is a measure of how homogenous the terminal nodes must be. Ideally, the information measure for a set of equal elements is zero. Assignment of a leaf node to a specific class. When a leaf node is reached, the instance is classified according to the class assigned to the leaf. Because the texture features are represented by numeric values, the classification task is to predict a numeric quantity by which the instance is assigned to the averaged numeric value of a class.

Inferring rules
The divide-and-conquer algorithm for generating decision trees enables a good separation of the instances. But the knowledge representation by binary trees is not as suitable for the interpretation of the classification process. In contrast, the knowledge representation by rules is very suitable in medicine because rules are easily understandable for the human interpreter. By the CART algorithm, the knowledge is extracted from the dataset in the form of inferring rules, which can easily be implemented as diagnostic rules. The inferring rules result from the splitting rules and are automatically derived directly off the decision tree, where one rule is generated for each terminal node. The rules have a syntax consisting of an antecedent (precondition) and a consequent (conclusion) part: For all instances, which fulfil the precondition () Px , follow the conclusion () Cx . The rule, represented by the formula, is called "Modus Ponens" meaning that: from () Px infer () Cx or in other words "if () Px then () Cx ". This rule is the formal description of the diagnostic process. The precondition may include several parts (single sub-conditions, for example clinical symptoms) If all the single sub-conditions ( () n p x ) are true then the precondition () Px is valid. The antecedent of a rule includes a condition for every node on the path from the root node to a specific terminal node. The consequence of the rule is the class assigned to the terminal node. For the implementation on a computer they are generally expressed as "IF-THEN" rules ( { } { } IF precondition THEN conclusion ). These rules represent knowledge in a form that is easily understandable for the human observer. It enables him to understand why a specific square element is assigned to a benign common nevi or malignant melanoma. Such inferring rules are used as diagnostic rules. The rules can be compared with the diagnostic guidelines of the derma pathologist. Furthermore they enable the computer scientist to validate the gained knowledge and to combine it eventually with previously known facts. An example of such a (simplified) inferring rule is:

{ }
Standard deviation in frequency band (5) According to the guidelines for the diagnosis, the rule can be translated in: if the tissue contains large and medium structures, for example: nevi cells grouped around basal structures, then it is nevus tissue. Generally, such rules are far more complex than necessary and therefore they are usually pruned to remove redundant tests. Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. The goal of pruning is to reduce complexity of the final classifier and to improve the classification accuracy by the reduction of over fitting and removal of sections of a classifier that may be based on erroneous data. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a test set or using cross-validation.

Study
The discrimination power of texture features based on the wavelet transform and the performance of the CART algorithm are demonstrated in the following study (square size 512x512). Overall, 857 images of benign common nevi (408 images) and malignant melanoma (449 images) are used as study set. To get more insights into the classification performance, a percentage split was performed by using 66% of the dataset for training and the remaining instances (34%) as the test set. The classification results of 572 cases (276 benign common nevi, 296 malignant melanomas) in the training set and 285 cases (132 benign common nevi, 153 malignant melanomas) in the test set are shown in Table 1

Discussion
Delayed recognition of skin malignancies puts the patient at risk of destructive growth and death from disease once the tumour has progressed to competence for metastasis. Therefore, preventive and periodical skin checkups are of special importance. Technological advancements in imaging systems have led to the development of convocal laser scanning microscopy (Ericson et al., 2008). This technique enables the examination of skin lesions in vivo and significantly higher prediction success than reported for dermoscopic examination can be achieved for the diagnosis of melanoma (Rajadhyaksha, 2009). However, due to the fact that the CLSM method is relatively new, there is still a lack of experiences with the diagnostic features and an intensive training is necessary for the clinician. The study demonstrates the applicability of the automated diagnosis system for the discrimination of CLSM views of skin lesions (Table 1). The image analysis, based on the wavelet transform, together with tree based machine learning algorithms provide a powerful tool for automated diagnosis of CLSM images of skin lesions. For the diagnosis of the CLSM views, architectural structures such as: micro-anatomic structures; cell nests etc., are used as guidelines by the derma pathologist. Therefore features based on the spectral properties of the wavelet transform, enabling an exploration of architectural structures at different spatial scales are suitable for the automatic analysis. The images of benign common nevi show pronounced architectural structures whereas images of malign melanoma show melanoma cells and connective tissue with few or no architectural structures. These guide lines are reflected by the wavelet coefficients inside the different frequency bands. The standard deviations of the wavelet coefficients in the lower and medium frequency bands show higher values for the benign common nevi than for malignant melanoma tissue, www.intechopen.com indicating more pronounced structures at different orders of magnitude (Fig. 15). The energy of the lowest frequency band (large scale architectural structures) is higher for the CLSM views of benign common nevi than for malignant melanoma. The CART algorithm has the ability to capture the decision structure explicitly. This means that the, from the tree, generated decision rules are intelligible in that way that they can be understood, discussed and explicitly used as diagnostic rules. Computer aided diagnosis, providing automated decisions, can be used as an expert second opinion or help and assist the non-experienced physician in the diagnostic procedure (Fig.  16). Automated diagnosis is the principal performance of medical expert systems. Although the handcrafted diagnostic rules in the inference machine of expert systems perform well in medical applications, machine learning has the advantage that the rules are generated automatically for systems where the producing of manual rules is too labour intensive and there is a lack of human expert knowledge and experience. The generating of manual rules requires expert knowledge whereas rules generated by machine learning algorithms represent knowledge that can be used to analyse and refine the diagnostic process.
To verify the performance of the method and to interpret the diagnostic process, the automatically generated inferring rules are implemented in specially developed viewer software (Fig. 17). By this viewer the classified square elements are indicated in the corresponding CLSM image in order to judge the performance of the analysis. For these purposes square elements of size 128x128 are used, because they enable a good localization of the different regions into the images and the texture features have still enough discrimination power (sensitivity and specificity of 88.12% for malignant melanoma and 84.75% for benign common nevi). The procedure is illustrated in the case of malignant melanoma tissue (Fig. 17). Square elements resulting from terminal nodes, with 100% of discrimination power, are taken as highly significant and are drawn with red margins at the graphical user interface of the viewer and labelled with the number 1. Square elements with discrimination power of 80-99% are drawn with green margins and labelled with the number 2. The relocated elements mainly show polymorphic tumour cells with structural Due to the fact that the method of confocal laser scanning microscopy is relatively new, the diagnostic features for the clinician have not yet been completely investigated. Therefore the automated image analysis is a fundamental and important step toward the general assessment of the CLSM images by the clinician. The viewer software is further used for the interaction of the clinician with the automated diagnostic system. Then he can use it as computer aided tool for a second opinion or a non-experienced user can consult the system, learn from the system and ameliorate his diagnostic skills. The results of the inferring mechanism (the classification) are visualized by the graphical user interface enabling an interpretation and evaluation of the diagnostic process.
In this review, we demonstrated how diagnostic guidelines and experiences are represented by mathematical structures and implemented on a computer. To enable the human interpreter an interpretation of the results of the automated diagnostic, the output is visualized in an appropriate manner. Since the highlighted regions in the CLSM images show tissue structures which are in good accordance with the already known diagnostic guidelines, no explicit formulation of the diagnostic rules is necessary. Due to the fact that the tissue features are based on the wavelet transform, reflecting the guidelines of the derma pathologists, the highlighted regions are self explaining. Although it should be noted that, beside the visual interpretation, it seems that features not accessible to the human eye contribute to the automated diagnostic process.

Conclusion
In conclusion, image analysis, based on the wavelet transform, together with a tree based machine learning algorithm provide a powerful tool for automated diagnosis of CLSM images of skin lesions. Already known, but subjective CLSM criteria are objectively reproduced. The system enables the identification of highly significant parts in CLSM views of malignant melanoma. In a clinical application, the system can be used as a screening tool to improve preventive medical checkups and the early recognition of skin tumours. The automated decisions provided can be used as an expert second opinion and as a training system for inexperienced or student derma pathologists. In another clinical onset, the system may automatically pre-select the cases in such a way that the critical cases are first interpreted by the clinician.